Skip to main content

User Guide

What are Data Flow Jobs?

Similarly to Batch Jobs, Data Flow Jobs apply a Policy on concrete input data. Instead of producing output files on HDFS, they produce a flow of output data. Inputs and output flows are managed on the external platforms executing the Job. Both input and output data can consist of a stream of records or a series of batches of data.

Input and output data flows are connected to the Privitar Data Flow plug-ins directly in the executing platform. The output data is published to the Protected Data Domain selected in the Data Flow Job configuration.

Pipelines using Data Flow Jobs with the same destination PDD will produce consistent data (if the Preserve data consistency option is used).

Once created, a Data Flow Job is referenced from the external pipeline by its ID. 

Configuration such as bad record handling is delegated to the Data Flow execution platform.