What is a Job?
A Privitar Job defines the application of a Policy to some data, with the goal of publishing the output into a Protected Data Domain.
PDDs are selected when creating or running the Jobs to control the consistency of the output data, and to optionally associate the processed data with a Watermark.
It represents the concrete execution of the data transformations defined in the Policy.
There are three types of Jobs:
Batch Jobs apply a Policy to datasets as batch operations. They are either executed on Hadoop clusters, reading and writing data in HDFS / cloud-based blob storage services or Hive, or on AWS Glue ETL, reading and writing data in Amazon S3.
Data Flow Jobs apply a Policy to flows of streaming or batches of data. They are executed on external data flow platforms. Input and output data flows are configured in the external data flow platform.
Privitar On Demand Jobs apply a Policy to batches of data received by a Privitar On Demand server, over a HTTP(s) API. The result of applying the Policy is returned to the caller of the API.
Batch Jobs apply a Policy to datasets as batch operations. These Jobs are executed:
on Hadoop clusters, reading and writing data at rest in HDFS, Hive or cloud-based blob storage services, such as Amazon S3, Azure Blob Storage or Google Cloud Storage, or
on AWS Glue ETL, a serverless data processing service in AWS, reading and writing data on Amazon S3.
Data Flow Jobs apply a Policy to flows of streaming data or batches of streaming. These Jobs are executed on external data flow platforms. Input and output data flows are configured in the external data flow platform.
Privitar On Demand Jobs apply a Policy to batches of data received by a Privitar On Demand server, over a HTTP(s) API. The result of applying the Policy is returned to the caller of the API.