Advanced settings (HDFS Batch Jobs)

User Guide

Advanced settings (HDFS Batch Jobs)

The Advanced tab contains advanced settings to configure HDFS Batch Jobs.

Overwrite Behavior

Use the Overwrite Behavior list box to define the behavior in the event of existing files being encountered during the execution of the Job.

The following table defines the options available:

Option	Description
Always overwrite existing data	Existing files encountered during Job execution will be silently overwritten by Privitar.
Job fails if data aleady exists	Existing files encountered during Job execution will cause the Job to fail. This behavior may be useful when Privitar is controlled via the Automation API. When using the API, data being overwritten is not expected and should be flagged as a production issue.

Option

Description

Always overwrite existing data

Existing files encountered during Job execution will be silently overwritten by Privitar.

Job fails if data aleady exists

Existing files encountered during Job execution will cause the Job to fail.

This behavior may be useful when Privitar is controlled via the Automation API. When using the API, data being overwritten is not expected and should be flagged as a production issue.

Handling of Bad Records

Use the Bad Record Handling dropdown menu to define the behavior in the event of bad records being received during the execution of the Job.

The following table defines the options available:

Option	Description
Fail when all records are bad	The Job is only deemed to have failed when no records in the data input could be successfully processed.
Fail when bad record percentage exceeds threshold	The Job is deemed to have failed when greater than the specified percentage of records could not be processed.
Fail when any record is bad	The Job is deemed to have failed when one or more records could not be processed.

Note

In the event that data could not be processed, the details of the failure are displayed in the Details column in the View Job window. This window can be accessed by clicking on the name of the Job in the Jobs page.

Spark Parameters

The Spark settings determine parameters for submitting Spark jobs used to perform anonymization processing on the Privitar platform. The best settings for these parameters are dependent on the specification of the Hadoop cluster and the characteristics of input data.

Typically these settings are specified centrally on the Job's selected Environment. For more information, see Hadoop Cluster Environment Configuration. Privitar also supports overriding these settings on a per-Job basis.

In this section:

User Guide

Advanced settings (HDFS Batch Jobs)

Overwrite Behavior

Handling of Bad Records

Note

Spark Parameters

Search results