The Advanced tab contains advanced settings to configure Hive Batch Jobs.
Use the Overwrite Behavior list box to define the behavior in the event of existing files being encountered during the execution of the Job.
This setting is useful when the Job is being used for a purpose where a strict, known output is required. For example, it may be necessary to know that the table contains only the result of the current Job run, and never the results of previous runs. In this case, the Truncate option would be used.
The table defines the options available:
Option | Description |
---|---|
Always insert new rows into existing tables | If the Hive table exists, insert into it, merging new records with existing data. |
Fail if the tables already exist | If the Hive table exists, fail the Job. The table must not exist when the Job runs. |
Always truncate existing tables before writing | If the Hive table exists, erase (truncate) it before writing any new records. This ensures that the table only contains the content of the most recent Job run. |
Use the Bad Record Handling list box to define the behavior in the event of bad records being received during the execution of the Job.
The table defines the options available:
Option | Description |
---|---|
Fail when all records are bad | The Job is only deemed to have failed when no records in the data input could be successfully processed. |
Fail when bad record percentage exceeds threshold | The Job is deemed to have failed when greater than the specified percentage of records could not be processed. |
Fail when any record is bad | The Job is deemed to have failed when one or more records could not be processed. |
Note
In the event that data could not be processed, the details of the failure are displayed in the Details column in the View Job window. This window can be accessed by clicking on the name of the Job in the Jobs page.
The Spark settings determine parameters for submitting Spark jobs used to perform anonymization processing on the Privitar platform. The best settings for these parameters are dependent on the specification of the Hadoop cluster and the characteristics of input data.
Typically these settings are specified centrally on the Job's selected Environment. For more information, see Hadoop Cluster Environment Configuration. Privitar also supports overriding these settings on a per-Job basis.