Skip to main content

User Guide

Setting Diagnostic Options for Jobs (HDFS & Hive Batch Jobs)

The Advanced tab on the Run Job window contains various options that can be used to generate diagnostic information about the running of a Job.

To enable the options, select the Specify Diagnostic Options checkbox. The window will display the options available.

The options are split into three sections:

  • Spark

  • Logging

  • Metrics

Note

The Advanced tab is only displayed in the Run Job window if the Privitar application.properties file has been configured to provide advanced diagnostic output for Jobs.

Spark

There are various options that can be selected for the Spark execution.

Maximum Partition Size

The Maximum Partition Size controls how much of an input file is processed in-memory at once by a Spark Executor. This setting can be used to control memory usage in active Spark Executors. This setting is optional and should only be set if needed.

As an approximation, this value should be set according to the formula:

Maximum Partition Size < Executor Memory / (2 × Executor Cores)
Write Spark Events File

Select this checkbox to write to an events file all the Spark events that are generated by Spark during processing.

Log Garbage Collection Details

Select this checkbox to write to the log file all the details about garbage collection performed during Spark processing.

Spark Driver and Executor Options

Enter additional JVM options to pass to the Spark driver, in the spark.driver.extraJavaOptions edit field.

Enter additional JVM options to pass to the Spark Executor, in the spark.executor.extraJavaOptions edit field.

Logging

Any additional log4j logging properties to pass to Spark may be specified in this section.

Metrics

Any additional Metrics may be specified in this section for the Spark job.