Setting Diagnostic Options for Jobs (HDFS & Hive Batch Jobs)
The Advanced tab on the Run Job window contains various options that can be used to generate diagnostic information about the running of a Job.
To enable the options, select the Specify Diagnostic Options checkbox. The window will display the options available.
The options are split into three sections:
Spark
Logging
Metrics
Note
The Advanced tab is only displayed in the Run Job window if the Privitar application.properties
file has been configured to provide advanced diagnostic output for Jobs.
Spark
There are various options that can be selected for the Spark execution.
Maximum Partition Size
The Maximum Partition Size controls how much of an input file is processed in-memory at once by a Spark Executor. This setting can be used to control memory usage in active Spark Executors. This setting is optional and should only be set if needed.
As an approximation, this value should be set according to the formula:
Maximum Partition Size < Executor Memory / (2 × Executor Cores)
Write Spark Events File
Select this checkbox to write to an events file all the Spark events that are generated by Spark during processing.
Log Garbage Collection Details
Select this checkbox to write to the log file all the details about garbage collection performed during Spark processing.
Spark Driver and Executor Options
Enter additional JVM options to pass to the Spark driver, in the spark.driver.extraJavaOptions
edit field.
Enter additional JVM options to pass to the Spark Executor, in the spark.executor.extraJavaOptions
edit field.
Logging
Any additional log4j logging properties to pass to Spark may be specified in this section.
Metrics
Any additional Metrics may be specified in this section for the Spark job.