StreamSets Reference Guide

Configuration options

This section describes the configuration options for the Privacy Platform data processor. Many of the configuration options are set to sensible defaults, so If you are unsure about a particular setting, keep the default value.

Some things to note about configuring the processor:

  • You can’t change the configuration of the processor when the pipeline is running. The pipeline must be stopped to enable it to be configured.

  • For convenience, some of the Configuration tabs contain an option to switch to Bulk Edit Mode. This enables you to enter the configuration options for that category in JSON format.

General

Attribute

Description

Default setting /Options available

Name

Name of the data processor.

Default setting (Apply Privitar Policy 1).

Description

A description of the use of the processor.

Default setting (Empty).

Required Fields

The fields in the Schema used by the Data Flow job that must contain data in order for the job to be processed.

Default setting (No fields are required.)

To select fields from the Schema, choose ‘Select Fields Using Preview Data’ and select the fields from the list of fields that are displayed

Preconditions

Records that don’t satisfy the specified preconditions are sent to error.

Default setting (No preconditions set.)

If there are many preconditions to define, select  ‘Switch to bulk edit mode’ to add multiple preconditions in a single entry. For more information on the types of preconditions that can be set, refer to the StreamSets Data Collector User Guide.

On Record Error

What action to take if a data processing error occurs.

Default setting (Send to Error.)

Other options are: Discard or Stop Pipeline.

Authentication

Attribute

Description

Default setting /Options available

Privitar Policy Manager URL

The HTTP address and port number of the Policy Manager that is used to run the Data Flow job used by the data pipeline

If using basic authentication, this address would be:

http://<address>:8080/

For Mutual TLS authentication, this address would be:

https://<address>:8443/

where <address> is the IP address of the Policy Manager.

Authentication Method

The method used for authenticating with the Privitar Policy Manager.

Basic Authentication

Privitar username

Username of the API user.

Default setting (Empty)

The API user must be have a Role with Run Data Flow permission for Masking Jobs or Unmasking jobs, in the Team that the job is defined in.

Privitar password

Password for the API user.

Default setting (Empty)

Mutual TLS Authentication

TLC Client Certificate File Path (from local file system)

Specifies the location of the certificate file used for authenticating with the Privitar Policy Manager.

Default setting (Empty)

The Common Name (CN) entry in the TLS certificates should resolve to an API user in the platform.

The API user must have a Role with Run Data Flow permission for Masking Jobs or Unmasking jobs, in the Team that the Job is defined in.

For more information about creating API users in the platform, see Configuring users.

For more information about Mutual TLS authentication, see Pipeline Configuration in the Streamsets documentation.

TLS Client Certificate Password

The password for the TLS client certificate file.

TLS Trusted CA Certificate File Path (from local file system)

Specifies the location of the TLS CA certificate file used for authenticating with the Privitar Policy Manager.

Data Flow Job

Attribute

Description

Default setting / Options available

Job ID

The Job ID of the Data Flow Job configured in the Policy Manager. This ID can be retrieved from the Data Flow Job details page in the Policy Manager UI).

Default setting (Empty).

Advanced settings

Attribute

Description

Default setting / Options available

Max Cache size

The maximum size (in bytes) of the local cache that is used to store tokens prior to being written to the Token Vault.

Default setting (512000000)

Max Batch size

Incoming records will be processed in batches no larger than this size.

Default setting (1000)

Concurrent Batches

The maximum number of batches that can be processed in parallel.

Default setting (20)

Job Cache Expiration (minutes)

The interval after which a Job cache entry that is not in use will be expired and closed.

Default setting (60)

Job Cache Refresh Frequency (minutes)

he frequency at which Job definitions are refreshed from the Policy Manager.

Default setting (10)

Token Vault Connection Cache Expiration (minutes)

The interval after which a Token Vault connection that is not in use will be expired and closed.

Default setting (30)

Token Vault Kerberos Keytab Path (from local file system)

Specifies the location of the Kerberos keytab used for connecting to an HBase token vault.

Default setting (Empty)

Advanced Settings

Advanced settings for debugging, tuning, monitoring, etc.

Default setting (Empty)