Configuration options
This section describes the configuration options for the Privacy Platform data processor. Many of the configuration options are set to sensible defaults, so If you are unsure about a particular setting, keep the default value.
Some things to note about configuring the processor:
You can’t change the configuration of the processor when the pipeline is running. The pipeline must be stopped to enable it to be configured.
For convenience, some of the Configuration tabs contain an option to switch to Bulk Edit Mode. This enables you to enter the configuration options for that category in JSON format.
General
Attribute | Description | Default setting /Options available |
---|---|---|
Name | Name of the data processor. | Default setting (Apply Privitar Policy 1). |
Description | A description of the use of the processor. | Default setting (Empty). |
Required Fields | The fields in the Schema used by the Data Flow job that must contain data in order for the job to be processed. | Default setting (No fields are required.) To select fields from the Schema, choose ‘Select Fields Using Preview Data’ and select the fields from the list of fields that are displayed |
Preconditions | Records that don’t satisfy the specified preconditions are sent to error. | Default setting (No preconditions set.) If there are many preconditions to define, select ‘Switch to bulk edit mode’ to add multiple preconditions in a single entry. For more information on the types of preconditions that can be set, refer to the StreamSets Data Collector User Guide. |
On Record Error | What action to take if a data processing error occurs. | Default setting (Send to Error.) Other options are: Discard or Stop Pipeline. |
Authentication
Attribute | Description | Default setting /Options available |
---|---|---|
Privitar Policy Manager URL | The HTTP address and port number of the Policy Manager that is used to run the Data Flow job used by the data pipeline | If using basic authentication, this address would be:
For Mutual TLS authentication, this address would be:
where |
Authentication Method | The method used for authenticating with the Privitar Policy Manager. | |
Basic Authentication | ||
Privitar username | Username of the API user. | Default setting (Empty) The API user must be have a Role with Run Data Flow permission for Masking Jobs or Unmasking jobs, in the Team that the job is defined in. |
Privitar password | Password for the API user. | Default setting (Empty) |
Mutual TLS Authentication | ||
TLC Client Certificate File Path (from local file system) | Specifies the location of the certificate file used for authenticating with the Privitar Policy Manager. | Default setting (Empty) The Common Name (CN) entry in the TLS certificates should resolve to an API user in the platform. The API user must have a Role with Run Data Flow permission for Masking Jobs or Unmasking jobs, in the Team that the Job is defined in. For more information about creating API users in the platform, see Configuring users. For more information about Mutual TLS authentication, see Pipeline Configuration in the Streamsets documentation. |
TLS Client Certificate Password | The password for the TLS client certificate file. | |
TLS Trusted CA Certificate File Path (from local file system) | Specifies the location of the TLS CA certificate file used for authenticating with the Privitar Policy Manager. |
Data Flow Job
Attribute | Description | Default setting / Options available |
---|---|---|
Job ID | The Job ID of the Data Flow Job configured in the Policy Manager. This ID can be retrieved from the Data Flow Job details page in the Policy Manager UI). | Default setting (Empty). |
Advanced settings
Attribute | Description | Default setting / Options available |
---|---|---|
Max Cache size | The maximum size (in bytes) of the local cache that is used to store tokens prior to being written to the Token Vault. | Default setting (512000000) |
Max Batch size | Incoming records will be processed in batches no larger than this size. | Default setting (1000) |
Concurrent Batches | The maximum number of batches that can be processed in parallel. | Default setting (20) |
Job Cache Expiration (minutes) | The interval after which a Job cache entry that is not in use will be expired and closed. | Default setting (60) |
Job Cache Refresh Frequency (minutes) | he frequency at which Job definitions are refreshed from the Policy Manager. | Default setting (10) |
Token Vault Connection Cache Expiration (minutes) | The interval after which a Token Vault connection that is not in use will be expired and closed. | Default setting (30) |
Token Vault Kerberos Keytab Path (from local file system) | Specifies the location of the Kerberos keytab used for connecting to an HBase token vault. | Default setting (Empty) |
Advanced Settings | Advanced settings for debugging, tuning, monitoring, etc. | Default setting (Empty) |