StreamSets Reference Guide

Installation

This section describes the installation procedure for the Privitar StreamSets Data processor.

Pre-requisites

It is assumed that you have installed the following software:

  • StreamSets v3.16.0

  • Privitar Data Privacy Platform v3.8.0 (or later)

Installation procedure

  1. Install the plug-in.

  2. Install the necessary Token Vault drivers.

  3. Restart StreamSets.

  4. Confirm that the plug-in is available in StreamSets as a Processor.

Installing the Privitar data connector plug-in

The Privitar StreamSets data processor is provided as a tar file called:

privitar-data-flow-streamsets-<x.x.x>.tar

where <x.x.x> is the version of the platform. For example:

privitar-data-flow-streamsets-3.8.0.tar

To un-tar the file and install the plug-in (assuming you are using v3.8 of the platform):

  1. Copy the tar file into the StreamSets directory that is defined by the StreamSets environment variable:

    USER_LIBRARIES_DIR

    (You can discover the definition of this variable using the env command from StreamSets.)

  2. Un-tar the file, using the command:

    tar -xvf privitar-data-flow-streamsets-3.8.0.tar

    This command creates a directory called:

    USER_LIBRARIES_DIR/privitar-data-flow-streamsets/

    The Privitar StreamSets Data Processor jar file is located in:

    privitar-data-flow-streamsets/lib/privitar-data-flow-streamsets-3.8.0.jar

Installing the Token Vault driver

Drivers are required by the Privitar plug-in to connect to the Privitar Token vault. These drivers need to be added to the same location as the jar file for the Privitar StreamSets data processor. That is:

USER_LIBRARIES_DIR/privitar-data-flow-streamsets/lib/

The drivers to include are specific to the type of database you are using to store the Privitar Token Vault. For a StreamSets processing environment, the following types of Token Vault are supported:

  • Relational Database (JDBC) including PostgreSQL (v9.6 and later) and Oracle (11g, 12c and later.)

  • HBase v2.2.x and later

For JDBC drivers, you can use the drivers that are provided from PostgreSQL and Oracle vendors.

For HBase, the platform provides a custom version of the HBase driver that can be used with the HBase Token Vault. The driver that needs to be used depends on the Hadoop Vendor that the HBase Token Vault is running on:

Hadoop vendor

Privitar HBase driver jar name

Google Bigtable

privitar-hbase-bigtable-driver-[version].jar

Cloudera CDH6 - HBase

privitar-hbase-cdh6-driver-[version].jar

For more information on accessing the Privitar HBase drivers, contact support@privitar.com.

Restart StreamSets

To restart StreamSets:

  1. Select the Administration icon in the top-right corner of the StreamSets main page.

  2. Select Restart from the menu.

Confirm that the Privitar data connector is available

The Privitar StreamSets Data Processor should now be available from StreamSets in the Processors list box. For example:

processor-icons.png

Two processing components are available:

  • Apply Privitar Policy - use this component to apply a Privitar Policy (de-identify data).

  • Apply Privitar Unmasking - use this component to Unmask (re-identify data).