Skip to main content

User Guide

Create a Connection to Apache Spark

You can use Apache Spark as a data source with Privitar Data Security Platform​.

To connect to Apache Spark, you must:

Meet the Apache Spark Connection Prerequisites

Note

Most of the settings for the Spark Thrift server are the same as those for HiveServer2. To learn more, see https://spark.apache.org/docs/latest/sql-distributed-sql-engine.html

Before you connect to Apache Spark, you must:

  1. Have a system user that is able to authenticate to Apache Spark using a username and password and has read access to the relevant databases and tables

  2. Have access to the SSL certificate used to encrypt the connection (or the relevant certificate authority certificates)

If your Secure Sockets Layer (SSL) source uses privately-signed server certificates, you must modify the truststore of your data plane in order to trust the server certificates as follows:

  1. Obtain the SSL certificate from the data source.

  2. Convert the SSL certificate to a JKS truststore.

  3. Copy the truststore into the shared/truststores/ location of your data plane configuration mounted volume (the volume used to store JDBC drivers).

    Note

    You will need to refer to this truststore when configuring the SSL JDBC properties. By default, the truststore is mounted on /config/shared/truststores/truststore.jks.

    The mounted volume's directory structure should look similar to the following:

    ├─shared/
    | └── jdbc-drivers/
    |      └── hive-42.2.23.jar
    | └── truststores/
    |      └── truststore.jks  
    ├─data-agent
    | └── EMPTY
    ├── data-proxy
    | └── EMPTY
  4. Download the JDBC JAR driver that you will use to connect to the data source.

  5. Place the JDBC JAR driver into the shared/jdbc-drivers/ location of your data plane configuration mounted volume (the volume used to store JDBC drivers).

For example, the SSL settings for Spark might look like the following:

jdbc:hive2://ip-172-31-26-172.eu-west-2.compute.internal:10000/default;ssl=true;sslTrustStore=/config/shared/truststores/truststore.jks;trustStorePassword=changeit
Build an Apache Spark Connection String

Note

Most of the settings for the Spark Thrift server are the same as those for HiveServer2. To learn more, see https://spark.apache.org/docs/latest/sql-distributed-sql-engine.html

The following is an example of a complete Apache Spark connection string:

jdbc:hive2://localhost:10000/database1

Note

The Spark Thrift server uses the same JDBC driver as HiveServer2.

To build an Apache Spark connection string, follow this example. Note that it has the following segments:

jdbc:hive2://<host>:<port>/<dbName>;<sessionConfs>?<hiveConfs>#<hiveVars>

If you have configured to use SSL in the previous section, the SSL settings for Spark might look like the following:

jdbc:hive2://ip-172-31-26-172.eu-west-2.compute.internal:10000/default;ssl=true;sslTrustStore=/config/shared/truststores/truststore.jks;trustStorePassword=changeit

The following table includes a description of each segment.

Table 2. Apache Spark Connection String

String Segment

Description

host

The Spark server hosting node. Required.

port

The port that the Spark server listens to. Required.

dbName

The name of the Hive database. Required.

sessionConfs

Key-value pairs for the JDBC driver in the format <key1>=<value1>;<key2>=<value2>; Optional.

hiveConfs

Key-value pairs for Hive in the format <key1>=<value1>;<key2>=<value2>; Optional.

hiveVars

Key-value pairs for Hive variables in the format <key1>=<value1>;<key2>=<value2>; Optional.



Authenticate to Apache Spark

The Privitar Data Security Platform​ currently supports username/password authentication for Apache Spark.

Enter the system user's Apache Spark credentials in the Username and Password fields on the platform's Connections page.