Mass Ingestion

Back Next

Guidelines for Databricks Delta targets

Consider the following guidelines when you use Databricks Delta targets:

When you use a Databricks Delta target for the first time, perform the following steps before you configure an

application ingestion

task for the target:

Download the Databricks JDBC driver from the Databricks website.

Copy the Databricks JDBC driver jar file, SparkJDBC42.jar, to the following directory:

Secure_Agent_installation_directory/apps/Database_Ingestion/ext/

On Windows, install Visual C++ Redistributable Packages for Visual Studio 2013 on the computer where the Secure Agent runs.

For incremental load jobs, you must enable Change Data Capture (CDC) for all source fields.

You can access Databricks Delta tables created on top of the following storage types:

Microsoft Azure Data Lake Storage (ADLS) Gen2

Amazon Web Services (AWS) S3

The Databricks Delta connection uses a JDBC URL to connect to the Databricks cluster. When you configure the target, specify the JDBC URL and credentials to use for connecting to the cluster. Also define the connection information that the target uses to connect to the staging location in Amazon S3 or ADLS Gen2.

Before writing data to Databricks Delta target tables,

application ingestion

jobs stage the data in an Amazon S3 bucket or ADLS directory. You must specify the directory for the data when you configure the

application ingestion

task.

Mass Ingestion Applications

does not use the

ADLS Staging Filesystem Name

and

S3 Staging Bucket

properties in the Databricks Delta connection properties to determine the directory.

Mass Ingestion Applications

uses jobs that run once to load data from staging files on AWS S3 or ADLS Gen2 to external tables.

By default,

Mass Ingestion Applications

runs jobs on the cluster that is specified in the Databricks Delta connection properties. If you want to run the jobs on another cluster, set the dbDeltaUseExistingCluster custom property to false on the

Target

page in the

application ingestion

task wizard.

By default,

Mass Ingestion Applications

uses the Databricks Delta COPY INTO feature to load data from the staging file to Databricks Delta target tables. You can disable it for all load types by setting the writerDatabricksUseSqlLoad custom property to false on the

Target

page in the

application ingestion

task wizard.

If you use an AWS cluster, you must specify the

S3 Service Regional Endpoint

value in the Databricks Delta connection properties.

Supported targets

Download Guide

Watch

Comments

Communities