Hi, I'm Ask INFA!
What would you like to know?
ASK INFAPreview
Please to access Bolo.

Table of Contents

Search

  1. Connectors and connections
  2. Connection configuration
  3. Connection properties
  4. Swagger file generation for REST V2 connections

Connections

Connections

Databricks Delta connection properties

Databricks Delta connection properties

When you set up a Databricks Delta connection, configure the connection properties.
You can configure the connection properties for both SQL warehouse and Databricks clusters.
The following table describes the Databricks Delta connection properties that are required to connect to Databricks Delta:
Property
Description
Connection Name
Name of the connection.
Each connection name must be unique within the organization. Connection names can contain alphanumeric characters, spaces, and the following special characters: _ . + -,
Maximum length is 255 characters.
Description
Description of the connection. Maximum length is 4000 characters.
Type
The Databricks Delta connection type.
Runtime Environment
Name of the runtime environment where you want to run the tasks.
You can specify a Secure agent, Hosted Agent, or serverless runtime environment.
Hosted Agent is not applicable for mappings in advanced mode.
You cannot run an application ingestion, database ingestion, or streaming ingestion task on a Hosted Agent or serverless runtime environment.
Databricks Token
Required for SQL warehouse and Databricks clusters.
Personal access token to access Databricks.
Ensure that you have permissions to attach to the cluster identified in the
Cluster ID
property.
For mappings, you must have additional permissions to create Databricks clusters.
SQL Warehouse JDBC URL
Required for SQL warehouse.
Databricks SQL Warehouse JDBC connection URL.
To get the SQL Warehouse JDBC URL, go to the Databricks console and select the JDBC driver version
2.6.22 or earlier
from the JDBC URL menu.
Use the following syntax:
jdbc:spark://<Databricks Host>:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/endpoints/<SQL endpoint cluster ID>;
The JDBC URL versions
2.6.25 or later
that begin with the prefix
jdbc:databricks://
are not applicable to Data Integration tasks and mappings.
Application ingestion and database ingestion tasks can use JDBC URL version
2.6.25 or later
or
2.6.22 or earlier
. The URLs must begin with the prefix
jdbc:databricks://
, as follows:
jdbc:databricks://<Databricks Host>:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/endpoints/<SQL endpoint cluster ID>;
This field is required to connect to the Databricks SQL warehouse.
Ensure that you set the required environment variables in the Secure Agent.
The Databricks Host, Organization ID, and Cluster ID properties are not considered if you configure the SQL warehouse JDBC URL property.
Cluster Environment
Required for SQL warehouse and Databricks clusters.
The cloud provider where the Databricks cluster is deployed.
Choose from the following options:
  • AWS
  • Azure
Default is AWS.
You cannot switch between clusters once you establish a connection. Databricks Delta does not support multi-level dependent connection attributes across clusters.
Cluster properties are required in the following scenarios:
  • When you create and run mappings to write data to Databricks Delta.
  • When you create and run mappings in advanced mode to read from and write to Databricks Delta.
The connection attributes depend on the cluster environment you select. For more information, see the AWS cluster properties and Azure cluster properties sections.
Databricks Host
Required for Databricks cluster. The host name of the endpoint the Databricks account belongs to.
Use the following syntax:
jdbc:spark://
<Databricks Host>
:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/<Org Id>/<Cluster ID>;AuthMech=3;UID=token;PWD=<personal-access-token>
You can get the URL from the Advanced Options of JDBC or ODBC in the Databricks Delta analytics cluster or all purpose cluster.
The value of PWD in Databricks Host, Organization Id, and Cluster ID is always
<personal-access-token>
.
Cluster ID
Required for Databricks cluster.
The ID of the Databricks analytics cluster.
You can get the cluster ID from the JDBC URL.
Use the following syntax:
jdbc:spark://<Databricks Host>:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/<Org Id>/
<Cluster ID>
;AuthMech=3;UID=token;PWD=<personal-access-token>
Organization ID
Required for Databricks cluster.
The unique organization ID for the workspace in Databricks.
Use the following syntax:
jdbc:spark://<Databricks Host>:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/
<Organization Id>
/<Cluster ID>;AuthMech=3;UID=token;PWD=<personal-access-token>
Database
Optional for SQL warehouse and Databricks clusters.
The database name that you want to connect to in Databricks Delta.
Specify a database name or specify
default
to enable the default database name.
For Data Integration, by default, all databases available in the workspace are listed.
JDBC Driver Class Name
Optional for SQL warehouse and Databricks clusters.
The name of the JDBC driver class.
Optional. If you do not specify the driver class, the following class name is used as default:
com.simba.spark.jdbc.Driver
For application ingestion and database ingestion tasks, specify the driver class name as:
com.databricks.client.jdbc.Driver
Min Workers
1
Required for Databricks cluster.
The minimum number of worker nodes to be used for the Spark job. Minimum value is 1.
Max Workers
1
Optional for Databricks cluster. The maximum number of worker nodes to be used for the Spark job. If you don't want to autoscale, set Max Workers = Min Workers or don't set Max Workers.
DB Runtime Version
1
Required for Databricks cluster.
The Databricks runtime version.
Determines the version of Databricks cluster to spawn when you connect to Databricks cluster to process mappings.
Select the runtime version 9.1 LTS.
Worker Node Type
1
Optional for Databricks cluster. The worker node instance type that is used to run the Spark job.
For example, the worker node type for AWS can be i3.2xlarge. The worker node type for Azure can be Standard_DS3_v2.
Driver Node Type
1
Optional for Databricks cluster. The driver node instance type that is used to collect data from the Spark workers.
For example, the driver node type for AWS can be i3.2xlarge. The driver node type for Azure can be Standard_DS3_v2.
If you don't specify the driver node type, Databricks uses the value you specify in the worker node type field.
Instance Pool ID
1
Optional for Databricks cluster. The instance pool ID used for the Spark cluster.
If you specify the Instance Pool ID
to run mappings
, the following connection properties are ignored:
  • Driver Node Type
  • EBS Volume Count
  • EBS Volume Type
  • EBS Volume Size
  • Enable Elastic Disk
  • Worker Node Type
  • Zone ID
Enable Elastic Disk
1
Optional for Databricks cluster. Enables the cluster to get additional disk space.
Enable this option if the Spark workers are running low on disk space.
Spark Configuration
1
Optional for Databricks cluster. The Spark configuration to use in the Databricks cluster.
The configuration must be in the following format:
"key1"="value1";"key2"="value2";....
For example:
"spark.executor.userClassPathFirst"="False"
Doesn't apply to a data loader task or to Mass Ingestion tasks.
Spark Environment Variables
1
Optional for Databricks cluster. The environment variables to export before launching the Spark driver and workers.
The variables must be in the following format:
"key1"="value1";"key2"="value2";....
For example:
"MY_ENVIRONMENT_VARIABLE"="true"
Doesn't apply to a data loader task or to Mass Ingestion tasks.
1
Doesn't apply to mappings in advanced mode and when you use SQL warehouse to connect to Databricks Delta.

0 COMMENTS

We’d like to hear from you!