Hi, I'm Ask INFA!
What would you like to know?
ASK INFAPreview
Please to access Bolo.

Table of Contents

Search

  1. Introducing Mass Ingestion
  2. Getting Started with Mass Ingestion
  3. Connectors and Connections
  4. Mass Ingestion Applications
  5. Mass Ingestion Databases
  6. Mass Ingestion Files
  7. Mass Ingestion Streaming
  8. Monitoring Mass Ingestion Jobs
  9. Asset Management
  10. Troubleshooting

Mass Ingestion

Mass Ingestion

Databricks Delta connection properties

Databricks Delta connection properties

When you create a Databricks Delta connection, you must configure the connection properties.
The following table describes the Databricks Delta connection properties:
Property
Description
Connection Name
Required. The name of the connection. The name is not case sensitive and must be unique within the domain.
You can change this property after you create the connection. The name cannot exceed 128 characters, contain spaces, or contain the following special characters:
~ ` ! $ % ^ & * ( ) - + = { [ } ] | \ : ; " ' < , > . ? /
Description
Description of the connection.
The description cannot exceed 4,000 characters.
Type
Required. Select Databricks Delta.
Runtime Environment
Required. Name of the runtime environment where you want to run the tasks.
Databricks Host
Required. The host name of the endpoint the Databricks account belongs to.
Use the following syntax:
jdbc:spark://
<Databricks Host>
:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/<Org Id>/<Cluster ID>;AuthMech=3;UID=token;PWD=<personal-access-token>
You can get the URL from the Databricks Delta analytics cluster or all purpose cluster ->
Advanced Options
->
JDBC / ODBC
.
The value of PWD in Databricks Host, Org Id, and Cluster ID is always
<personal-access-token>
.
Org Id
Required. The unique organization ID for the workspace in Databricks.
Use the following syntax:
jdbc:spark://<Databricks Host>:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/
<Org Id>
/<Cluster ID>;AuthMech=3;UID=token;PWD=<personal-access-token>
Cluster ID
Required. The ID of the Databricks analytics cluster. You can obtain the cluster ID from the JDBC URL.
Use the following syntax:
jdbc:spark://<Databricks Host>:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/<Org Id>/
<Cluster ID>
;AuthMech=3;UID=token;PWD=<personal-access-token>
Databricks Token
Required. Personal access token to access Databricks.
You must have permissions to attach to the cluster identified in the
Cluster ID
property.
SQL Endpoint JDBC URL
Databricks SQL endpoint JDBC connection URL.
Use the following syntax:
jdbc:spark://<Databricks Host>:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/endpoints/<SQL endpoint cluster ID>;
The Databricks Host, Org ID, and Cluster ID properties are not considered if you configure the SQL Endpoint JDBC URL property.
For more information on Databricks Delta SQL endpoint, contact Informatica Global Customer Support.
Database
The database in Databricks Delta that you want to connect to.
JDBC Driver Class Name
Required. The name of the JDBC driver class.
Cluster Environment
The cloud provider where the Databricks cluster is deployed.
You can select from the following options:
  • AWS
  • Azure
Default is AWS.
The connection attributes differ depending on the cluster environment you select. For more information, see the AWS and Azure cluster properties sections.
Min Workers
The minimum number of worker nodes to be used for the Spark job.
Max Workers
The maximum number of worker nodes to be used for the Spark job.
If you do not want autoscale, set Max Workers = Min Workers or do not set Max Workers.
DB Runtime Version
The Databricks runtime version.
Select 7.3 LTS from the list.
Worker Node Type
Required
. The instance type of the machine used for the Spark worker node.
Driver Node Type
The instance type of the machine used for the Spark driver node. If not provided, the value as in worker node type is used.
Instance Pool ID
The instance pool used for the Spark cluster.
Enable Elastic Disk
Enable this option for the cluster to dynamically acquire additional disk space when the Spark workers are running low on disk space.
Spark Configuration
The Spark configuration to be used in the Databricks cluster.
The configuration must be in the following format:
"key1"="value1";"key2"="value2";....
For example:
"spark.executor.userClassPathFirst"="False"
Spark Environment Variables
The environment variables that you need to export before launching the Spark driver and workers.
The variables must be in the following format:
"key1"="value1";"key2"="value2";....
For example:
"MY_ENVIRONMENT_VARIABLE"="true"

0 COMMENTS

We’d like to hear from you!