Hi, I'm Ask INFA!
What would you like to know?
ASK INFAPreview
Please to access Bolo.

Table of Contents

Search

  1. Introducing Mass Ingestion
  2. Getting Started with Mass Ingestion
  3. Connectors and Connections
  4. Mass Ingestion Applications
  5. Mass Ingestion Databases
  6. Mass Ingestion Files
  7. Mass Ingestion Streaming
  8. Monitoring Mass Ingestion Jobs
  9. Asset Management
  10. Troubleshooting

Mass Ingestion

Mass Ingestion

Databricks Delta targets

Databricks Delta targets

To use Databricks Delta targets in database ingestion tasks, first prepare the target and review the usage considerations.
Target preparation:
  1. Download the Databricks JDBC driver from the Databricks website.
  2. Copy the Databricks JDBC driver jar file, SparkJDBC42.jar, to the following directory:
    Secure_Agent_installation_directory
    /apps/Database_Ingestion/ext/
  3. On Windows, install Visual C++ Redistributable Packages for Visual Studio 2013 on the computer where the Secure Agent runs.
Usage considerations:
  • For incremental load jobs, you must enable Change Data Capture (CDC) for all source columns.
  • You can access Databricks Delta tables created on top of the following storage types:
    • Microsoft Azure Data Lake Storage (ADLS) Gen2
    • Amazon Web Services (AWS) S3
    The Databricks Delta connection uses a JDBC URL to connect to the Databricks cluster. When you configure the target, specify the JDBC URL and credentials to use for connecting to the cluster. Also define the connection information that the target uses to connect to the staging location in Amazon S3 or ADLS Gen2.
  • Before writing data to Databricks Delta target tables, database ingestion jobs stage the data in an Amazon S3 bucket or ADLS directory. You must specify the directory for the data when you configure the database ingestion task.
    Mass Ingestion Databases
    does not use the
    ADLS Staging Filesystem Name
    and
    S3 Staging Bucket
    properties in the Databricks Delta connection properties to determine the directory.
  • Mass Ingestion Databases
    uses jobs that run once to load data from staging files on AWS S3 or ADLS Gen2 to external tables.
    By default,
    Mass Ingestion Databases
    runs jobs on the cluster that is specified in the Databricks Delta connection properties. If you want to run the jobs on another cluster, set the dbDeltaUseExistingCluster custom property to false on the
    Target
    page in the database ingestion task wizard.
  • By default, Mass Ingestion Databases uses the Databricks Delta COPY INTO feature to load data from the staging file to Databricks Delta target tables. You can disable it for all load types by setting the writerDatabricksUseSqlLoad custom property to false on the
    Target
    page in the database ingestion task wizard.
  • If you use an AWS cluster, you must specify the
    S3 Service Regional Endpoint
    value in the Databricks Delta connection properties.
  • If you use Databricks Delta SQL endpoint to load data, you must specify the JDBC URL in the
    SQL Endpoint JDBC URL
    field in the Databricks Delta connection properties.

0 COMMENTS

We’d like to hear from you!