Hi, I'm Ask INFA!
What would you like to know?
ASK INFAPreview
Please to access Ask INFA.

Table of Contents

Search

  1. Introducing Mass Ingestion
  2. Getting Started with Mass Ingestion
  3. Connectors and Connections
  4. Mass Ingestion Applications
  5. Mass Ingestion Databases
  6. Mass Ingestion Files
  7. Mass Ingestion Streaming
  8. Monitoring Mass Ingestion Jobs
  9. Asset Management
  10. Troubleshooting

Mass Ingestion

Mass Ingestion

Databricks Delta target properties

Databricks Delta target properties

The following table describes the Databricks Delta target properties on the
Target
tab when you define a
streaming ingestion
task:
Property
Description
Connection
Name of the Databricks Delta target connection.
Connection Type
The Databricks Delta connection type.
The connection type populates automatically based on the connection that you select.
Use Existing Cluster
Choose whether you want to use the existing cluster or provision a new cluster.
Choose
True
to use the existing cluster.
If you choose
True
, provide the existing cluster ID.
Retry Attempts
The maximum number of times the Secure Agent retries the REST API calls to Databricks when an error occurs due to network connection or the REST endpoint returns 5xx HTTP error code.
Default is 0.
Retry Delay Interval
The time Interval, in milliseconds, at which the Secure Agent must retry the REST API call when an error occurs due to network connection or the REST endpoint returns 5xx HTTP error code.
Default is 1,000 milliseconds.
Job Status Poll Interval
Poll interval in seconds at which the Secure Agent checks the status of the job completion, in milliseconds.
Staging Location
Relative directory path to store the staging files.
  • If the Databricks cluster is deployed on AWS, use the relative path of the Amazon S3 staging bucket.
  • If the Databricks cluster is deployed on Azure, use the relative path of the Azure Data Lake Store Gen2 staging file system name.
Target Table Name
Name of the Databricks Delta table to append.
The following table describes the Databricks Delta target advanced properties that you can configure on the
Target
tab when you define a
streaming ingestion
task:
Property
Description
Data Location
Relative path to store the data.
If you do not provide a value, a managed table with the table name specified in
Target Table Name
property is created.
Target Database Name
Overrides the database name provided in the Databricks Delta connection in Administrator.
For a Databricks Delta target, the source messages must be only in JSON format.
In a
streaming ingestion
job with Databricks Delta target, when you change the source schema to include additional data columns, Informatica recommends that you redeploy the job to include the change data capture.
When you use a Filter transformation in a
streaming ingestion
task with a Databricks Delta target, ensure that the ingested data conforms to a valid JSON data format. The Filter transformation with JSONPath filter type validates the incoming data. If the incoming data does not conform to a valid JSON data format, the
streaming ingestion
task rejects the data. The rejected data then moves into the configured reject directory. If you do not have a reject directory already configured, the rejected data is lost.
Informatica recommends that you use a Combiner transformation in the
streaming ingestion
task that contains a Databricks Delta target. Add the Combiner transformation before writing to the target. The
streaming ingestion
task then combines all the staged data before writing into the Databricks Delta target.

0 COMMENTS

We’d like to hear from you!