Databricks Delta Connector

Back Next

Rules and guidelines for pushdown optimization

Use the following rules and guidelines when you enable a mapping for pushdown optimization to a Databricks Delta database:

Mapping with Databricks Delta source and target

Use the following rules and guidelines when you configure pushdown optimization in a mapping that reads from and writes to Databricks Delta:

LAST function is a non-deterministic function. This function returns different results each time it is called, even when you provide the same input values.

When you configure a Filter transformation or specify a filter condition, do not specify special characters.

When you connect to Databricks clusters to process the mapping and define a custom query with multiple tables in the SELECT statement, the mapping displays incorrect data for fields that have the same name. This doesn't apply to Databricks runtime version 9.1 LTA or later.

When you configure a mapping enabled for full pushdown optimization to read from multiple sources and you override the database name and table name from the advanced properties, the mapping fails.

To configure a Filter transformation or specify a filter condition on columns of date or timestamp in a Databricks Delta table, you must pass the data through the TO_DATE() function as an expression in the filter condition.

When you specify custom query as a source object, ensure that the SQL query does not contain any partitioning hints such as COALESCE, REPARTITION, or REPARTITION_BY_RANGE.

When you configure a mapping enabled for full pushdown optimization on the Databricks Delta SQL engine, you cannot configure single commit to write to multiple targets.

When you configure a mapping enabled for full pushdown optimization on the Databricks Delta SQL engine and push the data to the Databricks Delta target, ensure that you map all the fields in target. Else, the mapping fails.

When you create a new target at runtime, you must not specify a database name and table name in the

Target Database Name

and

Target Table Name

in the target advanced properties.

When you read data from a column of Date data type and write data into a column of Date data type, the pushdown query pushes the column of Date data type and casts the column to Timestamp data type.

You cannot completely parameterize a multi-line custom query using a parameter file. If you specify a multi-line custom query in a parameter file, the mapping considers only the first line of the multi-line query.

When you push the GREATEST() function to Databricks Delta and configure input value arguments of String data type, you must not specify the caseFlag argument.

To push the TO_CHAR(DATE) function to Databricks Delta, use the following string and format arguments:

YYYY

MON

MONTH

DDD

DAY

HH12

HH24

SS.MS

SS.US

SS.NS

To push the TO_DATE(string, format) function to Databricks Delta, you must use the following format arguments:

YYYY

MON

MONTH

DDD

HH12

HH24

SS.MS

SS.US

SS.NS

When you enable full pushdown optimization in a mapping and use the IFF() condition in an Expression transformation, the mapping fails for the following functions:

IS_SPACES

IS_NUMBER

IS_DATE

A mapping enabled with full pushdown optimization and contains an SQL transformation fails when the column names in the SQL override query don't match with the column names in the custom query.

Mapping with Amazon S3 source and Databricks Delta target

Use the following rules and guidelines when you configure pushdown optimization in a mapping that reads from an Amazon S3 source and writes to a Databricks Delta target:

When you select the source type as directory in the advanced source properties, ensure that all the files in the directory contain the same schema.

When you select query as the source type in lookup, you cannot override the database name and table name in the advanced source properties.

When you include a source transformation in a mapping enabled with pushdown optimization, exclude the FileName field from the source. The FileName field is not applicable.

When you parameterize a lookup object in a mapping enabled with pushdown optimization, the mapping fails as you cannot exclude the filename port at runtime.

When you parameterize the source object in a mapping task, ensure that you pass the source object parameter value with the fully qualified path in the parameter file.

You cannot use wildcard characters for the source file name and directory name in the source transformation.

You cannot use wildcard characters for the folder path or file name in the advanced source properties.

When you read from a partition folder that has a transaction log file, select the source type as Directory in the advanced source properties.

You cannot configure dynamic lookup cache.

When you use a Joiner transformation in a mapping enabled with pushdown optimization and create a new target at runtime, ensure that the fields do not have a not null constraint.

Ensure that the field names in Parquet, ORC, AVRO, or JSON files do not contain Unicode characters.

Mapping with Azure Data Lake Storage Gen2 source and Databricks Delta target

Use the following rules and guidelines when you configure pushdown optimization in a mapping that reads from a Azure Data Lake Storage Gen2 source and writes to a Databricks Delta target:

Mappings fail if the lookup object contains unsupported data types.

When you select the source type as directory in the advanced source property, ensure that all the files in the directory contain the same schema.

When you select query as the source type in lookup, you cannot override the database name and table name in the advanced source properties.

When you include a source transformation in a mapping enabled with pushdown optimization, exclude the FileName field from the source. The FileName field is not applicable.

When you parameterize a lookup object in a mapping enabled with pushdown optimization, the mapping fails as you cannot exclude the filename port at runtime.

When you parameterize the source object in a mapping task, ensure that you pass the source object parameter value with the fully qualified path in the parameter file.

You cannot use wildcard characters for the source file name and directory name in the source transformation.

When you read from a partition folder that has a transaction log file, select the source type as Directory in the advanced source properties.

You cannot configure dynamic lookup cache.

When you use a Joiner transformation in a mapping enabled with pushdown optimization and create a new target at runtime, ensure that the fields do not have a not null constraint.

Ensure that the field names in Parquet, ORC, AVRO, or JSON files do not contain Unicode characters.

Cross workspace mappings

When you set up a mapping enabled with full pushdown optimization to access data from a Databricks Delta workspace, and the associated metastore resides in a separate workspace, the mapping runs without pushdown optimization.

Rename Saved Search

Table of Contents

Databricks Delta Connector

Databricks Delta Connector

Rules and guidelines for pushdown optimization

Rules and guidelines for pushdown optimization

Mapping with Databricks Delta source and target

Mapping with Amazon S3 source and Databricks Delta target

Mapping with Azure Data Lake Storage Gen2 source and Databricks Delta target

Cross workspace mappings