Data Ingestion and Replication
- Data Ingestion and Replication
- All Products
Property
| Description
|
---|---|
Ingestion Method
| The ingestion method to load data to Microsoft Azure Synapse SQL.
Select one of the following options:
|
Command Type
| The command type for the ingestion method.
Select one of the following options:
|
Property
| Description
|
---|---|
Target Table Name
| Name of the table in Microsoft Azure Synapse SQL to which the files are loaded.
|
Add Parameters
| Create an expression to add it as
Target Table Name and
Schema parameters. For more information, see
Add Parameters.
|
Schema
| The Microsoft Azure Synapse SQL schema name.
|
Truncate Target Table
| Truncate the target table before loading.
|
Pre SQL
| SQL command to run before the
file ingestion task runs the PolyBase or Copy command.
|
Post SQL
| SQL command to run after the
file ingestion task runs the PolyBase or Copy command.
|
Field Delimiter
| Character used to separate fields in the file. Default is 0x1e. You can select the following field delimiters from the list:
~ ` | . TAB 0x1e
|
Quote Character
| Specifies the quote character to skip when you write data to Microsoft Azure Synapse SQL. When you write data to Microsoft Azure Synapse SQL and the source table contains the specified quote character, the task fails. Change the quote character value to a value that does not exist in the source table.
|
External Stage*
| Specifies the external stage directory to use for loading files into Microsoft Azure Synapse SQL. You can stage the files in Microsoft Azure Blob Storage or Microsoft Azure Data Lake Storage Gen2.
|
File Compression*
| Determines whether or not files are compressed before they are transferred to the target directory.
The following options are available:
|
Number of Concurrent Connections*
| Number of concurrent connections to extract data from the Microsoft Azure Blob Storage or Microsoft Azure Data Lake Storage Gen2. When reading a large file or object, you can spawn multiple threads to process data. Configure
Blob Part Size or
Block Size to divide a large file into smaller parts.
Default is 4. Maximum is 10.
|
*Not applicable when you read data from Microsoft Azure Blob Storage or Microsoft Azure Data Lake Storage Gen2.
|
Property
| Description
|
---|---|
File Format Definition
| Applies to Polybase ingestion method.
Transact-SQL CREATE EXTERNAL FILE FORMAT statement. For example:
CREATE EXTERNAL FILE FORMAT {{fileFormatName}} WITH ( FORMAT_TYPE = DELIMITEDTEXT, FORMAT_OPTIONS (FIELD_TERMINATOR = ',', STRING_DELIMITER = '"') )
The following is an example to create an external file in parquet format:
CREATE EXTERNAL FILE FORMAT {{fileFormatName}} WITH (FORMAT_TYPE = PARQUET)
Similarly, you can create an external file in JSON, Avro, and ORC formats.
For more information about the CREATE EXTERNAL FILE FORMAT statement, see the Microsoft documentation.
|
External Table Definition
| Applies to Polybase ingestion method.
Transact-SQL CREATE EXTERNAL TABLE statement. For example:
CREATE EXTERNAL TABLE {{externalTable}} ( id INT, name NVARCHAR ( 100 ) ) WITH (LOCATION = '{{blobLocation}}', DATA_SOURCE = {{dataSourceName}}, FILE_FORMAT = {{fileFormatName}})
The following is an example to create an external table in parquet format:
CREATE EXTERNAL TABLE {{externalTable}} (username VARCHAR(100),number int,colour VARCHAR(100))WITH (LOCATION='{{blobLocation}}',DATA_SOURCE={{dataSourceName}},FILE_FORMAT={{fileFormatName}})
Similarly, you can create an external table in JSON, Avro, and ORC formats.
For more information about the CREATE EXTERNAL TABLE statement, see the Microsoft documentation.
|
Insert SQL Definition
| Applies to Polybase ingestion method.
Transact-SQL INSERT statement. For example:
INSERT INTO schema.table (id, name) SELECT id+5, name FROM {{externalTable}}
The following is an example for defining insert SQL in parquet format:
INSERT INTO testing.test_parq(username,number,colour) SELECT username, number,colour FROM {{externalTable}};
Similarly, you can define insert SQL in JSON, Avro, and ORC formats.
For information about the INSERT statement, see the Microsoft documentation.
|
Copy Command Definition
| Applies to COPY Command ingestion method.
Transact-SQL COPY INTO statement. For example:
COPY INTO schema.table FROM EXTERNALLOCATION WITH(CREDENTIAL = (AZURECREDENTIALS), FIELDTERMINATOR = ',', FIELDQUOTE = '')
The following is an example for defining COPY Command in parquet format:
COPY INTO testing.test_parq FROM EXTERNALLOCATION WITH(CREDENTIAL = (AZURECREDENTIALS), FILE_TYPE = 'PARQUET')
Similarly, you can define COPY Command in JSON, Avro, and ORC formats.
For more information about the COPY INTO statement, see the Microsoft documentation.
|
Pre SQL
| SQL command to run before the
file ingestion task runs the PolyBase command.
|
Post SQL
| SQL command to run after the
file ingestion task runs the PolyBase command.
|
External Stage*
| Specifies the external stage directory to use for loading files into Microsoft Azure Synapse SQL. You can stage the files in Microsoft Azure Blob Storage or Microsoft Azure Data Lake Storage Gen2.
|
Number of Concurrent Connections*
| Number of concurrent connections to extract data from the Microsoft Azure Blob Storage or Microsoft Azure Data Lake Storage Gen2. When reading a large file or object, you can spawn multiple threads to process data. Configure
Blob Part Size or
Block Size to divide a large file into smaller parts.
Default is 4. Maximum is 10.
|
*Not applicable when you read data from Microsoft Azure Blob Storage or Microsoft Azure Data Lake Storage Gen2.
|