Hi, I'm Ask INFA!
What would you like to know?
ASK INFAPreview
Please to access Bolo.

Table of Contents

Search

  1. Introducing Mass Ingestion
  2. Getting Started with Mass Ingestion
  3. Connectors and Connections
  4. Mass Ingestion Applications
  5. Mass Ingestion Databases
  6. Mass Ingestion Files
  7. Mass Ingestion Streaming
  8. Monitoring Mass Ingestion Jobs
  9. Asset Management
  10. Troubleshooting

Mass Ingestion

Mass Ingestion

Microsoft Azure Synapse SQL target properties

Microsoft Azure Synapse SQL target properties

When you define a file ingestion task with a Microsoft Azure Synapse SQL target, you must enter target options on the
Target
tab of the task wizard.
The following table describes the target options:
Property
Description
Ingestion Method
The ingestion method to load data to Microsoft Azure Synapse SQL.
Select one of the following options:
  • Polybase
  • COPY Command
Command Type
The command type for the ingestion method.
Select one of the following options:
  • Auto Generated
    . Select this option to define the command properties.
  • Custom
    . Select this option to provide a custom command that the
    file ingestion
    task uses.
The following table describes the Microsoft Azure Synapse SQL advanced target options when you select
Polybase
or
COPY Command
ingestion method and
Auto Generated
command type:
The
Auto Generated
command type is applicable only for files in text and CSV formats.
Property
Description
Target Table Name
Name of the table in Microsoft Azure Synapse SQL to which the files are loaded.
Add Parameters
Create an expression to add it as
Target Table Name
and
Schema
parameters. For more information, see Add Parameters.
Schema
The Microsoft Azure Synapse SQL schema name.
Truncate Target Table
Truncate the target table before loading.
Pre SQL
SQL command to run before the
file ingestion
task runs the PolyBase or Copy command.
Post SQL
SQL command to run after the
file ingestion
task runs the PolyBase or Copy command.
Field Delimiter
Character used to separate fields in the file. Default is 0x1e. You can select the following field delimiters from the list:
~ ` | . TAB 0x1e
Quote Character
Specifies the quote character to skip when you write data to Microsoft Azure Synapse SQL. When you write data to Microsoft Azure Synapse SQL and the source table contains the specified quote character, the task fails. Change the quote character value to a value that does not exist in the source table.
External Stage*
Specifies the external stage directory to use for loading files into Microsoft Azure Synapse SQL. You can stage the files in Microsoft Azure Blob Storage or Microsoft Azure Data Lake Storage Gen2.
File Compression*
Determines whether or not files are compressed before they are transferred to the target directory.
The following options are available:
  • None
    . Files are not compressed.
  • GZIP
    . Files are compressed using GZIP compression.
Number of Concurrent Connections*
Number of concurrent connections to extract data from the Microsoft Azure Blob Storage or Microsoft Azure Data Lake Storage Gen2. When reading a large file or object, you can spawn multiple threads to process data. Configure
Blob Part Size
or
Block Size
to divide a large file into smaller parts.
Default is 4. Maximum is 10.
*Not applicable when you read data from Microsoft Azure Blob Storage or Microsoft Azure Data Lake Storage Gen2.
The following table describes the Microsoft Azure Synapse SQL advanced target properties when you select
Polybase
or
COPY Command
ingestion method and
Custom
command type:
Property
Description
File Format Definition
Applies to Polybase ingestion method.
Transact-SQL CREATE EXTERNAL FILE FORMAT statement. For example:
CREATE EXTERNAL FILE FORMAT {{fileFormatName}} WITH ( FORMAT_TYPE = DELIMITEDTEXT, FORMAT_OPTIONS (FIELD_TERMINATOR = ',', STRING_DELIMITER = '"') )
The following is an example to create an external file in parquet format:
CREATE EXTERNAL FILE FORMAT {{fileFormatName}} WITH (FORMAT_TYPE = PARQUET)
Similarly, you can create an external file in JSON, Avro, and ORC formats.
For more information about the CREATE EXTERNAL FILE FORMAT statement, see the Microsoft documentation.
External Table Definition
Applies to Polybase ingestion method.
Transact-SQL CREATE EXTERNAL TABLE statement. For example:
CREATE EXTERNAL TABLE {{externalTable}} ( id INT, name NVARCHAR ( 100 ) ) WITH (LOCATION = '{{blobLocation}}', DATA_SOURCE = {{dataSourceName}}, FILE_FORMAT = {{fileFormatName}})
The following is an example to create an external table in parquet format:
CREATE EXTERNAL TABLE {{externalTable}} (username VARCHAR(100),number int,colour VARCHAR(100))WITH (LOCATION='{{blobLocation}}',DATA_SOURCE={{dataSourceName}},FILE_FORMAT={{fileFormatName}})
Similarly, you can create an external table in JSON, Avro, and ORC formats.
For more information about the CREATE EXTERNAL TABLE statement, see the Microsoft documentation.
Insert SQL Definition
Applies to Polybase ingestion method.
Transact-SQL INSERT statement. For example:
INSERT INTO schema.table (id, name) SELECT id+5, name FROM {{externalTable}}
The following is an example for defining insert SQL in parquet format:
INSERT INTO testing.test_parq(username,number,colour) SELECT username, number,colour FROM {{externalTable}};
Similarly, you can define insert SQL in JSON, Avro, and ORC formats.
For information about the INSERT statement, see the Microsoft documentation.
Copy Command Definition
Applies to COPY Command ingestion method.
Transact-SQL COPY INTO statement. For example:
COPY INTO schema.table FROM EXTERNALLOCATION WITH(CREDENTIAL = (AZURECREDENTIALS), FIELDTERMINATOR = ',', FIELDQUOTE = '')
The following is an example for defining COPY Command in parquet format:
COPY INTO testing.test_parq FROM EXTERNALLOCATION WITH(CREDENTIAL = (AZURECREDENTIALS), FILE_TYPE = 'PARQUET')
Similarly, you can define COPY Command in JSON, Avro, and ORC formats.
For more information about the COPY INTO statement, see the Microsoft documentation.
Pre SQL
SQL command to run before the
file ingestion
task runs the PolyBase command.
Post SQL
SQL command to run after the
file ingestion
task runs the PolyBase command.
External Stage*
Specifies the external stage directory to use for loading files into Microsoft Azure Synapse SQL. You can stage the files in Microsoft Azure Blob Storage or Microsoft Azure Data Lake Storage Gen2.
Number of Concurrent Connections*
Number of concurrent connections to extract data from the Microsoft Azure Blob Storage or Microsoft Azure Data Lake Storage Gen2. When reading a large file or object, you can spawn multiple threads to process data. Configure
Blob Part Size
or
Block Size
to divide a large file into smaller parts.
Default is 4. Maximum is 10.
*Not applicable when you read data from Microsoft Azure Blob Storage or Microsoft Azure Data Lake Storage Gen2.

0 COMMENTS

We’d like to hear from you!