Hi, I'm Ask INFA!
What would you like to know?
ASK INFAPreview
Please to access Bolo.

Table of Contents

Search

  1. Introducing Mass Ingestion
  2. Getting Started with Mass Ingestion
  3. Connectors and Connections
  4. Mass Ingestion Applications
  5. Mass Ingestion Databases
  6. Mass Ingestion Files
  7. Mass Ingestion Streaming
  8. Monitoring Mass Ingestion Jobs
  9. Asset Management
  10. Troubleshooting

Mass Ingestion

Mass Ingestion

File format and copy options

File format and copy options

When you configure a file ingestion task to transfer a large number of files to Snowflake Data Cloud, specify the copy option and the file format to load the data.
Select a Snowflake Data Cloud connection in a file ingestion task and then specify the copy option and the file format in the target options to determine how to load the files to a Snowflake Data Cloud target table.
The copy option specifies the action that the task performs when an error is encountered while loading data from a file.
You can specify the following copy option to abort the COPY statement if any error is encountered:
ON_ERROR = ABORT_STATEMENT
The file ingestion task for Snowflake Data Cloud is certified for only the ABORT_STATEMENT for ON_ERROR copy option.
When you load files, you can specify the file format and define the rules for the data files. The task uses the specified file format and rules while bulk loading data into Snowflake Data Cloud tables.
The following list describes some of the format type options:
  • RECORD_DELIMITER = '<character>' | NONE
    . Single character string that separates records in an input file.
  • FIELD_DELIMITER = '<character>' | NONE
    . Specifies the single character string that separates records in an input file.
  • FILE_EXTENSION = '<string>' | NONE
    . Specifies the extension for files unloaded to a stage.
  • SKIP_HEADER = <integer>
    . Number of lines at the start of the file to skip.
  • DATE_FORMAT = '<string>' | AUTO.
    Defines the format of date values in the data files or table.
  • TIME_FORMAT = '<string>' | AUTO.
    Defines the format of time values in the data files or table.
  • TIMESTAMP_FORMAT = <string>' | AUTO.
    Defines the format of timestamp values in the data files or table.

Example of File format and copy options for loading files to Snowflake

You want to create a CSV file format and define the following rules to load files to Snowflake:
  • Delimit the fields using the pipe character ( |).
  • Files include a single header line that will be skipped.
Specify the following file format:
file_format = (type = csv field_delimiter = '|' skip_header = 1)
You can specify both the copy options and file format by using the following character:
&&
For example,
file_format = (type = csv field_delimiter = ',' skip_header = 2)&&on_error=ABORT_STATEMENT
Similarly, use the following file format in the
File Format and Copy Options
field to load data into separate columns:
  • For JSON:
    on_error='ABORT_STATEMENT'&&file_format = (type = json)&&MATCH_BY_COLUMN_NAME=CASE_INSENSITIVE
  • For AVRO:
    on_error='ABORT_STATEMENT'&&file_format = (type = avro)&&MATCH_BY_COLUMN_NAME=CASE_INSENSITIVE
  • For ORC:
    on_error='ABORT_STATEMENT'&&file_format = (type = orc)&&MATCH_BY_COLUMN_NAME=CASE_INSENSITIVE
  • For PARQUET:
    on_error='ABORT_STATEMENT'&&file_format = (type = parquet)&&MATCH_BY_COLUMN_NAME=CASE_INSENSITIVE
The string MATCH_BY_COLUMN_NAME specifies whether to load the semi-structured data into the columns in the target table that match the corresponding columns represented in the data. CASE_SENSITIVE, CASE_INSENSITIVE, and NONE are the supported options. Default is NONE.
Consider the following criteria for a column to match between the data and table:
  • The column represented in the data must have the same name as the column in the table. The column names are either case-sensitive (CASE_SENSITIVE ) or case-insensitive (CASE_INSENSITIVE).
  • The column can be in any order.
  • The column in the table must have a data type that is compatible with the values in the column represented in the data. For example, string, number, and Boolean values can be loaded into a variant column.
For more information about the various file formats that you can specify and the copy option, see the Snowflake Data Cloud documentation at the following website: https://docs.snowflake.net/manuals/sql-reference/sql/copy-into-table.html#copy-options-copyoptions

0 COMMENTS

We’d like to hear from you!