Mass Ingestion

Back Next

Amazon S3, Flat File, Google Cloud Storage, and Microsoft Azure Data Lake Storage targets

The following list identifies considerations for using Amazon S3, Flat File, Google Cloud Storage, and Microsoft Azure Data Lake Storage targets:

When you define a database ingestion task that has an Amazon S3, Flat File, Google Cloud Storage, or Microsoft Azure Data Lake Storage target, you can select either CSV or Avro format for the generated output files that contain the source data to be applied to the target.

If you select the

CSV

output format,

Mass Ingestion Databases

creates the following files on the target for each source table:

A schema.ini file that describes the schema and includes some settings for the output file on the target.

One or multiple output files for each source table, which contain the source data.

Mass Ingestion Databases

names these text files based on the name of the source table with an appended date and time.

The schema.ini file lists a sequence of columns for the rows in the corresponding output file. The following table describes the columns in the schema.ini file:

Column	Description
ColNameHeader	Indicates whether the source data files include column headers.
Format	Describes the format of the output files. Mass Ingestion Databases uses a comma (,) to delimit column values.
CharacterSet	Specifies the character set that is used for output files. Mass Ingestion Databases generates the files in the UTF-8 character set.
COL`<sequence_number>`	The name and data type of the column. If you selected any of the Add Operation... properties under Advanced on the Target page of the task wizard, the list of columns includes metadata columns for the operation type, time, owner, or transaction ID. If you selected the Add Before Images check box, for each source column, the job creates a `column_name`_OLD column for UNDO data and `column_name`_NEW column for REDO data.

You should not edit the schema.ini file.

If you select the

Avro

output format, you can select an Avro format type, a file compression type, an Avro data compression type, and the directory that stores the Avro schema definitions generated for each source table. The schema definition files have the following naming pattern: schemaname_tablename.txt.

On Flat File and Microsoft Azure Data Lake Storage targets, Mass Ingestion Databases creates an empty directory for each empty source table. Mass Ingestion Databases does not create empty directories on Amazon S3 and Google Cloud Storage targets.

If you do not specify an access key and secret key in the Amazon S3 connection properties,

Mass Ingestion Databases

tries to find AWS credentials by using the default credential provider chain that is implemented by the DefaultAWSCredentialsProviderChain class. For more information, see the

Amazon Web Services

documentation.

If database ingestion incremental load jobs replicate Update operations that change primary key values on the source to any of these targets that use the CSV output format, the job processes each Update record as two records on the target: a Delete followed by an Insert. The Delete contains the before image. The Insert contains the after image for the same row.

For Update operations that do not change primary key values, database ingestion jobs process each Update as one operation and writes only the after image to the target.

If source tables do not have primary keys,

Mass Ingestion Databases

treats the tables as if all columns were part of the primary key. In this case, each Update operation is processed as a Delete followed by an Insert.

Rename Saved Search

Table of Contents

Mass Ingestion

Mass Ingestion

Amazon S3, Flat File, Google Cloud Storage, and Microsoft Azure Data Lake Storage targets

Amazon S3, Flat File, Google Cloud Storage, and Microsoft Azure Data Lake Storage targets