Data Ingestion and Replication
- Data Ingestion and Replication
- All Products
Property
| Description
|
---|---|
Output Format
| Select the format of the output file. Options are:
The default value is
CSV .
Output files in CSV format use double-quotation marks ("") as the delimiter for each field.
|
Add Headers to CSV File
| If
CSV is selected as the output format, select this check box to add a header with source column names to the output CSV file.
|
Avro Format
| If you selected
AVRO as the output format, select the format of the Avro schema that will be created for each source table. Options are:
The default value is
Avro-Flat .
|
Avro Serialization Format
| If
AVRO is selected as the output format, select the serialization format of the Avro output file. Options are:
The default value is
Binary .
|
Avro Schema Directory
| If
AVRO is selected as the output format, specify the local directory where
Mass Ingestion Databases stores Avro schema definitions for each source table. Schema definition files have the following naming pattern:
If this directory is not specified, no Avro schema definition file is produced.
|
File Compression Type
| Select a file compression type for output files in CSV or AVRO output format. Options are:
The default value is
None , which means no compression is used.
|
Avro Compression Type
| If
AVRO is selected as the output format, select an Avro compression type. Options are:
The default value is
None , which means no compression is used.
|
Deflate Compression Level
| If
Deflate is selected in the
Avro Compression Type field, specify a compression level from 0 to 9. The default value is 0.
|
Data Directory
| For initial load tasks, define a directory structure for the directories where Mass Ingestion Databases stores output data files and optionally stores the schema. To define directory pattern, you can use the following types of entries:
Placeholder values are not case sensitive.
Examples:
The default directory pattern is
{TableName)_{Timestamp} .
For Amazon S3, Flat File, and Microsoft Azure Data Lake Storage Gen2 targets, Mass Ingestion Databases uses the directory specified in the target connection properties as the root for the data directory path when
Connection Directory as Parent is selected. For Google Cloud Storage targets, Mass Ingestion Databases uses the
Bucket name that you specify in the target properties for the ingestion task.
|
Connection Directory as Parent
| For initial load tasks, select this check box to use the directory value that is specified in the target connection properties as the parent directory for the custom directory paths specified in the task target properties. The parent directory is used in the
Data Directory and
Schema Directory .
|
Schema Directory
| For initial load tasks, you can specify a custom directory in which to store the schema file if you want to store it in a directory other than the default directory. This field is optional.
The schema is stored in the data directory by default. For incremental loads, the default directory for the schema file is
{TaskTargetDirectory}/data/{TableName}/schema .
You can use the same placeholders as for the
Data Directory field. Ensure the placeholders are enclosed in curly brackets { }.
|
Field
| Description
|
---|---|
Add Operation Type
| Select this check box to add a metadata column that includes the source SQL operation type in the output that the job propagates to the target.
For incremental loads, the job writes "I" for insert, "U" for update, or "D" for delete. For initial loads, the job always writes "I" for insert.
By default, this check box is cleared.
|
Add Operation Time
| Select this check box to add a metadata column that includes the source SQL operation time in the output that the job propagates to the target.
For initial loads, the job always writes the current date and time.
By default, this check box is cleared.
|
Add Operation Owner
| Select this check box to add a metadata column that includes the owner of the source SQL operation in the output that the job propagates to the target.
For initial loads, the job always writes "INFA" as the owner.
By default, this check box is cleared.
This property is not available for jobs that have a PostgreSQL source.
|
Add Operation Transaction Id
| Select this check box to add a metadata column that includes the source transaction ID in the output that the job propagates to the target for SQL operations.
For initial loads, the job always writes "1" as the ID.
By default, this check box is cleared.
|
Add Before Images
| Select this check box to include UNDO data in the output that an incremental load job writes to the target.
For initial loads, the job writes nulls.
By default, this check box is cleared.
|