Mass Ingestion

Back Next

Configuring schedule and runtime options

On the

Schedule and Runtime Options

page of the

database ingestion

task wizard, you can specify a schedule for running initial load jobs periodically and configure runtime options for jobs of any load type.

Under

Advanced

, optionally edit the

Number of Rows in Output File

value to specify the maximum number of rows that the

database ingestion

task writes to an output data file for a Flat File, Amazon Redshift, Amazon S3, Microsoft Azure Data Lake Storage, Microsoft Azure Synapse Analytics, or Snowflake target.

Advanced options are not displayed for incremental load tasks that have an Apache Kafka target.

For incremental load operations and combined initial and incremental load operations, change data is flushed to the target either when this number of rows is reached or when the flush latency period expires and the job is not
in the middle of processing a transaction. The flush latency period is the time that the job waits for more change data before flushing data to the target. The latency period is internally set to 10 seconds and cannot be changed.

Valid values are 1 through 100000000. The default value for Amazon S3 and Microsoft Azure Data Lake Storage Gen2 targets is 1000 rows. For the other targets, the default value is 100000 rows.

For Microsoft Azure Synapse Analytics targets, the data is first sent to a Microsoft Azure Data Lake Storage staging file before being written to the target tables. After data is written to the target, the entire contents of the table-specific directory that includes the staging files are emptied. For Snowflake targets, the data is first stored in an internal stage area before being written to the target tables.

For initial load jobs only, optionally clear the

File extension based on file type

check box if you want the output data files for Flat File, Amazon S3, and Microsoft Azure Data Lake Storage targets to have the .dat extension. This check box is selected by default, which causes the output files to have file-name extensions based on their file types.

For incremental load jobs with these target types, this option is not available. Mass Ingestion Databases always uses output file-name extensions based on file type.

For database ingestion incremental load tasks that have Amazon S3 or Microsoft Azure Data Lake Storage Gen2 targets, configure the following apply cycle options:

Option	Description
Apply Cycle Interval	Specifies the amount of time that must elapse before a database ingestion job ends an apply cycle. You can specify days, hours, minutes, and seconds or specify values for a subset of these time fields leaving the other fields blank. The default value is 15 minutes.
Apply Cycle Change Limit	Specifies the number of records that must be processed before a database ingestion job ends an apply cycle. When this record limit is reached, the database ingestion job ends the apply cycle and writes the change data to the target. The default value is 10000 records.
Low Activity Flush Interval	Specifies the amount of time, in hours, minutes, or both, that must elapse during a period of no change activity on the source before a database ingestion job ends an apply cycle. When this time limit is reached, the database ingestion job ends the apply cycle and writes the change data to the target. If you do not specify a value for this option, a database ingestion job ends apply cycles only after either the Apply Cycle Change Limit or Apply Cycle Interval limit is reached. No default value is provided.

Either the

Apply Cycle Interval

Apply Cycle Change Limit

field must have a non-zero value or use the default value.

An apply cycle ends when the job reaches any of the three limits, whichever limit is met first.

Under

Schema Drift Options

, if the detection of schema drift is supported for your source and target combination, specify the schema drift option to use for each of the supported types of DDL operations.

Schema drift options are supported for database ingestion incremental load tasks that propagate change data from Microsoft SQL Server, Oracle, or PostgreSQL sources to Amazon Redshift, Amazon S3, Databricks Delta, Google BigQuery, Google Cloud Storage, Kafka, Microsoft Azure Data Lake Storage, Microsoft Azure Synapse Analytics, or Snowflake targets. Schema drift options are also supported for database ingestion combined initial and incremental load tasks with the same source types and with Amazon Redshift, Databricks Delta, Google BigQuery, Kafka, Microsoft Azure Synapse Analytics, or Snowflake targets.

The types of supported DDL operations are:

Add Column

Modify Column

Drop Column

Rename Column

The Modify Column and Rename Column options are not supported and not displayed for database ingestion jobs that have Google BigQuery targets.

The following table describes the schema drift options that you can set for a DDL operation type:

Option	Description
Ignore	Does not replicate DDL changes that occur on the source database to the target. For Amazon Redshift, Kafka, Microsoft Azure Synapse Analytics, or Snowflake targets, this option is the default option for the Drop Column and Rename Column operation types. For Amazon S3, Google Cloud Storage, and Microsoft Azure Data Lake Storage targets that use the CSV output format, the Ignore option is disabled. For the AVRO output format, this option is enabled.
Replicate	Allows the database ingestion job to replicate the DDL change to the target. For Amazon S3, Google Cloud Storage, and Microsoft Azure Data Lake Storage targets, this option is the default option for all operation types. For other targets, this option is the default option for the Add Column and Modify Column operation types. If you try to replicate a type of schema change that is not supported on the target, database ingestion jobs associated with the task will end with an error. For example, if you select Replicate for Rename Column operations on Microsoft Azure Synapse Analytics targets, the jobs will end. Add Column operations that add a primary-key column are not supported and can cause unpredictable results. For Databricks Delta targets, the Replicate option is not available for Drop Column. Modify Column operations that change the NULL or NOT NULL constraint for a column are not replicated to the target by design because changing the nullibility of a target column can cause problems when subsequent changes are applied.
Stop Job	Stops the entire database ingestion job.
Stop Table	Stops processing the source table on which the DDL change occurred. When one or more of the tables are excluded from replication because of the Stop Table schema drift option, the job state changes to Running with Warning . The database ingestion job cannot retrieve the data changes that occurred on the source table after the job stopped processing it. Consequently, data loss might occur on the target. To avoid data loss, you will need to resynchronize the source and target objects that the job stopped processing. Use the Resume With Options Resync option. For more information, see Overriding schema drift options when resuming a database ingestion job.

For incremental load jobs that have an Apache Kafka target, configure the following checkpointing options:

Option	Description
Checkpoint All Rows	Indicates whether a database ingestion job performs checkpoint processing for every message that is sent to the Kafka target. If this check box is selected, the Checkpoint Every Commit , Checkpoint Row Count , and Checkpoint Frequency (secs) options are ignored.
Checkpoint Every Commit	Indicates whether a database ingestion job performs checkpoint processing for every commit that occurs on the source.
Checkpoint Row Count	Specifies the maximum number of messages that a database ingestion job sends to the target before adding a checkpoint. If you set this option to 0, a database ingestion job does not perform checkpoint processing based on the number of messages. If you set this option to 1, a database ingestion jobs add a checkpoint for each message.
Checkpoint Frequency (secs)	Specifies the maximum number of seconds that must elapse before a database ingestion job adds a checkpoint. If you set this option to 0, a database ingestion job does not perform checkpoint processing based on elapsed time.

Under

Schedule

, if you want to run job instances for an initial load task based on an existing schedule instead of manually starting the job after it is deployed from one of the monitoring interfaces, select

Run this task based on a schedule

and then select a predefined schedule. The default option is

Do not run ths task based on a schedule

This field is unavailable for incremental load and combined initial and incremental load tasks.

You can view and edit the schedule options in Administrator. If you edit the schedule, the changes will apply to all jobs that use the schedule. If you edit the schedule after deploying the task, you do not need to redeploy the task.

If the schedule criteria for running the job is met but the previous job run is still active,

Mass Ingestion Databases

skips the new job run.

Under

Custom Properties

, you can specify custom properties that Informatica provides to meet your special requirements. To add a property, in the

Create Property

fields, enter the property name and value. Then click

Add Property

Specify these properties only at the direction of Informatica Global Customer Support. Usually, these properties address unique environments or special processing needs. You can specify multiple properties, if necessary. A property name can contain only alphanumeric characters and the following special characters: periods (.), hyphens (-), and underscores (_).