Hi, I'm Ask INFA!
What would you like to know?
ASK INFAPreview
Please to access Bolo.

Table of Contents

Search

  1. Introducing Mass Ingestion
  2. Getting Started with Mass Ingestion
  3. Connectors and Connections
  4. Mass Ingestion Applications
  5. Mass Ingestion Databases
  6. Mass Ingestion Files
  7. Mass Ingestion Streaming
  8. Monitoring Mass Ingestion Jobs
  9. Asset Management
  10. Troubleshooting

Mass Ingestion

Mass Ingestion

Configuring schedule and runtime options

Configuring schedule and runtime options

On the
Schedule and Runtime Options
page of the
database ingestion
task wizard, you can specify a schedule for running initial load jobs periodically and configure runtime options for jobs of any load type.
  1. Under
    Advanced
    , optionally edit the
    Number of Rows in Output File
    value to specify the maximum number of rows that the
    database ingestion
    task writes to an output data file for a Flat File, Amazon Redshift, Amazon S3, Microsoft Azure Data Lake Storage, Microsoft Azure Synapse Analytics, or Snowflake target.
    Advanced options are not displayed for incremental load tasks that have an Apache Kafka target.
    For incremental load operations and combined initial and incremental load operations, change data is flushed to the target either when this number of rows is reached or when the flush latency period expires and the job is
    not
    in the middle of processing a transaction. The flush latency period is the time that the job waits for more change data before flushing data to the target. The latency period is internally set to 10 seconds and cannot be changed.
    Valid values are 1 through 100000000. The default value for Amazon S3 and Microsoft Azure Data Lake Storage Gen2 targets is 1000 rows. For the other targets, the default value is 100000 rows.
    For Microsoft Azure Synapse Analytics targets, the data is first sent to a Microsoft Azure Data Lake Storage staging file before being written to the target tables. After data is written to the target, the entire contents of the table-specific directory that includes the staging files are emptied. For Snowflake targets, the data is first stored in an internal stage area before being written to the target tables.
  2. For initial load jobs only, optionally clear the
    File extension based on file type
    check box if you want the output data files for Flat File, Amazon S3, and Microsoft Azure Data Lake Storage targets to have the .dat extension. This check box is selected by default, which causes the output files to have file-name extensions based on their file types.
    For incremental load jobs with these target types, this option is not available. Mass Ingestion Databases always uses output file-name extensions based on file type.
  3. For database ingestion incremental load tasks that have Amazon S3 or Microsoft Azure Data Lake Storage Gen2 targets, configure the following apply cycle options:
    Option
    Description
    Apply Cycle Interval
    Specifies the amount of time that must elapse before a database ingestion job ends an apply cycle. You can specify days, hours, minutes, and seconds or specify values for a subset of these time fields leaving the other fields blank.
    The default value is 15 minutes.
    Apply Cycle Change Limit
    Specifies the number of records that must be processed before a database ingestion job ends an apply cycle. When this record limit is reached, the database ingestion job ends the apply cycle and writes the change data to the target.
    The default value is 10000 records.
    Low Activity Flush Interval
    Specifies the amount of time, in hours, minutes, or both, that must elapse during a period of no change activity on the source before a database ingestion job ends an apply cycle. When this time limit is reached, the database ingestion job ends the apply cycle and writes the change data to the target.
    If you do not specify a value for this option, a database ingestion job ends apply cycles only after either the
    Apply Cycle Change Limit
    or
    Apply Cycle Interval
    limit is reached.
    No default value is provided.
    • Either the
      Apply Cycle Interval
      or
      Apply Cycle Change Limit
      field must have a non-zero value or use the default value.
    • An apply cycle ends when the job reaches any of the three limits, whichever limit is met first.
  4. Under
    Schema Drift Options
    , if the detection of schema drift is supported for your source and target combination, specify the schema drift option to use for each of the supported types of DDL operations.
    Schema drift options are supported for database ingestion incremental load tasks that propagate change data from Microsoft SQL Server, Oracle, or PostgreSQL sources to Amazon Redshift, Amazon S3, Databricks Delta, Google BigQuery, Google Cloud Storage, Kafka, Microsoft Azure Data Lake Storage, Microsoft Azure Synapse Analytics, or Snowflake targets. Schema drift options are also supported for database ingestion combined initial and incremental load tasks with the same source types and with Amazon Redshift, Databricks Delta, Google BigQuery, Kafka, Microsoft Azure Synapse Analytics, or Snowflake targets.
    The types of supported DDL operations are:
    • Add Column
    • Modify Column
    • Drop Column
    • Rename Column
    The Modify Column and Rename Column options are not supported and not displayed for database ingestion jobs that have Google BigQuery targets.
    The following table describes the schema drift options that you can set for a DDL operation type:
    Option
    Description
    Ignore
    Does not replicate DDL changes that occur on the source database to the target. For Amazon Redshift, Kafka, Microsoft Azure Synapse Analytics, or Snowflake targets, this option is the default option for the Drop Column and Rename Column operation types.
    For Amazon S3, Google Cloud Storage, and Microsoft Azure Data Lake Storage targets that use the CSV output format, the
    Ignore
    option is disabled. For the AVRO output format, this option is enabled.
    Replicate
    Allows the database ingestion job to replicate the DDL change to the target. For Amazon S3, Google Cloud Storage, and Microsoft Azure Data Lake Storage targets, this option is the default option for all operation types. For other targets, this option is the default option for the Add Column and Modify Column operation types.
    • If you try to replicate a type of schema change that is not supported on the target, database ingestion jobs associated with the task will end with an error. For example, if you select
      Replicate
      for Rename Column operations on Microsoft Azure Synapse Analytics targets, the jobs will end.
    • Add Column operations that add a primary-key column are not supported and can cause unpredictable results.
    • For Databricks Delta targets, the
      Replicate
      option is not available for Drop Column.
    • Modify Column operations that change the NULL or NOT NULL constraint for a column are not replicated to the target by design because changing the nullibility of a target column can cause problems when subsequent changes are applied.
    Stop Job
    Stops the entire database ingestion job.
    Stop Table
    Stops processing the source table on which the DDL change occurred. When one or more of the tables are excluded from replication because of the
    Stop Table
    schema drift option, the job state changes to
    Running with Warning
    .
    The database ingestion job cannot retrieve the data changes that occurred on the source table after the job stopped processing it. Consequently, data loss might occur on the target. To avoid data loss, you will need to resynchronize the source and target objects that the job stopped processing. Use the
    Resume With Options
    Resync
    option. For more information, see Overriding schema drift options when resuming a database ingestion job.
  5. For incremental load jobs that have an Apache Kafka target, configure the following checkpointing options:
    Option
    Description
    Checkpoint All Rows
    Indicates whether a database ingestion job performs checkpoint processing for every message that is sent to the Kafka target.
    If this check box is selected, the
    Checkpoint Every Commit
    ,
    Checkpoint Row Count
    , and
    Checkpoint Frequency (secs)
    options are ignored.
    Checkpoint Every Commit
    Indicates whether a database ingestion job performs checkpoint processing for every commit that occurs on the source.
    Checkpoint Row Count
    Specifies the maximum number of messages that a database ingestion job sends to the target before adding a checkpoint. If you set this option to 0, a database ingestion job does not perform checkpoint processing based on the number of messages. If you set this option to 1, a database ingestion jobs add a checkpoint for each message.
    Checkpoint Frequency (secs)
    Specifies the maximum number of seconds that must elapse before a database ingestion job adds a checkpoint. If you set this option to 0, a database ingestion job does not perform checkpoint processing based on elapsed time.
  6. Under
    Schedule
    , if you want to run job instances for an initial load task based on an existing schedule instead of manually starting the job after it is deployed from one of the monitoring interfaces, select
    Run this task based on a schedule
    and then select a predefined schedule. The default option is
    Do not run ths task based on a schedule
    .
    This field is unavailable for incremental load and combined initial and incremental load tasks.
    You can view and edit the schedule options in Administrator. If you edit the schedule, the changes will apply to all jobs that use the schedule. If you edit the schedule after deploying the task, you do not need to redeploy the task.
    If the schedule criteria for running the job is met but the previous job run is still active,
    Mass Ingestion Databases
    skips the new job run.
  7. Under
    Custom Properties
    , you can specify custom properties that Informatica provides to meet your special requirements. To add a property, in the
    Create Property
    fields, enter the property name and value. Then click
    Add Property
    .
    Specify these properties only at the direction of Informatica Global Customer Support. Usually, these properties address unique environments or special processing needs. You can specify multiple properties, if necessary. A property name can contain only alphanumeric characters and the following special characters: periods (.), hyphens (-), and underscores (_).
    To delete a property, click the Delete icon button at the right end of the property row in the list.
  8. Click
    Save
    .

0 COMMENTS

We’d like to hear from you!