Hi, I'm Ask INFA!
What would you like to know?
ASK INFAPreview
Please to access Bolo.

Table of Contents

Search

  1. Introducing Mass Ingestion
  2. Getting Started with Mass Ingestion
  3. Connectors and Connections
  4. Mass Ingestion Applications
  5. Mass Ingestion Databases
  6. Mass Ingestion Files
  7. Mass Ingestion Streaming
  8. Monitoring Mass Ingestion Jobs
  9. Asset Management
  10. Troubleshooting

Mass Ingestion

Mass Ingestion

Amazon S3, Flat File, Google Cloud Storage, and Microsoft Azure Data Lake Storage targets

Amazon S3, Flat File, Google Cloud Storage, and Microsoft Azure Data Lake Storage targets

The following list identifies considerations for using Amazon S3, Flat File, Google Cloud Storage, and Microsoft Azure Data Lake Storage targets:
  • When you define a database ingestion task that has an Amazon S3, Flat File, Google Cloud Storage, or Microsoft Azure Data Lake Storage target, you can select either CSV or Avro format for the generated output files that contain the source data to be applied to the target.
  • If you select the
    CSV
    output format,
    Mass Ingestion Databases
    creates the following files on the target for each source table:
    • A schema.ini file that describes the schema and includes some settings for the output file on the target.
    • One or multiple output files for each source table, which contain the source data.
      Mass Ingestion Databases
      names these text files based on the name of the source table with an appended date and time.
    The schema.ini file lists a sequence of columns for the rows in the corresponding output file. The following table describes the columns in the schema.ini file:
    Column
    Description
    ColNameHeader
    Indicates whether the source data files include column headers.
    Format
    Describes the format of the output files.
    Mass Ingestion Databases
    uses a comma (,) to delimit column values.
    CharacterSet
    Specifies the character set that is used for output files.
    Mass Ingestion Databases
    generates the files in the UTF-8 character set.
    COL
    <sequence_number>
    The name and data type of the column.
    • If you selected any of the
      Add Operation...
      properties under
      Advanced
      on the
      Target
      page of the task wizard, the list of columns includes metadata columns for the operation type, time, owner, or transaction ID.
    • If you selected the
      Add Before Images
      check box, for each source column, the job creates a
      column_name
      _OLD column for UNDO data and
      column_name
      _NEW column for REDO data.
    You should not edit the schema.ini file.
  • If you select the
    Avro
    output format, you can select an Avro format type, a file compression type, an Avro data compression type, and the directory that stores the Avro schema definitions generated for each source table. The schema definition files have the following naming pattern:
    schemaname
    _
    tablename
    .txt.
  • On Flat File and Microsoft Azure Data Lake Storage targets, Mass Ingestion Databases creates an empty directory for each empty source table. Mass Ingestion Databases does not create empty directories on Amazon S3 and Google Cloud Storage targets.
  • If you do not specify an access key and secret key in the Amazon S3 connection properties,
    Mass Ingestion Databases
    tries to find AWS credentials by using the default credential provider chain that is implemented by the DefaultAWSCredentialsProviderChain class. For more information, see the
    Amazon Web Services
    documentation.
  • If database ingestion incremental load jobs replicate Update operations that change primary key values on the source to any of these targets that use the CSV output format, the job processes each Update record as two records on the target: a Delete followed by an Insert. The Delete contains the before image. The Insert contains the after image for the same row.
    For Update operations that do not change primary key values, database ingestion jobs process each Update as one operation and writes only the after image to the target.
    If source tables do not have primary keys,
    Mass Ingestion Databases
    treats the tables as if all columns were part of the primary key. In this case, each Update operation is processed as a Delete followed by an Insert.

0 COMMENTS

We’d like to hear from you!