Hi, I'm Ask INFA!
What would you like to know?
ASK INFAPreview
Please to access Bolo.

Table of Contents

Search

  1. Introducing Mass Ingestion
  2. Getting Started with Mass Ingestion
  3. Connectors and Connections
  4. Mass Ingestion Applications
  5. Mass Ingestion Databases
  6. Mass Ingestion Files
  7. Mass Ingestion Streaming
  8. Monitoring Mass Ingestion Jobs
  9. Asset Management
  10. Troubleshooting

Mass Ingestion

Mass Ingestion

MongoDB sources

MongoDB sources

To use MongoDB sources in database ingestion tasks, review the following considerations.

Usage considerations

  • Mass Ingestion Databases
    supports MongoDB sources for initial load and incremental load jobs.
  • Mass Ingestion Databases
    supports the following targets for MongoDB sources: Amazon S3, Google Cloud Storage, and Microsoft Azure Data Lake Storage Gen2.
  • The
    database ingestion
    task moves the MongoDB data to the target as key-value pairs, where the key is the ObjectID and the value is the JSON string, which is a BSON document.
  • For MongoDB sources, data type mappings do not occur. All data is persisted on the target as string data.
  • In incremental load operations, the change in the data at source is tracked via the unique key (ObjectID) and the same changed JSON string is applied at the target side.
  • Mass Ingestion Databases
    uses MongoDB change streams to access real-time data changes on a single collection, a database, or an entire deployment.
  • To open a change stream on a single database, a custom role with privileges that grant
    find
    and
    changeStream
    actions is required. Use the following statement to grant the actions on all non-system collections in a database:
    { resource: { db: <dbname>, collection: "" }, actions: [ "find", "changeStream" ] }
  • Mass Ingestion Databases
    does not support time series collections in incremental load jobs that have MongoDB sources.
  • In incremental load operations,
    Mass Ingestion Databases
    retrieves the change records from the date and time specified as the restart point. For MongoDB sources, the default value for the restart point is the current time. You can change this value and specify a different date and time. You must specify the time in Greenwich Mean Time (GMT).
  • If schema drift occurs on the MongoDB source, the data in BSON documents that are sent to the target reflect the schema changes. However,
    Mass Ingestion Databases
    does not specifically detect and report the schema changes.

0 COMMENTS

We’d like to hear from you!