To use MongoDB sources in database ingestion tasks, review the following considerations.
Usage considerations
Mass Ingestion Databases
supports MongoDB sources for initial load and incremental load jobs.
Mass Ingestion Databases
supports the following targets for MongoDB sources: Amazon S3, Google Cloud Storage, and Microsoft Azure Data Lake Storage Gen2.
The
database ingestion
task moves the MongoDB data to the target as key-value pairs, where the key is the ObjectID and the value is the JSON string, which is a BSON document.
For MongoDB sources, data type mappings do not occur. All data is persisted on the target as string data.
In incremental load operations, the change in the data at source is tracked via the unique key (ObjectID) and the same changed JSON string is applied at the target side.
Mass Ingestion Databases
uses MongoDB change streams to access real-time data changes on a single collection, a database, or an entire deployment.
To open a change stream on a single database, a custom role with privileges that grant
find
and
changeStream
actions is required. Use the following statement to grant the actions on all non-system collections in a database:
does not support time series collections in incremental load jobs that have MongoDB sources.
In incremental load operations,
Mass Ingestion Databases
retrieves the change records from the date and time specified as the restart point. For MongoDB sources, the default value for the restart point is the current time. You can change this value and specify a different date and time. You must specify the time in Greenwich Mean Time (GMT).
If schema drift occurs on the MongoDB source, the data in BSON documents that are sent to the target reflect the schema changes. However,
Mass Ingestion Databases
does not specifically detect and report the schema changes.