Hi, I'm Ask INFA!
What would you like to know?
ASK INFAPreview
Please to access Bolo.

Table of Contents

Search

  1. Introducing Mass Ingestion
  2. Getting Started with Mass Ingestion
  3. Connectors and Connections
  4. Mass Ingestion Applications
  5. Mass Ingestion Databases
  6. Mass Ingestion Files
  7. Mass Ingestion Streaming
  8. Monitoring Mass Ingestion Jobs
  9. Asset Management
  10. Troubleshooting

Mass Ingestion

Mass Ingestion

Microsoft Azure Data Lake Storage Gen2 target properties

Microsoft Azure Data Lake Storage Gen2 target properties

When you define a file ingestion task with a Microsoft Azure Data Lake Storage Gen2 target, you must enter target options on the
Target
tab of the task wizard.
The following table describes the target options:
Target Property
Description
Target Directory
Directory to where files are transferred. The directory is created at run time if it does not exist. The directory path specified at run time overrides the path specified while creating a connection.
The default value is the target directory specified in the connection.
You can enter a relative path. To enter a relative path, start the path with a period, followed by a slash (./). The path is relative to the target directory specified in the connection.
Add Parameters
Create an expression to add it as a
Target Directory
parameter. For more information, see Add Parameters.
File Compression*
Determines whether or not files are compressed before they are transferred to the target directory. The following options are available:
  • None
    . Files are not compressed.
  • GZIP
    . Files are compressed using GZIP compression.
If File Exists*
Determines what to do with a file if a file with the same name exists in the target directory. The following options are available:
  • Overwrite
  • Append
  • Fail
Block Size (Bytes)*
Divides a large file into smaller specified block size. When you write a large file, divide the file into smaller parts and configure concurrent connections to spawn the required number of threads to process data in parallel.
Default is 8388608 bytes (8 MB).
*Not applicable when you read data from Databricks Delta.

0 COMMENTS

We’d like to hear from you!