Hi, I'm Ask INFA!
What would you like to know?
ASK INFAPreview
Please to access Bolo.

Table of Contents

Search

  1. Introducing Mass Ingestion
  2. Getting Started with Mass Ingestion
  3. Connectors and Connections
  4. Mass Ingestion Applications
  5. Mass Ingestion Databases
  6. Mass Ingestion Files
  7. Mass Ingestion Streaming
  8. Monitoring Mass Ingestion Jobs
  9. Asset Management
  10. Troubleshooting

Mass Ingestion

Mass Ingestion

Amazon S3 V2 source properties

Amazon S3 V2 source properties

When you define a file ingestion task with an Amazon S3 V2 source, you must enter source options on the
Source
tab of the task wizard. The options vary based on the file pickup method that you select for the task.
You can overwrite the file name pattern, folder, and table parameters, and define your own variable for sources by using the job resource of the Mass Ingestion Files REST API. For more information, see Mass Ingestion Files REST API.
The following table describes the source options:
Option
Description
File Pickup
The file ingestion task supports the following file pickup methods:
  • By Pattern
    . The file ingestion task picks up files by pattern.
  • By File List
    . The file ingestion task picks up files based on a file list.
Source Directory
Amazon S3 folder path from where files are transferred, including bucket name. The default value is the Folder Path value specified in the connection properties.
You can enter a relative path to the source file system. To enter a relative path, start the path with a period, followed by a slash (./). The path is relative to the source directory specified in the connection.
Add Parameters
Create an expression to add it as a
Folder Path
parameter. For more information, see Add Parameters.
Include files from sub-folders
This applies when
File Pickup
is
By Pattern
. Transfer files from all subfolders under the defined source directory.
File Pattern
This applies when
File Pickup
is
By Pattern
. File name pattern used to select the files to transfer.
In the pattern, you can use the following wildcard characters:
  • An asterisk (*) to represent any number of characters.
  • A question mark (?) to represent a single character.
File Date
This applies when
File Pickup
is
By Pattern
. A date and time expression for filtering the files to transfer.
Select one of the following options:
  • Greater than or Equal
    . Filters files that are modified on or after the specified date and time.
    To specify a date, click the calendar. To specify a time, click the clock.
  • Less than or Equal
    . Filters files that are modified before or on the specified date and time.
  • Equal
    . Filters files that are modified on the specified date and time.
    Click the calendar to select the date and the clock to select the time.
  • Days before today
    . Filters files that are modified within the specified number of days until the current date. Enter the number of days. The current date calculation starts from 00:00 hours.
For example, if you schedule the
file ingestion
task to run weekly and want to filter for the files that were modified in the previous week, set
Days before today
to 7. The task will pick up any file with a date between 7 days ago and the date on which it runs.
Time Zone
This applies when
File Pickup
is
By Pattern
. If you selected a
File Date
option, enter the time zone of the location where the files are located.
File Size
This applies when
File Pickup
is
By Pattern
. Filters the files to transfer based on file size. Enter the file size, select the file size unit, and filter options.
Select one of the following filter options:
  • Greater than or Equal
    . Filters files that are greater than or equal to the specified size.
  • Less than or Equal
    . Filters files that are less than or equal to the specified size.
  • Equal
    . Filters files that have the specified size.
The file path containing the list of files
This applies when
File Pickup
is
By File List
. Select this option to provide the path that contains the list of files to pick up and enter the file path.
File list
This applies when
File Pickup
is
By File List
. Select this option to provide the list of files to pick up and enter a comma-separated list of file names.
Skip Duplicate Files
Indicates whether to skip duplicate files. If you select this option, the file ingestion task does not transfer files that have the same name and creation date as another file. The file ingestion task marks these files as duplicate in the job log. If you do not select this option, the task transfers all files, even files with duplicate names and creation dates.
Check file stability
Indicates whether to verify that a file is stable before a file ingestion task attempts to pick it. The task skips unstable files it detects in the current run.
Stability check interval
This applies when you enable the
Check file stability
option. Time in seconds that a file ingestion task waits to check the file stability.
For example, if the stability time is 15 seconds, the file ingestion task detects all the files in the source folder that match the defined file pattern, it waits for 15 seconds, and then it processes only the stable files.
The interval ranges between 10 seconds to 300 seconds. Default is 10 seconds.
Batch Size
The number of files a
file ingestion
task can transfer in a batch.
Default is 5.
The maximum value of the batch depends on whether the files transfer through an intermediate staging server.
A
file ingestion
task does not transfer files through an intermediate staging server if the files are transferred from the following source to target endpoints:
  • Amazon S3 to Amazon Redshift, if you choose to transfer files without using intermediate staging.
  • Amazon S3 to Snowflake
When you transfer files using a command line, the
file ingestion
task transfers files through an intermediate staging server.
Consider the following guidelines when you define a batch size:
  • If files are transferred from the source to target without an intermediate staging server, the maximum number of files you can transfer in a batch is 8000.
  • If files pass through an intermediate staging server, the maximum number of files you can transfer in a batch is 20.
  • If you transfer files from any source to a Snowflake target, the maximum number of files you can transfer in a batch is 1000.
File Encryption Type
Type of Amazon S3 file encryption to use during file transfer.
Select one of the following options:
  • None
    . Files are not encrypted during file transfer. Default is
    None
    .
  • S3 server-side encryption
    . Amazon S3 encrypts the file by using AWS-managed encryption keys.
  • S3 client-side encryption
    . Ensure that unrestricted policies are implemented for the AgentJVM, and that the master symmetric key for the connection is set.
S3 Accelerated Transfer
Select whether to use Amazon S3 Transfer Acceleration on the S3 bucket.
To use Transfer Acceleration, accelerated transfer must be enabled for the bucket. The following options are available:
  • Disabled
    . Do not use Amazon S3 Transfer Acceleration.
  • Accelerated
    . Use Amazon S3 Transfer Acceleration.
  • Dualstack Accelerated
    . Use Amazon S3 Transfer Acceleration on a dual-stack endpoint.
Minimum Download Part Size
Minimum download part size in megabytes when downloading a large file as a set of multiple independent parts.
Multipart Download Threshold
Multipart download minimum threshold in megabytes that is used to determine when to upload objects in multiple parts in parallel.
After File Pickup
Determines what to do with the source files after the task streams them to the target.
Select one of the following options:
  • Keep the files in the source directory.
  • Delete the files from the source directory.
  • Rename the files in the source directory. You must specify a file name suffix that
    file ingestion
    task adds to the file name when renaming the files. You can choose to suffix the new file name with a timestamp ($timestamp), date ($date), runID ($runId), or time ($time).
  • Archive the files to a different location. You must specify an archive directory which is the absolute path or relative path to the source file system.

0 COMMENTS

We’d like to hear from you!