Hi, I'm Ask INFA!
What would you like to know?
ASK INFAPreview
Please to access Bolo.

Table of Contents

Search

  1. Introducing Mass Ingestion
  2. Getting Started with Mass Ingestion
  3. Connectors and Connections
  4. Mass Ingestion Applications
  5. Mass Ingestion Databases
  6. Mass Ingestion Files
  7. Mass Ingestion Streaming
  8. Monitoring Mass Ingestion Jobs
  9. Asset Management
  10. Troubleshooting

Mass Ingestion

Mass Ingestion

Microsoft SQL Server and Azure SQL Database sources

Microsoft SQL Server and Azure SQL Database sources

To use Microsoft SQL Server or Microsoft Azure SQL Database sources in database ingestion tasks, first prepare the source database and review the usage considerations.

Source preparation

  • For SQL Server sources, verify that you use Microsoft SQL Server Enterprise Edition, Standard Edition, or Developer Edition. Other SQL Server editions, such as SQL Server Express, are not supported.
  • To deploy and run a
    database ingestion
    task that has a SQL Server or Azure SQL Database source, the source connection must specify a database user who has the privileges required to perform an initial or incremental load operation.
    For initial load operations with a SQL Server or Azure SQL Database source, use the following SQL statements to grant VIEW ANY DEFINITION and SELECT privileges to the user:
    use [master] CREATE USER
    user_name
    FOR LOGIN
    login_name
    ; GRANT VIEW ANY DEFINITION TO [
    user_name
    ]; GRANT SELECT TO
    user_name
    ; GO
    For incremental load operations with SQL Server sources, the database user that you specify in the SQL Server source connection must have the sysadmin role, or have the db_owner role and the SELECT permission on the master.sys.fn_dblog function. Use the following SQL statements if you want the user to have the db_owner role:
    use [master] GRANT SELECT ON master.sys.fn_dblog TO
    user_name
    ; GRANT VIEW SERVER STATE TO
    user_name
    ; GRANT VIEW ANY DEFINITION TO
    user_name
    ; GO use [
    source_database
    ] EXEC sp_addrolemember '
    db_owner
    ', '
    user_name
    '; GO
    For SQL Server on-premises instances, the user must have the sysadmin role.
  • For database ingestion incremental load and combined initial and incremental load jobs that have SQL Server sources, you must enable SQL Server Change Data Capture (CDC) on the source database by running the
    sys.sp_cdc_enable_db
    stored procedure in the database context. To run the procedure, you must have the sysadmin role. When SQL Server CDC is enabled, SQL Server writes additional information to the transaction log and CDC tables, which Mass Ingestion Databases uses during incremental load processing.
    Also ensure that SQL Server CDC is enabled on the source tables.
    Mass Ingestion Databases
    cannot enable CDC for tables that contain more than 1019 columns.
    When you create a database ingestion task, you have the option of generating a script that enables CDC on the database and on all columns in the selected source tables.

Usage considerations

  • Mass Ingestion Databases
    supports Amazon Relational Database Service (RDS) for SQL Server sources for initial, incremental, and combined initial and incremental load jobs.
  • Mass Ingestion Databases
    supports cloud Microsoft Azure SQL Database sources for initial loads. For CDC, you must run the Azure SQL Database source instance on a cloud Azure Managed Instance (AMI).
  • Mass Ingestion Databases
    reads change data for incremental load jobs from the SQL Server transaction log or from the enabled SQL Server CDC tables. The change data is read from the transaction log if the required restart point (LSN) is available there. If the restart point pre-dates the active transaction log,
    Mass Ingestion Databases
    automatically transitions to reading the change data from the CDC tables instead. After reading data from the CDC tables,
    Mass Ingestion Databases
    switches back to the transaction log in a transparent manner. The same behavior can occur if you resume an incremental load job that stopped for any reason, or if the transaction log is truncated while capture processing is down.
  • When you enable CDC on the SQL Server database, SQL Server automatically creates a capture job and a cleanup job that will be executed by the SQL Server Agent. The capture job is responsible for populating the SQL Server CDC tables. The cleanup job is responsible for cleaning up records from the CDC tables. The default value for data retention in the CDC table is 72 hours, or 3 days. You can check the current retention period by running the
    sys.sp_cdc_help_jobs
    stored procedure and checking the retention value in the results. If you expect a downtime greater than 3 days, you can adjust the retention in the
    sys.sp_cdc_change_job
    stored procedure or in the SQL Server Agent cleanup job. You can also suspend the cleanup job.
  • Mass Ingestion Databases
    supports SQL Server page compression and row compression of source data.
  • Informatica recommends that each source table have a primary key.
    Mass Ingestion Databases
    does not honor unique indexes in place of a primary key. If no primary key is specified,
    Mass Ingestion Databases
    treats all columns as if they are part of the primary key.
  • Mass Ingestion Databases
    requires read-write access on the source database. If you use SQL Server Always On availability groups, this requirement means that
    Mass Ingestion Databases
    can capture change data from the read-write primary replica but not from the read-only secondary replica.
  • If a Microsoft SQL Server source uses the Always Encrypted method to encrypt column data, the CDC script that is generated from the
    CDC Script
    field on the
    Source
    page in the database ingestion task fails to run. This problem is caused by a SQL Server limitation. This problem does not occur with Transparent Data Encryption (TDE).
  • Mass Ingestion Databases supports schema drift options for Microsoft SQL Server sources in database ingestion incremental load jobs. The following limitations apply:
    • Microsoft SQL Server does not support renaming tables and columns for which Change Data Capture (CDC) is enabled.
    • Microsoft SQL Server does not support changing primary keys for CDC tables.
  • If source table partition changes cause rowset IDs to change, Mass Ingestion Databases can process the changes to enable database ingestion jobs to continue capturing DML changes from the tables.
  • If the
    Mass Ingestion Databases
    log reader encounters a row for an Insert, Delete, or Update operation on the source that is greater than 8000 bytes in size, change data loss occurs with a warning message.
  • Database ingestion initial, incremental, and combined initial and incremental load jobs can replicate data from Microsoft SQL Server large-object columns to Snowflake targets. If an IMAGE or VARBINARY(MAX) column contains more than 8 MB of data, the data is truncated to 8388608 bytes before being written to a BINARY column on the target. If a VARCHAR(MAX), NVARCHAR(MAX), TEXT, NTEXT, or XML column contains more than 16 MB of data, the data is truncated to 16777216 bytes before being written to a VARCHAR column on the target. To replicate data from these large-object columns, you must select
    Include LOBs
    under
    Advanced
    on the
    Source
    page when you configure the task.
  • Mass Ingestion Databases does not replicate data from SQL Server computed columns.
  • Mass Ingestion Databases
    does not support the following SQL Server data types:
    • GEOGRAPHY
    • GEOMETRY
    • HIERARCHYID
    • IMAGE
    Database ingestion
    jobs propagate nulls for columns that have these data types.
    For information about the default mappings of supported data types, see the Data Type Mappings Reference.

0 COMMENTS

We’d like to hear from you!