Mass Ingestion

Back Next

Microsoft SQL Server and Azure SQL Database sources

To use Microsoft SQL Server or Microsoft Azure SQL Database sources in database ingestion tasks, first prepare the source database and review the usage considerations.

Source preparation

For SQL Server sources, verify that you use Microsoft SQL Server Enterprise Edition, Standard Edition, or Developer Edition. Other SQL Server editions, such as SQL Server Express, are not supported.

To deploy and run a

database ingestion

task that has a SQL Server or Azure SQL Database source, the source connection must specify a database user who has the privileges required to perform an initial or incremental load operation.

For initial load operations with a SQL Server or Azure SQL Database source, use the following SQL statements to grant VIEW ANY DEFINITION and SELECT privileges to the user:

use [master]
CREATE USER user_name FOR LOGIN login_name;
GRANT VIEW ANY DEFINITION TO [user_name];
GRANT SELECT TO user_name;
GO

For incremental load operations with SQL Server sources, the database user that you specify in the SQL Server source connection must have the sysadmin role, or have the db_owner role and the SELECT permission on the master.sys.fn_dblog function. Use the following SQL statements if you want the user to have the db_owner role:

use [master]
GRANT SELECT ON master.sys.fn_dblog TO user_name;
GRANT VIEW SERVER STATE TO user_name;
GRANT VIEW ANY DEFINITION TO user_name;
GO
use [source_database]
EXEC sp_addrolemember 'db_owner', 'user_name';
GO

For SQL Server on-premises instances, the user must have the sysadmin role.

For database ingestion incremental load and combined initial and incremental load jobs that have SQL Server sources, you must enable SQL Server Change Data Capture (CDC) on the source database by running the

sys.sp_cdc_enable_db

stored procedure in the database context. To run the procedure, you must have the sysadmin role. When SQL Server CDC is enabled, SQL Server writes additional information to the transaction log and CDC tables, which Mass Ingestion Databases uses during incremental load processing.

Also ensure that SQL Server CDC is enabled on the source tables.

Mass Ingestion Databases

cannot enable CDC for tables that contain more than 1019 columns.

When you create a database ingestion task, you have the option of generating a script that enables CDC on the database and on all columns in the selected source tables.

Usage considerations

Mass Ingestion Databases

supports Amazon Relational Database Service (RDS) for SQL Server sources for initial, incremental, and combined initial and incremental load jobs.

Mass Ingestion Databases

supports cloud Microsoft Azure SQL Database sources for initial loads. For CDC, you must run the Azure SQL Database source instance on a cloud Azure Managed Instance (AMI).

Mass Ingestion Databases

reads change data for incremental load jobs from the SQL Server transaction log or from the enabled SQL Server CDC tables. The change data is read from the transaction log if the required restart point (LSN) is available there. If the restart point pre-dates the active transaction log,

Mass Ingestion Databases

automatically transitions to reading the change data from the CDC tables instead. After reading data from the CDC tables,

Mass Ingestion Databases

switches back to the transaction log in a transparent manner. The same behavior can occur if you resume an incremental load job that stopped for any reason, or if the transaction log is truncated while capture processing is down.

When you enable CDC on the SQL Server database, SQL Server automatically creates a capture job and a cleanup job that will be executed by the SQL Server Agent. The capture job is responsible for populating the SQL Server CDC tables. The cleanup job is responsible for cleaning up records from the CDC tables. The default value for data retention in the CDC table is 72 hours, or 3 days. You can check the current retention period by running the

sys.sp_cdc_help_jobs

stored procedure and checking the retention value in the results. If you expect a downtime greater than 3 days, you can adjust the retention in the

sys.sp_cdc_change_job

stored procedure or in the SQL Server Agent cleanup job. You can also suspend the cleanup job.

Mass Ingestion Databases

supports SQL Server page compression and row compression of source data.

Informatica recommends that each source table have a primary key.

Mass Ingestion Databases

does not honor unique indexes in place of a primary key. If no primary key is specified,

Mass Ingestion Databases

treats all columns as if they are part of the primary key.

Mass Ingestion Databases

requires read-write access on the source database. If you use SQL Server Always On availability groups, this requirement means that

Mass Ingestion Databases

can capture change data from the read-write primary replica but not from the read-only secondary replica.

If a Microsoft SQL Server source uses the Always Encrypted method to encrypt column data, the CDC script that is generated from the

CDC Script

field on the

Source

page in the database ingestion task fails to run. This problem is caused by a SQL Server limitation. This problem does not occur with Transparent Data Encryption (TDE).

Mass Ingestion Databases supports schema drift options for Microsoft SQL Server sources in database ingestion incremental load jobs. The following limitations apply:

Microsoft SQL Server does not support renaming tables and columns for which Change Data Capture (CDC) is enabled.

Microsoft SQL Server does not support changing primary keys for CDC tables.

If source table partition changes cause rowset IDs to change, Mass Ingestion Databases can process the changes to enable database ingestion jobs to continue capturing DML changes from the tables.

If the

Mass Ingestion Databases

log reader encounters a row for an Insert, Delete, or Update operation on the source that is greater than 8000 bytes in size, change data loss occurs with a warning message.

Database ingestion initial, incremental, and combined initial and incremental load jobs can replicate data from Microsoft SQL Server large-object columns to Snowflake targets. If an IMAGE or VARBINARY(MAX) column contains more than 8 MB of data, the data is truncated to 8388608 bytes before being written to a BINARY column on the target. If a VARCHAR(MAX), NVARCHAR(MAX), TEXT, NTEXT, or XML column contains more than 16 MB of data, the data is truncated to 16777216 bytes before being written to a VARCHAR column on the target. To replicate data from these large-object columns, you must select

Include LOBs

under

Advanced

on the

Source

page when you configure the task.

Mass Ingestion Databases does not replicate data from SQL Server computed columns.

Mass Ingestion Databases

does not support the following SQL Server data types:

GEOGRAPHY

GEOMETRY

HIERARCHYID

IMAGE

Database ingestion

jobs propagate nulls for columns that have these data types.

For information about the default mappings of supported data types, see the Data Type Mappings Reference.

Rename Saved Search

Table of Contents

Mass Ingestion

Mass Ingestion

Microsoft SQL Server and Azure SQL Database sources

Microsoft SQL Server and Azure SQL Database sources

Source preparation

Usage considerations