Preface
Understanding Domains
- Understanding Domains OverviewUnderstanding the Administrator Tool
- Nodes
- Service Manager
- Application Services
- High Availability
- Informatica Data Usage Policy
  - Configuring Informatica DiscoveryIQ Proxy Details
  - Disabling Informatica Data Usage
Managing Your Account
- Managing Your Account Overview
- Password Management
  - Changing Your Password
- Preferences
- Informatica Network Credentials
  - Enter Informatica Network Credentials
  - Searching Informatica Knowledge Base
Using Informatica Administrator
- Using Informatica Administrator Overview
- Log In to Informatica Administrator
  - Informatica Administrator URL
  - Troubleshooting the Login to Informatica Administrator
- Manage Tab
- Manage Tab - Domain View
  - Details Panel
  - Resource Usage Indicators
- Manage Tab - Services and Nodes View
- Manage Tab - Connections View
- Manage Tab - Schedules View
- Monitor Tab
- Monitor Tab - Summary Statistics View
- Monitor Tab - Execution Statistics View
- Logs Tab
- Reports Tab
- Security Tab
- Service States
- Process States
- Job States
- Informatica Administrator Accessibility Overview
  - Keyboard Shortcuts
Using the Domain View
- About the Domain View
- Dependency Graph
  - Viewing Dependencies for Application Services, Nodes, and GridsViewing Dependencies
  - Recycling or Disabling Downstream Services
- Command History
- History View
  - Viewing History
  - Viewing Events
Domain Management
- Domain Management Overview
- Alert Management
- Folder Management
- Domain Security Management
- User Security Management
- Application Service Management
- Gateway Configuration
  - Configuring the Gateway and Worker Nodes
- Domain Configuration Management
- Shutting Down a Domain
- Domain Properties
Nodes
- Nodes Overview
- Node Types
- Node Roles
- Define and Add Nodes
  - Adding Nodes to the Domain
- Configuring Node Properties
- Shutting Down and Restarting the Node
- Removing the Node Association
- Removing a Node
High Availability
- High Availability Overview
- Resilience
- Restart and Failover
  - Domain Failover
  - Application Service Restart and Failover
- Recovery
- Configuration for a Highly Available Domain
- Troubleshooting High Availability
Connections
- Connections Overview
- Connection Management
- Pass-through Security
  - Pass-Through Security with Data Object Caching
  - Adding Pass-Through Security
- Pooling Properties in Connection Objects
Connection Properties
- Connection Properties Overview
- Adabas Connection Properties
- Amazon Redshift Connection Properties
- Amazon S3 Connection Properties
- DataSift Connection Properties
- Facebook Connection Properties
- Greenplum Connection Properties
- Hadoop Connection Properties
- HBase Connection Properties
- HDFS Connection Properties
- HBase Connection Properties for MapR-DB
- Hive Connection Properties
- HTTP Connection Properties
- IBM DB2 Connection Properties
- IBM DB2 for i5/OS Connection Properties
- IBM DB2 for z/OS Connection Properties
- IMS Connection Properties
- JDBC Connection Properties
- JD Edwards EnterpriseOne Connection Properties
- LDAP Connection Properties
- LinkedIn Connection Properties
- Microsoft Azure Blob Storage Connection Properties
- Microsoft Azure SQL Data Warehouse Connection Properties
- MS SQL Server Connection Properties
- Netezza Connection Properties
- OData Connection Properties
- ODBC Connection Properties
- Oracle Connection Properties
- Salesforce Connection Properties
- SAP Connection Properties
- Sequential Connection Properties
- Teradata Parallel Transporter Connection Properties
- Tableau Connection Properties
- Twitter Connection Properties
- Twitter Streaming Connection Properties
- VSAM Connection Properties
- Web Content-Kapow Katalyst Connection Properties
- Web Services Connection Properties
- Identifier Properties in Database Connections
  - Regular Identifiers
  - Delimited Identifiers
  - Identifier Properties
Schedules
- Schedules Overview
- Create and Edit Schedules
  - Creating a Schedule
  - Editing a Schedule
- Pausing and Resuming a Schedule
- Removing Jobs from a Schedule
- Deleting a Schedule
Domain Object Export and Import
- Domain Object Export and Import Overview
- Export Process
  - Rules and Guidelines for Exporting Domain Objects
- View Domain Objects
  - Viewable Domain Object Names
- Import Process
  - Rules and Guidelines for Importing Domain Objects
  - Conflict Resolution
License Management
- License Management Overview
- Types of License Keys
  - Original Keys
  - Incremental Keys
- Creating a License Object
- Assigning a License to a Service
  - Rules and Guidelines for Assigning a License to a Service
- Unassigning a License from a Service
- Updating a License
- Removing a License
- License Properties
Monitoring
- Monitoring Overview
- Configuring Monitoring
- Summary Statistics
  - Viewing Summary Statistics
- Monitor Data Integration Services
  - Properties View for a Data Integration Service
  - Reports View for a Data Integration Service
- Monitor Ad Hoc Jobs
- Monitor Applications
  - Properties View for an Application
  - Reports View for an Application
- Monitor Deployed Mapping Jobs
- Monitor Logical Data Objects
- Monitor SQL Data Services
- Monitor Web Services
- Monitor Workflows
- Job Status After Application Service Restart or Failover
- Monitoring a Folder of Objects
Log Management
- Log Management Overview
- Log Manager Architecture
- Log Location
- System Logs
- Log Management Configuration
- Using the Logs Tab
- Log Events
- Log Aggregator
  - Aggregating Application Service Logs
  - Processing Aggregated Application Service Logs
Domain Reports
- Domain Reports Overview
- License Management Report
- Web Services Report
Node Diagnostics
- Node Diagnostics Overview
- Informatica Network Login
  - Logging In to the Informatica Network
- Generating Node Diagnostics
- Downloading Node Diagnostics
- Uploading Node Diagnostics
- Analyzing Node Diagnostics
  - Identify Bug Fixes
  - Identify Recommendations
Understanding Globalization
- Globalization Overview
  - Unicode
  - Working with a Unicode PowerCenter Repository
- Locales
- Data Movement Modes
  - Character Data Movement Modes
    - ASCII Data Movement Mode
    - Unicode Data Movement Mode
  - Changing Data Movement Modes
- Code Page Overview
- Code Page Compatibility
- Code Page Validation
- Relaxed Code Page Validation
- PowerCenter Code Page Conversion
  - Choosing Characters for PowerCenter Repository Metadata
- Case Study: Processing ISO 8859-1 Data
  - Configuring the ISO 8859-1 Environment
- Case Study: Processing Unicode UTF-8 Data
  - Configuring the UTF-8 Environment
Informatica Cloud Administration
- Informatica Cloud Administration Overview
- Informatica Cloud Organizations
- Informatica Cloud Secure Agent
- Informatica Cloud Connections
Code Pages
- Supported Code Pages for Application Services
- Supported Code Pages for Sources and Targets
Custom Roles
- Analyst Service Custom Role
- Metadata Manager Service Custom Roles
- Operator Custom Role
- PowerCenter Repository Service Custom Roles
- Test Data Manager Custom Roles
Informatica Platform Connectivity
- Informatica Platform Connectivity Overview
- Domain Connectivity
  - Model Repository Connectivity
- PowerCenter Connectivity
- Native Connectivity
- ODBC Connectivity
- JDBC Connectivity
Configure the Web Browser
- Configure the Web Browser

Administrator Guide

10.2
- 10.5.2

Back Next

Hadoop Connection Properties

Use the Hadoop connection to configure mappings to run on a Hadoop cluster. A Hadoop connection is a cluster type connection. You can create and manage a Hadoop connection in the Administrator tool or the Developer tool. You can use infacmd to create a Hadoop connection. Hadoop connection properties are case sensitive unless otherwise noted.

Hadoop Cluster Properties

The following table describes the general connection properties for the Hadoop connection:

Property	Description
Name	The name of the connection. The name is not case sensitive and must be unique within the domain. You can change this property after you create the connection. The name cannot exceed 128 characters, contain spaces, or contain the following special characters: ~ ` ! $ % ^ & * ( ) - + = { [ } ] \| \ : ; " ' < , > . ? /
ID	String that the Data Integration Service uses to identify the connection. The ID is not case sensitive. It must be 255 characters or less and must be unique in the domain. You cannot change this property after you create the connection. Default value is the connection name.
Description	The description of the connection. Enter a string that you can use to identify the connection. The description cannot exceed 4,000 characters.
Cluster Configuration	The name of the cluster configuration associated with the Hadoop environment.

Common Properties

The following table describes the common connection properties that you configure for the Hadoop connection:

Property	Description
Impersonation User Name	Required if the Hadoop cluster uses Kerberos authentication. Hadoop impersonation user. The user name that the Data Integration Service impersonates to run mappings in the Hadoop environment. The Data Integration Service runs mappings based on the user that is configured. Refer the following order to determine which user the Data Integration Services uses to run mappings: Operating system profile user. The mapping runs with the operating system profile user if the profile user is configured. If there is no operating system profile user, the mapping runs with the Hadoop impersonation user. Hadoop impersonation user. The mapping runs with the Hadoop impersonation user if the operating system profile user is not configured. If the Hadoop impersonation user is not configured, the Data Integration Service runs mappings with the Data Integration Service user. Data Integration Service user. The mapping runs with the Data Integration Service user if the operating system profile user and the Hadoop impersonation user are not configured.
Temporary Table Compression Codec	Hadoop compression library for a compression codec class name.
Codec Class Name	Codec class name that enables data compression and improves performance on temporary staging tables.
Hive Staging Database Name	Namespace for Hive staging tables. Use the name default for tables that do not have a specified database name.
Hadoop Engine Custom Properties	Custom properties that are unique to the Hadoop connection. You can specify multiple properties. Use the following format: <property1>=<value> To specify multiple properties use &: as the property separator. If more than one Hadoop connection is associated with the same cluster configuration, you can override configuration set property values. Use Informatica custom properties only at the request of Informatica Global Customer Support.

Reject Directory Properties

The following table describes the connection properties that you configure to the Hadoop Reject Directory.

Property	Description
Write Reject Files to Hadoop	If you use the Blaze engine to run mappings, select the check box to specify a location to move reject files. If checked, the Data Integration Service moves the reject files to the HDFS location listed in the property, Reject File Directory. By default, the Data Integration Service stores the reject files based on the RejectDir system parameter.
Reject File Directory	The directory for Hadoop mapping files on HDFS when you run mappings.

Hive Pushdown Configuration

The following table describes the connection properties that you configure to push mapping logic to the Hadoop cluster:

Property	Description
Environment SQL	SQL commands to set the Hadoop environment. The Data Integration Service executes the environment SQL at the beginning of each Hive script generated in a Hive execution plan. The following rules and guidelines apply to the usage of environment SQL: Use the environment SQL to specify Hive queries. Use the environment SQL to set the classpath for Hive user-defined functions and then use environment SQL or PreSQL to specify the Hive user-defined functions. You cannot use PreSQL in the data object properties to specify the classpath. The path must be the fully qualified path to the JAR files used for user-defined functions. Set the parameter hive.aux.jars.path with all the entries in infapdo.aux.jars.path and the path to the JAR files for user-defined functions. You can use environment SQL to define Hadoop or Hive parameters that you want to use in the PreSQL commands or in custom queries. If you use multiple values for the environment SQL, ensure that there is no space between the values.
Hive Warehouse Directory	Optional. The absolute HDFS file path of the default database for the warehouse that is local to the cluster. If you do not configure the Hive warehouse directory, the Hive engine first tries to write to the directory specified in the cluster configuration property hive.metastore.warehouse.dir . If the cluster configuration does not have the property, the Hive engine writes to the default directory /user/hive/warehouse .

Property

Description

Environment SQL

SQL commands to set the Hadoop environment. The Data Integration Service executes the environment SQL at the beginning of each Hive script generated in a Hive execution plan.

The following rules and guidelines apply to the usage of environment SQL:

Use the environment SQL to specify Hive queries.

Use the environment SQL to set the classpath for Hive user-defined functions and then use environment SQL or PreSQL to specify the Hive user-defined functions. You cannot use PreSQL in the data object properties to specify the classpath. The path must be the fully qualified path to the JAR files used for user-defined functions. Set the parameter hive.aux.jars.path with all the entries in infapdo.aux.jars.path and the path to the JAR files for user-defined functions.

You can use environment SQL to define Hadoop or Hive parameters that you want to use in the PreSQL commands or in custom queries.

If you use multiple values for the environment SQL, ensure that there is no space between the values.

Hive Warehouse Directory

Optional. The absolute HDFS file path of the default database for the warehouse that is local to the cluster.

If you do not configure the Hive warehouse directory, the Hive engine first tries to write to the directory specified in the cluster configuration property

hive.metastore.warehouse.dir

. If the cluster configuration does not have the property, the Hive engine writes to the default directory

/user/hive/warehouse

Hive Configuration

The following table describes the connection properties that you configure for the Hive engine:

Property	Description
Engine Type	The engine that the Hadoop environment uses to run a mapping on the Hadoop cluster. You can choose MRv2 or Tez. You can select Tez if it is configured for the Hadoop cluster. Default is MRv2.

Blaze Configuration

The following table describes the connection properties that you configure for the Blaze engine:

Property	Description
Blaze Staging Directory	The HDFS file path of the directory that the Blaze engine uses to store temporary files. Verify that the directory exists. The YARN user, Blaze engine user, and mapping impersonation user must have write permission on this directory. Default is /blaze/workdir . If you clear this property, the staging files are written to the Hadoop staging directory /tmp/blaze_<user name> .
Blaze User Name	The owner of the Blaze service and Blaze service logs. When the Hadoop cluster uses Kerberos authentication, the default user is the Data Integration Service SPN user. When the Hadoop cluster does not use Kerberos authentication and the Blaze user is not configured, the default user is the Data Integration Service user.
Minimum Port	The minimum value for the port number range for the Blaze engine. Default is 12300.
Maximum Port	The maximum value for the port number range for the Blaze engine. Default is 12600.
YARN Queue Name	The YARN scheduler queue name used by the Blaze engine that specifies available resources on a cluster.
Blaze Job Monitor Address	The host name and port number for the Blaze Job Monitor. Use the following format: <hostname>:<port> Where <hostname> is the host name or IP address of the Blaze Job Monitor server. <port> is the port on which the Blaze Job Monitor listens for remote procedure calls (RPC). For example, enter: myhostname:9080
Blaze Service Custom Properties	Custom properties that are unique to the Blaze engine. To enter multiple properties, separate each name-value pair with the following text: &: . Use Informatica custom properties only at the request of Informatica Global Customer Support.

Spark Configuration

The following table describes the connection properties that you configure for the Spark engine:

Property	Description
Spark Staging Directory	The HDFS file path of the directory that the Spark engine uses to store temporary files for running jobs. The YARN user, Data Integration Service user, and mapping impersonation user must have write permission on this directory. By default, the temporary files are written to the Hadoop staging directory /tmp/spark_<user name> .
Spark Event Log Directory	Optional. The HDFS file path of the directory that the Spark engine uses to log events.
YARN Queue Name	The YARN scheduler queue name used by the Spark engine that specifies available resources on a cluster. The name is case sensitive.
Spark Execution Parameters	An optional list of configuration parameters to apply to the Spark engine. You can change the default Spark configuration properties values, such as spark.executor.memory or spark.driver.cores . Use the following format: <property1>=<value> To enter multiple properties, separate each name-value pair with the following text: &:

Rename Saved Search

Table of Contents

Administrator Guide

Administrator Guide

Hadoop Connection Properties

Hadoop Connection Properties

Hadoop Cluster Properties

Common Properties

Reject Directory Properties

Hive Pushdown Configuration

Hive Configuration

Blaze Configuration

Spark Configuration