Hash Text
This section provides a comprehensive description of the Hash Text Rule.
For a summary of the rule and its compatibility with Privitar jobs and execution environments, see Masking Rule Types.
Data Types
The supported data types for this rule are:
Text
Description
The original value is completely replaced by a generated SHA256 salted hash of the value and a pepper (secret salt). By default, the hash will be a base 64 string; however, you can provide a regular expression for the hash output.
The rule does not utilize a Token Vault, so there is no support for Unmasking of any output value. As the input value is hashed, there is no way to return to the input value; the rule output is always irreversible.
Consistency in tokenization is achieved by the hashing function. Effectively, the function will always return the same output value for the same input value within a given PDD for a specific rule.
Note
This rule can only be used in a Job that has a KMS configured in the Privitar Environment. The only KMS that is currently supported is the AWS Secrets Manager. For more information, see Key Management Environment Configuration.
Masking Behavior
The options are described in the following table and assume that the original value is not null:
Option | Description |
---|---|
default | If you do not specify a regular expression, the default output is a (hashed) base 64 string. |
Regular expression | The pattern that the generated text should match. Using a regular expression with the rule ensures that it is possible to add a Watermark to the dataset. It is not possible to add a watermark if the rule is used without a regular expression. For more information, see Watermarking a Dataset. For more information about the regular expression syntax supported in Privitar, see Regular Expression Syntax. (Click on the RegExp class in the Class Summary table.) |
Examples
The following diagram illustrates the output behavior of the rule. The first two examples show how the same input value produces the same output value. The final example shows how the output value can be changed using a regular expression:

Here are some other examples of regular expressions that could be used to match some example fields and formats:
Field | Format | Expression |
---|---|---|
Email address | xxxxxxx@xxxxx.com | [a-z]{7}\@[a-z]{5}\.com |
Surname | xxxxxxxx | [a-z]{8} |
Tokenization Behavior
Tokenization Behavior contains various settings that determine how tokenization is performed when the rule is applied to a dataset.
For the Hash text rule, the Behavior setting is fixed as:
Consistency enforced by hashing function but duplicate tokens are possible
This means that the hashing function ensures that the same input value will always return the same output value. But, there is the unlikely possibility of duplicate tokens being generated. (The collision resistance of the SHA256 hashing algorithm is discussed in many external publications.)
However, collisions are much more likely if a regular expression is specified with the rule. For example, if the regular expression defines an output that is smaller than the default hash output.
If Retain NULL values is checked, NULL values in the input will not be replaced or tokenized and will be retained as NULL in the output.
Hash Text Environment Requirements
The hash text rule can only be used in environments which have a KMS configured. At the time of writing (4.3 release) the only supported KMS type that is compatible with the Hast Text Rule is the AWS Secrets Manager one (please consult the KMS configuration guide Key Management Environment Configuration).
The rule does not require the user to specify any key details as the key will be created automatically on the first execution of any hash text rule in a given environment. The Privitar Platform does not support rotation or deletion of the key (even in case the environment is deleted or the KMS type is changed). In case the key is deleted or rotated by the user directly from the KMS this will result in any newly processed data to be inconsistent with previously tokenized data.
Required Permissions for Hash Text Rule
For the Hash Text Rule all the communication with the KMS in this case AWS Secrets Manager is done on the on the execution engine (POD, Hadoop Batch Processor, SDK, other dataflow processor) using the AWS SDK and will use the Region and Endpoint to connect to the AWS secrets manager.
We require both read and write access to AWS Secrets Manager for the Hash Text rule as we will create the secret if missing on first use. AWS has a managed policy for Secrets Manager that will grant the necessary permission, however customer might chose another policy or create its own in which case we will require that this grants the necessarily permissions for the following AWS Secrets Manager actions:
CreateSecret - see Minimum permissions required (we always require the secretsmanager:TagResource permission as we tag the secret with the key algorithm)
In case of using customer manager CMK they also need to grant access to the AWS KMS for the following actions:
GenerateDataKey - needed only if you use a customer-managed AWS KMS key to encrypt the secret. You do not need this permission to use the account default AWS managed CMK for Secrets Manager.
Decrypt - needed only if you use a customer-managed AWS KMS key to encrypt the secret. You do not need this permission to use the account default AWS managed CMK for Secrets Manager.
By default, the Secret created by the Hash Text rule will be accessible as decided by the existing IAM policies. It’s advisable to restrict this to a specific role. This can be done manually by attaching an AWS Resource Policy to the Secret, with the AWS guide here.
To restrict access to just RoleA in account 111111111111, the Resource Policy would have the form:
{ "Version" : "2012-10-17", "Statement" : [ { "Effect" : "Deny", "Principal" : "*", "Action" : "secretsmanager:*", "Resource" : "*", "Condition" : { "ArnNotLike" : { "aws:PrincipalArn" : "arn:aws:iam::111111111111:role/RoleA" } } }] }
The user updating the policy must assume the role that access will be restricted to otherwise they would be locked out which is prevented by AWS. If they can’t assume this role, then either they should get temporary access to be able to update the resource policy, otherwise the resource policy should include the user, in a form similar to the below. This is up to the individual use case requirements.
{ "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Deny", "Principal": "*", "Action": "*", "Resource": "*", "Condition": { "StringNotLike": { "aws:PrincipalARN": [ "<ARN-OF-ALLOWED-ROLE>", "<ARN-OF-ALLOWED-USER>" ] } } } ] }
Glue Environments
Glue deployments have permission to create secrets and the first time a hash text rule is used, it will create one for that rule.
The following resource policy is automatically attached to the secret created.
{ "Version": "2012-10-17", "Statement" : [ { "Effect" : "Deny", "Principal": "*", "NotAction" : [ "secretsmanager:DeleteSecret", "secretsmanager:RestoreSecret", "secretsmanager:GetResourcePolicy" ], "Resource" :"*", "Condition" : { "ArnNotLike" : { "aws:PrincipalArn" : <Glue job IAM role arn> } } }] }
This denies access to all principals except the IAM role that the Glue job uses. To allow the secret to be deleted in the future, it doesn't deny DeleteSecret permissions. Note that the secret would have to be deleted via the AWS CLI command below, because deletion from the AWS console UI requires DescribeSecret permissions.
aws secretsmanager delete-secret --secret-id <secret name> --force-delete-without-recovery