Masking Rules
Masking Rules apply a de-identification transformation to each of the columns in the input Schema. The types of transformations supported, include:
Passing through a column unchanged.
Removing a column entirely.
Generating artificial token values based on user-specified patterns.
Masking data values by truncation (clipping), substitution or encryption.
To decide which behavior is most appropriate, consider the input Schema and determine:
Which columns are direct identifiers.
Which columns are quasi-identifiers or indirect identifiers that may allow linkage attacks when combined with data from other datasets.
Which columns are sensitive (to your organization or the data subjects in the dataset).
What are the use cases for the data after de-identification (for example, determining which columns should be consistently masked and potentially require unmasking later on).
This exercise will give a good indication as to where the masking should be applied.
In general it is desirable to remove all identifying information from a dataset. By replacing identifier columns with randomly generated tokens (for example using a Regular Expression Text Generator Rule), some privacy risk is eliminated.
Privitar preserves input data types during and after processing (with the exception of the Encrypt Rule which will always output Text, independently of the input data type), and supports Rule configurations to retain or closely resemble the original data format after masking (for example, credit card numbers, email addresses, dates, IDs).