Skip to main content

User Guide

Watermarking a Dataset

The watermarking capability of the Privitar platform can be used to embed a unique stamp into the content of files written to a Protected Data Domain (PDD). This stamp can be used to trace the origin of a file to the PDD for which it was originally produced. The presence of a watermark is a good incentive to a file's recipient to make sure they are not careless or malicious with the data, and also gives traceability in the event of a data breach or if data turns up somewhere unexpected.

The stamp is contained within the tokenized values used for de-identification. Given an arbitrary file, Privitar can investigate the content to identify which PDD, if any, a file belongs to. Locating the PDD gives access to its metadata properties, such as the file's intended purpose, who authorized the release of the data, and the Policy it was released under.

Conditions on Watermarking

To embed a watermark, the following conditions must be met:

  • The Policy must include at least one Text Regular Expression rule.

  • The PDD must have been created with the Embed Watermarks option selected. This option is provided on the PDD creation dialog.

  • Because of the way the watermark is embedded, the file must have a large enough number of rows to contain the watermark. As a rule, files larger than 15,000 rows can contain watermarks.

It is possible to embed watermarks into the output data when using both Data Flow Jobs and Batch Jobs.

If the above conditions are met, each column mapped to a Regular Expression will contain a watermark when a Job for that Policy is run in the PDD.