Skip to main content

User Guide

Batch Job Output Visualisations (HDFS and Hive)

Privitar produces three different visualisations of output data. These visualisations are available from the Job Details section of a Job. Select Manual Generalization or Automatic Generalization to see the charts.

  • Distortion Measures

  • Cluster Size Histogram

  • Cluster Size Bubble Chart

  • Distortion Histogram

Distortion Measures

UUID-e2341550-c050-939c-6532-6d621cf78c4e.png

Privitar calculates a distortion measure for each quasi-identifier. This is the mean average error between the original data and the generalized output.

This value is intended to give a high-level summary of the trade-off between utility and privacy that has occurred as a result of the generalization process.

Cluster Size Histogram

UUID-9f47e014-1a97-7e52-4a11-3fdd851be247.png

The Cluster Size Histogram visualises the distribution of cluster sizes in the output.

Clusters shown in grey do not meet the minimum cluster size threshold. For more information about cluster sizes, see What is k-anonymity?.

Cluster Size Bubble Chart

UUID-0101b6a2-b54b-8e85-8e5e-933670f6c4cc.png

The Cluster Size Bubble Chart visualises the relative sizes and counts of clusters in the output.

Clusters shown in grey do not meet the minimum cluster size threshold. For more information about cluster sizes, see What is k-anonymity?.

Distortion Histogram

UUID-e9456f03-4131-0dcd-2669-859861e4641f.png

The Distortion Histogram compares the distributions of the original data and the generalized output data.

The original data is shown as the blue line, the generalized output is shown in grey.