Skip to main content

User Guide

Creating Hierarchical Schemas

The Privitar Platform supports the processing of complex types using the Automation API. For example, data in Avro format or JSON format in Kafka messages. (Complex types are not supported in the user-interface.)

The following table lists the support levels for complex types for each Privitar Job type:

Job Type

Supported

Data Platforms

Batch

Not supported.

Data Flow

Yes

NiFi, Kafka, Streamsets.

(Also, Privitar SDK.)

POD

Yes

To support these types requires the creation of a Hierarchical Schema. Once a Hierarchical Schema has been created, a Policy can be created to describe the required transformations to the primitive data values. The process of creating a Policy is identical to the process for a non-Hierarchical Schema.

The Automation API can be used to create Schemas where the fields are arranged in a hierarchy, which map onto complex types. When de-identifying such data using a Policy, Rules are applied in-place to the primitive components of the complex types, producing a de-identified output that retains the structure.

For example, consider the following Hierarchical Schema defined using JSON syntax:

customer: {
        name: "John Smith",
        addresses : [
            {
                address: "4 Cherry Tree Dr, London",
                postal-code: "W12 8PQ"
            },
            {
                address: "18B Willow Road, Liverpool",
                postal-code: "L7 7GH"
            }
        ]
    }

To process this input, the Automation API should be used to create a Hierarchical Schema containing the following fields and paths:

  • customer > name

  • customer > addresses > address

  • customer > addresses > postal-code

A Policy can then be defined, using either the user-interface or the Automation API, to describe the required de-identification. For example:

  • (TOKENIZE) customer > name

  • (DROP) customer > addresses > address

  • (CLIP) customer > addresses > postal-code

Finally, applying the Policy using Privitar produces a structured de-identified output:

customer: {
        name: "Xjdhgidkkidhg",
        addresses : [
            {
                postal-code: "W12"
            },
            {
                postal-code: "L7"
            }
        ]
    }