Creating Hierarchical Schemas
The Privitar Platform supports the processing of complex types using the Automation API. For example, data in Avro format or JSON format in Kafka messages. (Complex types are not supported in the user-interface.)
The following table lists the support levels for complex types for each Privitar Job type:
Job Type | Supported | Data Platforms |
---|---|---|
Batch | Not supported. | |
Data Flow | Yes | NiFi, Kafka, Streamsets. (Also, Privitar SDK.) |
POD | Yes |
To support these types requires the creation of a Hierarchical Schema. Once a Hierarchical Schema has been created, a Policy can be created to describe the required transformations to the primitive data values. The process of creating a Policy is identical to the process for a non-Hierarchical Schema.
The Automation API can be used to create Schemas where the fields are arranged in a hierarchy, which map onto complex types. When de-identifying such data using a Policy, Rules are applied in-place to the primitive components of the complex types, producing a de-identified output that retains the structure.
For example, consider the following Hierarchical Schema defined using JSON syntax:
customer: { name: "John Smith", addresses : [ { address: "4 Cherry Tree Dr, London", postal-code: "W12 8PQ" }, { address: "18B Willow Road, Liverpool", postal-code: "L7 7GH" } ] }
To process this input, the Automation API should be used to create a Hierarchical Schema containing the following fields and paths:
customer > name
customer > addresses > address
customer > addresses > postal-code
A Policy can then be defined, using either the user-interface or the Automation API, to describe the required de-identification. For example:
(TOKENIZE)
customer > name
(DROP)
customer > addresses > address
(CLIP)
customer > addresses > postal-code
Finally, applying the Policy using Privitar produces a structured de-identified output:
customer: { name: "Xjdhgidkkidhg", addresses : [ { postal-code: "W12" }, { postal-code: "L7" } ] }