Glossary of Data Security Terminology
This glossary defines terms that relate to the Privitar Data Security Platform.
Glossary
A
- access control policy
An access control policy is a reusable set of access control rules that serves a business context. An access control policy is a flexible construct that allows you to apply access control rules according to desired conditions. For example, you can write access control policies to define rules that examine and drop rows (records) according to the business condition and the actual data in those records.
- access control rule
Access control rules act on the field level. Access control rules examine the actual data and discard each record being queried (requested) according to the rule’s conditions.
- access request
See project request.
- asset
Assets are data structures; for example the tables in an Oracle® or PostgreSQL database.
- asset registration request
An asset registration request is an inquiry made by a data owner to add a data asset (a database table, for example) to a dataset. A data guardian approves or denies asset registration requests.
- attribute-based access control (ABAC)
Attribute-based access controls (ABACs) are conditional policies and rules that regulate how users’ access fields or rows, based on specific attributes, such as location, terms, and tags.
ABACs determine how the platform applies policies and rules. In contrast, field-level access controls and record-level access controls determine where (on which assets, rows, or fields) the platform applies the policies and rules.
B
- business information
Business information provides definition, structure, and clarity to data assets, projects, policies, and rules by representing the context and semantics of an organization.
Business information includes data classes, tags, terms, and purpose.
Business information assists users to find and understand content on the platform and guides when to apply transformations based on attributes and conditions.
C
- cell-level transformation
Cell-level transformations allow you to select a different transformation for each distinct record of a specified field (column), that is, a cell, based on varying (logical) conditions.
For example, you can instruct the platform to apply different transformations to an identity number or postal code in a given record based on the value of country of residence in a specific cell.
- connection
A connection is a configuration for connecting to and reading data from a data source, such as a JDBC connection string. The platform uses this connection information to read metadata attributes from a data asset, to read the data itself, and to write the processed data to the target location.
- control plane
The control plane is a logical perimeter that does not have direct access to data but may host components that drive operations in the data plane.
The control plane is where policies, rules, projects, and assets are created and managed.
The architectural split between the control plane and the data plane allows for configuration, orchestration, and administration (control) without the need to access data, but the ability to process data close to the source within a given jurisdiction. The control plane allows for this by using metadata, data classes, and other representations of the data.
D
- data agent
The data agent provides access to the data plane whenever required by the control plane, for example to retrieve the schema for a data asset. It makes a long-lived connection to the data bridge on startup.
- data bridge
The data bridge is the component in the control plane that handles communication with the data plane. It acts as a Google Remote Procedure Call (gRPC) server. It is replicated, and it sits behind an ingress with a load-balancer.
- data class (class)
A data class is a categorization that data owners apply to fields within data assets to indicate the category of data. Within the Privitar Data Security Platform, data owners can apply a data class to identify the data's category and ensure that that kind of data is classified consistently throughout your organization. For example, data classes can classify birth dates, national identifiers, and postal codes.
- data consumer (consumer)
Data consumers are users on the Privitar Data Security Platform who request and consume data from the platform. Data consumers require direct access to data as part of their job responsibilities.
- data exchange (exchange)
A data exchange is a secure online portal where data owners can classify sensitive datasets, and data consumers can access them, without compromising data safety.
Each data exchange is separate and different from other data exchanges, being a discrete entity within an enterprise.
- data guardian (guardian)
Data guardians are users on the Privitar Data Security Platform who develop and maintain company policies and rules that govern data usage, including how the organization adheres to regulatory and compliance guidelines and requirements.
Data guardians are responsible for approving all data requests, including requests to register data on the platform and requests to access data outside the platform.
- data owner (owner)
Data owners are users on the Privitar Data Security Platform who register and classify data on the platform. Data owners understand where the data comes from, its quality, its meaning, and for what purposes it can be used.
- data plane
A data plane is a set of services used for the reading, writing, and processing of data. It contains a data agent and services capable of provisioning data, such as a data proxy or an integration using the Privitar SDK.
- data proxy (proxy)
The data proxy is a Java Database Connectivity proxy (JDBC proxy) that allows data consumers to access sensitive data to which de-identification policies have been applied. It makes calls to the data bridge to fetch the information it needs, for example the details of how to connect to the sensitive data and the policies to be applied.
- dataset
A dataset is a logical container of assets that is also known as a "data product." Its purpose is to group and facilitate an easier search experience. Data owners make datasets available to data consumers.
- data type (type)
A data type is the data's categorization that is read from the source. Examples include: integer and string. The data type references how data is stored in a database, and each data type can have a different corresponding transformation. For example, you can store a person’s age as an integer or a string.
E
- encryption
Encryption is the act of using a cryptographic algorithm to derive a value that is applied to a value in a dataset in such a way that only authorized parties can access the original value. In an encryption scheme, the original value, referred to as plaintext, is encrypted using an encryption algorithm to generate ciphertext that authorized parties can only read if it is decrypted. Encryption can be used as a de-identification technique.
It is good practice to encrypt data at rest and in transit. However, while encryption can help protect against unauthorized access, it does not protect the privacy of individuals’ data when it’s used by people who are authorized. This is known as an insider attack.
- enterprise administrator (enterprise admin)
Enterprise administrators are users who perform operations within the Privitar Data Security Platform, such as creating a data exchange, creating a data plane, and configuring a data plane.
- exchange
See data exchange.
- exchange administrator (exchange admin)
Exchange administrators are users who perform tasks within a data exchange, such as creating and editing a data plane, managing users and groups, and performing everyday administration tasks.
F
- field-level access control
Field-level access controls are conditional policies and rules that regulate users’ ability to access individual fields of a data asset. Field-level access controls determine which fields of the original dataset the platform retrieves prior to applying data transformation rules. Field-level access controls are implemented through drop field transformation, conditioned on attributes (ABAC), data consumer roles (RBAC), or purpose (PBAC).
Field-level access controls determine where (on which fields) the platform applies policies and rules.
- field-level transformation
Field-level transformations apply the same transformation to the entire field (column).
The platform determines whether to apply a field-level transformation based on the data class of the column.
H
- HashiCorp® Vault Key Management System (HashiCorp® Vault KMS)
The HashiCorp® Vault KMS is a key management system (KMS) used to create and control encryption keys, which you use to encrypt data. A KMS is a system for the management (generation, distribution, storage, and more) of cryptographic keys and their metadata.
K
- key format
The Privitar Data Security Platform uses "asymmetric" (or public key) encryption, which uses a pair of distinct, yet related keys. One key (the public key) is used for encryption, while the other in the pair (the private key) is used for decryption by an authenticated recipient (user).
L
- linkability
"Linkability" is the probability of inferring the original value of transformed data by linking values from different datasets. Applying different tokens to the same value in different datasets reduces the ability to re-identify or de-anonymize data.
P
- policy
A policy is a reusable set of rules that serves a business context. Users of the platform can utilize the following types of policies:
- privacy enhancing technology (PET)
A privacy enhancing technology is a transformation type used to modify raw data to remove sensitive data elements. The Privitar Data Security Platform offers many PETs. These are the transformation types that data guardians select when building policies.
- Privitar NOVLT
Privitar NOVLT is a feature of the Privitar Data Security Platform that applies consistent tokenization without a token vault. NOVLT allows for data linkability across regions. NOVLT also offers faster throughput and less latency than most vaulted solutions.
- Privitar Query Engine
The Privitar Query Engine retrieves relevant policies and applies them to assets. The Query Engine transforms SQL queries, and the data retrieved with them, in compliance with the applicable policies.
- project
A project is a collection of data assets that a team of data consumers wishes to provision safely. While data owners manage the data assets themselves, data consumers manage projects, including linkability between assets. However, data consumers will not have access to the data within a project until a data guardian approves their access.
- project request (request)
A project request is an inquiry made by a data consumer to use the assets in a data project. A data guardian approves or denies project requests.
- provision
Provisioning is the act of making data available in a secure way to users and applications.
- purpose
A purpose is the data consumer’s intended use for the data in a project. Data guardians use purposes as attributes in rules. Examples might include, “to find sources of bad loans” or “to build customer 360 profiles."
- purpose-based access control (PBAC)
Purpose-based access controls (PBACs) are conditional policies and rules that regulate how users’ access fields, rows, or entire data assets, based on a project purpose selected by a data consumer.
PBACs determine how the platform applies policies and rules. In contrast, field-level access controls, and RLACs determine where (on which fields, rows, or assets) the platform applies policies and rules.
R
- record-level access control (RLAC)
Record-level access controls (RLACs) are conditional policies and rules that regulate users’ ability to access individual records of an asset based on the values of selected fields of the same record. Record-level access controls determine which records of the original dataset the platform retrieves prior to applying transformation rules. Unlike data transformation rules, which are based solely on metadata, record-level access control rules are based on a combination of the data itself and metadata.
Record-level access controls (RLACs) determine where (on which records) the platform applies policies and rules. Attribute-based access controls (ABACs), purpose-based access controls (PBACs), and role-based access controls (RBACs) determine how the platform applies those policies and rules.
- region
In the Privitar Data Security Platform, a region is a name for the geographical location, such as the location of a data exchange or a data agent. This is closely tied to jurisdiction. Some regulations require that data must remain within certain jurisdictions.
In cloud computing a region, (aka “geography”), is a named set of cloud resources in the same geographical area. A region is comprised of availability zones.
- regular expression (regex)
A regular expression is a series of characters that specifies a pattern to match text and numeric data formats. The Privitar Data Security Platform uses regular expressions to replace text strings and numbers with random characters.
For example, for an initial value of
abcdef
, you could use the following regular expression[a-z]{6}
to produce an output such asmvskyc
.- request
- role-based access control (RBAC)
Role-based access controls (RBACs) are conditional policies and rules that regulate how users access fields or rows, based on specific roles provided as user groups.
RBACs determine how the platform applies policies and rules. In contrast, field-level access controls, and record-level access controls determine where (on which fields, rows, or assets) the platform applies policies and rules.
- rule
Rules are building blocks of policies. Rules are conditional based on attributes, such as user groups, terms, tags, locations, and so on. Rules also take actions specific to data classes and transformations.
Users of the platform can utilize the following types of rules:
S
- source connection
A source connection is from where a data owner reads data.
- system administrator (SysAdmin)
System administrators are users who perform activities to install and set up the Privitar Data Security Platform. Most of these activities are external to the platform, such as deploying the platform, managing secrets required for installation, performing backup and restore activities, and performing updates to the platform.
T
- tag
A tag is a keyword that you can define to describe objects, such as when you want to group objects together or add context to those objects. For example, you might want to define tags that correspond to geography, line of business, project names, or applications. Tags help facilitate search and filtering.
- target connection
A target connection is to where a data consumer provisions data.
- term
Terms are words used within your organization to describe business concepts in plain language. Adding them to the platform ensures consistent use of those words throughout your organization. Terms also lend meaning to physical assets and their fields and give them context. When data consumers are browsing assets, terms allow them to understand the business meaning and semantics of the physical asset. Examples of terms could be “account type,” “customer level,” or “credit risk rating.”
- tokenization
Tokenization is a form of fine-grained data protection that replaces a clear value with a randomly generated synthetic value that stands in for the original as a "token." The pattern for the tokenized value is configurable and can retain the same format as the original, which means fewer down-stream application changes, enhanced data sharing, and more meaningful testing and development with the protected data.
- token vault
A token vault is a secure database (for example, PostgreSQL or Amazon DynamoDB) where you can store tokens generated during the de-identification of a dataset. Token vaults are only used for consistent tokenization (always returning the same token for the input value). Each token in a token vault is unique. That is, each token is only returned for one value. Token vaults allow for re-identification. That is, you are able to take a token from a de-identified dataset and look up the original input value.
- transformation
A transformation defines a set of behaviors (privacy enhancing technologies) for the platform to execute on a field in a dataset to de-identify it, while still preserving data utility.
- transformation policy
A transformation policy is a reusable set of transformation rules that serves a business context. A transformation policy is a flexible construct that allows you to apply transformation rules in the way that best meets your needs. For example, you can write a policy around a regulation (such as HIPAA or GDPR) or around a business context (such as provisioning data for marketing analytics).
The order of transformation policies matters. The platform applies them in the order that they are arranged by the data guardian.
- transformation rule
Transformation rules are conditional based on attributes, such as user group, terms, tags, location, and so on. Transformation rules apply pre-defined transformations to data classes.
W
- watermark
A watermark is a unique digital pattern created by the Privitar platform that is added into the records of de-identified datasets for traceability reasons. The platform adds watermarks to the data during the process of de-identification. They are invisibly embedded and distributed throughout the data, and as a result are robust against tampering and operations, such as filtering or reorganizing of the data.
In the event of a leak or data breach, watermarks can be used to identify the data and plug potential security holes faster. Watermarks can also act as a deterrent to anyone handling the data, encouraging them to take the security of the dataset seriously when they know that the data can be traced.