Generalize Date
This section contains a comprehensive description of the Generalize Date rule.
For a summary of the rule and its compatibility with Privitar jobs and execution environments, see Masking Rule Types.
Data Types
The supported data types for this rule are:
Date
Timestamp
Description
The input date is replaced by a generalized version of the date, or a constant date if the input date is outside of a user-defined date range.
The date format used for input and output of Date
data types is yyyy-MM-dd
. For example:
2018-12-23
would become 2018-01-01
if the date is generalized to only preserve the year, and set the month and day to 01
.
The date format used for the input and output of Timestamp data types is dd-MM-yyyy HH:mm:ss.SSS
. The output time will always be generalized to midnight. For example:
2018-09-16T11:33:00.465
would become 2018-01-01T00:00:00.000
if the date is generalized to only preserve the year, and set the month and day to 01
. There are no options to use a specific time of day when setting the date range.
Alternatively, a date range can be specified and a constant date returned if the input date is outside of the range specified. For example, if you set the minimum date to 1900-01-01
and the maximum date to 1970-01-01
, then an input date that is before 1900-01-01
or after 1970-01-01
will be generalized. For example:
1897-05-06
would become 1899-12-31
as it is before the minimum date (1900-01-01
). The date is generalized to return the constant date that has been specified in the rule for the Set constant before a date (in this case 1899-12-31
).
Similarly:
1978-03-04
would become 1969-12-31
as it is after the maximum date (1970-01-01
). The date is generalized to return the constant date that has been specified in the rule for the Set constant after a date (in this case 1969-12-31
).
The date format is always preserved in the output.
Note
This behavior of this rule is designed such that it can satisfy the HIPAA (Health Insurance Portability and Accountability Act) requirement for the storage of ages and dates contained in patient healthcare data. For more information about HIPAA and how to use this rule to de-identify data for compliance with the HIPAA Privacy rule, see Setting HIPAA Privacy Rules.
Generalization Behavior
The following table describes the Generalization options that can be used to define the constant date to be returned to replace the input value:
Option | Description |
---|---|
Set the day to | This can be any number from This can also be set to return the original input value of the day; |
Set the month to | This can be any number from This can also be set to return the original input value of the month; |
Set the year to | This can be any number from This can also be set to return the original input value of the year; |
Important considerations when setting the date
To ensure that a valid date is returned by the rule, note the following points about setting the date:
If a generalize date is set as
-/-/31
(where-
isOriginal value
) the date will be generalized to return the last valid day of the month, for months that don't contain the specified number of days. For example, if the input date is2013/09/16
, the date returned by the rule would be2013/09/30
.The rule will also check if the year specified is a Leap year and make similar changes to the date to return the closest valid date.
Masking Behavior
Masking behavior can be used to specify a date range within which to apply the Rule.
The following table defines the Masking behavior:
Option | Sub-option | Description |
---|---|---|
Set constant date based on | Absolute date | Use this option to specify a Static Date. That is, a date that is not relative to the current date. |
Relative date | Use this option to specify a Dynamic Date. That is, a date (time period) that is relative to the current date. | |
Disable Masking behavior | Select this button to disable Masking behavior. | |
Set constant before a date | Use this option to set a Minimum date. That is, the date constant that will be used to replace the input date. This date constant will be applied if the input date is calculated to be earlier than the Minimum date. | |
Set constant after a date | Use this option to set a Maximum date. That is, the date constant that will be used to replace the input date. This date constant will be applied if the input date is calculated to be later than the Maximum date. |
Some important points to note when defining the Masking behavior:
When setting a date constant, the Date picker will prevent the setting of an invalid date in terms of days of the month. For example, it is not possible to set a date of
2020/09/31
. But, note that it is possible to enter a date directly into the set date to field. It is recommended that you use the Date picker to enter the date ranges.If a Minimum date is specified, but a Maximum date is not specified, then all input dates that are later than the Minimum date will be generalized according to the setting in Generalization Behavior.
If a Minimum date is not specified, but a Maximum date is specified, then all input dates that are earlier than the Maximum date will be generalized according to the setting in Generalization Behavior.
If specifying both Minimum and Maximum Static dates, then the Maximum date must be later than the Minimum date. An error will be displayed when the rule is saved, if a clash of dates is detected.
If specifying both Before and After Relative dates, there must not be an overlap between the two specified dates. An error will be displayed when the rule is saved, if an overlap of dates is detected.
For a Relative date, the exact generalization behavior depends on the environment in which the job is run:
For Batch jobs, the current date used will be the same for all records, even if the Job runs into the following day.
For Dataflow jobs, the current date will update over time.
The current date is updated when the Dataflow processor refreshes the Job definition. The frequency of the update is determined by the cache refresh setting in the
application.properties
file. By default, the refresh is set to 10 minutes.So, in the default case, the current date will be updated every 10 minutes. This setting should be sufficient for most use-cases that are applying this rule to a dataset, but contact your system administrator if you need a shorter refresh time period.
For both Absolute and Relative Dates, the replacement date constant that is used for dates outside the date range will not be affected by a change in the current date. That is, the date constant will not be affected by a Job running into the following day.