Skip to main content

User Guide

Creating a Schema from HDFS

A Schema can be created by importing table definitions from data files in HDFS. The supported HDFS file types are:

  • Avro (.avro)

  • Avro Schema (.avsc)

  • Parquet (.parquet)

The process of importing from HDFS requires an Environment to be specified that has had a Hadoop Cluster added to it. Privitar must be able connect to a Hadoop Cluster to read the data files. (For more information about adding a Hadoop Cluster to an Environment, see Hadoop Cluster Environment Configuration.)

To import a Schema from HDFS, follow these steps:

  1. Select Schemas from the Navigation sidebar. The Schemas page is displayed.

  2. Click on Create New Schema. The New Schema window is displayed.

  3. Enter a name for the new Schema in the Name field.

  4. Select Import from File from the Import tables list box. The Import from File window is displayed.

  5. Select an HDFS Environment that represents the cluster containing the data files to import, from the Environment list box.

  6. Enter a path in the Path box, referencing either a single data file containing the table definition to import, or a directory containing multiple such data files.

  7. If the tables that you are importing contain Date or Timestamp fields, Privitar will import these tables according to the following default Date and Timestamp formats:

    • Date Format: yyyy-MM-dd

    • Timestamp Format: yyyy-MM-dd'T'HH:mm:ss

    If you want to change this format, click on the Cog icon. Update the formats and click on OK. (For more information about the Date and Timestamp formats supported by Privitar, see Date and Timestamp formats.)

  8. Click on Fetch Tables.

    The table details are fetched from the location specified and displayed in the left-hand pane.

    Use the Eye icon to the right of a table to inspect the contents of a table. For example:

    UUID-87c01620-e48d-e30a-2e69-35f909c3d656.png
  9. Pick the tables to import by selecting the checkbox alongside each table. The selected table is added to the list of tables to import in the right-hand pane.

    To quickly select all tables, use the Select All checkbox. You can also use Shift-click to select a number of tables.

    To de-select a table from the list of tables to import, click on the Trash icon.

  10. Click on Import tables to add the selected tables to the Schema.

    The tables are imported and the New Schema window is updated with the tables that have been imported.

  11. Click on a table in the left-hand pane to preview its definition on the right, and to make any edits to any of the columns included in the table. For more information about the editing actions and how to finalize the Schema definition, see Adding Tables and Columns to a Schema.

  12. Click on Save to save the new Schema.

    The new Schema is added to the list of Schemas on the Schemas page.