Creating a Schema from CSV
A Schema can be created by importing definitions from a CSV file stored locally. This option is particularly useful when you want to create a Schema very quickly to try out the functionality of the platform with local test data.
Only one CSV file can be read to create a Schema. The single CSV file will create a Schema with a single table. To create a Schema with multiple tables from multiple CSV files, you need to create a Schema using HDFS. Using HDFS, you can specify a directory that contains multiple CSV files. These files will be read into a Schema with multiple tables. For more information, see Creating a Schema from HDFS.
To create a Schema with a single table from a locally stored CSV file:
Select Schemas from the Navigation sidebar. The Schemas page is displayed.
Click on Create New Schema. The New Schema window is displayed.
Enter a name for the new Schema in the Name field.
Select Import from CSV (Local) from the Import tables list box. The Import from CSV (Local) window is displayed.
If required, edit the CSV settings to match the format of the CSV file that will be read to create the Schema.
The following table describes the CSV settings:
Setting
Description
CSV Delimiter
The default character that is used to separate the fields in a row.
CSV Escape Char
The character that starts an escape sequence. For example, you can escape a quote (using
\"
or\'
) within a quoted string to include these characters literally without being interpreted as a quote. For example,'It\'s OK'
is interpreted asIt's OK
.If you use an escape character outside of a quoted context, then it is not interpreted as an escape value but instead appears verbatim.
CSV Quote Char
The character that is used to delimit text strings in the input.
CSV Timestamp Format
The timestamp format used by the platform for processing columns containing Date or Timestamp fields.
If the tables that you are importing contain Date or Timestamp fields, the platform will import these tables according to the following default Date and Timestamp formats:
Date Format:
yyyy-MM-dd
Timestamp Format:
yyyy-MM-dd'T'HH:mm:ss
Click in the field to change the format.
(For more information about the Date and Timestamp formats supported by the platform, see Date and Timestamp formats.)
Contains Header Row
Check this box if the first row of the CSV file contains a header row. The column names in the header row will be read and used as the names for the columns in the Schema.
If the box is not checked, the default column names will be assigned to the columns in the Schema. That is,
_c0.
,_c1.
and so on.Click on Browse link to select a CSV file to upload, or drag and drop a CSV file directly on to the Source CSV File field.
The CSV file is read by the platform and the Import from CSV window is updated with information about the CSV file:
One table is available to be imported.
The name of the table is derived from the name of the CSV file. For example, if the file is called
text.csv
, the name of the table to be imported will betext
.Click on the Eye icon to inspect the columns in the table.
The table is automatically added to the right column as the table to be imported.
Click on Import tables to add the selected table to the Schema.
The table is imported and the New Schema window is updated with the tables that have been imported.
Click on a table in the left-hand pane to preview its definition on the right, and to make any edits to any of the columns included in the table. For more information about the editing actions and how to finalize the Schema definition, see Adding Tables and Columns to a Schema.
Click on Save to save the new Schema.
The new Schema is added to the list of Schemas on the Schemas page.