Data Dictionary File
Data Dictionary File
Column Name | Description | Examples | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
name | Name of the column. No spaces. | ind_id, site_id, etc. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
unit | Units of the data collected in the column. | For a column weight, the units can be pounds, kilograms, etc. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
type | Data type of the information collected in the column. It can be string, integer, decimal and can be combined with fixed_set, or encoded. |
* For fixed-set and encoded values, the corresponding values in the file must match EXACTLY, in both case and content. Using the "string, fixed_set" example above, "schizophrenia,disorganized" or "schizophrenia, Disorganized" will NOT be recognized as "schizophrenia, disorganized"
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
min |
Valid for numeric data. The minimum value allowed for the data. |
You may use pre-defined variables in min/max columns, which will be substituted by the system. Available variables are current_year or current_date.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
max | Valid for numeric data. The maximum value allowed for the data. |
You may use pre-defined variables in min/max columns, which will be substituted by the system. Available variables are current_year or current_date.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
min_length |
Valid for string data. The data value must have a minimum length of min_length This column will be ignored if the entered "type" is integer or decimal |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
max_length |
Valid for string data. The data value must have a maximum length of max_length This column will be ignored if the entered "type" is integer or decimal |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
unique |
Assign a one or more labels (separated by a pipe '|') to identify a column, or a set of columns whose values should be unique. Note: Unique columns allow null values. |
For a column ind_id which should contain unique data the value can be set to u_ind_id For a set of columns site_id, family_id, subject_id which combined should contain unique data the value can be set to u_sfs_id. The same label u_sfs_id should be assigned for all three columns
Example: Desired effect (user, col_a), and (user, col_b) combinations should be unique.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
mandatory |
Columns which do not allow null values.
|
Expression Grammar Access value of a field using format, c["<column-name>"], i.e. c["sex"], etc. Constant string should be enclosed in single or double quotes, i.e "a" or 'a' Constant integers should not be quoted, i.e. 159, 35.0 Constant booleans should be either True or False Constant date's must be reprsented as date( "<date>"), where
<date> is date in IS0 8601 format
Examples
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
primary_key | For column, or set of columns which combined uniquely identify records in the phenotypic file set this to Y |
If site_id, family_id, subject_id uniquely identify records in a file
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
resolution | Valid for floating point values. |
Data contained in the column will be truncated to n digits after the decimal point. If the data is 4.1212, and resolution for column is specified to 2 the data will be truncated to 4.12. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
values | Valid when the type column contains fixed_set or encoded value. |
Fixed Set * Specifies the set of values which are allowed in the column. Example: For a column Gender the valid values are Male/Female/Unknown Then values column should contain Male|Female|Unknown Format: <value_1>|<value_2>|..|<value_n> separated by a pipe '|'. Encoded * Specifies the set of values which are allowed in the column and their corresponding meaning. Example: For a column Gender the valid values are M/F/U Then values column should contain M=Male|F=Female|U=Unknown Format: <abbr_1>=<value_1>|<abbr_2>=<value_2>|..|<abbr_n>=<value_n> separated by a pipe '|'.
* For fixed-set and encoded values, the corresponding values in the file must match EXACTLY, in both case and content. Using the "Encoded" example above, "m" will NOT be recognized as "M" |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
description |
Detailed text describing the information contained in this column. This field cannot be left empty. |
Advanced Examples
Consider a column Age which contains integer values between 0 and 120. The column has specially designated values -1 for Missing, -2 for Not collected
name | type | min | max | values |
---|---|---|---|---|
age | integer,encoded | 0 | 120 | -1=Missing|-2=Not Collected |
Consider a column Twins which can contain only the following values Monozygotic, or Dizygotic.
name | type | values |
---|---|---|
Twins | string,fixed_set | Monozygotic|Dizygotic |