The following requirements and formats are necessary for validation through our AutoQC system. Files are checked for agreement with defined data dictionaries. They will also be checked for pedigree validity (are mothers female? Are children younger than parents?), cross-database accuracy (are cell_id (RUID) and ind_id pairs accurate according to Sampled (formerly IBX) records?), and diagnosis (are submitted diagnoses defined by the study?). Although the error-checking is automated, the AutoQC system will not automatically correct errors. You will be required to review error logs generated by the system, make the necessary changes, and re-submit a corrected package. There is no limit to the number of re-submissions allowable.
- Submissions must be contained in a .zip file containing all of the components listed below and marked (*) as required. Data files within the submission must be .txt or .csv format.
- File names must not contain any spaces and must match the formulas provided for each item in the list below.
- ID pairs must match Sampled (formerly IBX) records. Use the same ID (ind_id) that you used when reporting to Sampled (formerly IBX). Our AutoQC system will check for validity of your reported ind_id / cell_id (RUID) pairs against Sampled (formerly IBX) records.
- Case of data dictionaries must match that of uploaded files. For example,
ind_iddoes not match
Ind_idand will flag an error. The same applies to user-defined dictionaries.
- Order of columns must also match. Columns in the data dictionaries must be in the same order as the columns in your uploaded files, for both the Repository-defined files (_sub, _dx, etc.) and user-defined files (_phen, _phen_dd).
* = required file
Submission File* (FILENAME_SUB)
The main file in the submission, includes demographic information, ID numbers, and study-defined diagnoses.
This file is identical across all diseases, but is slightly different for Autism in that age must be reported in months rather than years..
Diagnosis File* (FILENAME_DX)
The purpose of this file is to define for future users what diagnoses were used in your study’s analyses and what diagnostic criteria were used.
Alternate ID File (FILENAME_ID)
Contains alternate IDs for individuals, like internal ID, NDAR GUID, dbGaP ID, additional Sampled (formerly IBX) sample IDs, etc. This file is required for studies submitting additional data to additional public repositories like NDA and dbGaP. Any ind_id in this file must have a record in the _sub file.
Extended Diagnostic Information File (FILENAME_EDX)
Contains diagnostic information using a validated standard diagnostic system [Diagnostic Code Sets] . The file allows submissions of multiple diagnoses per individual, but it is also important to report the standard diagnostic code for individuals with only one diagnosis. This file facilitates cross-study and cross-disorder analyses and is required for many studies – see the terms of your Notice of Grant award. Any ind_id in this file must have a record in the _sub file.
Race/Ethnicity File (FILENAME_RE)
Contains extended race and ethnicity information to supplement the standard NIMH race and ethnicity assignments. Multiple races can be reported per individual, and for their parents and grandparents. Any ind_id in this file must have a record in the _sub file.
Publications File (FILENAME_PUBS)
Contains publications to be reported for the submissions.
Study-Specific File(s) (FILENAME_PHEN)
You may submit files specific to your individual study, provided that corresponding data dictionaries are provided for each file. Most studies require at least one _phen file per submission. Refer to your Data Sharing Plan to determine how many different clinical instruments NIMH expects you to submit, and whether that data must be item-level or summary-level data. Any ind_id in this file must have a record in the _sub file.
Data Dictionaries for Study-Specific files (FILENAME_PHEN_DD)
Required to accompany any study-specific files. File names must match the corresponding files exactly, but with the “_dd” extension added.
For longitudinal studies submitting samples AND data from the same subject on multiple different dates, there are 3 Timepoint Files that must be submitted together with the rest of the submission package: Timepoint Definition (_tp), Timepoint Phenotypic File (_tp_phen), and Timepoint Phenotypic Data Dictionary File (_tp_phen_dd).
In addition to the above files, you may include additional information,like study acknowledgements, blank interview forms, training manuals, and any other study documentation that you believe useful for future users of your data and samples. These files must use PDF (.pdf), OpenDocument Text (.odt), or Text (.txt) extensions.