The following requirements and formats are necessary for validation through our AutoQC system. Files are checked for agreement with defined data dictionaries. They will also be checked for pedigree validity (are mothers female? Are children younger than parents?), cross-database accuracy (are cell_id (RUID) and ind_id pairs accurate according to RUCDR records?), and diagnosis (are submitted diagnoses defined by the study?). Although the error-checking is automated, the AutoQC system will not automatically correct errors. You will be required to review error logs generated by the system, make the necessary changes, and re-submit a corrected package. There is no limit to the number of re-submissions allowable.
- Submissions must be contained in a .zip file containing all of the components listed below and marked (*) as required.
- File names must not contain any spaces and must match the formulas provided for each item in the list below.
- ID pairs must match RUCDR records. Use the same ID (ind_id) that you used when reporting to RUCDR. Our AutoQC system will check for validity of your reported ind_id / cell_id (RUID) pairs against RUCDR records.
- Case of data dictionaries must match that of uploaded files. For example,
ind_iddoes not match
Ind_idand will flag an error. The same applies to user-defined dictionaries.
- Order of columns must also match. Columns in the data dictionaries must be in the same order as the columns in your uploaded files, for both the Repository-defined files (_sub, _dx, etc.) and user-defined files (_phen, _phen_dd).
File Formats and Naming
Your AutoQC submission requires a number of different file types. When you are creating files for submission, please consider these best practices for file naming:
- Do not include spaces or special characters (!@#$%^&*()?)
- Include your NIMH Study ID
- For phenotypic files, include the name or abbreviation of the clinical instrument
* = required file
Submission File* (FILENAME_SUB)
The main file in the submission, includes demographic information, ID numbers, and study-defined diagnoses.
This file is identical across all diseases, but is slightly different for Autism in that age must be reported in months rather than years..
Diagnosis File* (FILENAME_DX)
The purpose of this file is to define for future users what diagnoses were used in your study’s analyses and what diagnostic criteria were used. Must contain four columns: study_id (the unique id for the study), code (the diagnosis code, usually an abbreviation, e.g. CT), definition (what the code stands for), and description (details specifying what the code means).
Alternate ID File (FILENAME_ID)
Contains alternate IDs for individuals, like internal ID, NDAR GUID, dbGaP ID, additional RUCDR sample IDs, etc. This file is required for studies submitting additional data to additional public repositories like NDA and dbGaP. Any ind_id in this file must have a record in the _sub file.
Extended Diagnostic Information File (FILENAME_EDX)
Contains diagnostic information using a validated standard diagnostic system [text or excel] . The file allows submissions of multiple diagnoses per individual, but it is also important to report the standard diagnostic code for individuals with only one diagnosis. This file facilitates cross-study and cross-disorder analyses and is required for many studies – see the terms of your Notice of Grant award. Any ind_id in this file must have a record in the _sub file.
Race/Ethnicity File (FILENAME_RE)
Contains extended race and ethnicity information to supplement the standard NIMH race and ethnicity assignments. Multiple races can be reported per individual, and for their parents and grandparents. Any ind_id in this file must have a record in the _sub file.
Study-Specific File(s) (FILENAME_PHEN)
You may submit files specific to your individual study, provided that corresponding data dictionaries are provided for each file. Most studies require at least one _phen file per submission. Refer to your Data Sharing Plan to determine how many different clinical instruments NIMH expects you to submit, and whether that data must be item-level or summary-level data. Any ind_id in this file must have a record in the _sub file.
Data Dictionaries for Study-Specific files (FILENAME_PHEN_DD)
Required to accompany any study-specific files. File names must match the corresponding files exactly, but with the “_dd” extension added.
In addition to the above files, you may include additional information,like study acknowledgements, blank interview forms, training manuals, and any other study documentation that you believe useful for future users of your data and samples. These files must use PDF (.pdf), OpenDocument Text (.odt), or Text (.txt) extensions.