Download Data Overview
Standard Data Files
All collections have a set of standard data files. The primary data file is the Distribution File, containing demographic, diagnostic, and pedigree information, RUCDR sample IDs (cell_id) and consent level. You can use the Data Explorer to query the Distribution File.
For some studies that submitted data to additional data repositories (like NDA or dbGaP) the linking IDs for those repositories are contained in the Alternate ID file. Original study IDs are also included in this file when the ID has been harmonized to the standard NRGR format. There are ongoing efforts to collect linking IDs that are not yet submitted to NRGR.
Studies in multiple collections
In some cases, there are studies cross-listed in multiple different collections. For example:
- iPSC – When iPSCs are generated from existing cell lines, they are added into the iPSC collection. The original study may be from the SZ, BP, 22Q, or other disorder-specific collections.
- Controls – The controls distribution contains individuals recruited specifically for their lack of mental disorders with consent for General Research Use. For some studies, the controls that they collected for use with their cases in a specific disorder are released separately in this collection. This is true for several studies in the BP and SZ collections.
Not all subjects in a collection have available biomaterials. For example:
- family member with only phenotypic data
- "Dummy" subject record created only to build complete pedigree drawings
- Subject once had a biomaterial, but the sample is depleted
Additional Data Files
Collections also include additional data files for some studies, which cannot be queried in Data Explorer and must be downloaded here. The amount and type of data available depends upon the Data Sharing agreements with NIMH, and the way that the data was collected and released.
- Study Descriptions – The Study Description files contain information submitted by the studies to give context about the study design and resulting publications, and sometimes notes from NRGR about the released data.
- Pedigree Drawings – for studies with family data, Pedigree drawings are available for download. Alternatively, you can use our Pedigree Generation tool to generate them.
Overview tables display the kind of genetic data available on the download data page. In general terms, there are two sources of genetic data for download at NRGR:
- Primary source genetic data - from the investigator who collected the samples. Some of these datasets also include subjects from other studies, for instance when the primary investigator applied for access to NRGR controls. In those cases, we keep the genetic data together rather than splitting it for release in different collections.
- Secondary source genetic data - generated not by the investigators who collected the samples, but by investigators who applied for access to the biomaterials and generated new data. In many of these cases, the genetic dataset includes subjects across multiple studies.
Clinical Instrument Data
Some studies have very limited or no additional data, while others have submitted databases of item-level data.