Jump to content

All Disorder Collections Updated July 18, 2022

Collection-specific updates:

  • The Anorexia distribution has been renamed to Eating Disorders.
  • SZ/CT Dataset 37: An additional genetic dataset is being made available on previously released subjects. MGS - Molecular Genetics of Schizophrenia - Studies 0, 6, 27 and 29. CNV detection with Birdsuite software of Affymetrix 6.0 genotyping array calls. Subjects were part of the Molecular Genetics of Schizophrenia study. Cases: 3721, Controls: 3851, 52 Families. Analysis data files provided by Dr. Douglas F. Levinson. The CT dataset, available in the Controls collection, is a subset of SZ Dataset 37 containing only controls from study 29.

General Updates: All disorder collections have undergone an overhaul to the distribution file format

  • Due to changes to the standard format for nrgr_ind_ids, some individuals from older studies have been reassigned a new nrgr_ind_id. Linking information for older versions of nrgr_ind_ids can be found in the alt_id file that accompanies each distribution. These entries have an 'alt_id_type' of 'nrgr_alias_id'.
  • The fam_id field has been renamed to nrgr_fam_id to better reflect the fact that the value is a concatenation of the study_id, site_id, and the fam_id. In the event that a nrgr_ind_id has been updated as explained above, the nrgr_fam_id will have also changed. In these cases, old fam_id values can be found in the alt_id file by taking the first 3 fields of the corresponding nrgr_alias_id for a given individual.
  • The values for the subject_type field have changed.

    Previous value/definitions were:

    Value Definition
    P Proband
    NP Not Proband
    D Dummy (used to link pedigrees)
    C Control

    New value/definitions are as follows:

    Value Definition Description
    P Primary Subject Within a family study this is an individual who has been subjected to a complete battery of clinical assessments (according to the standards of the study) and who typically has a sample. This category may include both affected (e.g., probands) and unaffected family members. Within a case/control study, this category represents affected cases.
    S Secondary Subject Within a family study this is an individual who has not been subjected to a complete battery of clinical assessments (according to the standards of the study). They do, however, have some useful clinical and demographic information and typically share relevant familial relations with other subjects. Subjects may be affected or unaffected. In addition, they typically have a sample. Note that this category is not applicable to case/control studies.
    E Excluded from Study Individuals who met exclusion criteria from a given study, but still have a sample, consent, and at least some demographic information available and therefore may be of use to other investigators.
    L Linker Individual that is not a consented study subject but who is required to construct a complete pedigree (e.g., parents in the case of twin studies). Besides relevant identifiers (ind_id, site_id, study_id, father_id, mother_id, fam_id), the only demographic information that may be included is sex. Also, there must be no sample. Please note that, if the subject is part of a family study and has a biomaterial or any clinical data as well as the appropriate level of consent, then they should be classified as a Secondary Subject (S).
    C Control Individual is a control.

    Old values have been converted to the new values in the following manner (with the exception of 'E' for which there was no previous value):

    Old Value New Value
    P P
    NP S
    D L
    C C
  • The following changes have been made to consent:
    • consent now includes an additional value of 'HMB' indicating that use of the data is limited to health/medical/biomedical purposes
    • A new value of 'consent_modifier' has been added to augment the existing 'consent' data. The table below outlines the values/definitions available for this field.
      Value Definition
      COL Requestor must provide a letter of collaboration with the primary study investigator(s).
      GSO Use of the data is limited to genetic studies only
      IRB Requestor must provide documentation of local IRB approval
      MDS Use of the data includes methods development research
      NAE Biospecimens cannot be engrafted onto animals or humans
      NPU Use of the data is limited to not-for-profit organizations
      PUB Requestor agrees to make results of studies using the data available to the larger scientific community
      RPU If the requestor is a for-profit entity, the requestor must secure permission to use samples directly from the primary study investigator(s)
  • The following new fields have been added to the distribution files:
    • ind_id - This field contains the individual id submitted to NRGR and for newer studies this also matches the id of record with the NRGR biorepository. Previously this value was stored in the alt_id file. By placing it in the distribution file directly it should be easier for investigators to have the mapping readily available along with demographic/diagnostic data.
    • genetic_dx - This field includes known genetic diagnoses that may have/had an impact on either the diagnosis submitted by the study (dx_study) or the harmonized diagnosis generated by NRGR (dx_nimh). In most cases for older studies this value is unknown, but we are reaching out to previously submitted studies and will provide updates when they are available.
    • consent_modifier - See consent changes mentioned above.
  • The following existing fields have been updated to include additional values:
    • race - Now includes an option 'R' for refused to answer
    • ethnicity - Now includes an option 'R' for refused to answer
    • sex - Now includes an option 'I' for intersex
    • zygosity - Now includes options 'A' for adopted and 'S' for singleton. 'TZ' for trizygotic has been replaced by 'PZ' for polyzygotic to allow for higher values than 3. The other values remain unchanged.
    • consent - See consent changes mentioned above.
  • The 'dist' field definition has been updated to better communicate its contents. This column represents the first version of the distribution into which a study was released.
  • In all distribution files values of 'NULL' for dx_confidence have been updated to '1' (provided the dx_study value for the same individual is not NULL). The definition for a '1' in this field is 'Unknown; diagnosis asserted but completeness of diagnosis and status of supporting information is unknown'.
  • In all distributions where the dx_system is reported as 'study_xx' (where xx is the study_id) the value has been changed to 'SS' to indicate a study-specific diagnosis.
  • The family_race field has been removed. In many cases the data presented here were potentially misleading without several generations worth of information.