The Omicia OMIM track is intended for research purposes. The data are obtained from OMIM and mapped by computational methods and/or human annotation, and may contain errors.
This track contains the OMIM allelic variants mapped to the human genome sequence. To obtain a more "genome-centric" view of OMIM and increase its usefulness to the community, Omicia has mapped OMIM variants as nucleotide changes leading to protein changes. Specifically, OMIM allelic variants with headings that refer to amino acid mutations, e.g., ARG123CYS, and allelic variants that refer to mutations within or relative to introns, e.g., IVS1AS G-A -1, are mapped.
There are two subtracks: one track, produced for all genes with mappable allelic variant data within OMIM, where the data are mapped by computational methods; a second track, determined for approximately 50 genes, where the data from the computational mapping method are reviewed and confirmed for accuracy, and for which additional allelic variants that could not be mapped computationally are mapped by hand.
The track is color-coded by score. (See Methods for details on scoring.)
Often this is non-trivial since protein sequences derived from the reference human genome differ from those in early publications. Omicia has employed a shifting algorithm to resolve discrepancies, whereby a best alignment is found between the reference sequence and the data available directly from OMIM. If a single nucleotide change can represent an OMIM variant, then it is given a score of 1. Mutations are scored by how well they overlap with data from other sources, for example dbSNP (overlap with an entry from dbSNP increases the score by +1, validation code "d"), as well as additional data sources when available (increase the score by +1.5, validation code "c"), and whether they overlap with variants mapped between OMIM and dbSNP by the NCBI (increase the score by +2, validation code "D"). Mapped variants that require shifting have reduced scores based on the quality of the shift, as assessed by the percentage of variants for a gene that are mapped when shifted. Shifted variants receive the validation code "s".
To map variants in and near introns, it is first necessary to determine the exon and intron coordinates for each transcript, numbering the first exon as exon 1, followed by intron 1. All exons are counted, including those that become untranslated regions. Intron variants are scored based on the information in the annotation. If the acceptor or donor site is reported, the variant receives a starting score of 1.5. If the annotation does not contain the AS/DS designation, then the variant is assumed to be relative to the AS if the sign of the position is noted as positive (e.g., IVS3, G-A, +1), and DS if the sign of the position is negative (IVS9, G-C, -1). These variants receive a starting score of 1.0. Results are only reported and scored if the nucleotide in the annotation matches the nucleotide determined at the mapped position. For the case of IVS1DS, T>C, +2, if the +2 position from the donor site were not T, then this variant would not be scored. Intron variants that do not have the AS or DS designation in the OMIM annotation receive a penalty score of 0.5, resulting in a total score of 1.0. The score is also reduced by 0.25 if the OMIM annotation indicates that it is in the exon (e.g., IVS1AS, G-A, +3, validation code "o"), and is increased by 0.5 if the position is greater than 2 nucleotides from the splice site (validation code of "f").
In cases where there is more than one transcript of a gene, amino acid and intron variants are mapped and scored for each transcript.
Approximately 50 genes have been selected for hand curation of alleles. Omicia selected these genes with the assistance of UCSC, and included 54 genes frequently queried over a six month period spanning 2006-2007. Additional genes were selected based on feedback from the community. The hand curation focused on confirming the accuracy of the computationally-mapped allelic variants and mapping the allelic variants that could not be mapped by computational methods, including nucleotide changes in non-coding regions, small deletions and insertions, and rearrangements.
OMIM
Publication is in preparation.