M-CAP eliminates a majority of variants of uncertain significance in clinical exomes at high sensitivity

Karthik A Jagadeesh; Aaron M Wenger; Mark J Berger; Harendra Guturu; Peter D Stenson; David N Cooper; Jonathan A Bernstein; Gill Bejerano

doi:10.1038/ng.3703

Abstract

Variant pathogenicity classifiers such as SIFT, PolyPhen-2, CADD, and MetaLR assist in interpretation of the hundreds of rare, missense variants in the typical patient genome by deprioritizing some variants as likely benign. These widely used methods misclassify 26 to 38% of known pathogenic mutations, which could lead to missed diagnoses if the classifiers are trusted as definitive in a clinical setting. We developed M-CAP, a clinical pathogenicity classifier that outperforms existing methods at all thresholds and correctly dismisses 60% of rare, missense variants of uncertain significance in a typical genome at 95% sensitivity.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

Rent or Buy

All prices are NET prices.

References

1.
Yang, Y. et al. Clinical whole-exome sequencing for the diagnosis of Mendelian disorders. N. Engl. J. Med. 369, 1502–1511 (2013).
2.
Iglesias, A. et al. The usefulness of whole-exome sequencing in routine clinical practice. Genet. Med. 16, 922–931 (2014).
3.
Lee, H. et al. Clinical exome sequencing for genetic identification of rare Mendelian disorders. J. Am. Med. Assoc. 312, 1880–1887 (2014).
4.
Brownstein, C.A. et al. An international effort towards developing standards for best practices in analysis, interpretation and reporting of clinical genome sequencing results in the CLARITY Challenge. Genome Biol. 15, R53 (2014).
5.
Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–424 (2015).
6.
Ng, S.B. et al. Targeted capture and massively parallel sequencing of 12 human exomes. Nature 461, 272–276 (2009).
7.
Simpson, M.A. et al. Mutations in NOTCH2 cause Hajdu–Cheney syndrome, a disorder of severe and progressive bone loss. Nat. Genet. 43, 303–305 (2011).
8.
Ng, S.B. et al. Exome sequencing identifies the cause of a mendelian disorder. Nat. Genet. 42, 30–35 (2010).
9.
Taylor, J.C. et al. Factors influencing success of clinical genome sequencing across a broad spectrum of disorders. Nat. Genet. 47, 717–726 (2015).
10.
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
11.
1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
12.
Rehm, H.L. et al. ACMG clinical laboratory standards for next-generation sequencing. Genet. Med. 15, 733–747 (2013).
13.
Ng, P.C. & Henikoff, S. SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res. 31, 3812–3814 (2003).
14.
Adzhubei, I.A. et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249 (2010).
15.
Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).
16.
Dong, C. et al. Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies. Hum. Mol. Genet. 24, 2125–2137 (2015).
17.
Hastie, T., Tibshirani, R. & Friedman, J. Elements of Statistical Learning (Springer, 2003).
- - Google Scholar
18.
Fusi, N., Smith, I., Doench, J. & Listgarten, J. In silico predictive modeling of CRISPR/Cas9 guide efficiency. Preprint at bioRxiv http://dx.doi.org/10.1101/021568 (2015).
- - Google Scholar
19.
Ogutu, J.O., Piepho, H.-P. & Schulz-Streeck, T. A comparison of random forests, boosting and support vector machines for genomic selection. BMC Proc. 5(Suppl. 3), S11 (2011).
20.
Schwarz, J.M., Cooper, D.N., Schuelke, M. & Seelow, D. MutationTaster2: mutation prediction for the deep-sequencing age. Nat. Methods 11, 361–362 (2014).
21.
Reva, B., Antipin, Y. & Sander, C. Predicting the functional impact of protein mutations: application to cancer genomics. Nucleic Acids Res. 39, e118 (2011).
22.
Shihab, H.A. et al. Predicting the functional, molecular, and phenotypic consequences of amino acid substitutions using hidden Markov models. Hum. Mutat. 34, 57–65 (2013).
23.
Chun, S. & Fay, J.C. Identification of deleterious mutations within three human genomes. Genome Res. 19, 1553–1561 (2009).
24.
Petrovski, S., Wang, Q., Heinzen, E.L., Allen, A.S. & Goldstein, D.B. Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 9, e1003709 (2013).
25.
Pollard, K.S., Hubisz, M.J., Rosenbloom, K.R. & Siepel, A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20, 110–121 (2010).
26.
Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005).
27.
Henikoff, S. & Henikoff, J.G. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, 10915–10919 (1992).
28.
Garber, M. et al. Identifying novel constrained elements by exploiting biased substitution patterns. Bioinformatics 25, i54–i62 (2009).
29.
Davydov, E.V. et al. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput. Biol. 6, e1001025 (2010).
30.
Kuhn, R.M., Haussler, D. & Kent, W.J. The UCSC genome browser and associated tools. Brief. Bioinform. 14, 144–161 (2013).
31.
Stenson, P.D. et al. The Human Gene Mutation Database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine. Hum. Genet. 133, 1–9 (2014).
32.
Ionita-Laza, I., McCallum, K., Xu, B. & Buxbaum, J.D. A spectral approach integrating functional genomic annotations for coding and noncoding variants. Nat. Genet. 48, 214–220 (2016).
33.
UniProt Consortium. UniProt: a hub for protein information. Nucleic Acids Res. 43, D204–D212 (2015).
34.
Yang, Y. et al. Molecular findings among patients referred for clinical whole-exome sequencing. J. Am. Med. Assoc. 312, 1870–1879 (2014).
35.
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).
36.
Liu, X., Jian, X. & Boerwinkle, E. dbNSFP: a lightweight database of human nonsynonymous SNPs and their functional predictions. Hum. Mutat. 32, 894–899 (2011).
37.
Liu, X., Jian, X. & Boerwinkle, E. dbNSFP v2.0: a database of human non-synonymous SNVs and their functional predictions and annotations. Hum. Mutat. 34, E2393–E2402 (2013).
38.
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
- - Google Scholar
39.
Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26, 589–595 (2010).
40.
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

Download references

Acknowledgements

We thank the members of the Bejerano laboartory, particularly J. Notwell, S. Chinchali, and J. Birgmeier, for technical advice and helpful discussions. P.D.S. and D.N.C. receive financial support from Qiagen through a license agreement with Cardiff University. We thank the PolyPhen-2, CADD, Eigen, FATHMM, MutationTaster, and MetaLR teams for making their training and testing data readily available. This work was funded in part by the Stanford Pediatrics Department, DARPA, a Packard Foundation Fellowship, and a Microsoft Faculty Fellowship to G.B.

Author information

- Karthik A Jagadeesh
- & Aaron M Wenger
These authors contributed equally to this work.

Affiliations

Department of Computer Science, Stanford University, Stanford, California, USA.
- Karthik A Jagadeesh
- , Mark J Berger
- & Gill Bejerano
Department of Pediatrics, Stanford University, Stanford, California, USA.
- Aaron M Wenger
- , Harendra Guturu
- , Jonathan A Bernstein
- & Gill Bejerano
Department of Medical Genetics, Cardiff University, Heath Park, Cardiff, UK.
- Peter D Stenson
- & David N Cooper
Department of Developmental Biology, Stanford University, Stanford, California, USA.
- Gill Bejerano

Authors

Search for Karthik A Jagadeesh in:
- Nature Research journals •
- PubMed •
- Google Scholar
Search for Aaron M Wenger in:
- Nature Research journals •
- PubMed •
- Google Scholar
Search for Mark J Berger in:
- Nature Research journals •
- PubMed •
- Google Scholar
Search for Harendra Guturu in:
- Nature Research journals •
- PubMed •
- Google Scholar
Search for Peter D Stenson in:
- Nature Research journals •
- PubMed •
- Google Scholar
Search for David N Cooper in:
- Nature Research journals •
- PubMed •
- Google Scholar
Search for Jonathan A Bernstein in:
- Nature Research journals •
- PubMed •
- Google Scholar
Search for Gill Bejerano in:
- Nature Research journals •
- PubMed •
- Google Scholar

Contributions

K.A.J., A.M.W., M.J.B., and G.B. designed the study and analyzed results. K.A.J. and M.J.B. implemented the model and performed the experiments. K.A.J., A.M.W., and H.G. wrote software tools that were used for analysis. P.D.S. and D.N.C. curated the HGMD data and provided feedback. J.A.B. provided patient exome cases and feedback. K.A.J., A.M.W., and G.B. wrote the manuscript. All authors reviewed and commented on the manuscript.

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Gill Bejerano.

Supplementary information

PDF files

1.
Supplementary Text and Figures
Supplementary Tables 1–4, 6, 9 and 10.

Excel files

1.
Supplementary Table 5
M-CAP scores for disease-causing mutations found in BRCA1, BRCA2, CFTR and MLL2.
2.
Supplementary Table 7
Clinical phenotypes for case study patients.
3.
Supplementary Table 8
Rare missense variants in case study patients.

Rights and permissions

To obtain permission to re-use content from this article visit RightsLink.

About this article

Publication history

Received

30 June 2016

Accepted

26 September 2016

Published

24 October 2016

DOI

https://doi.org/10.1038/ng.3703

Subjects

Abstract

Access optionsAccess options

References

Acknowledgements

Author information

Author notes

Affiliations

Department of Computer Science, Stanford University, Stanford, California, USA.

Department of Pediatrics, Stanford University, Stanford, California, USA.

Department of Medical Genetics, Cardiff University, Heath Park, Cardiff, UK.

Department of Developmental Biology, Stanford University, Stanford, California, USA.

Authors

Search for Karthik A Jagadeesh in:

Search for Aaron M Wenger in:

Search for Mark J Berger in:

Search for Harendra Guturu in:

Search for Peter D Stenson in:

Search for David N Cooper in:

Search for Jonathan A Bernstein in:

Search for Gill Bejerano in: