1977 B.A, Chemistry and Biology, University of Rochester, NY
1978-1982 Ph.D. California Institute of Technology, CA Advisor: Dr. Norman Davidson
1982-1986 Postdoctoral Research Stanford University School of Medicine, CA Advisor: Dr. Ronald Davis
1986-2009 Faculty Dept of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT
2009-present Dept of Genetics, Stanford University School of Medicine, Stanford, CA

Administrative Appointments

  • Chair, Dept. of Genetics (2009 - Present)
  • Director, Center for Genomics and Personalized Medicine (2009 - Present)

Research & Scholarship

Current Research and Scholarly Interests

We are presently in an omics revolution in which genomes and other omes can be readily characterized. Our laboratory uses a variety of approaches to analyze genomes and regulatory networks. Our research focuses on yeast, an ideal model organism ideally suited to genetic analysis, and humans.

1) Transcriptomes
To annotate genomes, we developed RNA sequencing for annotation the yeast and human transcriptomes. We discovered that the eukaryotic transcriptome is much more complex than previously appreciated and that embryonic stem cells have more transcript isoforms than differentiated cells.

2) Transcription Factor Binding Networks
We have also developed methods for mapping transcription factor binding sites through the genome. We used this to develop regulatory maps and have been using this to help decipher the combinatorial regulatory code – which factors work together to regulate which genes. Using this approach we have mapped out pathways crucial for metabolism and inflammation.

3) Integrated Regulatory Networks
In addition to transcriptional factor binding networks we have also been mapping phosphorylation and metabolite-protein interaction networks. These studies have revealed novel global regulators and key points in integrated regulatory networks.

4) Variation
We have been analyzing differences between individuals and species at two levels: DNA sequence variation and regulatory information variations. We developed paired end sequencing for humans and found that humans have extensive structural variation (SV), i.e. deletions, insertions and inversions. This is likely to be a major cause of phenotypic variation and human disease. In addition, by mapping binding sites difference among different yeast strains and humans, we have found that individuals differ much more in their regulatory information than in coding sequence differences. We can correlate these differences with those in SNPS and SVs, thereby associating noncoding DNA differences with regulatory information.

5) Human Disease
Finally, we are applying omics approaches of genome sequencing, transcriptomics proteomics metabolomics, DNA methylation and microbiome assays to the analysis of human disease. These integrative omics approaches are being applied to help understand the molecular basis of disease and the development of diagnostics and therapeutics.

Clinical Trials

  • Understanding and Diagnosing Allergic Disease in Twins Recruiting

    The purpose of this study is to gain better understanding of how the immune system works in twins with and without allergic disease. Healthy volunteers are not specifically targeted. Healthy non-allergic study participants may be found through the course of evaluation for the presence of allergies.

    View full details


2017-18 Courses

Graduate and Fellowship Programs


All Publications

  • Fetal de novo mutations and preterm birth. PLoS genetics Li, J., Oehlert, J., Snyder, M., Stevenson, D. K., Shaw, G. M. 2017; 13 (4)


    Preterm birth (PTB) affects ~12% of pregnancies in the US. Despite its high mortality and morbidity, the molecular etiology underlying PTB has been unclear. Numerous studies have been devoted to identifying genetic factors in maternal and fetal genomes, but so far few genomic loci have been associated with PTB. By analyzing whole-genome sequencing data from 816 trio families, for the first time, we observed the role of fetal de novo mutations in PTB. We observed a significant increase in de novo mutation burden in PTB fetal genomes. Our genomic analyses further revealed that affected genes by PTB de novo mutations were dosage sensitive, intolerant to genomic deletions, and their mouse orthologs were likely developmentally essential. These genes were significantly involved in early fetal brain development, which was further supported by our analysis of copy number variants identified from an independent PTB cohort. Our study indicates a new mechanism in PTB occurrence independently contributed from fetal genomes, and thus opens a new avenue for future PTB research.

    View details for DOI 10.1371/journal.pgen.1006689

    View details for PubMedID 28388617

  • De novo and rare mutations in the HSPA1L heat shock gene associated with inflammatory bowel disease GENOME MEDICINE Takahashi, S., Andreoletti, G., Chen, R., Munehira, Y., Batra, A., Afzal, N. A., Beattie, R. M., Bernstein, J. A., Ennis, S., Snyder, M. 2017; 9


    Inflammatory bowel disease (IBD) is a chronic, relapsing inflammatory disease of the gastrointestinal tract which includes ulcerative colitis and Crohn's disease. Genetic risk factors for IBD are not well understood.We performed a family-based whole exome sequencing (WES) analysis on a core family (Family A) to identify potential causal mutations and then analyzed exome data from a Caucasian pediatric cohort (136 patients and 106 controls) to validate the presence of mutations in the candidate gene, heat shock 70 kDa protein 1-like (HSPA1L). Biochemical assays of the de novo and rare (minor allele frequency, MAF < 0.01) mutation variant proteins further validated the predicted deleterious effects of the identified alleles.In the proband of Family A, we found a heterozygous de novo mutation (c.830C > T; p.Ser277Leu) in HSPA1L. Through analysis of WES data of 136 patients, we identified five additional rare HSPA1L mutations (p.Gly77Ser, p.Leu172del, p.Thr267Ile, p.Ala268Thr, p.Glu558Asp) in six patients. In contrast, rare HSPA1L mutations were not observed in controls, and were significantly enriched in patients (P = 0.02). Interestingly, we did not find non-synonymous rare mutations in the HSP70 isoforms HSPA1A and HSPA1B. Biochemical assays revealed that all six rare HSPA1L variant proteins showed decreased chaperone activity in vitro. Moreover, three variants demonstrated dominant negative effects on HSPA1L and HSPA1A protein activity.Our results indicate that de novo and rare mutations in HSPA1L are associated with IBD and provide insights into the pathogenesis of IBD, and also expand our understanding of the roles of HSP70s in human disease.

    View details for DOI 10.1186/s13073-016-0394-9

    View details for Web of Science ID 000393834000001

    View details for PubMedID 28126021

  • Digital Health: Tracking Physiomes and Activity Using Wearable Biosensors Reveals Useful Health-Related Information. PLoS biology Li, X., Dunn, J., Salins, D., Zhou, G., Zhou, W., Schüssler-Fiorenza Rose, S. M., Perelman, D., Colbert, E., Runge, R., Rego, S., Sonecha, R., Datta, S., McLaughlin, T., Snyder, M. P. 2017; 15 (1)


    A new wave of portable biosensors allows frequent measurement of health-related physiology. We investigated the use of these devices to monitor human physiological changes during various activities and their role in managing health and diagnosing and analyzing disease. By recording over 250,000 daily measurements for up to 43 individuals, we found personalized circadian differences in physiological parameters, replicating previous physiological findings. Interestingly, we found striking changes in particular environments, such as airline flights (decreased peripheral capillary oxygen saturation [SpO2] and increased radiation exposure). These events are associated with physiological macro-phenotypes such as fatigue, providing a strong association between reduced pressure/oxygen and fatigue on high-altitude flights. Importantly, we combined biosensor information with frequent medical measurements and made two important observations: First, wearable devices were useful in identification of early signs of Lyme disease and inflammatory responses; we used this information to develop a personalized, activity-based normalization framework to identify abnormal physiological signals from longitudinal data for facile disease detection. Second, wearables distinguish physiological differences between insulin-sensitive and -resistant individuals. Overall, these results indicate that portable biosensors provide useful information for monitoring personal activities and physiology and are likely to play an important role in managing health and enabling affordable health care access to groups traditionally limited by socioeconomic class or remote geography.

    View details for DOI 10.1371/journal.pbio.2001402

    View details for PubMedID 28081144

  • Patient-Specific iPSC-Derived Endothelial Cells Uncover Pathways that Protect against Pulmonary Hypertension in BMPR2 Mutation Carriers. Cell stem cell Gu, M., Shao, N., Sa, S., Li, D., Termglinchan, V., Ameen, M., Karakikes, I., Sosa, G., Grubert, F., Lee, J., Cao, A., Taylor, S., Ma, Y., Zhao, Z., Chappell, J., Hamid, R., Austin, E. D., Gold, J. D., Wu, J. C., Snyder, M. P., Rabinovitch, M. 2016


    In familial pulmonary arterial hypertension (FPAH), the autosomal dominant disease-causing BMPR2 mutation is only 20% penetrant, suggesting that genetic variation provides modifiers that alleviate the disease. Here, we used comparison of induced pluripotent stem cell-derived endothelial cells (iPSC-ECs) from three families with unaffected mutation carriers (UMCs), FPAH patients, and gender-matched controls to investigate this variation. Our analysis identified features of UMC iPSC-ECs related to modifiers of BMPR2 signaling or to differentially expressed genes. FPAH-iPSC-ECs showed reduced adhesion, survival, migration, and angiogenesis compared to UMC-iPSC-ECs and control cells. The "rescued" phenotype of UMC cells was related to an increase in specific BMPR2 activators and/or a reduction in inhibitors, and the improved cell adhesion could be attributed to preservation of related signaling. The improved survival was related to increased BIRC3 and was independent of BMPR2. Our findings therefore highlight protective modifiers for FPAH that could help inform development of future treatment strategies.

    View details for DOI 10.1016/j.stem.2016.08.019

    View details for PubMedID 28017794

  • Simul-seq: combined DNA and RNA sequencing for whole-genome and transcriptome profiling. Nature methods Reuter, J. A., Spacek, D. V., Pai, R. K., Snyder, M. P. 2016; 13 (11): 953-958


    Paired DNA and RNA profiling is increasingly employed in genomics research to uncover molecular mechanisms of disease and to explore personal genotype and phenotype correlations. Here, we introduce Simul-seq, a technique for the production of high-quality whole-genome and transcriptome sequencing libraries from small quantities of cells or tissues. We apply the method to laser-capture-microdissected esophageal adenocarcinoma tissue, revealing a highly aneuploid tumor genome with extensive blocks of increased homozygosity and corresponding increases in allele-specific expression. Among this widespread allele-specific expression, we identify germline polymorphisms that are associated with response to cancer therapies. We further leverage this integrative data to uncover expressed mutations in several known cancer genes as well as a recurrent mutation in the motor domain of KIF3B that significantly affects kinesin-microtubule interactions. Simul-seq provides a new streamlined approach for generating comprehensive genome and transcriptome profiles from limited quantities of clinically relevant samples.

    View details for DOI 10.1038/nmeth.4028

    View details for PubMedID 27723755

  • Identification of significantly mutated regions across cancer types highlights a rich landscape of functional molecular alterations NATURE GENETICS Araya, C. L., Cenik, C., Reuters, J. A., Kiss, G., Pande, V. S., Snyder, M. P., Greenleaf, W. J. 2016; 48 (2): 117-125

    View details for DOI 10.1038/ng.3471

    View details for Web of Science ID 000369043900008

  • Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nature communications Yu, K., Zhang, C., Berry, G. J., Altman, R. B., Ré, C., Rubin, D. L., Snyder, M. 2016; 7: 12474-?


    Lung cancer is the most prevalent cancer worldwide, and histopathological assessment is indispensable for its diagnosis. However, human evaluation of pathology slides cannot accurately predict patients' prognoses. In this study, we obtain 2,186 haematoxylin and eosin stained histopathology whole-slide images of lung adenocarcinoma and squamous cell carcinoma patients from The Cancer Genome Atlas (TCGA), and 294 additional images from Stanford Tissue Microarray (TMA) Database. We extract 9,879 quantitative image features and use regularized machine-learning methods to select the top features and to distinguish shorter-term survivors from longer-term survivors with stage I adenocarcinoma (P<0.003) or squamous cell carcinoma (P=0.023) in the TCGA data set. We validate the survival prediction framework with the TMA cohort (P<0.036 for both tumour types). Our results suggest that automatically derived image features can predict the prognosis of lung cancer patients and thereby contribute to precision oncology. Our methods are extensible to histopathology images of other organs.

    View details for DOI 10.1038/ncomms12474

    View details for PubMedID 27527408

  • Synthetic long-read sequencing reveals intraspecies diversity in the human microbiome. Nature biotechnology Kuleshov, V., Jiang, C., Zhou, W., Jahanbani, F., Batzoglou, S., Snyder, M. 2016; 34 (1): 64-69


    Identifying bacterial strains in metagenome and microbiome samples using computational analyses of short-read sequences remains a difficult problem. Here, we present an analysis of a human gut microbiome using TruSeq synthetic long reads combined with computational tools for metagenomic long-read assembly, variant calling and haplotyping (Nanoscope and Lens). Our analysis identifies 178 bacterial species, of which 51 were not found using shotgun reads alone. We recover bacterial contigs that comprise multiple operons, including 22 contigs of >1 Mbp. Furthermore, we observe extensive intraspecies variation within microbial strains in the form of haplotypes that span up to hundreds of Kbp. Incorporation of synthetic long-read sequencing technology with standard short-read approaches enables more precise and comprehensive analyses of metagenomic samples.

    View details for DOI 10.1038/nbt.3416

    View details for PubMedID 26655498

  • Identification of Human Neuronal Protein Complexes Reveals Biochemical Activities and Convergent Mechanisms of Action in Autism Spectrum Disorders CELL SYSTEMS Li, J., Ma, Z., Shi, M., Malty, R. H., Aoki, H., Minic, Z., Phanse, S., Jin, K., Wall, D. P., Zhang, Z., Urban, A. E., Hallmayer, J., Babu, M., Snyder, M. 2015; 1 (5): 361-374


    The prevalence of autism spectrum disorders (ASDs) is rapidly growing, yet its molecular basis is poorly understood. We used a systems approach in which ASD candidate genes were mapped onto the ubiquitous human protein complexes and the resulting complexes were characterized. The studies revealed the role of histone deacetylases (HDAC1/2) in regulating the expression of ASD orthologs in the embryonic mouse brain. Proteome-wide screens for the co-complexed subunits with HDAC1 and six other key ASD proteins in neuronal cells revealed a protein interaction network, which displayed preferential expression in fetal brain development, exhibited increased deleterious mutations in ASD cases, and were strongly regulated by FMRP and MECP2 causal for Fragile X and Rett syndromes, respectively. Overall, our study reveals molecular components in ASD, suggests a shared mechanism between the syndromic and idiopathic forms of ASDs, and provides a systems framework for analyzing complex human diseases.

    View details for DOI 10.1016/j.cels.2015.11.002

    View details for Web of Science ID 000209926300009

    View details for PubMedCentralID PMC4776331

  • Genetic Control of Chromatin States in Humans Involves Local and Distal Chromosomal Interactions CELL Grubert, F., Zaugg, J. B., Kasowski, M., Ursu, O., Spacek, D. V., Martin, A. R., Greenside, P., Srivas, R., Phanstiel, D. H., Pekowska, A., Heidari, N., Euskirchen, G., Huber, W., Pritchard, J. K., Bustamante, C. D., Steinmetz, L. M., Kundaje, A., Snyder, M. 2015; 162 (5): 1051-1065


    Deciphering the impact of genetic variants on gene regulation is fundamental to understanding human disease. Although gene regulation often involves long-range interactions, it is unknown to what extent non-coding genetic variants influence distal molecular phenotypes. Here, we integrate chromatin profiling for three histone marks in lymphoblastoid cell lines (LCLs) from 75 sequenced individuals with LCL-specific Hi-C and ChIA-PET-based chromatin contact maps to uncover one of the largest collections of local and distal histone quantitative trait loci (hQTLs). Distal QTLs are enriched within topologically associated domains and exhibit largely concordant variation of chromatin state coordinated by proximal and distal non-coding genetic variants. Histone QTLs are enriched for common variants associated with autoimmune diseases and enable identification of putative target genes of disease-associated variants from genome-wide association studies. These analyses provide insights into how genetic variation can affect human disease phenotypes by coordinated changes in chromatin at interacting regulatory elements.

    View details for DOI 10.1016/j.cell.2015.07.048

    View details for Web of Science ID 000360589900015

    View details for PubMedCentralID PMC4556133

  • Recurrent somatic mutations in regulatory regions of human cancer genomes. Nature genetics Melton, C., Reuter, J. A., Spacek, D. V., Snyder, M. 2015; 47 (7): 710-716


    Aberrant regulation of gene expression in cancer can promote survival and proliferation of cancer cells. Here we integrate whole-genome sequencing data from The Cancer Genome Atlas (TCGA) for 436 patients from 8 cancer subtypes with ENCODE and other regulatory annotations to identify point mutations in regulatory regions. We find evidence for positive selection of mutations in transcription factor binding sites, consistent with these sites regulating important cancer cell functions. Using a new method that adjusts for sample- and genomic locus-specific mutation rates, we identify recurrently mutated sites across individuals with cancer. Mutated regulatory sites include known sites in the TERT promoter and many new sites, including a subset in proximity to cancer-related genes. In reporter assays, two new sites display decreased enhancer activity upon mutation. These data demonstrate that many regulatory regions contain mutations under selective pressure and suggest a greater role for regulatory mutations in cancer than previously appreciated.

    View details for DOI 10.1038/ng.3332

    View details for PubMedID 26053494

  • Comprehensive transcriptome analysis using synthetic long-read sequencing reveals molecular co-association of distant splicing events NATURE BIOTECHNOLOGY Tilgner, H., Jahanbani, F., Blauwkamp, T., Moshrefi, A., Jaeger, E., Chen, F., Harel, I., Bustamante, C. D., Rasmussen, M., Snyder, M. P. 2015; 33 (7): 736-742


    Alternative splicing shapes mammalian transcriptomes, with many RNA molecules undergoing multiple distant alternative splicing events. Comprehensive transcriptome analysis, including analysis of exon co-association in the same molecule, requires deep, long-read sequencing. Here we introduce an RNA sequencing method, synthetic long-read RNA sequencing (SLR-RNA-seq), in which small pools (≤1,000 molecules/pool, ≤1 molecule/gene for most genes) of full-length cDNAs are amplified, fragmented and short-read-sequenced. We demonstrate that these RNA sequences reconstructed from the short reads from each of the pools are mostly close to full length and contain few insertion and deletion errors. We report many previously undescribed isoforms (human brain: ∼13,800 affected genes, 14.5% of molecules; mouse brain ∼8,600 genes, 18% of molecules) and up to 165 human distant molecularly associated exon pairs (dMAPs) and distant molecularly and mutually exclusive pairs (dMEPs). Of 16 associated pairs detected in the mouse brain, 9 are conserved in human. Our results indicate conserved mechanisms that can produce distant but phased features on transcript and proteome isoforms.

    View details for DOI 10.1038/nbt.3242

    View details for Web of Science ID 000358396100029

  • Comparison of the transcriptional landscapes between human and mouse tissues PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Lin, S., Lin, Y., Nery, J. R., Urich, M. A., Breschi, A., Davis, C. A., Dobin, A., Zaleski, C., Beer, M. A., Chapman, W. C., Gingeras, T. R., Ecker, J. R., Snyder, M. P. 2014; 111 (48): 17224-17229


    Although the similarities between humans and mice are typically highlighted, morphologically and genetically, there are many differences. To better understand these two species on a molecular level, we performed a comparison of the expression profiles of 15 tissues by deep RNA sequencing and examined the similarities and differences in the transcriptome for both protein-coding and -noncoding transcripts. Although commonalities are evident in the expression of tissue-specific genes between the two species, the expression for many sets of genes was found to be more similar in different tissues within the same species than between species. These findings were further corroborated by associated epigenetic histone mark analyses. We also find that many noncoding transcripts are expressed at a low level and are not detectable at appreciable levels across individuals. Moreover, the majority lack obvious sequence homologs between species, even when we restrict our attention to those which are most highly reproducible across biological replicates. Overall, our results indicate that there is considerable RNA expression diversity between humans and mice, well beyond what was described previously, likely reflecting the fundamental physiological differences between these two organisms.

    View details for DOI 10.1073/pnas.1413624111

    View details for Web of Science ID 000345920800059

    View details for PubMedID 25413365

    View details for PubMedCentralID PMC4260565

  • Regulatory analysis of the C. elegans genome with spatiotemporal resolution. Nature Araya, C. L., Kawli, T., Kundaje, A., Jiang, L., Wu, B., Vafeados, D., Terrell, R., Weissdepp, P., Gevirtzman, L., Mace, D., Niu, W., Boyle, A. P., Xie, D., Ma, L., Murray, J. I., Reinke, V., Waterston, R. H., Snyder, M. 2014; 512 (7515): 400-405

    View details for DOI 10.1038/nature13497

    View details for PubMedID 25164749

  • Comparative analysis of regulatory information and circuits across distant species. Nature Boyle, A. P., Araya, C. L., Brdlik, C., Cayting, P., Cheng, C., Cheng, Y., Gardner, K., Hillier, L. W., Janette, J., Jiang, L., Kasper, D., Kawli, T., Kheradpour, P., Kundaje, A., Li, J. J., Ma, L., Niu, W., Rehm, E. J., Rozowsky, J., Slattery, M., Spokony, R., Terrell, R., Vafeados, D., Wang, D., Weisdepp, P., Wu, Y., Xie, D., Yan, K., Feingold, E. A., Good, P. J., Pazin, M. J., Huang, H., Bickel, P. J., Brenner, S. E., Reinke, V., Waterston, R. H., Gerstein, M., White, K. P., Kellis, M., Snyder, M. 2014; 512 (7515): 453-456

    View details for DOI 10.1038/nature13668

    View details for PubMedID 25164757

  • Defining a personal, allele-specific, and single-molecule long-read transcriptome PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Tilgner, H., Grubert, F., Sharon, D., Snyder, M. P. 2014; 111 (27): 9869-9874
  • Clinical interpretation and implications of whole-genome sequencing. JAMA : the journal of the American Medical Association Dewey, F. E., Grove, M. E., Pan, C., Goldstein, B. A., Bernstein, J. A., Chaib, H., Merker, J. D., Goldfeder, R. L., Enns, G. M., David, S. P., Pakdaman, N., Ormond, K. E., Caleshu, C., Kingham, K., Klein, T. E., Whirl-Carrillo, M., Sakamoto, K., Wheeler, M. T., Butte, A. J., Ford, J. M., Boxer, L., Ioannidis, J. P., Yeung, A. C., Altman, R. B., Assimes, T. L., Snyder, M., Ashley, E. A., Quertermous, T. 2014; 311 (10): 1035-1045


    Whole-genome sequencing (WGS) is increasingly applied in clinical medicine and is expected to uncover clinically significant findings regardless of sequencing indication.To examine coverage and concordance of clinically relevant genetic variation provided by WGS technologies; to quantitate inherited disease risk and pharmacogenomic findings in WGS data and resources required for their discovery and interpretation; and to evaluate clinical action prompted by WGS findings.An exploratory study of 12 adult participants recruited at Stanford University Medical Center who underwent WGS between November 2011 and March 2012. A multidisciplinary team reviewed all potentially reportable genetic findings. Five physicians proposed initial clinical follow-up based on the genetic findings.Genome coverage and sequencing platform concordance in different categories of genetic disease risk, person-hours spent curating candidate disease-risk variants, interpretation agreement between trained curators and disease genetics databases, burden of inherited disease risk and pharmacogenomic findings, and burden and interrater agreement of proposed clinical follow-up.Depending on sequencing platform, 10% to 19% of inherited disease genes were not covered to accepted standards for single nucleotide variant discovery. Genotype concordance was high for previously described single nucleotide genetic variants (99%-100%) but low for small insertion/deletion variants (53%-59%). Curation of 90 to 127 genetic variants in each participant required a median of 54 minutes (range, 5-223 minutes) per genetic variant, resulted in moderate classification agreement between professionals (Gross κ, 0.52; 95% CI, 0.40-0.64), and reclassified 69% of genetic variants cataloged as disease causing in mutation databases to variants of uncertain or lesser significance. Two to 6 personal disease-risk findings were discovered in each participant, including 1 frameshift deletion in the BRCA1 gene implicated in hereditary breast and ovarian cancer. Physician review of sequencing findings prompted consideration of a median of 1 to 3 initial diagnostic tests and referrals per participant, with fair interrater agreement about the suitability of WGS findings for clinical follow-up (Fleiss κ, 0.24; P < 001).In this exploratory study of 12 volunteer adults, the use of WGS was associated with incomplete coverage of inherited disease genes, low reproducibility of detection of genetic variation with the highest potential clinical effects, and uncertainty about clinically reportable findings. In certain cases, WGS will identify clinically actionable genetic variants warranting early medical intervention. These issues should be considered when determining the role of WGS in clinical medicine.

    View details for DOI 10.1001/jama.2014.1717

    View details for PubMedID 24618965

    View details for PubMedCentralID PMC4119063

  • Divergence in a master variator generates distinct phenotypes and transcriptional responses GENES & DEVELOPMENT Gallagher, J. E., Zheng, W., Rong, X., Miranda, N., Lin, Z., Dunn, B., Zhao, H., Snyder, M. P. 2014; 28 (4): 409-421


    Genetic basis of phenotypic differences in individuals is an important area in biology and personalized medicine. Analysis of divergent Saccharomyces cerevisiae strains grown under different conditions revealed extensive variation in response to both drugs (e.g., 4-nitroquinoline 1-oxide [4NQO]) and different carbon sources. Differences in 4NQO resistance were due to amino acid variation in the transcription factor Yrr1. Yrr1(YJM789) conferred 4NQO resistance but caused slower growth on glycerol, and vice versa with Yrr1(S96), indicating that alleles of Yrr1 confer distinct phenotypes. The binding targets of Yrr1 alleles from diverse yeast strains varied considerably among different strains grown under the same conditions as well as for the same strain under different conditions, indicating that distinct molecular programs are conferred by the different Yrr1 alleles. Our results demonstrate that genetic variations in one important control gene (YRR1), lead to distinct regulatory programs and phenotypes in individuals. We term these polymorphic control genes "master variators."

    View details for DOI 10.1101/gad.228940.113

    View details for Web of Science ID 000331616100009

    View details for PubMedID 24532717

    View details for PubMedCentralID PMC3937518

  • Integrated systems analysis reveals a molecular network underlying autism spectrum disorders. Molecular systems biology Li, J., Shi, M., Ma, Z., Zhao, S., Euskirchen, G., Ziskin, J., Urban, A., Hallmayer, J., Snyder, M. 2014; 10: 774-?


    Autism is a complex disease whose etiology remains elusive. We integrated previously and newly generated data and developed a systems framework involving the interactome, gene expression and genome sequencing to identify a protein interaction module with members strongly enriched for autism candidate genes. Sequencing of 25 patients confirmed the involvement of this module in autism, which was subsequently validated using an independent cohort of over 500 patients. Expression of this module was dichotomized with a ubiquitously expressed subcomponent and another subcomponent preferentially expressed in the corpus callosum, which was significantly affected by our identified mutations in the network center. RNA-sequencing of the corpus callosum from patients with autism exhibited extensive gene mis-expression in this module, and our immunochemical analysis showed that the human corpus callosum is predominantly populated by oligodendrocyte cells. Analysis of functional genomic data further revealed a significant involvement of this module in the development of oligodendrocyte cells in mouse brain. Our analysis delineates a natural network involved in autism, helps uncover novel candidate genes for this disease and improves our understanding of its molecular pathology.

    View details for DOI 10.15252/msb.20145487

    View details for PubMedID 25549968

  • Integrated systems analysis reveals a molecular network underlying autism spectrum disorders. Molecular systems biology Li, J., Shi, M., Ma, Z., Zhao, S., Euskirchen, G., Ziskin, J., Urban, A., Hallmayer, J., Snyder, M. 2014; 10 (12): 774-?

    View details for DOI 10.15252/msb.20145487

    View details for PubMedID 25549968

  • Extensive Variation in Chromatin States Across Humans SCIENCE Kasowski, M., Kyriazopoulou-Panagiotopoulou, S., Grubert, F., Zaugg, J. B., Kundaje, A., Liu, Y., Boyle, A. P., Zhang, Q. C., Zakharia, F., Spacek, D. V., Li, J., Xie, D., Olarerin-George, A., Steinmetz, L. M., Hogenesch, J. B., Kellis, M., Batzoglou, S., Snyder, M. 2013; 342 (6159): 750-752


    The majority of disease-associated variants lie outside protein-coding regions, suggesting a link between variation in regulatory regions and disease predisposition. We studied differences in chromatin states using five histone modifications, cohesin, and CTCF in lymphoblastoid lines from 19 individuals of diverse ancestry. We found extensive signal variation in regulatory regions, which often switch between active and repressed states across individuals. Enhancer activity is particularly diverse among individuals, whereas gene expression remains relatively stable. Chromatin variability shows genetic inheritance in trios, correlates with genetic variation and population divergence, and is associated with disruptions of transcription factor binding motifs. Overall, our results provide insights into chromatin variation among humans.

    View details for DOI 10.1126/science.1242510

    View details for Web of Science ID 000326647600047

    View details for PubMedID 24136358

  • A single-molecule long-read survey of the human transcriptome. Nature biotechnology Sharon, D., Tilgner, H., Grubert, F., Snyder, M. 2013; 31 (11): 1009-1014


    Global RNA studies have become central to understanding biological processes, but methods such as microarrays and short-read sequencing are unable to describe an entire RNA molecule from 5' to 3' end. Here we use single-molecule long-read sequencing technology from Pacific Biosciences to sequence the polyadenylated RNA complement of a pooled set of 20 human organs and tissues without the need for fragmentation or amplification. We show that full-length RNA molecules of up to 1.5 kb can readily be monitored with little sequence loss at the 5' ends. For longer RNA molecules more 5' nucleotides are missing, but complete intron structures are often preserved. In total, we identify ∼14,000 spliced GENCODE genes. High-confidence mappings are consistent with GENCODE annotations, but >10% of the alignments represent intron structures that were not previously annotated. As a group, transcripts mapping to unannotated regions have features of long, noncoding RNAs. Our results show the feasibility of deep sequencing full-length RNA from complex eukaryotic transcriptomes on a single-molecule level.

    View details for DOI 10.1038/nbt.2705

    View details for PubMedID 24108091

  • Dynamic trans-Acting Factor Colocalization in Human Cells CELL Xie, D., Boyle, A. P., Wu, L., Zhai, J., Kawli, T., Snyder, M. 2013; 155 (3): 713-724


    Different trans-acting factors (TFs) collaborate and act in concert at distinct loci to perform accurate regulation of their target genes. To date, the cobinding of TF pairs has been investigated in a limited context both in terms of the number of factors within a cell type and across cell types and the extent of combinatorial colocalizations. Here, we use an approach to analyze TF colocalization within a cell type and across multiple cell lines at an unprecedented level. We extend this approach with large-scale mass spectrometry analysis of immunoprecipitations of 50 TFs. Our combined approach reveals large numbers of interesting TF-TF associations. We observe extensive change in TF colocalizations both within a cell type exposed to different conditions and across multiple cell types. We show distinct functional annotations and properties of different TF cobinding patterns and provide insights into the complex regulatory landscape of the cell.

    View details for DOI 10.1016/j.cell.2013.09.043

    View details for Web of Science ID 000326571800023

    View details for PubMedID 24243024

  • Whole-exome sequencing identifies tetratricopeptide repeat domain 7A (TTC7A) mutations for combined immunodeficiency with intestinal atresias. journal of allergy and clinical immunology Chen, R., Giliani, S., Lanzi, G., Mias, G. I., Lonardi, S., Dobbs, K., Manis, J., Im, H., Gallagher, J. E., Phanstiel, D. H., Euskirchen, G., Lacroute, P., Bettinger, K., Moratto, D., Weinacht, K., Montin, D., Gallo, E., Mangili, G., Porta, F., Notarangelo, L. D., Pedretti, S., Al-Herz, W., Alfahdli, W., Comeau, A. M., Traister, R. S., Pai, S., Carella, G., Facchetti, F., Nadeau, K. C., Snyder, M., Notarangelo, L. D. 2013; 132 (3): 656-664 e17


    Combined immunodeficiency with multiple intestinal atresias (CID-MIA) is a rare hereditary disease characterized by intestinal obstructions and profound immune defects.We sought to determine the underlying genetic causes of CID-MIA by analyzing the exomic sequences of 5 patients and their healthy direct relatives from 5 unrelated families.We performed whole-exome sequencing on 5 patients with CID-MIA and 10 healthy direct family members belonging to 5 unrelated families with CID-MIA. We also performed targeted Sanger sequencing for the candidate gene tetratricopeptide repeat domain 7A (TTC7A) on 3 additional patients with CID-MIA.Through analysis and comparison of the exomic sequence of the subjects from these 5 families, we identified biallelic damaging mutations in the TTC7A gene, for a total of 7 distinct mutations. Targeted TTC7A gene sequencing in 3 additional unrelated patients with CID-MIA revealed biallelic deleterious mutations in 2 of them, as well as an aberrant splice product in the third patient. Staining of normal thymus showed that the TTC7A protein is expressed in thymic epithelial cells, as well as in thymocytes. Moreover, severe lymphoid depletion was observed in the thymus and peripheral lymphoid tissues from 2 patients with CID-MIA.We identified deleterious mutations of the TTC7A gene in 8 unrelated patients with CID-MIA and demonstrated that the TTC7A protein is expressed in the thymus. Our results strongly suggest that TTC7A gene defects cause CID-MIA.

    View details for DOI 10.1016/j.jaci.2013.06.013

    View details for PubMedID 23830146

  • Whole-exome sequencing identifies tetratricopeptide repeat domain 7A (TTC7A) mutations for combined immunodeficiency with intestinal atresias JOURNAL OF ALLERGY AND CLINICAL IMMUNOLOGY Chen, R., Giliani, S., Lanzi, G., Mias, G. I., Lonardi, S., Dobbs, K., Manis, J., Im, H., Gallagher, J. E., Phanstiel, D. H., Euskirchen, G., Lacroute, P., Bettinger, K., Moratto, D., Weinacht, K., Montin, D., Gallo, E., Mangili, G., Porta, F., Notarangelo, L. D., Pedretti, S., Al-Herz, W., Alfahdli, W., Comeau, A. M., Traister, R. S., Pai, S., Carella, G., Facchetti, F., Nadeau, K. C., Snyder, M., Notarangelo, L. D. 2013; 132 (3): 656-?


    Combined immunodeficiency with multiple intestinal atresias (CID-MIA) is a rare hereditary disease characterized by intestinal obstructions and profound immune defects.We sought to determine the underlying genetic causes of CID-MIA by analyzing the exomic sequences of 5 patients and their healthy direct relatives from 5 unrelated families.We performed whole-exome sequencing on 5 patients with CID-MIA and 10 healthy direct family members belonging to 5 unrelated families with CID-MIA. We also performed targeted Sanger sequencing for the candidate gene tetratricopeptide repeat domain 7A (TTC7A) on 3 additional patients with CID-MIA.Through analysis and comparison of the exomic sequence of the subjects from these 5 families, we identified biallelic damaging mutations in the TTC7A gene, for a total of 7 distinct mutations. Targeted TTC7A gene sequencing in 3 additional unrelated patients with CID-MIA revealed biallelic deleterious mutations in 2 of them, as well as an aberrant splice product in the third patient. Staining of normal thymus showed that the TTC7A protein is expressed in thymic epithelial cells, as well as in thymocytes. Moreover, severe lymphoid depletion was observed in the thymus and peripheral lymphoid tissues from 2 patients with CID-MIA.We identified deleterious mutations of the TTC7A gene in 8 unrelated patients with CID-MIA and demonstrated that the TTC7A protein is expressed in the thymus. Our results strongly suggest that TTC7A gene defects cause CID-MIA.

    View details for DOI 10.1016/j.jaci.2013.06.013

    View details for Web of Science ID 000323612000018

    View details for PubMedID 23830146

  • Systematic functional regulatory assessment of disease-associated variants. Proceedings of the National Academy of Sciences of the United States of America Karczewski, K. J., Dudley, J. T., Kukurba, K. R., Chen, R., Butte, A. J., Montgomery, S. B., Snyder, M. 2013; 110 (23): 9607-9612


    Genome-wide association studies have discovered many genetic loci associated with disease traits, but the functional molecular basis of these associations is often unresolved. Genome-wide regulatory and gene expression profiles measured across individuals and diseases reflect downstream effects of genetic variation and may allow for functional assessment of disease-associated loci. Here, we present a unique approach for systematic integration of genetic disease associations, transcription factor binding among individuals, and gene expression data to assess the functional consequences of variants associated with hundreds of human diseases. In an analysis of genome-wide binding profiles of NFκB, we find that disease-associated SNPs are enriched in NFκB binding regions overall, and specifically for inflammatory-mediated diseases, such as asthma, rheumatoid arthritis, and coronary artery disease. Using genome-wide variation in transcription factor-binding data, we find that NFκB binding is often correlated with disease-associated variants in a genotype-specific and allele-specific manner. Furthermore, we show that this binding variation is often related to expression of nearby genes, which are also found to have altered expression in independent profiling of the variant-associated disease condition. Thus, using this integrative approach, we provide a unique means to assign putative function to many disease-associated SNPs.

    View details for DOI 10.1073/pnas.1219099110

    View details for PubMedID 23690573

  • Specific plasma autoantibody reactivity in myelodysplastic syndromes. Scientific reports Mias, G. I., Chen, R., Zhang, Y., Sridhar, K., Sharon, D., Xiao, L., Im, H., Snyder, M. P., Greenberg, P. L. 2013; 3: 3311-?


    Increased autoantibody reactivity in plasma from Myelodysplastic Syndromes (MDS) patients may provide novel disease signatures, and possible early detection. In a two-stage study we investigated Immunoglobulin G reactivity in plasma from MDS, Acute Myeloid Leukemia post MDS patients, and a healthy cohort. In exploratory Stage I we utilized high-throughput protein arrays to identify 35 high-interest proteins showing increased reactivity in patient subgroups compared to healthy controls. In validation Stage II we designed new arrays focusing on 25 of the proteins identified in Stage I and expanded the initial cohort. We validated increased antibody reactivity against AKT3, FCGR3A and ARL8B in patients, which enabled sample classification into stable MDS and healthy individuals. We also detected elevated AKT3 protein levels in MDS patient plasma. The discovery of increased specific autoantibody reactivity in MDS patients, provides molecular signatures for classification, supplementing existing risk categorizations, and may enhance diagnostic and prognostic capabilities for MDS.

    View details for DOI 10.1038/srep03311

    View details for PubMedID 24264604

  • Extensive genetic variation in somatic human tissues PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA O'Huallachain, M., Karczewski, K. J., Weissman, S. M., Urban, A. E., Snyder, M. P. 2012; 109 (44): 18018-18023


    Genetic variation between individuals has been extensively investigated, but differences between tissues within individuals are far less understood. It is commonly assumed that all healthy cells that arise from the same zygote possess the same genomic content, with a few known exceptions in the immune system and germ line. However, a growing body of evidence shows that genomic variation exists between differentiated tissues. We investigated the scope of somatic genomic variation between tissues within humans. Analysis of copy number variation by high-resolution array-comparative genomic hybridization in diverse tissues from six unrelated subjects reveals a significant number of intraindividual genomic changes between tissues. Many (79%) of these events affect genes. Our results have important consequences for understanding normal genetic and phenotypic variation within individuals, and they have significant implications for both the etiology of genetic diseases such as cancer and for immortalized cell lines that might be used in research and therapeutics.

    View details for DOI 10.1073/pnas.1213736109

    View details for Web of Science ID 000311149900070

    View details for PubMedID 23043118

  • An integrated encyclopedia of DNA elements in the human genome NATURE Dunham, I., Kundaje, A., Aldred, S. F., Collins, P. J., Davis, C., Doyle, F., Epstein, C. B., Frietze, S., Harrow, J., Kaul, R., Khatun, J., Lajoie, B. R., Landt, S. G., Lee, B., Pauli, F., Rosenbloom, K. R., Sabo, P., Safi, A., Sanyal, A., Shoresh, N., Simon, J. M., Song, L., Trinklein, N. D., Altshuler, R. C., Birney, E., Brown, J. B., Cheng, C., Djebali, S., Dong, X., Dunham, I., Ernst, J., Furey, T. S., Gerstein, M., Giardine, B., Greven, M., Hardison, R. C., Harris, R. S., Herrero, J., Hoffman, M. M., Iyer, S., Kellis, M., Khatun, J., Kheradpour, P., Kundaje, A., Lassmann, T., Li, Q., Lin, X., Marinov, G. K., Merkel, A., Mortazavi, A., Parker, S. C., Reddy, T. E., Rozowsky, J., Schlesinger, F., Thurman, R. E., Wang, J., Ward, L. D., Whitfield, T. W., Wilder, S. P., Wu, W., Xi, H. S., Yip, K. Y., Zhuang, J., Bernstein, B. E., Birney, E., Dunham, I., Green, E. D., Gunter, C., Snyder, M., Pazin, M. J., Lowdon, R. F., Dillon, L. A., Adams, L. B., Kelly, C. J., Zhang, J., Wexler, J. R., Green, E. D., Good, P. J., Feingold, E. A., Bernstein, B. E., Birney, E., Crawford, G. E., Dekker, J., Elnitski, L., Farnham, P. J., Gerstein, M., Giddings, M. C., Gingeras, T. R., Green, E. D., Guigo, R., Hardison, R. C., Hubbard, T. J., Kellis, M., Kent, W. J., Lieb, J. D., Margulies, E. H., Myers, R. M., Snyder, M., Stamatoyannopoulos, J. A., Tenenbaum, S. A., Weng, Z., White, K. P., Wold, B., Khatun, J., Yu, Y., Wrobel, J., Risk, B. A., Gunawardena, H. P., Kuiper, H. C., Maier, C. W., Xie, L., Chen, X., Giddings, M. C., Bernstein, B. E., Epstein, C. B., Shoresh, N., Ernst, J., Kheradpour, P., Mikkelsen, T. S., Gillespie, S., Goren, A., Ram, O., Zhang, X., Wang, L., Issner, R., Coyne, M. J., Durham, T., Ku, M., Truong, T., Ward, L. D., Altshuler, R. C., Eaton, M. L., Kellis, M., Djebali, S., Davis, C. A., Merkel, A., Dobin, A., Lassmann, T., Mortazavi, A., Tanzer, A., Lagarde, J., Lin, W., Schlesinger, F., Xue, C., Marinov, G. K., Khatun, J., Williams, B. A., Zaleski, C., Rozowsky, J., Roeder, M., Kokocinski, F., Abdelhamid, R. F., Alioto, T., Antoshechkin, I., Baer, M. T., Batut, P., Bell, I., Bell, K., Chakrabortty, S., Chen, X., Chrast, J., Curado, J., Derrien, T., Drenkow, J., Dumais, E., Dumais, J., Duttagupta, R., Fastuca, M., Fejes-Toth, K., Ferreira, P., Foissac, S., Fullwood, M. J., Gao, H., Gonzalez, D., Gordon, A., Gunawardena, H. P., Howald, C., Jha, S., Johnson, R., Kapranov, P., King, B., Kingswood, C., Li, G., Luo, O. J., Park, E., Preall, J. B., Presaud, K., Ribeca, P., Risk, B. A., Robyr, D., Ruan, X., Sammeth, M., Sandhu, K. S., Schaeffer, L., See, L., Shahab, A., Skancke, J., Suzuki, A. M., Takahashi, H., Tilgner, H., Trout, D., Walters, N., Wang, H., Wrobel, J., Yu, Y., Hayashizaki, Y., Harrow, J., Gerstein, M., Hubbard, T. J., Reymond, A., Antonarakis, S. E., Hannon, G. J., Giddings, M. C., Ruan, Y., Wold, B., Carninci, P., Guigo, R., Gingeras, T. R., Rosenbloom, K. R., Sloan, C. A., Learned, K., Malladi, V. S., Wong, M. C., Barber, G., Cline, M. S., Dreszer, T. R., Heitner, S. G., Karolchik, D., Kent, W. J., Kirkup, V. M., Meyer, L. R., Long, J. C., Maddren, M., Raney, B. J., Furey, T. S., Song, L., Grasfeder, L. L., Giresi, P. G., Lee, B., Battenhouse, A., Sheffield, N. C., Simon, J. M., Showers, K. A., Safi, A., London, D., Bhinge, A. A., Shestak, C., Schaner, M. R., Kim, S. K., Zhang, Z. Z., Mieczkowski, P. A., Mieczkowska, J. O., Liu, Z., McDaniell, R. M., Ni, Y., Rashid, N. U., Kim, M. J., Adar, S., Zhang, Z., Wang, T., Winter, D., Keefe, D., Birney, E., Iyer, V. R., Lieb, J. D., Crawford, G. E., Li, G., Sandhu, K. S., Zheng, M., Wang, P., Luo, O. J., Shahab, A., Fullwood, M. J., Ruan, X., Ruan, Y., Myers, R. M., Pauli, F., Williams, B. A., Gertz, J., Marinov, G. K., Reddy, T. E., Vielmetter, J., Partridge, E. C., Trout, D., Varley, K. E., Gasper, C., Bansal, A., Pepke, S., Jain, P., Amrhein, H., Bowling, K. M., Anaya, M., Cross, M. K., King, B., Muratet, M. A., Antoshechkin, I., Newberry, K. M., McCue, K., Nesmith, A. S., Fisher-Aylor, K. I., Pusey, B., DeSalvo, G., Parker, S. L., Balasubramanian, S., Davis, N. S., Meadows, S. K., Eggleston, T., Gunter, C., Newberry, J. S., Levy, S. E., Absher, D. M., Mortazavi, A., Wong, W. H., Wold, B., Blow, M. J., Visel, A., Pennachio, L. A., Elnitski, L., Margulies, E. H., Parker, S. C., Petrykowska, H. M., Abyzov, A., Aken, B., Barrell, D., Barson, G., Berry, A., Bignell, A., Boychenko, V., Bussotti, G., Chrast, J., Davidson, C., Derrien, T., Despacio-Reyes, G., Diekhans, M., Ezkurdia, I., Frankish, A., Gilbert, J., Gonzalez, J. M., Griffiths, E., Harte, R., Hendrix, D. A., Howald, C., Hunt, T., Jungreis, I., Kay, M., Khurana, E., Kokocinski, F., Leng, J., Lin, M. F., Loveland, J., Lu, Z., Manthravadi, D., Mariotti, M., Mudge, J., Mukherjee, G., Notredame, C., Pei, B., Rodriguez, J. M., Saunders, G., Sboner, A., Searle, S., Sisu, C., Snow, C., Steward, C., Tanzer, A., Tapanari, E., Tress, M. L., van Baren, M. J., Walters, N., Washietl, S., Wilming, L., Zadissa, A., Zhang, Z., Brent, M., Haussler, D., Kellis, M., Valencia, A., Gerstein, M., Reymond, A., Guigo, R., Harrow, J., Hubbard, T. J., Landt, S. G., Frietze, S., Abyzov, A., Addleman, N., Alexander, R. P., Auerbach, R. K., Balasubramanian, S., Bettinger, K., Bhardwaj, N., Boyle, A. P., Cao, A. R., Cayting, P., Charos, A., Cheng, Y., Cheng, C., Eastman, C., Euskirchen, G., Fleming, J. D., Grubert, F., Habegger, L., Hariharan, M., Harmanci, A., Iyengar, S., Jin, V. X., Karczewski, K. J., Kasowski, M., Lacroute, P., Lam, H., Lamarre-Vincent, N., Leng, J., Lian, J., Lindahl-Allen, M., Min, R., Miotto, B., Monahan, H., Moqtaderi, Z., Mu, X. J., O'Geen, H., Ouyang, Z., Patacsil, D., Pei, B., Raha, D., Ramirez, L., Reed, B., Rozowsky, J., Sboner, A., Shi, M., Sisu, C., Slifer, T., Witt, H., Wu, L., Xu, X., Yan, K., Yang, X., Yip, K. Y., Zhang, Z., Struhl, K., Weissman, S. M., Gerstein, M., Farnham, P. J., Snyder, M., Tenenbaum, S. A., Penalva, L. O., Doyle, F., Karmakar, S., Landt, S. G., Bhanvadia, R. R., Choudhury, A., Domanus, M., Ma, L., Moran, J., Patacsil, D., Slifer, T., Victorsen, A., Yang, X., Snyder, M., White, K. P., Auer, T., Centanin, L., Eichenlaub, M., Gruhl, F., Heermann, S., Hoeckendorf, B., Inoue, D., Kellner, T., Kirchmaier, S., Mueller, C., Reinhardt, R., Schertel, L., Schneider, S., Sinn, R., Wittbrodt, B., Wittbrodt, J., Weng, Z., Whitfield, T. W., Wang, J., Collins, P. J., Aldred, S. F., Trinklein, N. D., Partridge, E. C., Myers, R. M., Dekker, J., Jain, G., Lajoie, B. R., Sanyal, A., Balasundaram, G., Bates, D. L., Byron, R., Canfield, T. K., Diegel, M. J., Dunn, D., Ebersol, A. K., Frum, T., Garg, K., Gist, E., Hansen, R. S., Boatman, L., Haugen, E., Humbert, R., Jain, G., Johnson, A. K., Johnson, E. M., Kutyavin, T. V., Lajoie, B. R., Lee, K., Lotakis, D., Maurano, M. T., Neph, S. J., Neri, F. V., Nguyen, E. D., Qu, H., Reynolds, A. P., Roach, V., Rynes, E., Sabo, P., Sanchez, M. E., Sandstrom, R. S., Sanyal, A., Shafer, A. O., Stergachis, A. B., Thomas, S., Thurman, R. E., Vernot, B., Vierstra, J., Vong, S., Wang, H., Weaver, M. A., Yan, Y., Zhang, M., Akey, J. M., Bender, M., Dorschner, M. O., Groudine, M., MacCoss, M. J., Navas, P., Stamatoyannopoulos, G., Kaul, R., Dekker, J., Stamatoyannopoulos, J. A., Dunham, I., Beal, K., Brazma, A., Flicek, P., Herrero, J., Johnson, N., Keefe, D., Lukk, M., Luscombe, N. M., Sobral, D., Vaquerizas, J. M., Wilder, S. P., Batzoglou, S., Sidow, A., Hussami, N., Kyriazopoulou-Panagiotopoulou, S., Libbrecht, M. W., Schaub, M. A., Kundaje, A., Hardison, R. C., Miller, W., Giardine, B., Harris, R. S., Wu, W., Bickel, P. J., Banfai, B., Boley, N. P., Brown, J. B., Huang, H., Li, Q., Li, J. J., Noble, W. S., Bilmes, J. A., Buske, O. J., Hoffman, M. M., Sahu, A. D., Kharchenko, P. V., Park, P. J., Baker, D., Taylor, J., Weng, Z., Iyer, S., Dong, X., Greven, M., Lin, X., Wang, J., Xi, H. S., Zhuang, J., Gerstein, M., Alexander, R. P., Balasubramanian, S., Cheng, C., Harmanci, A., Lochovsky, L., Min, R., Mu, X. J., Rozowsky, J., Yan, K., Yip, K. Y., Birney, E. 2012; 489 (7414): 57-74


    The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

    View details for DOI 10.1038/nature11247

    View details for Web of Science ID 000308347000039

    View details for PubMedID 22955616

    View details for PubMedCentralID PMC3439153

  • Architecture of the human regulatory network derived from ENCODE data NATURE Gerstein, M. B., Kundaje, A., Hariharan, M., Landt, S. G., Yan, K., Cheng, C., Mu, X. J., Khurana, E., Rozowsky, J., Alexander, R., Min, R., Alves, P., Abyzov, A., Addleman, N., Bhardwaj, N., Boyle, A. P., Cayting, P., Charos, A., Chen, D. Z., Cheng, Y., Clarke, D., Eastman, C., Euskirchen, G., Frietze, S., Fu, Y., Gertz, J., Grubert, F., Harmanci, A., Jain, P., Kasowski, M., Lacroute, P., Leng, J., Lian, J., Monahan, H., O'Geen, H., Ouyang, Z., Partridge, E. C., Patacsil, D., Pauli, F., Raha, D., Ramirez, L., Reddy, T. E., Reed, B., Shi, M., Slifer, T., Wang, J., Wu, L., Yang, X., Yip, K. Y., Zilberman-Schapira, G., Batzoglou, S., Sidow, A., Farnham, P. J., Myers, R. M., Weissman, S. M., Snyder, M. 2012; 489 (7414): 91-100


    Transcription factors bind in a combinatorial fashion to specify the on-and-off states of genes; the ensemble of these binding events forms a regulatory network, constituting the wiring diagram for a cell. To examine the principles of the human transcriptional regulatory network, we determined the genomic binding information of 119 transcription-related factors in over 450 distinct experiments. We found the combinatorial, co-association of transcription factors to be highly context specific: distinct combinations of factors bind at specific genomic locations. In particular, there are significant differences in the binding proximal and distal to genes. We organized all the transcription factor binding into a hierarchy and integrated it with other genomic information (for example, microRNA regulation), forming a dense meta-network. Factors at different levels have different properties; for instance, top-level transcription factors more strongly influence expression and middle-level ones co-regulate targets to mitigate information-flow bottlenecks. Moreover, these co-regulations give rise to many enriched network motifs (for example, noise-buffering feed-forward loops). Finally, more connected network components are under stronger selection and exhibit a greater degree of allele-specific activity (that is, differential binding to the two parental alleles). The regulatory information obtained in this study will be crucial for interpreting personal genome sequences and understanding basic principles of human biology and disease.

    View details for DOI 10.1038/nature11245

    View details for Web of Science ID 000308347000042

    View details for PubMedID 22955619

  • Linking disease associations with regulatory information in the human genome GENOME RESEARCH Schaub, M. A., Boyle, A. P., Kundaje, A., Batzoglou, S., Snyder, M. 2012; 22 (9): 1748-1759


    Genome-wide association studies have been successful in identifying single nucleotide polymorphisms (SNPs) associated with a large number of phenotypes. However, an associated SNP is likely part of a larger region of linkage disequilibrium. This makes it difficult to precisely identify the SNPs that have a biological link with the phenotype. We have systematically investigated the association of multiple types of ENCODE data with disease-associated SNPs and show that there is significant enrichment for functional SNPs among the currently identified associations. This enrichment is strongest when integrating multiple sources of functional information and when highest confidence disease-associated SNPs are used. We propose an approach that integrates multiple types of functional data generated by the ENCODE Consortium to help identify "functional SNPs" that may be associated with the disease phenotype. Our approach generates putative functional annotations for up to 80% of all previously reported associations. We show that for most associations, the functional SNP most strongly supported by experimental evidence is a SNP in linkage disequilibrium with the reported association rather than the reported SNP itself. Our results show that the experimental data sets generated by the ENCODE Consortium can be successfully used to suggest functional hypotheses for variants associated with diseases and other phenotypes.

    View details for DOI 10.1101/gr.136127.111

    View details for Web of Science ID 000308272800016

    View details for PubMedID 22955986

    View details for PubMedCentralID PMC3431491

  • Annotation of functional variation in personal genomes using RegulomeDB GENOME RESEARCH Boyle, A. P., Hong, E. L., Hariharan, M., Cheng, Y., Schaub, M. A., Kasowski, M., Karczewski, K. J., Park, J., Hitz, B. C., Weng, S., Cherry, J. M., Snyder, M. 2012; 22 (9): 1790-1797


    As the sequencing of healthy and disease genomes becomes more commonplace, detailed annotation provides interpretation for individual variation responsible for normal and disease phenotypes. Current approaches focus on direct changes in protein coding genes, particularly nonsynonymous mutations that directly affect the gene product. However, most individual variation occurs outside of genes and, indeed, most markers generated from genome-wide association studies (GWAS) identify variants outside of coding segments. Identification of potential regulatory changes that perturb these sites will lead to a better localization of truly functional variants and interpretation of their effects. We have developed a novel approach and database, RegulomeDB, which guides interpretation of regulatory variants in the human genome. RegulomeDB includes high-throughput, experimental data sets from ENCODE and other sources, as well as computational predictions and manual annotations to identify putative regulatory potential and identify functional variants. These data sources are combined into a powerful tool that scores variants to help separate functional variants from a large pool and provides a small set of putative sites with testable hypotheses as to their function. We demonstrate the applicability of this tool to the annotation of noncoding variants from 69 full sequenced genomes as well as that of a personal genome, where thousands of functionally associated variants were identified. Moreover, we demonstrate a GWAS where the database is able to quickly identify the known associated functional variant and provide a hypothesis as to its function. Overall, we expect this approach and resource to be valuable for the annotation of human genome sequences.

    View details for DOI 10.1101/gr.137323.112

    View details for Web of Science ID 000308272800019

    View details for PubMedID 22955989

    View details for PubMedCentralID PMC3431494

  • ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia GENOME RESEARCH Landt, S. G., Marinov, G. K., Kundaje, A., Kheradpour, P., Pauli, F., Batzoglou, S., Bernstein, B. E., Bickel, P., Brown, J. B., Cayting, P., Chen, Y., DeSalvo, G., Epstein, C., Fisher-Aylor, K. I., Euskirchen, G., Gerstein, M., Gertz, J., Hartemink, A. J., Hoffman, M. M., Iyer, V. R., Jung, Y. L., Karmakar, S., Kellis, M., Kharchenko, P. V., Li, Q., Liu, T., Liu, X. S., Ma, L., Milosavljevic, A., Myers, R. M., Park, P. J., Pazin, M. J., Perry, M. D., Raha, D., Reddy, T. E., Rozowsky, J., Shoresh, N., Sidow, A., Slattery, M., Stamatoyannopoulos, J. A., Tolstorukov, M. Y., White, K. P., Xi, S., Farnham, P. J., Lieb, J. D., Wold, B. J., Snyder, M. 2012; 22 (9): 1813-1831


    Chromatin immunoprecipitation (ChIP) followed by high-throughput DNA sequencing (ChIP-seq) has become a valuable and widely used approach for mapping the genomic location of transcription-factor binding and histone modifications in living cells. Despite its widespread use, there are considerable differences in how these experiments are conducted, how the results are scored and evaluated for quality, and how the data and metadata are archived for public use. These practices affect the quality and utility of any global ChIP experiment. Through our experience in performing ChIP-seq experiments, the ENCODE and modENCODE consortia have developed a set of working standards and guidelines for ChIP experiments that are updated routinely. The current guidelines address antibody validation, experimental replication, sequencing depth, data and metadata reporting, and data quality assessment. We discuss how ChIP quality, assessed in these ways, affects different uses of ChIP-seq data. All data sets used in the analysis have been deposited for public viewing and downloading at the ENCODE ( and modENCODE ( portals.

    View details for DOI 10.1101/gr.136184.111

    View details for Web of Science ID 000308272800021

    View details for PubMedID 22955991

    View details for PubMedCentralID PMC3431496

  • Personal Omics Profiling Reveals Dynamic Molecular and Medical Phenotypes CELL Chen, R., Mias, G. I., Li-Pook-Than, J., Jiang, L., Lam, H. Y., Chen, R., Miriami, E., Karczewski, K. J., Hariharan, M., Dewey, F. E., Cheng, Y., Clark, M. J., Im, H., Habegger, L., Balasubramanian, S., O'Huallachain, M., Dudley, J. T., Hillenmeyer, S., Haraksingh, R., Sharon, D., Euskirchen, G., Lacroute, P., Bettinger, K., Boyle, A. P., Kasowski, M., Grubert, F., Seki, S., Garcia, M., Whirl-Carrillo, M., Gallardo, M., Blasco, M. A., Greenberg, P. L., Snyder, P., Klein, T. E., Altman, R. B., Butte, A. J., Ashley, E. A., Gerstein, M., Nadeau, K. C., Tang, H., Snyder, M. 2012; 148 (6): 1293-1307


    Personalized medicine is expected to benefit from combining genomic information with regular monitoring of physiological states by multiple high-throughput methods. Here, we present an integrative personal omics profile (iPOP), an analysis that combines genomic, transcriptomic, proteomic, metabolomic, and autoantibody profiles from a single individual over a 14 month period. Our iPOP analysis revealed various medical risks, including type 2 diabetes. It also uncovered extensive, dynamic changes in diverse molecular components and biological pathways across healthy and diseased conditions. Extremely high-coverage genomic and transcriptomic data, which provide the basis of our iPOP, revealed extensive heteroallelic changes during healthy and diseased states and an unexpected RNA editing mechanism. This study demonstrates that longitudinal iPOP can be used to interpret healthy and diseased states by connecting genomic information with additional dynamic omics activity.

    View details for DOI 10.1016/j.cell.2012.02.009

    View details for Web of Science ID 000301889500023

    View details for PubMedID 22424236

    View details for PubMedCentralID PMC3341616

  • Detecting and annotating genetic variations using the HugeSeq pipeline NATURE BIOTECHNOLOGY Lam, H. Y., Pan, C., Clark, M. J., Lacroute, P., Chen, R., Haraksingh, R., O'Huallachain, M., Gerstein, M. B., Kidd, J. M., Bustamante, C. D., Snyder, M. 2012; 30 (3): 226-229

    View details for Web of Science ID 000301303800013

    View details for PubMedID 22398614

  • Extensive Promoter-Centered Chromatin Interactions Provide a Topological Basis for Transcription Regulation CELL Li, G., Ruan, X., Auerbach, R. K., Sandhu, K. S., Zheng, M., Wang, P., Poh, H. M., Goh, Y., Lim, J., Zhang, J., Sim, H. S., Peh, S. Q., Mulawadi, F. H., Ong, C. T., Orlov, Y. L., Hong, S., Zhang, Z., Landt, S., Raha, D., Euskirchen, G., Wei, C., Ge, W., Wang, H., Davis, C., Fisher-Aylor, K. I., Mortazavi, A., Gerstein, M., Gingeras, T., Wold, B., Sun, Y., Fullwood, M. J., Cheung, E., Liu, E., Sung, W., Snyder, M., Ruan, Y. 2012; 148 (1-2): 84-98


    Higher-order chromosomal organization for transcription regulation is poorly understood in eukaryotes. Using genome-wide Chromatin Interaction Analysis with Paired-End-Tag sequencing (ChIA-PET), we mapped long-range chromatin interactions associated with RNA polymerase II in human cells and uncovered widespread promoter-centered intragenic, extragenic, and intergenic interactions. These interactions further aggregated into higher-order clusters, wherein proximal and distal genes were engaged through promoter-promoter interactions. Most genes with promoter-promoter interactions were active and transcribed cooperatively, and some interacting promoters could influence each other implying combinatorial complexity of transcriptional controls. Comparative analyses of different cell lines showed that cell-specific chromatin interactions could provide structural frameworks for cell-specific transcription, and suggested significant enrichment of enhancer-promoter interactions for cell-specific functions. Furthermore, genetically-identified disease-associated noncoding elements were found to be spatially engaged with corresponding genes through long-range interactions. Overall, our study provides insights into transcription regulation by three-dimensional chromatin interactions for both housekeeping and cell-specific genes in human cells.

    View details for DOI 10.1016/j.cell.2011.12.014

    View details for Web of Science ID 000299540700016

    View details for PubMedID 22265404

    View details for PubMedCentralID PMC3339270

  • Performance comparison of whole-genome sequencing platforms NATURE BIOTECHNOLOGY Lam, H. Y., Clark, M. J., Chen, R., Chen, R., Natsoulis, G., O'Huallachain, M., Dewey, F. E., Habegger, L., Ashley, E. A., Gerstein, M. B., Butte, A. J., Ji, H. P., Snyder, M. 2012; 30 (1): 78-U118

    View details for DOI 10.1038/nbt.2065

    View details for Web of Science ID 000299110600023

  • Dissecting phosphorylation networks: lessons learned from yeast EXPERT REVIEW OF PROTEOMICS Mok, J., Zhu, X., Snyder, M. 2011; 8 (6): 775-786


    Protein phosphorylation continues to be regarded as one of the most important post-translational modifications found in eukaryotes and has been implicated in key roles in the development of a number of human diseases. In order to elucidate roles for the 518 human kinases, phosphorylation has routinely been studied using the budding yeast Saccharomyces cerevisiae as a model system. In recent years, a number of technologies have emerged to globally map phosphorylation in yeast. In this article, we review these technologies and discuss how these phosphorylation mapping efforts have shed light on our understanding of kinase signaling pathways and eukaryotic proteomic networks in general.

    View details for DOI 10.1586/EPR.11.64

    View details for Web of Science ID 000297299000013

    View details for PubMedID 22087660

  • Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF NATURE Iyer, V. R., Horak, C. E., Scafe, C. S., Botstein, D., Snyder, M., Brown, P. O. 2001; 409 (6819): 533-538


    Proteins interact with genomic DNA to bring the genome to life; and these interactions also define many functional features of the genome. SBF and MBF are sequence-specific transcription factors that activate gene expression during the G1/S transition of the cell cycle in yeast. SBF is a heterodimer of Swi4 and Swi6, and MBF is a heterodimer of Mbpl and Swi6 (refs 1, 3). The related Swi4 and Mbp1 proteins are the DNA-binding components of the respective factors, and Swi6 mayhave a regulatory function. A small number of SBF and MBF target genes have been identified. Here we define the genomic binding sites of the SBF and MBF transcription factors in vivo, by using DNA microarrays. In addition to the previously characterized targets, we have identified about 200 new putative targets. Our results support the hypothesis that SBF activated genes are predominantly involved in budding, and in membrane and cell-wall biosynthesis, whereas DNA replication and repair are the dominant functions among MBF activated genes. The functional specialization of these factors may provide a mechanism for independent regulation of distinct molecular processes that normally occur in synchrony during the mitotic cell cycle.

    View details for Web of Science ID 000166570500053

    View details for PubMedID 11206552

  • Isolated Congenital Anosmia and CNGA2 Mutation. Scientific reports Sailani, M. R., Jingga, I., MirMazlomi, S. H., Bitarafan, F., Bernstein, J. A., Snyder, M. P., Garshasbi, M. 2017; 7 (1): 2667-?


    Isolated congenital anosmia (ICA) is a rare condition that is associated with life-long inability to smell. Here we report a genetic characterization of a large Iranian family segregating ICA. Whole exome sequencing in five affected family members and five healthy members revealed a stop gain mutation in CNGA2 (OMIM 300338) (chrX:150,911,102; CNGA2. c.577C > T; p.Arg193*). The mutation segregates in an X-linked pattern, as all the affected family members are hemizygotes, whereas healthy family members are either heterozygote or homozygote for the reference allele. cnga2 knockout mice are congenitally anosmic and have abnormal olfactory system physiology, additionally Karstensen et al. recently reported two anosmic brothers sharing a CNGA2 truncating variant. Our study in concert with these findings provides strong support for role of CNGA2 gene with pathogenicity of ICA in humans. Together, these results indicate that mutations in key olfactory signaling pathway genes are responsible for human disease.

    View details for DOI 10.1038/s41598-017-02947-y

    View details for PubMedID 28572688

  • Succinate and its G-protein-coupled receptor stimulates osteoclastogenesis. Nature communications Guo, Y., Xie, C., Li, X., Yang, J., Yu, T., Zhang, R., Zhang, T., Saxena, D., Snyder, M., Wu, Y., Li, X. 2017; 8: 15621-?


    The mechanism underlying bone impairment in patients with diabetes mellitus, a metabolic disorder characterized by chronic hyperglycaemia and dysregulation in metabolism, is unclear. Here we show the difference in the metabolomics of bone marrow stromal cells (BMSCs) derived from hyperglycaemic (type 2 diabetes mellitus, T2D) and normoglycaemic mice. One hundred and forty-two metabolites are substantially regulated in BMSCs from T2D mice, with the tricarboxylic acid (TCA) cycle being one of the primary metabolic pathways impaired by hyperglycaemia. Importantly, succinate, an intermediate metabolite in the TCA cycle, is increased by 24-fold in BMSCs from T2D mice. Succinate functions as an extracellular ligand through binding to its specific receptor on osteoclastic lineage cells and stimulates osteoclastogenesis in vitro and in vivo. Strategies targeting the receptor activation inhibit osteoclastogenesis. This study reveals a metabolite-mediated mechanism of osteoclastogenesis modulation that contributes to bone dysregulation in metabolic disorders.

    View details for DOI 10.1038/ncomms15621

    View details for PubMedID 28561074

  • Multi-platform analysis reveals a complex transcriptome architecture of a circovirus. Virus research Moldován, N., Balázs, Z., Tombácz, D., Csabai, Z., Szucs, A., Snyder, M., Boldogkoi, Z. 2017; 237: 37-46


    In this study, we used Pacific Biosciences RS II long-read and Illumina HiScanSQ short-read sequencing technologies for the characterization of porcine circovirus type 1 (PCV-1) transcripts. Our aim was to identify novel RNA molecules and transcript isoforms, as well as to determine the exact 5'- and 3'-end sequences of previously described transcripts with single base-pair accuracy. We discovered a novel 3'-UTR length isoform of the Cap transcript, and a non-spliced Cap transcript variant. Additionally, our analysis has revealed a 3'-UTR isoform of Rep and two 5'-UTR isoforms of Rep' transcripts, and a novel splice variant of the longer Rep' transcript. We also explored two novel long transcripts, one with a previously identified splice site, and a formerly undetected mRNA of ORF3. Altogether, our methods have identified nine novel RNA molecules, doubling the size of PCV-1 transcriptome that had been known before. Additionally, our investigations revealed an intricate pattern of transcript overlapping, which might produce transcriptional interference between the transcriptional machineries of adjacent genes, and thereby may potentially play a role in the regulation of gene expression in circoviruses.

    View details for DOI 10.1016/j.virusres.2017.05.010

    View details for PubMedID 28549855

  • intestinal stem-cell self-renewal. Nature Yan, K. S., Janda, C. Y., Chang, J., Zheng, G. X., Larkin, K. A., Luca, V. C., Chia, L. A., Mah, A. T., Han, A., Terry, J. M., Ootani, A., Roelf, K., Lee, M., Yuan, J., Li, X., Bolen, C. R., Wilhelmy, J., Davies, P. S., Ueno, H., von Furstenberg, R. J., Belgrader, P., Ziraldo, S. B., Ordonez, H., Henning, S. J., Wong, M. H., Snyder, M. P., Weissman, I. L., Hsueh, A. J., Mikkelsen, T. S., Garcia, K. C., Kuo, C. J. 2017; 545 (7653): 238-242


    The canonical Wnt/β-catenin signalling pathway governs diverse developmental, homeostatic and pathological processes. Palmitoylated Wnt ligands engage cell-surface frizzled (FZD) receptors and LRP5 and LRP6 co-receptors, enabling β-catenin nuclear translocation and TCF/LEF-dependent gene transactivation. Mutations in Wnt downstream signalling components have revealed diverse functions thought to be carried out by Wnt ligands themselves. However, redundancy between the 19 mammalian Wnt proteins and 10 FZD receptors and Wnt hydrophobicity have made it difficult to attribute these functions directly to Wnt ligands. For example, individual mutations in Wnt ligands have not revealed homeostatic phenotypes in the intestinal epithelium-an archetypal canonical, Wnt pathway-dependent, rapidly self-renewing tissue, the regeneration of which is fueled by proliferative crypt Lgr5(+) intestinal stem cells (ISCs). R-spondin ligands (RSPO1-RSPO4) engage distinct LGR4-LGR6, RNF43 and ZNRF3 receptor classes, markedly potentiate canonical Wnt/β-catenin signalling, and induce intestinal organoid growth in vitro and Lgr5(+) ISCs in vivo. However, the interchangeability, functional cooperation and relative contributions of Wnt versus RSPO ligands to in vivo canonical Wnt signalling and ISC biology remain unknown. Here we identify the functional roles of Wnt and RSPO ligands in the intestinal crypt stem-cell niche. We show that the default fate of Lgr5(+) ISCs is to differentiate, unless both RSPO and Wnt ligands are present. However, gain-of-function studies using RSPO ligands and a new non-lipidated Wnt analogue reveal that these ligands have qualitatively distinct, non-interchangeable roles in ISCs. Wnt proteins are unable to induce Lgr5(+) ISC self-renewal, but instead confer a basal competency by maintaining RSPO receptor expression that enables RSPO ligands to actively drive and specify the extent of stem-cell expansion. This functionally non-equivalent yet cooperative interaction between Wnt and RSPO ligands establishes a molecular precedent for regulation of mammalian stem cells by distinct priming and self-renewal factors, with broad implications for precise control of tissue regeneration.

    View details for DOI 10.1038/nature22313

    View details for PubMedID 28467820

  • Non-equivalence of Wnt and R-spondin ligands during Lgr5(+) intestinal stem-cell self-renewal NATURE Yan, K. S., Janda, C. Y., Chang, J., Zheng, G. X., Larkin, K. A., Luca, V. C., Chia, L. A., Mah, A. T., Han, A., Terry, J. M., Ootani, A., Roelf, K., Lee, M., Yuan, J., Li, X., Bolen, C. R., Wilhelmy, J., Davies, P. S., Ueno, H., von Furstenberg, R. J., Belgrader, P., Ziraldo, S. B., Ordonez, H., Henning, S. J., Wong, M. H., Snyder, M. P., Weissman, I. L., Hsueh, A. J., Mikkelsen, T. S., Garcia, K. C., Kuo, C. J. 2017; 545 (7653): 238-?
  • Histone variant H2A.J accumulates in senescent cells and promotes inflammatory gene expression NATURE COMMUNICATIONS Contrepois, K., Coudereau, C., Benayoun, B. A., Schuler, N., Roux, P., Bischof, O., Courbeyrette, R., Carvalho, C., Thuret, J., Ma, Z., Derbois, C., Nevers, M., Volland, H., Redon, C. E., Bonner, W. M., Deleuze, J., Wiel, C., Bernard, D., Snyder, M. P., Ruebe, C. E., Olaso, R., Fenaille, F., Mann, C. 2017; 8


    The senescence of mammalian cells is characterized by a proliferative arrest in response to stress and the expression of an inflammatory phenotype. Here we show that histone H2A.J, a poorly studied H2A variant found only in mammals, accumulates in human fibroblasts in senescence with persistent DNA damage. H2A.J also accumulates in mice with aging in a tissue-specific manner and in human skin. Knock-down of H2A.J inhibits the expression of inflammatory genes that contribute to the senescent-associated secretory phenotype (SASP), and over expression of H2A.J increases the expression of some of these genes in proliferating cells. H2A.J accumulation may thus promote the signalling of senescent cells to the immune system, and it may contribute to chronic inflammation and the development of aging-associated diseases.

    View details for DOI 10.1038/ncomms14995

    View details for Web of Science ID 000400886800001

    View details for PubMedID 28489069

  • Genome-scale measurement of off-target activity using Cas9 toxicity in high-throughput screens NATURE COMMUNICATIONS Morgens, D. W., Wainberg, M., Boyle, E. A., Ursu, O., Araya, C. L., Tsui, C. K., Haney, M. S., Hess, G. T., Han, K., Jeng, E. E., Li, A., Snyder, M. P., Greenleaf, W. J., Kundaje, A., Bassik, M. C. 2017; 8


    CRISPR-Cas9 screens are powerful tools for high-throughput interrogation of genome function, but can be confounded by nuclease-induced toxicity at both on- and off-target sites, likely due to DNA damage. Here, to test potential solutions to this issue, we design and analyse a CRISPR-Cas9 library with 10 variable-length guides per gene and thousands of negative controls targeting non-functional, non-genic regions (termed safe-targeting guides), in addition to non-targeting controls. We find this library has excellent performance in identifying genes affecting growth and sensitivity to the ricin toxin. The safe-targeting guides allow for proper control of toxicity from on-target DNA damage. Using this toxicity as a proxy to measure off-target cutting, we demonstrate with tens of thousands of guides both the nucleotide position-dependent sensitivity to single mismatches and the reduction of off-target cutting using truncated guides. Our results demonstrate a simple strategy for high-throughput evaluation of target specificity and nuclease toxicity in Cas9 screens.

    View details for DOI 10.1038/ncomms15178

    View details for Web of Science ID 000400851300001

    View details for PubMedID 28474669

  • A Case Report of Hypoglycemia and Hypogammaglobulinemia: DAVID syndrome in a patient with a novel NFKB2 mutation. journal of clinical endocrinology and metabolism Lal, R. A., Bachrach, L. K., Hoffman, A. R., Inlora, J., Rego, S., Snyder, M. P., Lewis, D. B. 2017


    DAVID syndrome (Deficient Anterior pituitary with Variable Immune Deficiency) is a rare disorder in which children present with symptomatic ACTH deficiency preceded by hypogammaglobulinemia from B-cell dysfunction with recurrent infections, termed common variable immunodeficiency (CVID). Subsequent whole exome sequencing studies have revealed germline heterozygous C-terminal mutations of NFKB2 as either a cause of DAVID syndrome or of CVID without clinical hypopituitarism. However, to the best of our knowledge there have been no cases in which the endocrinopathy has presented in the absence of a prior clinical history of CVID.A previously healthy 7 year-old boy with no history of clinical immunodeficiency, presented with profound hypoglycemia and seizures. He was found to have secondary adrenal insufficiency and was started on glucocorticoid replacement. An evaluation for autoimmune disease, including for anti-pituitary antibodies, was negative. Evaluation unexpectedly revealed hypogammaglobulinemia (decreased IgG, IgM, and IgA). He had moderately reduced serotype-specific IgG responses following pneumococcal polysaccharide vaccine. Subsequently, he was found to have growth hormone (GH) deficiency. Six years after initial presentation, whole exome sequencing revealed a novel de novo heterozygous NFKB2 missense mutation c.2596A>C (p.Ser866Arg) in the C-terminal region predicted to abrogate the processing of the p100 NFKB2 protein to its active p52 form.Isolated early-onset ACTH deficiency is rare and C-terminal region NFKB2 mutations should be considered as an etiology even in the absence of a clinical history of CVID. Early immunologic evaluation is indicated in the diagnosis and management of isolated ACTH deficiency.

    View details for DOI 10.1210/jc.2017-00341

    View details for PubMedID 28472507

  • Patient-Specific iPSC-Derived Endothelial Cells Uncover Pathways that Protect against Pulmonary Hypertension in BMPR2 Mutation Carriers CELL STEM CELL Gu, M., Shao, N., Sa, S., Li, D., Termglinchan, V., Ameen, M., Karakikes, I., Sosa, G., Grubert, F., Lee, J., Cao, A., Taylor, S., Ma, Y., Zhao, Z., Chappell, J., Hamid, R., Austin, E. D., Gold, J. D., Wu, J. C., Snyder, M. P., Rabinovitch, M. 2017; 20 (4): 490-?
  • Gpr124 is essential for blood-brain barrier integrity in central nervous system disease NATURE MEDICINE Chang, J., Mancuso, M. R., Maier, C., Liang, X., Yuki, K., Yang, L., Kwong, J. W., Wang, J., Rao, V., Vallon, M., Kosinski, C., Zhang, J. J., Mah, A. T., Xu, L., Li, L., Gholamin, S., Reyes, T. F., Li, R., Kuhnert, F., Han, X., Yuan, J., Chiou, S., Brettman, A. D., Daly, L., Corney, D. C., Cheshier, S. H., Shortliffe, L. D., Wu, X., Snyder, M., Chan, P., Giffard, R. G., Chang, H. Y., Andreasson, K., Kuo, C. J. 2017; 23 (4): 450-?


    Although blood-brain barrier (BBB) compromise is central to the etiology of diverse central nervous system (CNS) disorders, endothelial receptor proteins that control BBB function are poorly defined. The endothelial G-protein-coupled receptor (GPCR) Gpr124 has been reported to be required for normal forebrain angiogenesis and BBB function in mouse embryos, but the role of this receptor in adult animals is unknown. Here Gpr124 conditional knockout (CKO) in the endothelia of adult mice did not affect homeostatic BBB integrity, but resulted in BBB disruption and microvascular hemorrhage in mouse models of both ischemic stroke and glioblastoma, accompanied by reduced cerebrovascular canonical Wnt-β-catenin signaling. Constitutive activation of Wnt-β-catenin signaling fully corrected the BBB disruption and hemorrhage defects of Gpr124-CKO mice, with rescue of the endothelial gene tight junction, pericyte coverage and extracellular-matrix deficits. We thus identify Gpr124 as an endothelial GPCR specifically required for endothelial Wnt signaling and BBB integrity under pathological conditions in adult mice. This finding implicates Gpr124 as a potential therapeutic target for human CNS disorders characterized by BBB disruption.

    View details for DOI 10.1038/nm.4309

    View details for Web of Science ID 000398768100013

    View details for PubMedID 28288111

  • Induced Pluripotent Stem Cell Model of Pulmonary Arterial Hypertension Reveals Novel Gene Expression and Patient Specificity AMERICAN JOURNAL OF RESPIRATORY AND CRITICAL CARE MEDICINE Sa, S., Gu, M., Chappe, J., Shao, N., Ameen, M., Elliott, K. A., Li, D., Grubert, F., Li, C. G., Taylor, S., Cao, A., Ma, Y., Fong, R., Nguyen, L., Wu, J. C., Snyder, M. P., Rabinovitch, M. 2017; 195 (7): 930-941
  • Characterization of the Dynamic Transcriptome of a Herpesvirus with Long-read Single Molecule Real-Time Sequencing. Scientific reports Tombácz, D., Balázs, Z., Csabai, Z., Moldován, N., Szucs, A., Sharon, D., Snyder, M., Boldogkoi, Z. 2017; 7: 43751-?


    Herpesvirus gene expression is co-ordinately regulated and sequentially ordered during productive infection. The viral genes can be classified into three distinct kinetic groups: immediate-early, early, and late classes. In this study, a massively parallel sequencing technique that is based on PacBio Single Molecule Real-time sequencing platform, was used for quantifying the poly(A) fraction of the lytic transcriptome of pseudorabies virus (PRV) throughout a 12-hour interval of productive infection on PK-15 cells. Other approaches, including microarray, real-time RT-PCR and Illumina sequencing are capable of detecting only the aggregate transcriptional activity of particular genomic regions, but not individual herpesvirus transcripts. However, SMRT sequencing allows for a distinction between transcript isoforms, including length- and splice variants, as well as between overlapping polycistronic RNA molecules. The non-amplified Isoform Sequencing (Iso-Seq) method was used to analyse the kinetic properties of the lytic PRV transcripts and to then classify them accordingly. Additionally, the present study demonstrates the general utility of long-read sequencing for the time-course analysis of global gene expression in practically any organism.

    View details for DOI 10.1038/srep43751

    View details for PubMedID 28256586

    View details for PubMedCentralID PMC5335617

  • A common class of transcripts with 5'-intron depletion, distinct early coding sequence features, and N-1-methyladenosine modification RNA Cenik, C., Chua, H. N., Singh, G., Akef, A., Snyder, M. P., Palazzo, A. F., Moore, M. J., Roth, F. P. 2017; 23 (3): 270-283
  • -methyladenosine modification. RNA (New York, N.Y.) Cenik, C., Chua, H. N., Singh, G., Akef, A., Snyder, M. P., Palazzo, A. F., Moore, M. J., Roth, F. P. 2017; 23 (3): 270-283


    Introns are found in 5' untranslated regions (5'UTRs) for 35% of all human transcripts. These 5'UTR introns are not randomly distributed: Genes that encode secreted, membrane-bound and mitochondrial proteins are less likely to have them. Curiously, transcripts lacking 5'UTR introns tend to harbor specific RNA sequence elements in their early coding regions. To model and understand the connection between coding-region sequence and 5'UTR intron status, we developed a classifier that can predict 5'UTR intron status with >80% accuracy using only sequence features in the early coding region. Thus, the classifier identifies transcripts with 5' proximal-intron-minus-like-coding regions ("5IM" transcripts). Unexpectedly, we found that the early coding sequence features defining 5IM transcripts are widespread, appearing in 21% of all human RefSeq transcripts. The 5IM class of transcripts is enriched for non-AUG start codons, more extensive secondary structure both preceding the start codon and near the 5' cap, greater dependence on eIF4E for translation, and association with ER-proximal ribosomes. 5IM transcripts are bound by the exon junction complex (EJC) at noncanonical 5' proximal positions. Finally, N(1)-methyladenosines are specifically enriched in the early coding regions of 5IM transcripts. Taken together, our analyses point to the existence of a distinct 5IM class comprising ∼20% of human transcripts. This class is defined by depletion of 5' proximal introns, presence of specific RNA sequence features associated with low translation efficiency, N(1)-methyladenosines in the early coding region, and enrichment for noncanonical binding by the EJC.

    View details for DOI 10.1261/rna.059105.116

    View details for PubMedID 27994090

    View details for PubMedCentralID PMC5311483

  • Association of AHSG with alopecia and mental retardation (APMR) syndrome. Human genetics Reza Sailani, M., Jahanbani, F., Nasiri, J., Behnam, M., Salehi, M., Sedghi, M., Hoseinzadeh, M., Takahashi, S., Zia, A., Gruber, J., Lynch, J. L., Lam, D., Winkelmann, J., Amirkiai, S., Pang, B., Rego, S., Mazroui, S., Bernstein, J. A., Snyder, M. P. 2017; 136 (3): 287-296


    Alopecia with mental retardation syndrome (APMR) is a very rare autosomal recessive condition that is associated with total or partial absence of hair from the scalp and other parts of the body as well as variable intellectual disability. Here we present whole-exome sequencing results of a large consanguineous family segregating APMR syndrome with seven affected family members. Our study revealed a novel predicted pathogenic, homozygous missense mutation in the AHSG (OMIM 138680) gene (AHSG: NM_001622:exon7:c.950G>A:p.Arg317His). The variant is predicted to affect a region of the protein required for protein processing and disrupts a phosphorylation motif. In addition, the altered protein migrates with an aberrant size relative to healthy individuals. Consistent with the phenotype, AHSG maps within APMR linkage region 1 (APMR 1) as reported before, and falls within runs of homozygosity (ROH). Previous families with APMR syndrome have been studied through linkage analyses and the linkage resolution did not allow pointing out to a single gene candidate. Our study is the first report to identify a homozygous missense mutation for APMR syndrome through whole-exome sequencing.

    View details for DOI 10.1007/s00439-016-1756-5

    View details for PubMedID 28054173

  • Single cell transcriptomics reveals unanticipated features of early hematopoietic precursors NUCLEIC ACIDS RESEARCH Yang, J., Tanaka, Y., Seay, M., Li, Z., Jin, J., Garmire, L. X., Zhu, X., Taylor, A., Li, W., Euskirchen, G., Halene, S., Kluger, Y., Snyder, M. P., Park, I., Pan, X., Weissman, S. M. 2017; 45 (3): 1281-1296
  • Pharmacological rescue of diabetic skeletal stem cell niches. Science translational medicine Tevlin, R., Seo, E. Y., Marecic, O., McArdle, A., Tong, X., Zimdahl, B., Malkovskiy, A., Sinha, R., Gulati, G., Li, X., Wearda, T., Morganti, R., Lopez, M., Ransom, R. C., Duldulao, C. R., Rodrigues, M., Nguyen, A., Januszyk, M., Maan, Z., Paik, K., Yapa, K., Rajadas, J., Wan, D. C., Gurtner, G. C., Snyder, M., Beachy, P. A., Yang, F., Goodman, S. B., Weissman, I. L., Chan, C. K., Longaker, M. T. 2017; 9 (372)


    Diabetes mellitus (DM) is a metabolic disease frequently associated with impaired bone healing. Despite its increasing prevalence worldwide, the molecular etiology of DM-linked skeletal complications remains poorly defined. Using advanced stem cell characterization techniques, we analyzed intrinsic and extrinsic determinants of mouse skeletal stem cell (mSSC) function to identify specific mSSC niche-related abnormalities that could impair skeletal repair in diabetic (Db) mice. We discovered that high serum concentrations of tumor necrosis factor-α directly repressed the expression of Indian hedgehog (Ihh) in mSSCs and in their downstream skeletogenic progenitors in Db mice. When hedgehog signaling was inhibited during fracture repair, injury-induced mSSC expansion was suppressed, resulting in impaired healing. We reversed this deficiency by precise delivery of purified Ihh to the fracture site via a specially formulated, slow-release hydrogel. In the presence of exogenous Ihh, the injury-induced expansion and osteogenic potential of mSSCs were restored, culminating in the rescue of Db bone healing. Our results present a feasible strategy for precise treatment of molecular aberrations in stem and progenitor cell populations to correct skeletal manifestations of systemic disease.

    View details for DOI 10.1126/scitranslmed.aag2809

    View details for PubMedID 28077677

  • ChIA-PET2: a versatile and flexible pipeline for ChIA-PET data analysis. Nucleic acids research Li, G., Chen, Y., Snyder, M. P., Zhang, M. Q. 2017; 45 (1)


    ChIA-PET2 is a versatile and flexible pipeline for analyzing different types of ChIA-PET data from raw sequencing reads to chromatin loops. ChIA-PET2 integrates all steps required for ChIA-PET data analysis, including linker trimming, read alignment, duplicate removal, peak calling and chromatin loop calling. It supports different kinds of ChIA-PET data generated from different ChIA-PET protocols and also provides quality controls for different steps of ChIA-PET analysis. In addition, ChIA-PET2 can use phased genotype data to call allele-specific chromatin interactions. We applied ChIA-PET2 to different ChIA-PET datasets, demonstrating its significantly improved performance as well as its ability to easily process ChIA-PET raw data. ChIA-PET2 is available at

    View details for DOI 10.1093/nar/gkw809

    View details for PubMedID 27625391

    View details for PubMedCentralID PMC5224499

  • Disease Model of GATA4 Mutation Reveals Transcription Factor Cooperativity in Human Cardiogenesis CELL Ang, Y., Rivas, R. N., Ribeiro, A. J., Srivas, R., Rivera, J., Stone, N. R., Pratt, K., Mohamed, T. M., Fu, J., Spencer, C. I., Tippens, N. D., Li, M., Narasimha, A., Radzinsky, E., Moon-Grady, A. J., Yu, H., Pruitt, B. L., Snyder, M. P., Srivastava, D. 2016; 167 (7): 1734-?


    Mutation of highly conserved residues in transcription factors may affect protein-protein or protein-DNA interactions, leading to gene network dysregulation and human disease. Human mutations in GATA4, a cardiogenic transcription factor, cause cardiac septal defects and cardiomyopathy. Here, iPS-derived cardiomyocytes from subjects with a heterozygous GATA4-G296S missense mutation showed impaired contractility, calcium handling, and metabolic activity. In human cardiomyocytes, GATA4 broadly co-occupied cardiac enhancers with TBX5, another transcription factor that causes septal defects when mutated. The GATA4-G296S mutation disrupted TBX5 recruitment, particularly to cardiac super-enhancers, concomitant with dysregulation of genes related to the phenotypic abnormalities, including cardiac septation. Conversely, the GATA4-G296S mutation led to failure of GATA4 and TBX5-mediated repression at non-cardiac genes and enhanced open chromatin states at endothelial/endocardial promoters. These results reveal how disease-causing missense mutations can disrupt transcriptional cooperativity, leading to aberrant chromatin states and cellular dysfunction, including those related to morphogenetic defects.

    View details for DOI 10.1016/j.cell.2016.11.033

    View details for Web of Science ID 000393114700013

    View details for PubMedID 27984724

    View details for PubMedCentralID PMC5180611

  • Can heavy isotopes increase lifespan? Studies of relative abundance in various organisms reveal chemical perspectives on aging. BioEssays Li, X., Snyder, M. P. 2016; 38 (11): 1093-1101


    Stable heavy isotopes co-exist with their lighter counterparts in all elements commonly found in biology. These heavy isotopes represent a low natural abundance in isotopic composition but impose great retardation effects in chemical reactions because of kinetic isotopic effects (KIEs). Previous isotope analyses have recorded pervasive enrichment or depletion of heavy isotopes in various organisms, strongly supporting the capability of biological systems to distinguish different isotopes. This capability has recently been found to lead to general decline of heavy isotopes in metabolites during yeast aging. Conversely, supplementing heavy isotopes in growth medium promotes longevity. Whether this observation prevails in other organisms is not known, but it potentially bears promise in promoting human longevity.

    View details for DOI 10.1002/bies.201600040

    View details for PubMedID 27554342

  • Nat1 Deficiency Is Associated with Mitochondrial Dysfunction and Exercise Intolerance in Mice CELL REPORTS Chennamsetty, I., Coronado, M., Contrepois, K., Keller, M. P., Carcamo-Orive, I., Sandin, J., Fajardo, G., Whittle, A. J., Fathzadeh, M., Snyder, M., Reaven, G., Attie, A. D., Bernstein, D., Quertermous, T., Knowles, J. W. 2016; 17 (2): 527-540


    We recently identified human N-acetyltransferase 2 (NAT2) as an insulin resistance (IR) gene. Here, we examine the cellular mechanism linking NAT2 to IR and find that Nat1 (mouse ortholog of NAT2) is co-regulated with key mitochondrial genes. RNAi-mediated silencing of Nat1 led to mitochondrial dysfunction characterized by increased intracellular reactive oxygen species and mitochondrial fragmentation as well as decreased mitochondrial membrane potential, biogenesis, mass, cellular respiration, and ATP generation. These effects were consistent in 3T3-L1 adipocytes, C2C12 myoblasts, and in tissues from Nat1-deficient mice, including white adipose tissue, heart, and skeletal muscle. Nat1-deficient mice had changes in plasma metabolites and lipids consistent with a decreased ability to utilize fats for energy and a decrease in basal metabolic rate and exercise capacity without altered thermogenesis. Collectively, our results suggest that Nat1 deficiency results in mitochondrial dysfunction, which may constitute a mechanistic link between this gene and IR.

    View details for DOI 10.1016/j.celrep.2016.09.005

    View details for Web of Science ID 000385850700019

    View details for PubMedID 27705799

    View details for PubMedCentralID PMC5097870

  • Lineage-specific and single-cell chromatin accessibility charts human hematopoiesis and leukemia evolution. Nature genetics Corces, M. R., Buenrostro, J. D., Wu, B., Greenside, P. G., Chan, S. M., Koenig, J. L., Snyder, M. P., Pritchard, J. K., Kundaje, A., Greenleaf, W. J., Majeti, R., Chang, H. Y. 2016; 48 (10): 1193-1203


    We define the chromatin accessibility and transcriptional landscapes in 13 human primary blood cell types that span the hematopoietic hierarchy. Exploiting the finding that the enhancer landscape better reflects cell identity than mRNA levels, we enable 'enhancer cytometry' for enumeration of pure cell types from complex populations. We identify regulators governing hematopoietic differentiation and further show the lineage ontogeny of genetic elements linked to diverse human diseases. In acute myeloid leukemia (AML), chromatin accessibility uncovers unique regulatory evolution in cancer cells with a progressively increasing mutation burden. Single AML cells exhibit distinctive mixed regulome profiles corresponding to disparate developmental stages. A method to account for this regulatory heterogeneity identified cancer-specific deviations and implicated HOX factors as key regulators of preleukemic hematopoietic stem cell characteristics. Thus, regulome dynamics can provide diverse insights into hematopoietic development and disease.

    View details for DOI 10.1038/ng.3646

    View details for PubMedID 27526324

    View details for PubMedCentralID PMC5042844

  • A proposal for validation of antibodies NATURE METHODS Uhlen, M., Bandrowski, A., Carr, S., Edwards, A., Ellenberg, J., Lundberg, E., Rimm, D. L., Rodriguez, H., Hiltke, T., Snyder, M., Yamamoto, T. 2016; 13 (10): 823-?

    View details for DOI 10.1038/NMETH.3995

    View details for Web of Science ID 000385194600015

    View details for PubMedID 27595404

  • Multiple Pairwise Analysis of Non-homologous Centromere Coupling Reveals Preferential Chromosome Size-Dependent Interactions and a Role for Bouquet Formation in Establishing the Interaction Pattern PLOS GENETICS Lefrancois, P., Rockmill, B., Xie, P., Roeder, G. S., Snyder, M. 2016; 12 (10)


    During meiosis, chromosomes undergo a homology search in order to locate their homolog to form stable pairs and exchange genetic material. Early in prophase, chromosomes associate in mostly non-homologous pairs, tethered only at their centromeres. This phenomenon, conserved through higher eukaryotes, is termed centromere coupling in budding yeast. Both initiation of recombination and the presence of homologs are dispensable for centromere coupling (occurring in spo11 mutants and haploids induced to undergo meiosis) but the presence of the synaptonemal complex (SC) protein Zip1 is required. The nature and mechanism of coupling have yet to be elucidated. Here we present the first pairwise analysis of centromere coupling in an effort to uncover underlying rules that may exist within these non-homologous interactions. We designed a novel chromosome conformation capture (3C)-based assay to detect all possible interactions between non-homologous yeast centromeres during early meiosis. Using this variant of 3C-qPCR, we found a size-dependent interaction pattern, in which chromosomes assort preferentially with chromosomes of similar sizes, in haploid and diploid spo11 cells, but not in a coupling-defective mutant (spo11 zip1 haploid and diploid yeast). This pattern is also observed in wild-type diploids early in meiosis but disappears as meiosis progresses and homologous chromosomes pair. We found no evidence to support the notion that ancestral centromere homology plays a role in pattern establishment in S. cerevisiae post-genome duplication. Moreover, we found a role for the meiotic bouquet in establishing the size dependence of centromere coupling, as abolishing bouquet (using the bouquet-defective spo11 ndj1 mutant) reduces it. Coupling in spo11 ndj1 rather follows telomere clustering preferences. We propose that a chromosome size preference for centromere coupling helps establish efficient homolog recognition.

    View details for DOI 10.1371/journal.pgen.1006347

    View details for Web of Science ID 000386683300016

    View details for PubMedID 27768699

  • iPSC-derived cardiomyocytes reveal abnormal TGF-ß signalling in left ventricular non-compaction cardiomyopathy. Nature cell biology Kodo, K., Ong, S., Jahanbani, F., Termglinchan, V., Hirono, K., Inanloorahatloo, K., Ebert, A. D., Shukla, P., Abilez, O. J., Churko, J. M., Karakikes, I., Jung, G., Ichida, F., Wu, S. M., Snyder, M. P., Bernstein, D., Wu, J. C. 2016; 18 (10): 1031-1042


    Left ventricular non-compaction (LVNC) is the third most prevalent cardiomyopathy in children and its pathogenesis has been associated with the developmental defect of the embryonic myocardium. We show that patient-specific induced pluripotent stem cell-derived cardiomyocytes (iPSC-CMs) generated from LVNC patients carrying a mutation in the cardiac transcription factor TBX20 recapitulate a key aspect of the pathological phenotype at the single-cell level and this was associated with perturbed transforming growth factor beta (TGF-β) signalling. LVNC iPSC-CMs have decreased proliferative capacity due to abnormal activation of TGF-β signalling. TBX20 regulates the expression of TGF-β signalling modifiers including one known to be a genetic cause of LVNC, PRDM16, and genome editing of PRDM16 caused proliferation defects in iPSC-CMs. Inhibition of TGF-β signalling and genome correction of the TBX20 mutation were sufficient to reverse the disease phenotype. Our study demonstrates that iPSC-CMs are a useful tool for the exploration of pathological mechanisms underlying poorly understood cardiomyopathies including LVNC.

    View details for DOI 10.1038/ncb3411

    View details for PubMedID 27642787

  • Full-Length Isoform Sequencing Reveals Novel Transcripts and Substantial Transcriptional Overlaps in a Herpesvirus PLOS ONE Tombacz, D., Csabai, Z., Olah, P., Balazs, Z., Liko, I., Zsigmond, L., Sharon, D., Snyder, M., Boldogkoi, Z. 2016; 11 (9)


    Whole transcriptome studies have become essential for understanding the complexity of genetic regulation. However, the conventionally applied short-read sequencing platforms cannot be used to reliably distinguish between many transcript isoforms. The Pacific Biosciences (PacBio) RS II platform is capable of reading long nucleic acid stretches in a single sequencing run. The pseudorabies virus (PRV) is an excellent system to study herpesvirus gene expression and potential interactions between the transcriptional units. In this work, non-amplified and amplified isoform sequencing protocols were used to characterize the poly(A+) fraction of the lytic transcriptome of PRV, with the aim of a complete transcriptional annotation of the viral genes. The analyses revealed a previously unrecognized complexity of the PRV transcriptome including the discovery of novel protein-coding and non-coding genes, novel mono- and polycistronic transcription units, as well as extensive transcriptional overlaps between neighboring and distal genes. This study identified non-coding transcripts overlapping all three replication origins of the PRV, which might play a role in the control of DNA synthesis. We additionally established the relative expression levels of gene products. Our investigations revealed that the whole PRV genome is utilized for transcription, including both DNA strands in all coding and intergenic regions. The genome-wide occurrence of transcript overlaps suggests a crosstalk between genes through a network formed by interacting transcriptional machineries with a potential function in the control of gene expression.

    View details for DOI 10.1371/journal.pone.0162868

    View details for Web of Science ID 000384328500015

    View details for PubMedID 27685795

    View details for PubMedCentralID PMC5042381

  • Transcriptome Profiling of Patient-Specific Human iPSC-Cardiomyocytes Predicts Individual Drug Safety and Efficacy Responses In Vitro. Cell stem cell Matsa, E., Burridge, P. W., Yu, K., Ahrens, J. H., Termglinchan, V., Wu, H., Liu, C., Shukla, P., Sayed, N., Churko, J. M., Shao, N., Woo, N. A., Chao, A. S., Gold, J. D., Karakikes, I., Snyder, M. P., Wu, J. C. 2016; 19 (3): 311-325


    Understanding individual susceptibility to drug-induced cardiotoxicity is key to improving patient safety and preventing drug attrition. Human induced pluripotent stem cells (hiPSCs) enable the study of pharmacological and toxicological responses in patient-specific cardiomyocytes (CMs) and may serve as preclinical platforms for precision medicine. Transcriptome profiling in hiPSC-CMs from seven individuals lacking known cardiovascular disease-associated mutations and in three isogenic human heart tissue and hiPSC-CM pairs showed greater inter-patient variation than intra-patient variation, verifying that reprogramming and differentiation preserve patient-specific gene expression, particularly in metabolic and stress-response genes. Transcriptome-based toxicology analysis predicted and risk-stratified patient-specific susceptibility to cardiotoxicity, and functional assays in hiPSC-CMs using tacrolimus and rosiglitazone, drugs targeting pathways predicted to produce cardiotoxicity, validated inter-patient differential responses. CRISPR/Cas9-mediated pathway correction prevented drug-induced cardiotoxicity. Our data suggest that hiPSC-CMs can be used in vitro to predict and validate patient-specific drug safety and efficacy, potentially enabling future clinical approaches to precision medicine.

    View details for DOI 10.1016/j.stem.2016.07.006

    View details for PubMedID 27545504

  • Predicting Ovarian Cancer Patients' Clinical Response to Platinum-Based Chemotherapy by Their Tumor Proteomic Signatures JOURNAL OF PROTEOME RESEARCH Yu, K., Levine, D. A., Zhang, H., Chan, D. W., Zhang, Z., Snyder, M. 2016; 15 (8): 2455-2465


    Ovarian cancer is the deadliest gynecologic malignancy in the United States with most patients diagnosed in the advanced stage of the disease. Platinum-based antineoplastic therapeutics is indispensable to treating advanced ovarian serous carcinoma. However, patients have heterogeneous responses to platinum drugs, and it is difficult to predict these interindividual differences before administering medication. In this study, we investigated the tumor proteomic profiles and clinical characteristics of 130 ovarian serous carcinoma patients analyzed by the Clinical Proteomic Tumor Analysis Consortium (CPTAC), predicted the platinum drug response using supervised machine learning methods, and evaluated our prediction models through leave-one-out cross-validation. Our data-driven feature selection approach indicated that tumor proteomics profiles contain information for predicting binarized platinum response (P < 0.0001). We further built a least absolute shrinkage and selection operator (LASSO)-Cox proportional hazards model that stratified patients into early relapse and late relapse groups (P = 0.00013). The top proteomic features indicative of platinum response were involved in ATP synthesis pathways and Ran GTPase binding. Overall, we demonstrated that proteomic profiles of ovarian serous carcinoma patients predicted platinum drug responses as well as provided insights into the biological processes influencing the efficacy of platinum-based therapeutics. Our analytical approach is also extensible to predicting response to other antineoplastic agents or treatment modalities for both ovarian and other cancers.

    View details for DOI 10.1021/acs.jproteome.5b01129

    View details for Web of Science ID 000381235900010

    View details for PubMedID 27312948

  • EPHB4 kinase-inactivating mutations cause autosomal dominant lymphatic-related hydrops fetalis. journal of clinical investigation Martin-Almedina, S., Martinez-Corral, I., Holdhus, R., Vicente, A., Fotiou, E., Lin, S., Petersen, K., Simpson, M. A., Hoischen, A., Gilissen, C., Jeffery, H., Atton, G., Karapouliou, C., Brice, G., Gordon, K., Wiseman, J. W., Wedin, M., Rockson, S. G., Jeffery, S., Mortimer, P. S., Snyder, M. P., Berland, S., Mansour, S., Makinen, T., Ostergaard, P. 2016; 126 (8): 3080-3088


    Hydrops fetalis describes fluid accumulation in at least 2 fetal compartments, including abdominal cavities, pleura, and pericardium, or in body tissue. The majority of hydrops fetalis cases are nonimmune conditions that present with generalized edema of the fetus, and approximately 15% of these nonimmune cases result from a lymphatic abnormality. Here, we have identified an autosomal dominant, inherited form of lymphatic-related (nonimmune) hydrops fetalis (LRHF). Independent exome sequencing projects on 2 families with a history of in utero and neonatal deaths associated with nonimmune hydrops fetalis uncovered 2 heterozygous missense variants in the gene encoding Eph receptor B4 (EPHB4). Biochemical analysis determined that the mutant EPHB4 proteins are devoid of tyrosine kinase activity, indicating that loss of EPHB4 signaling contributes to LRHF pathogenesis. Further, inactivation of Ephb4 in lymphatic endothelial cells of developing mouse embryos led to defective lymphovenous valve formation and consequent subcutaneous edema. Together, these findings identify EPHB4 as a critical regulator of early lymphatic vascular development and demonstrate that mutations in the gene can cause an autosomal dominant form of LRHF that is associated with a high mortality rate.

    View details for DOI 10.1172/JCI85794

    View details for PubMedID 27400125

  • Omics Profiling in Precision Oncology. Molecular & cellular proteomics Yu, K., Snyder, M. 2016; 15 (8): 2525-2536


    Cancer causes significant morbidity and mortality worldwide, and is the area most targeted in precision medicine. Recent development of high-throughput methods enables detailed omics analysis of the molecular mechanisms underpinning tumor biology. These studies have identified clinically actionable mutations, gene and protein expression patterns associated with prognosis, and provided further insights into the molecular mechanisms indicative of cancer biology and new therapeutics strategies such as immunotherapy. In this review, we summarize the techniques used for tumor omics analysis, recapitulate the key findings in cancer omics studies, and point to areas requiring further research on precision oncology.

    View details for DOI 10.1074/mcp.O116.059253

    View details for PubMedID 27099341

  • Integrated Proteogenomic Characterization of Human High-Grade Serous Ovarian Cancer. Cell Zhang, H., Liu, T., Zhang, Z., Payne, S. H., Zhang, B., McDermott, J. E., Zhou, J., Petyuk, V. A., Chen, L., Ray, D., Sun, S., Yang, F., Chen, L., Wang, J., Shah, P., Cha, S. W., Aiyetan, P., Woo, S., Tian, Y., Gritsenko, M. A., Clauss, T. R., Choi, C., Monroe, M. E., Thomas, S., Nie, S., Wu, C., Moore, R. J., Yu, K., Tabb, D. L., Fenyö, D., Bafna, V., Wang, Y., Rodriguez, H., Boja, E. S., Hiltke, T., Rivers, R. C., Sokoll, L., Zhu, H., Shih, I., Cope, L., Pandey, A., Zhang, B., Snyder, M. P., Levine, D. A., Smith, R. D., Chan, D. W., Rodland, K. D. 2016; 166 (3): 755-765


    To provide a detailed analysis of the molecular components and underlying mechanisms associated with ovarian cancer, we performed a comprehensive mass-spectrometry-based proteomic characterization of 174 ovarian tumors previously analyzed by The Cancer Genome Atlas (TCGA), of which 169 were high-grade serous carcinomas (HGSCs). Integrating our proteomic measurements with the genomic data yielded a number of insights into disease, such as how different copy-number alternations influence the proteome, the proteins associated with chromosomal instability, the sets of signaling pathways that diverse genome rearrangements converge on, and the ones most associated with short overall survival. Specific protein acetylations associated with homologous recombination deficiency suggest a potential means for stratifying patients for therapy. In addition to providing a valuable resource, these findings provide a view of how the somatic genome drives the cancer proteome and associations between protein and post-translational modification levels and clinical outcomes in HGSC.

    View details for DOI 10.1016/j.cell.2016.05.069

    View details for PubMedID 27372738

  • Integrated Network Analysis Reveals an Association between Plasma Mannose Levels and Insulin Resistance CELL METABOLISM Lee, S., Zhang, C., Kilicarslan, M., Piening, B. D., Bjornson, E., Hallstrom, B. M., Groen, A. K., Ferrannini, E., Laakso, M., Snyder, M., Bluher, M., Uhlen, M., Nielsen, J., Smith, U., Serlie, M. J., Boren, J., Mardinoglu, A. 2016; 24 (1): 172-184


    To investigate the biological processes that are altered in obese subjects, we generated cell-specific integrated networks (INs) by merging genome-scale metabolic, transcriptional regulatory and protein-protein interaction networks. We performed genome-wide transcriptomics analysis to determine the global gene expression changes in the liver and three adipose tissues from obese subjects undergoing bariatric surgery and integrated these data into the cell-specific INs. We found dysregulations in mannose metabolism in obese subjects and validated our predictions by detecting mannose levels in the plasma of the lean and obese subjects. We observed significant correlations between plasma mannose levels, BMI, and insulin resistance (IR). We also measured plasma mannose levels of the subjects in two additional different cohorts and observed that an increased plasma mannose level was associated with IR and insulin secretion. We finally identified mannose as one of the best plasma metabolites in explaining the variance in obesity-independent IR.

    View details for DOI 10.1016/j.cmet.2016.05.026

    View details for Web of Science ID 000380793400022

    View details for PubMedID 27345421

  • Using Mass Spectrometry to Quantify Rituximab and Perform Individualized Immunoglobulin Phenotyping in ANCA-Associated Vasculitis ANALYTICAL CHEMISTRY Mills, J. R., Cornec, D., Dasari, S., Ladwig, P. M., Hummel, A. M., Cheu, M., Murray, D. L., Willrich, M. A., Snyder, M. R., Hoffman, G. S., Kallenberg, C. G., Langford, C. A., Merkel, P. A., Monach, P. A., Seo, P., Spiera, R. F., St Cair, E. W., Stone, J. H., Specks, U., Barnidge, D. R. 2016; 88 (12): 6317-6325


    Therapeutic monoclonal immunoglobulins (mAbs) are used to treat patients with a wide range of disorders including autoimmune diseases. As pharmaceutical companies bring more fully humanized therapeutic mAb drugs to the healthcare market analytical platforms that perform therapeutic drug monitoring (TDM) without relying on mAb specific reagents will be needed. In this study we demonstrate that liquid-chromatography-mass spectrometry (LC-MS) can be used to perform TDM of mAbs in the same manner as smaller nonbiologic drugs. The assay uses commercially available reagents combined with heavy and light chain disulfide bond reduction followed by light chain analysis by microflow-LC-electrospray ionization-quadrupole-time-of-flight mass spectrometry (ESI-Q-TOF MS). Quantification is performed using the peak areas from multiply charged mAb light chain ions using an in-house developed software package developed for TDM of mAbs. The data presented here demonstrate the ability of an LC-MS assay to quantify a therapeutic mAb in a large cohort of patients in a clinical trial. The ability to quantify any mAb in serum via the reduced light chain without the need for reagents specific for each mAb demonstrates the unique capabilities of LC-MS. This fact, coupled with the ability to phenotype a patient's polyclonal repertoire in the same analysis further shows the potential of this approach to mAb analysis.

    View details for DOI 10.1021/acs.analchem.6b00544

    View details for Web of Science ID 000378470200034

    View details for PubMedID 27228216

  • Genome assembly from synthetic long read clouds BIOINFORMATICS Kuleshov, V., Snyder, M. P., Batzoglou, S. 2016; 32 (12): 216-224
  • Genome assembly from synthetic long read clouds. Bioinformatics Kuleshov, V., Snyder, M. P., Batzoglou, S. 2016; 32 (12): i216-i224


    Despite rapid progress in sequencing technology, assembling de novo the genomes of new species as well as reconstructing complex metagenomes remains major technological challenges. New synthetic long read (SLR) technologies promise significant advances towards these goals; however, their applicability is limited by high sequencing requirements and the inability of current assembly paradigms to cope with combinations of short and long reads.Here, we introduce Architect, a new de novo scaffolder aimed at SLR technologies. Unlike previous assembly strategies, Architect does not require a costly subassembly step; instead it assembles genomes directly from the SLR's underlying short reads, which we refer to as read clouds This enables a 4- to 20-fold reduction in sequencing requirements and a 5-fold increase in assembly contiguity on both genomic and metagenomic datasets relative to state-of-the-art assembly strategies aimed directly at fully subassembled long reads.Our source code is freely available at

    View details for DOI 10.1093/bioinformatics/btw267

    View details for PubMedID 27307620

  • Concerted genomic targeting of H3K27 demethylase REF6 and chromatin-remodeling ATPase BRM in Arabidopsis NATURE GENETICS Li, C., Gu, L., Gao, L., Chen, C., Wei, C., Qiu, Q., Chien, C., Wang, S., Jiang, L., Ai, L., Chen, C., Yang, S., Nguyen, V., Qi, Y., Snyder, M. P., Burlingame, A. L., Kohalmi, S. E., Huang, S., Cao, X., Wang, Z., Wu, K., Chen, X., Cui, Y. 2016; 48 (6): 687-?


    SWI/SNF-type chromatin remodelers, such as BRAHMA (BRM), and H3K27 demethylases both have active roles in regulating gene expression at the chromatin level, but how they are recruited to specific genomic sites remains largely unknown. Here we show that RELATIVE OF EARLY FLOWERING 6 (REF6), a plant-unique H3K27 demethylase, targets genomic loci containing a CTCTGYTY motif via its zinc-finger (ZnF) domains and facilitates the recruitment of BRM. Genome-wide analyses showed that REF6 colocalizes with BRM at many genomic sites with the CTCTGYTY motif. Loss of REF6 results in decreased BRM occupancy at BRM-REF6 co-targets. Furthermore, REF6 directly binds to the CTCTGYTY motif in vitro, and deletion of the motif from a target gene renders it inaccessible to REF6 in vivo. Finally, we show that, when its ZnF domains are deleted, REF6 loses its genomic targeting ability. Thus, our work identifies a new genomic targeting mechanism for an H3K27 demethylase and demonstrates its key role in recruiting the BRM chromatin remodeler.

    View details for DOI 10.1038/ng.3555

    View details for Web of Science ID 000376744200018

    View details for PubMedID 27111034

  • The genetic predisposition to bronchopulmonary dysplasia CURRENT OPINION IN PEDIATRICS Yu, K., Li, J., Snyder, M., Shaw, G. M., O'Brodovich, H. M. 2016; 28 (3): 318-323


    Bronchopulmonary dysplasia (BPD) is a prevalent chronic lung disease in premature infants. Twin studies have shown strong heritability underlying this disease; however, the genetic architecture of BPD remains unclear.A number of studies employed different approaches to characterize the genetic aberrations associated with BPD, including candidate gene studies, genome-wide association studies, exome sequencing, integrative omics analysis, and pathway analysis. Candidate gene studies identified a number of genes potentially involved with the development of BPD, but the etiological contribution from each gene is not substantial. Copy number variation studies and three independent genome-wide association studies did not identify genetic variations significantly and consistently associated with BPD. A recent exome-sequencing study pointed to rare variants implicated in the disease. In this review, we summarize these studies' methodology and findings, and suggest future research directions to better understand the genetic underpinnings of this potentially life-long lung disease.Genetic factors play a significant role in the development of BPD. Recent studies suggested that rare variants in genes participating in lung development pathways could contribute to BPD susceptibility.

    View details for DOI 10.1097/MOP.0000000000000344

    View details for Web of Science ID 000376387000010

    View details for PubMedID 26963946

  • Age-Dependent Pancreatic Gene Regulation Reveals Mechanisms Governing Human beta Cell Function CELL METABOLISM Arda, H. E., Li, L., Tsai, J., Torre, E. A., Rosli, Y., Peiris, H., Spitale, R. C., Dai, C., Gu, X., Qu, K., Wang, P., Wang, J., Grompe, M., Scharfmann, R., Snyder, M. S., Bottino, R., Powers, A. C., Chang, H. Y., Kim, S. K. 2016; 23 (5): 909-920


    Intensive efforts are focused on identifying regulators of human pancreatic islet cell growth and maturation to accelerate development of therapies for diabetes. After birth, islet cell growth and function are dynamically regulated; however, establishing these age-dependent changes in humans has been challenging. Here, we describe a multimodal strategy for isolating pancreatic endocrine and exocrine cells from children and adults to identify age-dependent gene expression and chromatin changes on a genomic scale. These profiles revealed distinct proliferative and functional states of islet α cells or β cells and histone modifications underlying age-dependent gene expression changes. Expression of SIX2 and SIX3, transcription factors without prior known functions in the pancreas and linked to fasting hyperglycemia risk, increased with age specifically in human islet β cells. SIX2 and SIX3 were sufficient to enhance insulin content or secretion in immature β cells. Our work provides a unique resource to study human-specific regulators of islet cell maturation and function.

    View details for DOI 10.1016/j.cmet.2016.04.002

    View details for Web of Science ID 000375550700021

    View details for PubMedID 27133132

  • Systematic evaluation of the impact of ChIP-seq read designs on genome coverage, peak identification, and allele-specific binding detection BMC BIOINFORMATICS Zhang, Q., Zeng, X., Younkin, S., Kawli, T., Snyder, M. P., Keles, S. 2016; 17


    Chromatin immunoprecipitation followed by sequencing (ChIP-seq) experiments revolutionized genome-wide profiling of transcription factors and histone modifications. Although maturing sequencing technologies allow these experiments to be carried out with short (36-50 bps), long (75-100 bps), single-end, or paired-end reads, the impact of these read parameters on the downstream data analysis are not well understood. In this paper, we evaluate the effects of different read parameters on genome sequence alignment, coverage of different classes of genomic features, peak identification, and allele-specific binding detection.We generated 101 bps paired-end ChIP-seq data for many transcription factors from human GM12878 and MCF7 cell lines. Systematic evaluations using in silico variations of these data as well as fully simulated data, revealed complex interplay between the sequencing parameters and analysis tools, and indicated clear advantages of paired-end designs in several aspects such as alignment accuracy, peak resolution, and most notably, allele-specific binding detection.Our work elucidates the effect of design on the downstream analysis and provides insights to investigators in deciding sequencing parameters in ChIP-seq experiments. We present the first systematic evaluation of the impact of ChIP-seq designs on allele-specific binding detection and highlights the power of pair-end designs in such studies.

    View details for DOI 10.1186/s12859-016-0957-1

    View details for Web of Science ID 000370775000001

    View details for PubMedID 26908256

  • Identification of significantly mutated regions across cancer types highlights a rich landscape of functional molecular alterations. Nature genetics Araya, C. L., Cenik, C., Reuter, J. A., Kiss, G., Pande, V. S., Snyder, M. P., Greenleaf, W. J. 2016; 48 (2): 117-125


    Cancer sequencing studies have primarily identified cancer driver genes by the accumulation of protein-altering mutations. An improved method would be annotation independent, sensitive to unknown distributions of functions within proteins and inclusive of noncoding drivers. We employed density-based clustering methods in 21 tumor types to detect variably sized significantly mutated regions (SMRs). SMRs reveal recurrent alterations across a spectrum of coding and noncoding elements, including transcription factor binding sites and untranslated regions mutated in up to ∼15% of specific tumor types. SMRs demonstrate spatial clustering of alterations in molecular domains and at interfaces, often with associated changes in signaling. Mutation frequencies in SMRs demonstrate that distinct protein regions are differentially mutated across tumor types, as exemplified by a linker region of PIK3CA in which biophysical simulations suggest that mutations affect regulatory interactions. The functional diversity of SMRs underscores both the varied mechanisms of oncogenic misregulation and the advantage of functionally agnostic driver identification.

    View details for DOI 10.1038/ng.3471

    View details for PubMedID 26691984

  • Proteome-wide survey of the autoimmune target repertoire in autoimmune polyendocrine syndrome type 1 SCIENTIFIC REPORTS Landegren, N., Sharon, D., Freyhult, E., Hallgren, A., Eriksson, D., Edqvist, P., Bensing, S., Wahlberg, J., Nelson, L. M., Gustafsson, J., Husebye, E. S., Anderson, M. S., Snyder, M., Kampe, O. 2016; 6


    Autoimmune polyendocrine syndrome type 1 (APS1) is a monogenic disorder that features multiple autoimmune disease manifestations. It is caused by mutations in the Autoimmune regulator (AIRE) gene, which promote thymic display of thousands of peripheral tissue antigens in a process critical for establishing central immune tolerance. We here used proteome arrays to perform a comprehensive study of autoimmune targets in APS1. Interrogation of established autoantigens revealed highly reliable detection of autoantibodies, and by exploring the full panel of more than 9000 proteins we further identified MAGEB2 and PDILT as novel major autoantigens in APS1. Our proteome-wide assessment revealed a marked enrichment for tissue-specific immune targets, mirroring AIRE's selectiveness for this category of genes. Our findings also suggest that only a very limited portion of the proteome becomes targeted by the immune system in APS1, which contrasts the broad defect of thymic presentation associated with AIRE-deficiency and raises novel questions what other factors are needed for break of tolerance.

    View details for DOI 10.1038/srep20104

    View details for Web of Science ID 000368996700001

    View details for PubMedID 26830021

  • Distance from sub-Saharan Africa predicts mutational load in diverse human genomes PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Henn, B. M., Botigue, L. R., Peischl, S., Dupanloup, I., Lipatov, M., Maples, B. K., Martin, A. R., Musharoff, S., Cann, H., Snyder, M. P., Excoffier, L., Kidd, J. M., Bustamante, C. D. 2016; 113 (4): E440-E449
  • Disease Variant Landscape of a Large Multiethnic Population of Moyamoya Patients by Exome Sequencing G3-GENES GENOMES GENETICS Shoemaker, L. D., Clark, M. J., Patwardhan, A., Chandratillake, G., Garcia, S., Chen, R., Morgan, A. A., Leng, N., Kirk, S., Chen, R., Cook, D. J., Snyder, M., Steinberg, G. K. 2016; 6 (1): 41-49
  • NIH working group report-using genomic information to guide weight management: From universal to precision treatment OBESITY Bray, M. S., Loos, R. J., McCaffery, J. M., Ling, C., Franks, P. W., Weinstock, G. M., Snyder, M. P., Vassy, J. L., Agurs-Collins, T. 2016; 24 (1): 14-22

    View details for DOI 10.1002/oby.21381

    View details for Web of Science ID 000367189800005

    View details for PubMedID 26692578

  • Metformin Improves Diabetic Bone Health by Re-Balancing Catabolism and Nitrogen Disposal PLOS ONE Li, X., Guo, Y., Yan, W., Snyder, M. P., Li, X. 2015; 10 (12)
  • Integrated Proteomic and Genomic Analysis of Gastric Cancer Patient Tissues JOURNAL OF PROTEOME RESEARCH Yan, J. F., Kim, H., Jeong, S., Lee, H., Sethi, M. K., Lee, L. Y., Beavis, R. C., Im, H., Snyder, M. P., Hofree, M., Ideker, T., Wu, S., Paik, Y., Fanayan, S., Hancock, W. S. 2015; 14 (12): 4995-5006
  • Integrative analysis of RNA, translation, and protein levels reveals distinct regulatory variation across humans GENOME RESEARCH Cenik, C., Cenik, E. S., Byeon, G. W., Grubert, F., Candille, S. I., Spacek, D., Alsallakh, B., Tilgner, H., Araya, C. L., Tang, H., Ricci, E., Snyder, M. P. 2015; 25 (11): 1610-1621


    Elucidating the consequences of genetic differences between humans is essential for understanding phenotypic diversity and personalized medicine. Although variation in RNA levels, transcription factor binding, and chromatin have been explored, little is known about global variation in translation and its genetic determinants. We used ribosome profiling, RNA sequencing, and mass spectrometry to perform an integrated analysis in lymphoblastoid cell lines from a diverse group of individuals. We find significant differences in RNA, translation, and protein levels suggesting diverse mechanisms of personalized gene expression control. Combined analysis of RNA expression and ribosome occupancy improves the identification of individual protein level differences. Finally, we identify genetic differences that specifically modulate ribosome occupancy--many of these differences lie close to start codons and upstream ORFs. Our results reveal a new level of gene expression variation among humans and indicate that genetic variants can cause changes in protein levels through effects on translation.

    View details for DOI 10.1101/gr.193342.115

    View details for Web of Science ID 000364355600003

    View details for PubMedID 26297486

    View details for PubMedCentralID PMC4617958

  • Design and Implementation of the International Genetics and Translational Research in Transplantation Network TRANSPLANTATION Keating, B. J., van Setten, J., Jacobson, P. A., Holmes, M. V., Verma, S. S., Chandrupatla, H. R., Nair, N., Gao, H., Li, Y. R., Chang, B., Wong, C., Phillips, R., Cole, B. S., Mukhtar, E., Zhang, W., Cao, H., Mohebnasab, M., Hou, C., Lee, T., Steel, L., Shaked, O., Garifallou, J., Miller, M. B., Karczewski, K. J., Akdere, A., Gonzalez, A., Lloyd, K. M., McGinn, D., Michaud, Z., Colasacco, A., Lek, M., Fu, Y., Pawashe, M., Guettouche, T., Himes, A., Perez, L., Guan, W., Wu, B., Schladt, D., Menon, M., Zhang, Z., Tragante, V., de Jonge, N., Otten, H. G., de Weger, R. A., van de Graaf, E. A., Baan, C. C., Manintveld, O. C., De Vlaminck, I., Piening, B. D., Strehl, C., Shaw, M., Snieder, H., Klintmalm, G. B., O'Leary, J. G., Amaral, S., Goldfarb, S., Rand, E., Rossano, J. W., Kohli, U., Heeger, P., Stahl, E., Christie, J. D., Fuentes, M. H., Levine, J. E., Aplenc, R., Schadt, E. E., Stranger, B. E., Kluin, J., Potena, L., Zuckermann, A., Khush, K., Alzahrani, A. J., Al-Muhanna, F. A., Al-Ali, A. K., Al-Ali, R., Al-Rubaish, A. M., Al-Mueilo, S., Byrne, E. M., Miller, D., Alexander, S. I., Onengut-Gumuscu, S., Rich, S. S., Suthanthiran, M., Tedesco, H., Saw, C. L., Ragoussis, J., Kfoury, A. G., Horne, B., Carlquist, J., Gerstein, M. B., Reindl-Schwaighofer, R., Oberbauer, R., Wijmenga, C., Palmer, S., Pereira, A. C., Segovia, J., Alonso-Pulpon, L. A., Comez-Bueno, M., Vilches, C., Jaramillo, N., de Borst, M. H., Naesens, M., Hao, K., MacArthur, D., Balasubramanian, S., Conlon, P. J., Lord, G. M., Ritchie, M. D., Snyder, M., Olthoff, K. M., Moore, J. H., Petersdorf, E. W., Kamoun, M., Wang, J., Monos, D. S., de Bakker, P. I., Hakonarson, H., Murphy, B., Lankree, M. B., Garcia-Pavia, P., Oetting, W. S., Birdwell, K. A., Bakker, S. J., Israni, A. K., Shaked, A., Asselbergs, F. W. 2015; 99 (11): 2401-2412


    Genetic association studies of transplantation outcomes have been hampered by small samples and highly complex multifactorial phenotypes, hindering investigations of the genetic architecture of a range of comorbidities which significantly impact graft and recipient life expectancy. We describe here the rationale and design of the International Genetics & Translational Research in Transplantation Network. The network comprises 22 studies to date, including 16494 transplant recipients and 11669 donors, of whom more than 5000 are of non-European ancestry, all of whom have existing genomewide genotype data sets.We describe the rich genetic and phenotypic information available in this consortium comprising heart, kidney, liver, and lung transplant cohorts.We demonstrate significant power in International Genetics & Translational Research in Transplantation Network to detect main effect association signals across regions such as the MHC region as well as genomewide for transplant outcomes that span all solid organs, such as graft survival, acute rejection, new onset of diabetes after transplantation, and for delayed graft function in kidney only.This consortium is designed and statistically powered to deliver pioneering insights into the genetic architecture of transplant-related outcomes across a range of different solid-organ transplant studies. The study design allows a spectrum of analyses to be performed including recipient-only analyses, donor-recipient HLA mismatches with focus on loss-of-function variants and nonsynonymous single nucleotide polymorphisms.

    View details for DOI 10.1097/TP.0000000000000913

    View details for Web of Science ID 000369087800037

    View details for PubMedCentralID PMC4623847

  • Sequence to Medical Phenotypes: A Framework for Interpretation of Human Whole Genome DNA Sequence Data PLOS GENETICS Dewey, F. E., Grove, M. E., Priest, J. R., Waggott, D., Batra, P., Miller, C. L., Wheeler, M., Zia, A., Pan, C., Karzcewski, K. J., Miyake, C., Whirl-Carrillo, M., Klein, T. E., Datta, S., Altman, R. B., Snyder, M., Quertermous, T., Ashley, E. A. 2015; 11 (10)


    High throughput sequencing has facilitated a precipitous drop in the cost of genomic sequencing, prompting predictions of a revolution in medicine via genetic personalization of diagnostic and therapeutic strategies. There are significant barriers to realizing this goal that are related to the difficult task of interpreting personal genetic variation. A comprehensive, widely accessible application for interpretation of whole genome sequence data is needed. Here, we present a series of methods for identification of genetic variants and genotypes with clinical associations, phasing genetic data and using Mendelian inheritance for quality control, and providing predictive genetic information about risk for rare disease phenotypes and response to pharmacological therapy in single individuals and father-mother-child trios. We demonstrate application of these methods for disease and drug response prognostication in whole genome sequence data from twelve unrelated adults, and for disease gene discovery in one father-mother-child trio with apparently simplex congenital ventricular arrhythmia. In doing so we identify clinically actionable inherited disease risk and drug response genotypes in pre-symptomatic individuals. We also nominate a new candidate gene in congenital arrhythmia, ATP2B4, and provide experimental evidence of a regulatory role for variants discovered using this framework.

    View details for DOI 10.1371/journal.pgen.1005496

    View details for Web of Science ID 000364401600008

    View details for PubMedID 26448358

    View details for PubMedCentralID PMC4598191

  • Mango: a bias-correcting ChIA-PET analysis pipeline. Bioinformatics Phanstiel, D. H., Boyle, A. P., Heidari, N., Snyder, M. P. 2015; 31 (19): 3092-3098

    View details for DOI 10.1093/bioinformatics/btv336

    View details for PubMedID 26034063

  • Sequence to Medical Phenotypes: A Framework for Interpretation of Human Whole Genome DNA Sequence Data. PLoS genetics Dewey, F. E., Grove, M. E., Priest, J. R., Waggott, D., Batra, P., Miller, C. L., Wheeler, M., Zia, A., Pan, C., Karzcewski, K. J., Miyake, C., Whirl-Carrillo, M., Klein, T. E., Datta, S., Altman, R. B., Snyder, M., Quertermous, T., Ashley, E. A. 2015; 11 (10)


    High throughput sequencing has facilitated a precipitous drop in the cost of genomic sequencing, prompting predictions of a revolution in medicine via genetic personalization of diagnostic and therapeutic strategies. There are significant barriers to realizing this goal that are related to the difficult task of interpreting personal genetic variation. A comprehensive, widely accessible application for interpretation of whole genome sequence data is needed. Here, we present a series of methods for identification of genetic variants and genotypes with clinical associations, phasing genetic data and using Mendelian inheritance for quality control, and providing predictive genetic information about risk for rare disease phenotypes and response to pharmacological therapy in single individuals and father-mother-child trios. We demonstrate application of these methods for disease and drug response prognostication in whole genome sequence data from twelve unrelated adults, and for disease gene discovery in one father-mother-child trio with apparently simplex congenital ventricular arrhythmia. In doing so we identify clinically actionable inherited disease risk and drug response genotypes in pre-symptomatic individuals. We also nominate a new candidate gene in congenital arrhythmia, ATP2B4, and provide experimental evidence of a regulatory role for variants discovered using this framework.

    View details for DOI 10.1371/journal.pgen.1005496

    View details for PubMedID 26448358

    View details for PubMedCentralID PMC4598191

  • Evaluating Common Humoral Responses against Fungal Infections with Yeast Protein Microarrays JOURNAL OF PROTEOME RESEARCH Coelho, P. S., Im, H., Clemons, K. V., Snyder, M. P., Stevens, D. A. 2015; 14 (9): 3924-3931
  • Exome Sequencing of Neonatal Blood Spots and the Identification of Genes Implicated in Bronchopulmonary Dysplasia. American journal of respiratory and critical care medicine Li, J., Yu, K., Oehlert, J., Jeliffe-Pawlowski, L. L., Gould, J. B., Stevenson, D. K., Snyder, M., Shaw, G. M., O'Brodovich, H. M. 2015; 192 (5): 589-596


    Bronchopulmonary dysplasia (BPD), a prevalent severe lung disease of premature infants, has a strong genetic component. Large-scale genome-wide association studies for common variants have not revealed its genetic basis.Given the historical high mortality rate of extremely preterm infants who now survive and develop BPD, we hypothesized that risk loci underlying this disease are under severe purifying selection during evolution; thus, rare variants likely explain greater risk of the disease.We performed exome sequencing on 50 BPD-affected and unaffected twin pairs using DNA isolated from neonatal blood spots and identified genes affected by extremely rare nonsynonymous mutations. Functional genomic approaches were then used to systematically compare these affected genes.We identified 258 genes with rare nonsynonymous mutations in patients with BPD. These genes were highly enriched for processes involved in pulmonary structure and function including collagen fibril organization, morphogenesis of embryonic epithelium, and regulation of Wnt signaling pathway; displayed significantly elevated expression in fetal and adult lungs; and were substantially up-regulated in a murine model of BPD. Analyses of mouse mutants revealed their phenotypic enrichment for embryonic development and the cyanosis phenotype, a clinical manifestation of BPD.Our study supports the role of rare variants in BPD, in contrast with the role of common variants targeted by genome-wide association studies. Overall, our study is the first to sequence BPD exomes from newborn blood spot samples and identify with high confidence genes implicated in BPD, thereby providing important insights into its biology and molecular etiology.

    View details for DOI 10.1164/rccm.201501-0168OC

    View details for PubMedID 26030808

  • RNA Sequencing Analysis Detection of a Novel Pathway of Endothelial Dysfunction in Pulmonary Arterial Hypertension AMERICAN JOURNAL OF RESPIRATORY AND CRITICAL CARE MEDICINE Rhodes, C. J., Im, H., Cao, A., Hennigs, J. K., Wang, L., Sa, S., Chen, P., Nickel, N. P., Miyagawa, K., Hopper, R. K., Tojais, N. F., Li, C. G., Gu, M., Spiekerkoetter, E., Xian, Z., Chen, R., Zhao, M., Kaschwich, M., del Rosario, P. A., Bernstein, D., Zamanian, R. T., Wu, J. C., Snyder, M. P., Rabinovitch, M. 2015; 192 (3): 356-366


    Pulmonary arterial hypertension is characterized by endothelial dysregulation, but global changes in gene expression have not been related to perturbations in function.RNA sequencing was utilized to discriminate changes in transcriptomes of endothelial cells cultured from lungs of patients with idiopathic pulmonary arterial hypertension vs. controls and to assess the functional significance of major differentially expressed transcripts.The endothelial transcriptomes from seven control and six idiopathic pulmonary arterial hypertension patients' lungs were analyzed. Differentially expressed genes were related to BMPR2 signaling. Those downregulated were assessed for function in cultured cells, and in a transgenic mouse.Fold-differences in ten genes were significant (p<0.05), four increased and six decreased in patients vs.No patient was mutant for BMPR2. However, knockdown of BMPR2 by siRNA in control pulmonary arterial endothelial cells recapitulated six/ten patient-related gene changes, including decreased collagen IV (COL4A1, COL4A2) and ephrinA1 (EFNA1). Reduction of BMPR2 regulated transcripts was related to decreased β-catenin. Reducing COL4A1, COL4A2 and EFNA1 by siRNA inhibited pulmonary endothelial adhesion, migration and tube formation. In mice null for the EFNA1 receptor, EphA2, vs. controls, VEGF receptor blockade and hypoxia caused more severe pulmonary hypertension, judged by elevated right ventricular systolic pressure, right ventricular hypertrophy and loss of small arteries.The novel relationship between BMPR2 dysfunction and reduced expression of endothelial COL4 and EFNA1 may underlie vulnerability to injury in pulmonary arterial hypertension.

    View details for DOI 10.1164/rccm.201408-1528OC

    View details for Web of Science ID 000359178500017

    View details for PubMedID 26030479

  • Probing High-density Functional Protein Microarrays to Detect Protein-protein Interactions JOVE-JOURNAL OF VISUALIZED EXPERIMENTS Fasolo, J., Im, H., Snyder, M. P. 2015

    View details for DOI 10.3791/51872

    View details for Web of Science ID 000361537100003

    View details for PubMedID 26274875

  • Single-cell chromatin accessibility reveals principles of regulatory variation. Nature Buenrostro, J. D., Wu, B., Litzenburger, U. M., Ruff, D., Gonzales, M. L., Snyder, M. P., Chang, H. Y., Greenleaf, W. J. 2015; 523 (7561): 486-490


    Cell-to-cell variation is a universal feature of life that affects a wide range of biological phenomena, from developmental plasticity to tumour heterogeneity. Although recent advances have improved our ability to document cellular phenotypic variation, the fundamental mechanisms that generate variability from identical DNA sequences remain elusive. Here we reveal the landscape and principles of mammalian DNA regulatory variation by developing a robust method for mapping the accessible genome of individual cells by assay for transposase-accessible chromatin using sequencing (ATAC-seq) integrated into a programmable microfluidics platform. Single-cell ATAC-seq (scATAC-seq) maps from hundreds of single cells in aggregate closely resemble accessibility profiles from tens of millions of cells and provide insights into cell-to-cell variation. Accessibility variance is systematically associated with specific trans-factors and cis-elements, and we discover combinations of trans-factors associated with either induction or suppression of cell-to-cell variability. We further identify sets of trans-factors associated with cell-type-specific accessibility variance across eight cell types. Targeted perturbations of cell cycle or transcription factor signalling evoke stimulus-specific changes in this observed variability. The pattern of accessibility variation in cis across the genome recapitulates chromosome compartments de novo, linking single-cell accessibility variation to three-dimensional genome organization. Single-cell analysis of DNA accessibility provides new insight into cellular variation of the 'regulome'.

    View details for DOI 10.1038/nature14590

    View details for PubMedID 26083756

  • Single-cell chromatin accessibility reveals principles of regulatory variation NATURE Buenostro, J. D., Wu, B., Litzenburger, U. M., Ruff, D., Gonzales, M. L., Snyder, M. P., Chang, H. Y., Greenleaf, W. J. 2015; 523 (7561): 486-U264
  • Achieving high-sensitivity for clinical applications using augmented exome sequencing GENOME MEDICINE Patwardhan, A., Harris, J., Leng, N., Bartha, G., Church, D. M., Luo, S., Haudenschild, C., Pratt, M., Zook, J., Salit, M., Tirch, J., Morra, M., Chervitz, S., Li, M., Clark, M., Garcia, S., Chandratillake, G., Kirk, S., Ashley, E., Snyder, M., Altman, R., Bustamante, C., Butte, A. J., West, J., Chen, R. 2015; 7


    Whole exome sequencing is increasingly used for the clinical evaluation of genetic disease, yet the variation of coverage and sensitivity over medically relevant parts of the genome remains poorly understood. Several sequencing-based assays continue to provide coverage that is inadequate for clinical assessment.Using sequence data obtained from the NA12878 reference sample and pre-defined lists of medically-relevant protein-coding and noncoding sequences, we compared the breadth and depth of coverage obtained among four commercial exome capture platforms and whole genome sequencing. In addition, we evaluated the performance of an augmented exome strategy, ACE, that extends coverage in medically relevant regions and enhances coverage in areas that are challenging to sequence. Leveraging reference call-sets, we also examined the effects of improved coverage on variant detection sensitivity.We observed coverage shortfalls with each of the conventional exome-capture and whole-genome platforms across several medically interpretable genes. These gaps included areas of the genome required for reporting recently established secondary findings (ACMG) and known disease-associated loci. The augmented exome strategy recovered many of these gaps, resulting in improved coverage in these areas. At clinically-relevant coverage levels (100 % bases covered at ≥20×), ACE improved coverage among genes in the medically interpretable genome (>90 % covered relative to 10-78 % with other platforms), the set of ACMG secondary finding genes (91 % covered relative to 4-75 % with other platforms) and a subset of variants known to be associated with human disease (99 % covered relative to 52-95 % with other platforms). Improved coverage translated into improvements in sensitivity, with ACE variant detection sensitivities (>97.5 % SNVs, >92.5 % InDels) exceeding that observed with conventional whole-exome and whole-genome platforms.Clinicians should consider analytical performance when making clinical assessments, given that even a few missed variants can lead to reporting false negative results. An augmented exome strategy provides a level of coverage not achievable with other platforms, thus addressing concerns regarding the lack of sensitivity in clinically important regions. In clinical applications where comprehensive coverage of medically interpretable areas of the genome requires higher localized sequencing depth, an augmented exome approach offers both cost and performance advantages over other sequencing-based tests.

    View details for DOI 10.1186/s13073-015-0197-4

    View details for Web of Science ID 000359428300001

    View details for PubMedID 26269718

    View details for PubMedCentralID PMC4534066

  • Recurrent somatic mutations in regulatory regions of human cancer genomes NATURE GENETICS Melton, C., Reuter, J. A., Spacek, D. V., Snyder, M. 2015; 47 (7): 710-?

    View details for DOI 10.1038/ng.3332

    View details for Web of Science ID 000357090300007

    View details for PubMedID 26053494

  • Where Next for Genetics and Genomics? PLOS BIOLOGY Tyler-Smith, C., Yang, H., Landweber, L. F., Dunham, I., Knoppers, B. M., Donnelly, P., Mardis, E. R., Snyder, M., McVean, G. 2015; 13 (7)
  • Metabolome progression during early gut microbial colonization of gnotobiotic mice SCIENTIFIC REPORTS Marcobal, A., Yusufaly, T., Higginbottom, S., Snyder, M., Sonnenburg, J. L., Mias, G. I. 2015; 5

    View details for DOI 10.1038/srep11589

    View details for Web of Science ID 000357041700001

    View details for PubMedID 26118551

  • Transglutaminase 4 as a prostate autoantigen in male subfertility SCIENCE TRANSLATIONAL MEDICINE Landegren, N., Sharon, D., Shum, A. K., Khan, I. S., Fasano, K. J., Hallgren, A., Kampf, C., Freyhult, E., Ardesjo-Lundgren, B., Alimohammadi, M., Rathsman, S., Ludvigsson, J. F., Lundh, D., Motrich, R., Rivero, V., Fong, L., Giwercman, A., Gustafsson, J., Perheentupa, J., Husebye, E. S., Anderson, M. S., Snyder, M., Kampe, O. 2015; 7 (292)


    Autoimmune polyendocrine syndrome type 1 (APS1), a monogenic disorder caused by AIRE gene mutations, features multiple autoimmune disease components. Infertility is common in both males and females with APS1. Although female infertility can be explained by autoimmune ovarian failure, the mechanisms underlying male infertility have remained poorly understood. We performed a proteome-wide autoantibody screen in APS1 patient sera to assess the autoimmune response against the male reproductive organs. By screening human protein arrays with male and female patient sera and by selecting for gender-imbalanced autoantibody signals, we identified transglutaminase 4 (TGM4) as a male-specific autoantigen. Notably, TGM4 is a prostatic secretory molecule with critical role in male reproduction. TGM4 autoantibodies were detected in most of the adult male APS1 patients but were absent in all the young males. Consecutive serum samples further revealed that TGM4 autoantibodies first presented during pubertal age and subsequent to prostate maturation. We assessed the animal model for APS1, the Aire-deficient mouse, and found spontaneous development of TGM4 autoantibodies specifically in males. Aire-deficient mice failed to present TGM4 in the thymus, consistent with a defect in central tolerance for TGM4. In the mouse, we further link TGM4 immunity with a destructive prostatitis and compromised secretion of TGM4. Collectively, our findings in APS1 patients and Aire-deficient mice reveal prostate autoimmunity as a major manifestation of APS1 with potential role in male subfertility.

    View details for DOI 10.1126/scitranslmed.aaa9186

    View details for Web of Science ID 000356390500008

    View details for PubMedID 26084804

  • Transcriptome Signature and Regulation in Human Somatic Cell Reprogramming STEM CELL REPORTS Tanaka, Y., Hysolli, E., Su, J., Xiang, Y., Kim, K., Zhong, M., Li, Y., Heydari, K., Euskirchen, G., Snyder, M. P., Pan, X., Weissman, S. M., Park, I. 2015; 4 (6): 1125-1139


    Reprogramming of somatic cells produces induced pluripotent stem cells (iPSCs) that are invaluable resources for biomedical research. Here, we extended the previous transcriptome studies by performing RNA-seq on cells defined by a combination of multiple cellular surface markers. We found that transcriptome changes during early reprogramming occur independently from the opening of closed chromatin by OCT4, SOX2, KLF4, and MYC (OSKM). Furthermore, our data identify multiple spliced forms of genes uniquely expressed at each progressive stage of reprogramming. In particular, we found a pluripotency-specific spliced form of CCNE1 that is specific to human and significantly enhances reprogramming. In addition, single nucleotide polymorphism (SNP) expression analysis reveals that monoallelic gene expression is induced in the intermediate stages of reprogramming, while biallelic expression is recovered upon completion of reprogramming. Our transcriptome data provide unique opportunities in understanding human iPSC reprogramming.

    View details for DOI 10.1016/j.stemcr.2015.04.009

    View details for Web of Science ID 000356068100017

    View details for PubMedID 26004630

  • Optimized Analytical Procedures for the Untargeted Metabolomic Profiling of Human Urine and Plasma by Combining Hydrophilic Interaction (HILIC) and Reverse-Phase Liquid Chromatography (RPLC)-Mass Spectrometry MOLECULAR & CELLULAR PROTEOMICS Contrepois, K., Jiang, L., Snyder, M. 2015; 14 (6): 1684-1695


    Profiling of body fluids is crucial for monitoring and discovering metabolic markers of health and disease and for providing insights into human physiology. Since human urine and plasma each contain an extreme diversity of metabolites, a single liquid chromatographic system when coupled to mass spectrometry (MS) is not sufficient to achieve reasonable metabolome coverage. Hydrophilic interaction liquid chromatography (HILIC) offers complementary information to reverse-phase liquid chromatography (RPLC) by retaining polar metabolites. With the objective of finding the optimal combined chromatographic solution to profile urine and plasma, we systematically investigated the performance of five HILIC columns with different chemistries operated at three different pH (acidic, neutral, basic) and five C18-silica RPLC columns. The zwitterionic column ZIC-HILIC operated at neutral pH provided optimal performance on a large set of hydrophilic metabolites. The RPLC columns Hypersil GOLD and Zorbax SB aq were proven to be best suited for the metabolic profiling of urine and plasma, respectively. Importantly, the optimized HILIC-MS method showed excellent intrabatch peak area reproducibility (CV < 12%) and good long-term interbatch (40 days) peak area reproducibility (CV < 22%) that were similar to those of RPLC-MS procedures. Finally, combining the optimal HILIC- and RPLC-MS approaches greatly expanded metabolome coverage with 44% and 108% new metabolic features detected compared with RPLC-MS alone for urine and plasma, respectively. The proposed combined LC-MS approaches improve the comprehensiveness of global metabolic profiling of body fluids and thus are valuable for monitoring and discovering metabolic changes associated with health and disease in clinical research studies.

    View details for DOI 10.1074/mcp.M114.046508

    View details for Web of Science ID 000355550400019

    View details for PubMedID 25787789

  • High-Throughput Sequencing Technologies MOLECULAR CELL Reuter, J. A., Spacek, D. V., Snyder, M. P. 2015; 58 (4): 586-597
  • High-throughput sequencing technologies. Molecular cell Reuter, J. A., Spacek, D. V., Snyder, M. P. 2015; 58 (4): 586-597


    The human genome sequence has profoundly altered our understanding of biology, human diversity, and disease. The path from the first draft sequence to our nascent era of personal genomes and genomic medicine has been made possible only because of the extraordinary advancements in DNA sequencing technologies over the past 10 years. Here, we discuss commonly used high-throughput sequencing platforms, the growing array of sequencing assays developed around them, as well as the challenges facing current sequencing platforms and their clinical application.

    View details for DOI 10.1016/j.molcel.2015.05.004

    View details for PubMedID 26000844

  • Characterization of Novel Transcripts in Pseudorabies Virus VIRUSES-BASEL Tombacz, D., Csabai, Z., Olah, P., Havelda, Z., Sharon, D., Snyder, M., Boldogkoi, Z. 2015; 7 (5): 2727-2744


    In this study we identified two 3'-coterminal RNA molecules in the pseudorabies virus. The highly abundant short transcript (CTO-S) proved to be encoded between the ul21 and ul22 genes in close vicinity of the replication origin (OriL) of the virus. The less abundant long RNA molecule (CTO-L) is a transcriptional readthrough product of the ul21 gene and overlaps OriL. These polyadenylated RNAs were characterized by ascertaining their nucleotide sequences with the Illumina HiScanSQ and Pacific Biosciences Real-Time (PacBio RSII) sequencing platforms and by analyzing their transcription kinetics through use of multi-time-point Real-Time RT-PCR and the PacBio RSII system. It emerged that transcription of the CTOs is fully dependent on the viral transactivator protein IE180 and CTO-S is not a microRNA precursor. We propose an interaction between the transcription and replication machineries at this genomic location, which might play an important role in the regulation of DNA synthesis.

    View details for DOI 10.3390/v7052727

    View details for Web of Science ID 000356228700027

    View details for PubMedID 26008709

    View details for PubMedCentralID PMC4452928

  • Impact of allele-specific peptides in proteome quantification PROTEOMICS CLINICAL APPLICATIONS Wu, L., Snyder, M. 2015; 9 (3-4): 432-436


    MS-based proteome technologies have greatly improved our ability to detect and quantify proteomes across various biological samples. High throughput bottom-up proteome profiling in combination with targeted MS method, e.g. SRM assay, is emerging as a powerful approach in the field of biomarker discovery. In the past few years, increasing number of studies have attempted to integrate genomic and proteomic data for biomarker discovery. Here, we describe how allele-specific peptide can be applied in biomarker discovery and their impact in protein quantification.

    View details for DOI 10.1002/prca.201400126

    View details for Web of Science ID 000353291000019

    View details for PubMedID 25676416

  • Reassessment of Piwi Binding to the Genome and Piwi Impact on RNA Polymerase II Distribution DEVELOPMENTAL CELL Lin, H., Chen, M., Kundaje, A., Valouev, A., Yin, H., Liu, N., Neuenkirchen, N., Zhong, M., Snyder, M. 2015; 32 (6): 772-774


    Drosophila Piwi was reported by Huang et al. (2013) to be guided by piRNAs to piRNA-complementary sites in the genome, which then recruits heterochromatin protein 1a and histone methyltransferase Su(Var)3-9 to the sites. Among additional findings, Huang et al. (2013) also reported Piwi binding sites in the genome and the reduction of RNA polymerase II in euchromatin but its increase in pericentric regions in piwi mutants. Marinov et al. (2015) disputed the validity of the Huang et al. bioinformatic pipeline that led to the last two claims. Here we report our independent reanalysis of the data using current bioinformatic methods. Our reanalysis agrees with Marinov et al. (2015) that Piwi's genomic targets still remain to be identified but confirms the Huang et al. claim that Piwi influences RNA polymerase II distribution in the genome. This Matters Arising Response addresses the Marinov et al. (2015) Matters Arising, published concurrently in this issue of Developmental Cell.

    View details for DOI 10.1016/j.devcel.2015.03.004

    View details for Web of Science ID 000351841900015

    View details for PubMedID 25805139

  • The conserved histone deacetylase Rpd3 and its DNA binding subunit Ume6 control dynamic transcript architecture during mitotic growth and meiotic development NUCLEIC ACIDS RESEARCH Lardenois, A., Stuparevic, I., Liu, Y., Law, M. J., Becker, E., Smagulova, F., Waern, K., Guilleux, M., Horecka, J., Chu, A., Kervarrec, C., Strich, R., Snyder, M., Davis, R. W., Steinmetz, L. M., Primig, M. 2015; 43 (1): 115-128


    It was recently reported that the sizes of many mRNAs change when budding yeast cells exit mitosis and enter the meiotic differentiation pathway. These differences were attributed to length variations of their untranslated regions. The function of UTRs in protein translation is well established. However, the mechanism controlling the expression of distinct transcript isoforms during mitotic growth and meiotic development is unknown. In this study, we order developmentally regulated transcript isoforms according to their expression at specific stages during meiosis and gametogenesis, as compared to vegetative growth and starvation. We employ regulatory motif prediction, in vivo protein-DNA binding assays, genetic analyses and monitoring of epigenetic amino acid modification patterns to identify a novel role for Rpd3 and Ume6, two components of a histone deacetylase complex already known to repress early meiosis-specific genes in dividing cells, in mitotic repression of meiosis-specific transcript isoforms. Our findings classify developmental stage-specific early, middle and late meiotic transcript isoforms, and they point to a novel HDAC-dependent control mechanism for flexible transcript architecture during cell growth and differentiation. Since Rpd3 is highly conserved and ubiquitously expressed in many tissues, our results are likely relevant for development and disease in higher eukaryotes.

    View details for DOI 10.1093/nar/gku1185

    View details for Web of Science ID 000350207100017

    View details for PubMedID 25477386

    View details for PubMedCentralID PMC4288150

  • Novel mutations in PIEZO1 cause an autosomal recessive generalized lymphatic dysplasia with non-immune hydrops fetalis. Nature communications Fotiou, E., Martin-Almedina, S., Simpson, M. A., Lin, S., Gordon, K., Brice, G., Atton, G., Jeffery, I., Rees, D. C., Mignot, C., Vogt, J., Homfray, T., Snyder, M. P., Rockson, S. G., Jeffery, S., Mortimer, P. S., Mansour, S., Ostergaard, P. 2015; 6: 8085-?

    View details for DOI 10.1038/ncomms9085

    View details for PubMedID 26333996

  • Novel mutations in PIEZO1 cause an autosomal recessive generalized lymphatic dysplasia with non-immune hydrops fetalis. Nature communications Fotiou, E., Martin-Almedina, S., Simpson, M. A., Lin, S., Gordon, K., Brice, G., Atton, G., Jeffery, I., Rees, D. C., Mignot, C., Vogt, J., Homfray, T., Snyder, M. P., Rockson, S. G., Jeffery, S., Mortimer, P. S., Mansour, S., Ostergaard, P. 2015; 6: 8085-?


    Generalized lymphatic dysplasia (GLD) is a rare form of primary lymphoedema characterized by a uniform, widespread lymphoedema affecting all segments of the body, with systemic involvement such as intestinal and/or pulmonary lymphangiectasia, pleural effusions, chylothoraces and/or pericardial effusions. This may present prenatally as non-immune hydrops. Here we report homozygous and compound heterozygous mutations in PIEZO1, resulting in an autosomal recessive form of GLD with a high incidence of non-immune hydrops fetalis and childhood onset of facial and four limb lymphoedema. Mutations in PIEZO1, which encodes a mechanically activated ion channel, have been reported with autosomal dominant dehydrated hereditary stomatocytosis and non-immune hydrops of unknown aetiology. Besides its role in red blood cells, our findings indicate that PIEZO1 is also involved in the development of lymphatic structures.

    View details for DOI 10.1038/ncomms9085

    View details for PubMedID 26333996

  • Whole-Exome Enrichment with the Agilent SureSelect Human All Exon Platform. Cold Spring Harbor protocols Chen, R., Im, H., Snyder, M. 2015; 2015 (7): pdb prot083659-?


    There are multiple platforms available for whole-exome enrichment and sequencing (WES). This protocol is based on the Agilent SureSelect Human All Exon platform, which targets ∼50 Mb of the human exonic regions. The SureSelect system uses ∼120-base RNA probes to capture known coding DNA sequences (CDS) from the NCBI Consensus CDS Database as well as other major RNA coding sequence databases, such as Sanger miRBase. The protocol can be performed at the benchside without the need for automation, and the resulting library can be used for targeted next-generation sequencing on an Illumina HiSeq 2000 sequencer.

    View details for DOI 10.1101/pdb.prot083659

    View details for PubMedID 25762417

  • Metabolome progression during early gut microbial colonization of gnotobiotic mice. Scientific reports Marcobal, A., Yusufaly, T., Higginbottom, S., Snyder, M., Sonnenburg, J. L., Mias, G. I. 2015; 5: 11589-?


    The microbiome has been implicated directly in host health, especially host metabolic processes and development of immune responses. These are particularly important in infants where the gut first begins being colonized, and such processes may be modeled in mice. In this investigation we follow longitudinally the urine metabolome of ex-germ-free mice, which are colonized with two bacterial species, Bacteroides thetaiotaomicron and Bifidobacterium longum. High-throughput mass spectrometry profiling of urine samples revealed dynamic changes in the metabolome makeup, associated with the gut bacterial colonization, enabled by our adaptation of non-linear time-series analysis to urine metabolomics data. Results demonstrate both gradual and punctuated changes in metabolite production and that early colonization events profoundly impact the nature of small molecules circulating in the host. The identified small molecules are implicated in amino acid and carbohydrate metabolic processes, and offer insights into the dynamic changes occurring during the colonization process, using high-throughput longitudinal methodology.

    View details for DOI 10.1038/srep11589

    View details for PubMedID 26118551

  • Genomic analysis of fibrolamellar hepatocellular carcinoma. Human molecular genetics Xu, L., Hazard, F. K., Zmoos, A., Jahchan, N., Chaib, H., Garfin, P. M., Rangaswami, A., Snyder, M. P., Sage, J. 2015; 24 (1): 50-63


    Pediatric tumors are relatively infrequent but are often associated with significant lethality and lifelong morbidity. A major goal of pediatric cancer research has been to identify key drivers of tumorigenesis to eventually develop targeted therapies to enhance cure rate and minimize acute and long-term toxic effects. Here we used genomics approaches to identify biomarkers and candidate drivers for fibrolamellar hepatocellular carcinoma (FL-HCC), a very rare subtype of pediatric liver cancer for which limited therapeutic options exist. In-depth genomics analyses of one tumor followed by immunohistochemistry validation on seven other tumors showed expression of neuroendocrine markers in FL-HCC. DNA and RNA sequencing data further showed that common cancer pathways are not visibly altered in FL-HCC but identified two novel structural variants, both resulting in fusion transcripts. The first, a 400kb deletion, results in a DNAJ1-PRKCA fusion transcript, which leads to increased PKA activity in the index tumor case and other FL-HCC cases compared to normal liver. This PKA fusion protein is oncogenic in HCC cells. The second gene fusion event, a translocation between the CLPTML1 and GLIS3 genes, generates a transcript whose product also promotes cancer phenotypes in HCC cell lines. These experiments further highlight the tumorigenic role of gene fusions in the etiology of pediatric solid tumors and identify both candidate biomarkers and possible therapeutic targets for this lethal pediatric disease.

    View details for DOI 10.1093/hmg/ddu418

    View details for PubMedID 25122662

  • Exome sequencing and genome-wide copy number variant mapping reveal novel associations with sensorineural hereditary hearing loss BMC GENOMICS Haraksingh, R. R., Jahanbani, F., Rodriguez-Paris, J., Gelernter, J., Nadeau, K. C., Oghalai, J. S., Schrijver, I., Snyder, M. P. 2014; 15


    The genetic diversity of loci and mutations underlying hereditary hearing loss is an active area of investigation. To identify loci associated with predominantly non-syndromic sensorineural hearing loss, we performed exome sequencing of families and of single probands, as well as copy number variation (CNV) mapping in a case-control cohort.Analysis of three distinct families revealed several candidate loci in two families and a single strong candidate gene, MYH7B, for hearing loss in one family. MYH7B encodes a Type II myosin, consistent with a role for cytoskeletal proteins in hearing. High-resolution genome-wide CNV analysis of 150 cases and 157 controls revealed deletions in genes known to be involved in hearing (e.g. GJB6, OTOA, and STRC, encoding connexin 30, otoancorin, and stereocilin, respectively), supporting CNV contributions to hearing loss phenotypes. Additionally, a novel region on chromosome 16 containing part of the PDXDC1 gene was found to be frequently deleted in hearing loss patients (OR = 3.91, 95% CI: 1.62-9.40, p = 1.45 x 10-7).We conclude that many known as well as novel loci and distinct types of mutations not typically tested in clinical settings can contribute to the etiology of hearing loss. Our study also demonstrates the challenges of exome sequencing and genome-wide CNV mapping for direct clinical application, and illustrates the need for functional and clinical follow-up as well as curated open-access databases.

    View details for DOI 10.1186/1471-2164-15-1155

    View details for Web of Science ID 000209598100001

  • Genomic era diagnosis and management of hereditary and sporadic colon cancer. World journal of clinical oncology Esplin, E. D., Snyder, M. P. 2014; 5 (5): 1036-1047


    The morbidity and mortality attributable to heritable and sporadic carcinomas of the colon are substantial and affect children and adults alike. Despite current colonoscopy screening recommendations colorectal adenocarcinoma (CRC) still accounts for almost 140000 cancer cases yearly. Familial adenomatous polyposis (FAP) is a colon cancer predisposition due to alterations in the adenomatous polyposis coli gene, which is mutated in most CRC. Since the beginning of the genomic era next-generation sequencing analyses of CRC continue to improve our understanding of the genetics of tumorigenesis and promise to expand our ability to identify and treat this disease. Advances in genome sequence analysis have facilitated the molecular diagnosis of individuals with FAP, which enables initiation of appropriate monitoring and timely intervention. Genome sequencing also has potential clinical impact for individuals with sporadic forms of CRC, providing means for molecular diagnosis of CRC tumor type, data guiding selection of tumor targeted therapies, and pharmacogenomic profiles specifying patient specific drug tolerances. There is even a potential role for genomic sequencing in surveillance for recurrence, and early detection, of CRC. We review strategies for diagnostic assessment and management of FAP and sporadic CRC in the current genomic era, with emphasis on the current, and potential for future, impact of genome sequencing on the clinical care of these conditions.

    View details for DOI 10.5306/wjco.v5.i5.1036

    View details for PubMedID 25493239

    View details for PubMedCentralID PMC4259930

  • Genome-wide map of regulatory interactions in the human genome GENOME RESEARCH Heidari, N., Phanstiel, D. H., He, C., Grubert, F., Jahanbani, F., Kasowski, M., Zhang, M. Q., Snyder, M. P. 2014; 24 (12): 1905-1917


    Increasing evidence suggests that interactions between regulatory genomic elements play an important role in regulating gene expression. We generated a genome-wide interaction map of regulatory elements in human cells (ENCODE tier 1 cells, K562, GM12878) using Chromatin Interaction Analysis by Paired-End Tag sequencing (ChIA-PET) experiments targeting six broadly distributed factors. Bound regions covered 80% of DNase I hypersensitive sites including 99.7% of TSS and 98% of enhancers. Correlating this map with ChIP-seq and RNA-seq data sets revealed cohesin, CTCF, and ZNF143 as key components of three-dimensional chromatin structure and revealed how the distal chromatin state affects gene transcription. Comparison of interactions between cell types revealed that enhancer-promoter interactions were highly cell-type-specific. Construction and comparison of distal and proximal regulatory networks revealed stark differences in structure and biological function. Proximal binding events are enriched at genes with housekeeping functions, while distal binding events interact with genes involved in dynamic biological processes including response to stimulus. This study reveals new mechanistic and functional insights into regulatory region organization in the nucleus.

    View details for DOI 10.1101/gr.176586.114

    View details for Web of Science ID 000345810600001

    View details for PubMedID 25228660

    View details for PubMedCentralID PMC4248309

  • Widespread contribution of transposable elements to the innovation of gene regulatory networks GENOME RESEARCH Sundaram, V., Cheng, Y., Ma, Z., Li, D., Xing, X., Edge, P., Snyder, M. P., Wang, T. 2014; 24 (12): 1963-1976


    Transposable elements (TEs) have been shown to contain functional binding sites for certain transcription factors (TFs). However, the extent to which TEs contribute to the evolution of TF binding sites is not well known. We comprehensively mapped binding sites for 26 pairs of orthologous TFs in two pairs of human and mouse cell lines (representing two cell lineages), along with epigenomic profiles, including DNA methylation and six histone modifications. Overall, we found that 20% of binding sites were embedded within TEs. This number varied across different TFs, ranging from 2% to 40%. We further identified 710 TF-TE relationships in which genomic copies of a TE subfamily contributed a significant number of binding peaks for a TF, and we found that LTR elements dominated these relationships in human. Importantly, TE-derived binding peaks were strongly associated with open and active chromatin signatures, including reduced DNA methylation and increased enhancer-associated histone marks. On average, 66% of TE-derived binding events were cell type-specific with a cell type-specific epigenetic landscape. Most of the binding sites contributed by TEs were species-specific, but we also identified binding sites conserved between human and mouse, the functional relevance of which was supported by a signature of purifying selection on DNA sequences of these TEs. Interestingly, several TFs had significantly expanded binding site landscapes only in one species, which were linked to species-specific gene functions, suggesting that TEs are an important driving force for regulatory innovation. Taken together, our data suggest that TEs have significantly and continuously shaped gene regulatory networks during mammalian evolution.

    View details for DOI 10.1101/gr.168872.113

    View details for Web of Science ID 000345810600005

    View details for PubMedID 25319995

    View details for PubMedCentralID PMC4248313

  • A comparative encyclopedia of DNA elements in the mouse genome NATURE Yue, F., Cheng, Y., Breschi, A., Vierstra, J., Wu, W., Ryba, T., Sandstrom, R., Ma, Z., Davis, C., Pope, B. D., Shen, Y., Pervouchine, D. D., Djebali, S., Thurman, R. E., Kaul, R., Rynes, E., Kirilusha, A., Marinov, G. K., Williams, B. A., Trout, D., Amrhein, H., Fisher-Aylor, K., Antoshechkin, I., DeSalvo, G., See, L., Fastuca, M., Drenkow, J., Zaleski, C., Dobin, A., Prieto, P., Lagarde, J., Bussotti, G., Tanzer, A., Denas, O., Li, K., Bender, M. A., Zhang, M., Byron, R., Groudine, M. T., McCleary, D., Pham, L., Ye, Z., Kuan, S., Edsall, L., Wu, Y., Rasmussen, M. D., Bansal, M. S., Kellis, M., Keller, C. A., Morrissey, C. S., Mishra, T., Jain, D., Dogan, N., Harris, R. S., Cayting, P., Kawli, T., Boyle, A. P., Euskirchen, G., Kundaje, A., Lin, S., Lin, Y., Jansen, C., Malladi, V. S., Cline, M. S., Erickson, D. T., Kirkup, V. M., Learned, K., Sloan, C. A., Rosenbloom, K. R., De Sousa, B. L., Beal, K., Pignatelli, M., Flicek, P., Lian, J., Kahveci, T., Lee, D., Kent, W. J., Santos, M. R., Herrero, J., Notredame, C., Johnson, A., Vong, S., Lee, K., Bates, D., Neri, F., Diegel, M., Canfield, T., Sabo, P. J., Wilken, M. S., Reh, T. A., Giste, E., Shafer, A., Kutyavin, T., Haugen, E., Dunn, D., Reynolds, A. P., Neph, S., Humbert, R., Hansen, R. S., de Bruijn, M., Selleri, L., Rudensky, A., Josefowicz, S., Samstein, R., Eichler, E. E., Orkin, S. H., Levasseur, D., Papayannopoulou, T., Chang, K., Skoultchi, A., Gosh, S., Disteche, C., Treuting, P., Wang, Y., Weiss, M. J., Blobel, G. A., Cao, X., Zhong, S., Wang, T., Good, P. J., Lowdon, R. F., Adams, L. B., Zhou, X., Pazin, M. J., Feingold, E. A., Wold, B., Taylor, J., Mortazavi, A., Weissman, S. M., Stamatoyannopoulos, J. A., Snyder, M. P., Guigo, R., Gingeras, T. R., Gilbert, D. M., Hardison, R. C., Beer, M. A., Ren, B. 2014; 515 (7527): 355-?


    The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases.

    View details for DOI 10.1038/nature13992

    View details for Web of Science ID 000345770600034

  • Principles of regulatory information conservation between mouse and human NATURE Cheng, Y., Ma, Z., Kim, B., Wu, W., Cayting, P., Boyle, A. P., Sundaram, V., Xing, X., Dogan, N., Li, J., Euskirchen, G., Lin, S., Lin, Y., Visel, A., Kawli, T., Yang, X., Patacsil, D., Keller, C. A., Giardine, B., Kundaje, A., Wang, T., Pennacchio, L. A., Weng, Z., Hardison, R. C., Snyder, M. P. 2014; 515 (7527): 371-?


    To broaden our understanding of the evolution of gene regulation mechanisms, we generated occupancy profiles for 34 orthologous transcription factors (TFs) in human-mouse erythroid progenitor, lymphoblast and embryonic stem-cell lines. By combining the genome-wide transcription factor occupancy repertoires, associated epigenetic signals, and co-association patterns, here we deduce several evolutionary principles of gene regulatory features operating since the mouse and human lineages diverged. The genomic distribution profiles, primary binding motifs, chromatin states, and DNA methylation preferences are well conserved for TF-occupied sequences. However, the extent to which orthologous DNA segments are bound by orthologous TFs varies both among TFs and with genomic location: binding at promoters is more highly conserved than binding at distal elements. Notably, occupancy-conserved TF-occupied sequences tend to be pleiotropic; they function in several tissues and also co-associate with many TFs. Single nucleotide variants at sites with potential regulatory functions are enriched in occupancy-conserved TF-occupied sequences.

    View details for DOI 10.1038/nature13985

    View details for Web of Science ID 000345770600036

    View details for PubMedCentralID PMC4343047

  • Topologically associating domains are stable units of replication-timing regulation NATURE Pope, B. D., Ryba, T., Dileep, V., Yue, F., Wu, W., Denas, O., Vera, D. L., Wang, Y., Hansen, R. S., Canfield, T. K., Thurman, R. E., Cheng, Y., Guelsoy, G., Dennis, J. H., Snyder, M. P., Stamatoyannopoulos, J. A., Taylor, J., Hardison, R. C., Kahveci, T., Ren, B., Gilbert, D. M. 2014; 515 (7527): 402-?
  • Mutations in NGLY1 cause an inherited disorder of the endoplasmic reticulum-associated degradation pathway GENETICS IN MEDICINE Enns, G. M., Shashi, V., Bainbridge, M., Gambello, M. J., Zahir, F. R., Bast, T., Crimian, R., Schoch, K., Platt, J., Cox, R., Bernstein, J. A., Scavina, M., Walter, R. S., Bibb, A., Jones, M., Hegde, M., Graham, B. H., Need, A. C., Oviedo, A., Schaaf, C. P., Boyle, S., Butte, A. J., Chen, R., Clark, M. J., Haraksingh, R., Cowan, T. M., He, P., Langlois, S., Zoghbi, H. Y., Snyder, M., Gibbs, R. A., Freeze, H. H., Goldstein, D. B. 2014; 16 (10): 751-758
  • Mutations in NGLY1 cause an inherited disorder of the endoplasmic reticulum-associated degradation pathway. Genetics in medicine Enns, G. M., Shashi, V., Bainbridge, M., Gambello, M. J., Zahir, F. R., Bast, T., Crimian, R., Schoch, K., Platt, J., Cox, R., Bernstein, J. A., Scavina, M., Walter, R. S., Bibb, A., Jones, M., Hegde, M., Graham, B. H., Need, A. C., Oviedo, A., Schaaf, C. P., Boyle, S., Butte, A. J., Chen, R., Clark, M. J., Haraksingh, R., Cowan, T. M., He, P., Langlois, S., Zoghbi, H. Y., Snyder, M., Gibbs, R. A., Freeze, H. H., Goldstein, D. B. 2014; 16 (10): 751-758


    Purpose:The endoplasmic reticulum-associated degradation pathway is responsible for the translocation of misfolded proteins across the endoplasmic reticulum membrane into the cytosol for subsequent degradation by the proteasome. To define the phenotype associated with a novel inherited disorder of cytosolic endoplasmic reticulum-associated degradation pathway dysfunction, we studied a series of eight patients with deficiency of N-glycanase 1.Methods:Whole-genome, whole-exome, or standard Sanger sequencing techniques were employed. Retrospective chart reviews were performed in order to obtain clinical data.Results:All patients had global developmental delay, a movement disorder, and hypotonia. Other common findings included hypolacrima or alacrima (7/8), elevated liver transaminases (6/7), microcephaly (6/8), diminished reflexes (6/8), hepatocyte cytoplasmic storage material or vacuolization (5/6), and seizures (4/8). The nonsense mutation c.1201A>T (p.R401X) was the most common deleterious allele.Conclusion:NGLY1 deficiency is a novel autosomal recessive disorder of the endoplasmic reticulum-associated degradation pathway associated with neurological dysfunction, abnormal tear production, and liver disease. The majority of patients detected to date carry a specific nonsense mutation that appears to be associated with severe disease. The phenotypic spectrum is likely to enlarge as cases with a broader range of mutations are detected.Genet Med advance online publication 20 March 2014Genetics in Medicine (2014); doi:10.1038/gim.2014.22.

    View details for DOI 10.1038/gim.2014.22

    View details for PubMedID 24651605

  • Sushi.R: flexible, quantitative and integrative genomic visualizations for publication-quality multi-panel figures. Bioinformatics Phanstiel, D. H., Boyle, A. P., Araya, C. L., Snyder, M. P. 2014; 30 (19): 2808-2810


    Motivation: Interpretation and communication of genomic data require flexible and quantitative tools to analyze and visualize diverse data types, and yet, a comprehensive tool to display all common genomic data types in publication quality figures does not exist to date. To address this shortcoming, we present Sushi.R, an R/Bioconductor package that allows flexible integration of genomic visualizations into highly customizable, publication-ready, multi-panel figures from common genomic data formats including Browser Extensible Data (BED), bedGraph and Browser Extensible Data Paired-End (BEDPE). Sushi.R is open source and made publicly available through GitHub ( and Bioconductor ( or

    View details for DOI 10.1093/bioinformatics/btu379

    View details for PubMedID 24903420

  • Sushi.R: flexible, quantitative and integrative genomic visualizations for publication-quality multi-panel figures BIOINFORMATICS Phanstiel, D. H., Boyle, A. P., Araya, C. L., Snyder, M. P. 2014; 30 (19): 2808-2810
  • Comparative analysis of regulatory information and circuits across distant species. Nature Boyle, A. P., Araya, C. L., Brdlik, C., Cayting, P., Cheng, C., Cheng, Y., Gardner, K., Hillier, L. W., Janette, J., Jiang, L., Kasper, D., Kawli, T., Kheradpour, P., Kundaje, A., Li, J. J., Ma, L., Niu, W., Rehm, E. J., Rozowsky, J., Slattery, M., Spokony, R., Terrell, R., Vafeados, D., Wang, D., Weisdepp, P., Wu, Y., Xie, D., Yan, K., Feingold, E. A., Good, P. J., Pazin, M. J., Huang, H., Bickel, P. J., Brenner, S. E., Reinke, V., Waterston, R. H., Gerstein, M., White, K. P., Kellis, M., Snyder, M. 2014; 512 (7515): 453-456


    Despite the large evolutionary distances between metazoan species, they can show remarkable commonalities in their biology, and this has helped to establish fly and worm as model organisms for human biology. Although studies of individual elements and factors have explored similarities in gene regulation, a large-scale comparative analysis of basic principles of transcriptional regulatory features is lacking. Here we map the genome-wide binding locations of 165 human, 93 worm and 52 fly transcription regulatory factors, generating a total of 1,019 data sets from diverse cell types, developmental stages, or conditions in the three species, of which 498 (48.9%) are presented here for the first time. We find that structural properties of regulatory networks are remarkably conserved and that orthologous regulatory factor families recognize similar binding motifs in vivo and show some similar co-associations. Our results suggest that gene-regulatory properties previously observed for individual factors are general principles of metazoan regulation that are remarkably well-preserved despite extensive functional divergence of individual network connections. The comparative maps of regulatory circuitry provided here will drive an improved understanding of the regulatory underpinnings of model organism biology and how these relate to human biology, development and disease.

    View details for DOI 10.1038/nature13668

    View details for PubMedID 25164757

  • Regulatory analysis of the C. elegans genome with spatiotemporal resolution. Nature Araya, C. L., Kawli, T., Kundaje, A., Jiang, L., Wu, B., Vafeados, D., Terrell, R., Weissdepp, P., Gevirtzman, L., Mace, D., Niu, W., Boyle, A. P., Xie, D., Ma, L., Murray, J. I., Reinke, V., Waterston, R. H., Snyder, M. 2014; 512 (7515): 400-405


    Discovering the structure and dynamics of transcriptional regulatory events in the genome with cellular and temporal resolution is crucial to understanding the regulatory underpinnings of development and disease. We determined the genomic distribution of binding sites for 92 transcription factors and regulatory proteins across multiple stages of Caenorhabditis elegans development by performing 241 ChIP-seq (chromatin immunoprecipitation followed by sequencing) experiments. Integration of regulatory binding and cellular-resolution expression data produced a spatiotemporally resolved metazoan transcription factor binding map. Using this map, we explore developmental regulatory circuits that encode combinatorial logic at the levels of co-binding and co-expression of transcription factors, characterizing the genomic coverage and clustering of regulatory binding, the binding preferences of, and biological processes regulated by, transcription factors, the global transcription factor co-associations and genomic subdomains that suggest shared patterns of regulation, and identifying key transcription factors and transcription factor co-associations for fate specification of individual lineages and cell types.

    View details for DOI 10.1038/nature13497

    View details for PubMedID 25164749

  • Shared functions of plant and mammalian StAR-related lipid transfer (START) domains in modulating transcription factor activity BMC BIOLOGY Schrick, K., Bruno, M., Khosla, A., Cox, P. N., Marlatt, S. A., Roque, R. A., Nguyen, H. C., He, C., Snyder, M. P., Singh, D., Yadav, G. 2014; 12


    Steroidogenic acute regulatory protein (StAR)-related lipid transfer (START) domains were first identified from mammalian proteins that bind lipid/sterol ligands via a hydrophobic pocket. In plants, predicted START domains are predominantly found in homeodomain leucine zipper (HD-Zip) transcription factors that are master regulators of cell-type differentiation in development. Here we utilized studies of Arabidopsis in parallel with heterologous expression of START domains in yeast to investigate the hypothesis that START domains are versatile ligand-binding motifs that can modulate transcription factor activity.Our results show that deletion of the START domain from Arabidopsis Glabra2 (GL2), a representative HD-Zip transcription factor involved in differentiation of the epidermis, results in a complete loss-of-function phenotype, although the protein is correctly localized to the nucleus. Despite low sequence similarly, the mammalian START domain from StAR can functionally replace the HD-Zip-derived START domain. Embedding the START domain within a synthetic transcription factor in yeast, we found that several mammalian START domains from StAR, MLN64 and PCTP stimulated transcription factor activity, as did START domains from two Arabidopsis HD-Zip transcription factors. Mutation of ligand-binding residues within StAR START reduced this activity, consistent with the yeast assay monitoring ligand-binding. The D182L missense mutation in StAR START was shown to affect GL2 transcription factor activity in maintenance of the leaf trichome cell fate. Analysis of in vivo protein-metabolite interactions by mass spectrometry provided direct evidence for analogous lipid-binding activity in mammalian and plant START domains in the yeast system. Structural modeling predicted similar sized ligand-binding cavities of a subset of plant START domains in comparison to mammalian counterparts.The START domain is required for transcription factor activity in HD-Zip proteins from plants, although it is not strictly necessary for the protein's nuclear localization. START domains from both mammals and plants are modular in that they can bind lipid ligands to regulate transcription factor function in a yeast system. The data provide evidence for an evolutionarily conserved mechanism by which lipid metabolites can orchestrate transcription. We propose a model in which the START domain is used by both plants and mammals to regulate transcription factor activity.

    View details for DOI 10.1186/s12915-014-0070-8

    View details for Web of Science ID 000342371100001

    View details for PubMedID 25159688

  • Reply to Brunet and Doolittle: Both selected effect and causal role elements can influence human biology and disease PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Kellis, M., Wold, B., Snyder, M. P., Bernstein, B. E., Kundaje, A., Marinov, G. K., Ward, L. D., Birney, E., Crawford, G. E., Dekker, J., Dunham, I., Elnitski, L. L., Farnham, P. J., Feingold, E. A., Gerstein, M., Giddings, M. C., Gilbert, D. M., Gingeras, T. R., Green, E. D., Guigo, R., Hubbard, T., Kent, J., Lieb, J. D., Myers, R. M., Pazin, M. J., Ren, B., Stamatoyannopoulos, J., Weng, Z., White, K. P., Hardison, R. C. 2014; 111 (33): E3366-E3366

    View details for DOI 10.1073/pnas.1410434111

    View details for Web of Science ID 000340438800004

    View details for PubMedID 25275169

    View details for PubMedCentralID PMC4143047

  • Transcriptome sequencing from diverse human populations reveals differentiated regulatory architecture. PLoS genetics Martin, A. R., Costa, H. A., Lappalainen, T., Henn, B. M., Kidd, J. M., Yee, M., Grubert, F., Cann, H. M., Snyder, M., Montgomery, S. B., Bustamante, C. D. 2014; 10 (8)


    Large-scale sequencing efforts have documented extensive genetic variation within the human genome. However, our understanding of the origins, global distribution, and functional consequences of this variation is far from complete. While regulatory variation influencing gene expression has been studied within a handful of populations, the breadth of transcriptome differences across diverse human populations has not been systematically analyzed. To better understand the spectrum of gene expression variation, alternative splicing, and the population genetics of regulatory variation in humans, we have sequenced the genomes, exomes, and transcriptomes of EBV transformed lymphoblastoid cell lines derived from 45 individuals in the Human Genome Diversity Panel (HGDP). The populations sampled span the geographic breadth of human migration history and include Namibian San, Mbuti Pygmies of the Democratic Republic of Congo, Algerian Mozabites, Pathan of Pakistan, Cambodians of East Asia, Yakut of Siberia, and Mayans of Mexico. We discover that approximately 25.0% of the variation in gene expression found amongst individuals can be attributed to population differences. However, we find few genes that are systematically differentially expressed among populations. Of this population-specific variation, 75.5% is due to expression rather than splicing variability, and we find few genes with strong evidence for differential splicing across populations. Allelic expression analyses indicate that previously mapped common regulatory variants identified in eight populations from the International Haplotype Map Phase 3 project have similar effects in our seven sampled HGDP populations, suggesting that the cellular effects of common variants are shared across diverse populations. Together, these results provide a resource for studies analyzing functional differences across populations by estimating the degree of shared gene expression, alternative splicing, and regulatory genetics across populations from the broadest points of human migration history yet sampled.

    View details for DOI 10.1371/journal.pgen.1004549

    View details for PubMedID 25121757

  • Transcriptome sequencing from diverse human populations reveals differentiated regulatory architecture. PLoS genetics Martin, A. R., Costa, H. A., Lappalainen, T., Henn, B. M., Kidd, J. M., Yee, M., Grubert, F., Cann, H. M., Snyder, M., Montgomery, S. B., Bustamante, C. D. 2014; 10 (8)

    View details for DOI 10.1371/journal.pgen.1004549

    View details for PubMedID 25121757

  • H3K4me3 Breadth Is Linked to Cell Identity and Transcriptional Consistency. Cell Benayoun, B. A., Pollina, E. A., Ucar, D., Mahmoudi, S., Karra, K., Wong, E. D., Devarajan, K., Daugherty, A. C., Kundaje, A. B., Mancini, E., Hitz, B. C., Gupta, R., Rando, T. A., Baker, J. C., Snyder, M. P., Cherry, J. M., Brunet, A. 2014; 158 (3): 673-688


    Trimethylation of histone H3 at lysine 4 (H3K4me3) is a chromatin modification known to mark the transcription start sites of active genes. Here, we show that H3K4me3 domains that spread more broadly over genes in a given cell type preferentially mark genes that are essential for the identity and function of that cell type. Using the broadest H3K4me3 domains as a discovery tool in neural progenitor cells, we identify novel regulators of these cells. Machine learning models reveal that the broadest H3K4me3 domains represent a distinct entity, characterized by increased marks of elongation. The broadest H3K4me3 domains also have more paused polymerase at their promoters, suggesting a unique transcriptional output. Indeed, genes marked by the broadest H3K4me3 domains exhibit enhanced transcriptional consistency and [corrected] increased transcriptional levels, and perturbation of H3K4me3 breadth leads to changes in transcriptional consistency. Thus, H3K4me3 breadth contains information that could ensure transcriptional precision at key cell identity/function genes.

    View details for DOI 10.1016/j.cell.2014.06.027

    View details for PubMedID 25083876

    View details for PubMedCentralID PMC4137894

  • Quantitative analysis of RNA-protein interactions on a massively parallel array reveals biophysical and evolutionary landscapes. Nature biotechnology Buenrostro, J. D., Araya, C. L., Chircus, L. M., Layton, C. J., Chang, H. Y., Snyder, M. P., Greenleaf, W. J. 2014; 32 (6): 562-568

    View details for DOI 10.1038/nbt.2880

    View details for PubMedID 24727714

  • Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq. Nature Treutlein, B., Brownfield, D. G., Wu, A. R., Neff, N. F., Mantalas, G. L., Espinoza, F. H., Desai, T. J., Krasnow, M. A., Quake, S. R. 2014; 509 (7500): 371-375

    View details for DOI 10.1038/nature13173

    View details for PubMedID 24739965

  • Allelic Expression of Deleterious Protein-Coding Variants across Human Tissues PLOS GENETICS Kukurba, K. R., Zhang, R., Li, X., Smith, K. S., Knowles, D. A., Tan, M. H., Piskol, R., Lek, M., Snyder, M., MacArthur, D. G., Li, J. B., Montgomery, S. B. 2014; 10 (5)
  • Allelic Expression of Deleterious Protein-Coding Variants across Human Tissues. PLoS genetics Kukurba, K. R., Zhang, R., Li, X., Smith, K. S., Knowles, D. A., How Tan, M., Piskol, R., Lek, M., Snyder, M., MacArthur, D. G., Li, J. B., Montgomery, S. B. 2014; 10 (5)


    Personal exome and genome sequencing provides access to loss-of-function and rare deleterious alleles whose interpretation is expected to provide insight into individual disease burden. However, for each allele, accurate interpretation of its effect will depend on both its penetrance and the trait's expressivity. In this regard, an important factor that can modify the effect of a pathogenic coding allele is its level of expression; a factor which itself characteristically changes across tissues. To better inform the degree to which pathogenic alleles can be modified by expression level across multiple tissues, we have conducted exome, RNA and deep, targeted allele-specific expression (ASE) sequencing in ten tissues obtained from a single individual. By combining such data, we report the impact of rare and common loss-of-function variants on allelic expression exposing stronger allelic bias for rare stop-gain variants and informing the extent to which rare deleterious coding alleles are consistently expressed across tissues. This study demonstrates the potential importance of transcriptome data to the interpretation of pathogenic protein-coding variants.

    View details for DOI 10.1371/journal.pgen.1004304

    View details for PubMedID 24786518

  • Defining functional DNA elements in the human genome PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Kellis, M., Wold, B., Snyder, M. P., Bernstein, B. E., Kundaje, A., Marinov, G. K., Ward, L. D., Birney, E., Crawford, G. E., Dekker, J., Dunham, I., Elnitski, L. L., Farnham, P. J., Feingold, E. A., Gerstein, M., Giddings, M. C., Gilbert, D. M., Gingeras, T. R., Green, E. D., Guigo, R., Hubbard, T., Kent, J., Lieb, J. D., Myers, R. M., Pazin, M. J., Ren, B., Stamatoyannopoulos, J. A., Weng, Z., White, K. P., Hardison, R. C. 2014; 111 (17): 6131-6138


    With the completion of the human genome sequence, attention turned to identifying and annotating its functional DNA elements. As a complement to genetic and comparative genomics approaches, the Encyclopedia of DNA Elements Project was launched to contribute maps of RNA transcripts, transcriptional regulator binding sites, and chromatin states in many cell types. The resulting genome-wide data reveal sites of biochemical activity with high positional resolution and cell type specificity that facilitate studies of gene regulation and interpretation of noncoding variants associated with human disease. However, the biochemically active regions cover a much larger fraction of the genome than do evolutionarily conserved regions, raising the question of whether nonconserved but biochemically active regions are truly functional. Here, we review the strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the observed differences in estimated genomic coverage, and the biological implications of these discrepancies. We also analyze the relationship between signal intensity, genomic coverage, and evolutionary conservation. Our results reinforce the principle that each approach provides complementary information and that we need to use combinations of all three to elucidate genome function in human biology and disease.

    View details for DOI 10.1073/pnas.1318948111

    View details for Web of Science ID 000335199000025

    View details for PubMedID 24753594

    View details for PubMedCentralID PMC4035993

  • Extended lifespan and reduced adiposity in mice lacking the FAT10 gene. Proceedings of the National Academy of Sciences of the United States of America Canaan, A., DeFuria, J., Perelman, E., Schultz, V., Seay, M., Tuck, D., Flavell, R. A., Snyder, M. P., Obin, M. S., Weissman, S. M. 2014; 111 (14): 5313-5318


    The HLA-F adjacent transcript 10 (FAT10) is a member of the ubiquitin-like gene family that alters protein function/stability through covalent ligation. Although FAT10 is induced by inflammatory mediators and implicated in immunity, the physiological functions of FAT10 are poorly defined. We report the discovery that FAT10 regulates lifespan through pleiotropic actions on metabolism and inflammation. Median and overall lifespan are increased 20% in FAT10ko mice, coincident with elevated metabolic rate, preferential use of fat as fuel, and dramatically reduced adiposity. This phenotype is associated with metabolic reprogramming of skeletal muscle (i.e., increased AMP kinase activity, β-oxidation and -uncoupling, and decreased triglyceride content). Moreover, knockout mice have reduced circulating glucose and insulin levels and enhanced insulin sensitivity in metabolic tissues, consistent with elevated IL-10 in skeletal muscle and serum. These observations suggest novel roles of FAT10 in immune metabolic regulation that impact aging and chronic disease.

    View details for DOI 10.1073/pnas.1323426111

    View details for PubMedID 24706839

  • Haplotype structure and positive selection at TLR1 EUROPEAN JOURNAL OF HUMAN GENETICS Heffelfinger, C., Pakstis, A. J., Speed, W. C., Clark, A. P., Haigh, E., Fang, R., Furtado, M. R., Kidd, K. K., Snyder, M. P. 2014; 22 (4): 551-557


    Toll-like receptor 1, when dimerized with Toll-like receptor 2, is a cell surface receptor that, upon recognition of bacterial lipoproteins, activates the innate immune system. Variants in TLR1 associate with the risk of a variety of medical conditions and diseases, including sepsis, leprosy, tuberculosis, and others. The foremost of these is rs5743618 c.2079T>G(p.(Ile602Ser)), the derived allele of which is associated with reduced risk of sepsis, leprosy, and other diseases. Interestingly, 602Ser, which shows signatures of selection, inhibits TLR1 surface trafficking and subsequent activation of NFκB upon recognition of a ligand. This suggests that reduced TLR1 activity may be beneficial for human health. To better understand TLR1 variation and its link to human health, we have typed all 7 high-frequency missense variants (>5% in at least one population) along with 17 other variants in and around TLR1 in 2548 individuals from 56 populations from around the globe. We have also found additional signatures of selection on missense variants not associated with rs5743618, suggesting that there may be multiple functional alleles under positive selection in this gene.

    View details for DOI 10.1038/ejhg.2013.194

    View details for Web of Science ID 000332938400027

    View details for PubMedID 24002163

  • Clinical interpretation and implications of whole-genome sequencing. JAMA Dewey, F. E., Grove, M. E., Pan, C., Goldstein, B. A., Bernstein, J. A., Chaib, H., Merker, J. D., Goldfeder, R. L., Enns, G. M., David, S. P., Pakdaman, N., Ormond, K. E., Caleshu, C., Kingham, K., Klein, T. E., Whirl-Carrillo, M., Sakamoto, K., Wheeler, M. T., Butte, A. J., Ford, J. M., Boxer, L., Ioannidis, J. P., Yeung, A. C., Altman, R. B., Assimes, T. L., Snyder, M., Ashley, E. A., Quertermous, T. 2014; 311 (10): 1035-1045


    Whole-genome sequencing (WGS) is increasingly applied in clinical medicine and is expected to uncover clinically significant findings regardless of sequencing indication.To examine coverage and concordance of clinically relevant genetic variation provided by WGS technologies; to quantitate inherited disease risk and pharmacogenomic findings in WGS data and resources required for their discovery and interpretation; and to evaluate clinical action prompted by WGS findings.An exploratory study of 12 adult participants recruited at Stanford University Medical Center who underwent WGS between November 2011 and March 2012. A multidisciplinary team reviewed all potentially reportable genetic findings. Five physicians proposed initial clinical follow-up based on the genetic findings.Genome coverage and sequencing platform concordance in different categories of genetic disease risk, person-hours spent curating candidate disease-risk variants, interpretation agreement between trained curators and disease genetics databases, burden of inherited disease risk and pharmacogenomic findings, and burden and interrater agreement of proposed clinical follow-up.Depending on sequencing platform, 10% to 19% of inherited disease genes were not covered to accepted standards for single nucleotide variant discovery. Genotype concordance was high for previously described single nucleotide genetic variants (99%-100%) but low for small insertion/deletion variants (53%-59%). Curation of 90 to 127 genetic variants in each participant required a median of 54 minutes (range, 5-223 minutes) per genetic variant, resulted in moderate classification agreement between professionals (Gross κ, 0.52; 95% CI, 0.40-0.64), and reclassified 69% of genetic variants cataloged as disease causing in mutation databases to variants of uncertain or lesser significance. Two to 6 personal disease-risk findings were discovered in each participant, including 1 frameshift deletion in the BRCA1 gene implicated in hereditary breast and ovarian cancer. Physician review of sequencing findings prompted consideration of a median of 1 to 3 initial diagnostic tests and referrals per participant, with fair interrater agreement about the suitability of WGS findings for clinical follow-up (Fleiss κ, 0.24; P < 001).In this exploratory study of 12 volunteer adults, the use of WGS was associated with incomplete coverage of inherited disease genes, low reproducibility of detection of genetic variation with the highest potential clinical effects, and uncertainty about clinically reportable findings. In certain cases, WGS will identify clinically actionable genetic variants warranting early medical intervention. These issues should be considered when determining the role of WGS in clinical medicine.

    View details for DOI 10.1001/jama.2014.1717

    View details for PubMedID 24618965

    View details for PubMedCentralID PMC4119063

  • Erratum: A single-molecule long-read survey of the human transcriptome. Nature biotechnology Sharon, D., Tilgner, H., Grubert, F., Snyder, M. 2014; 32 (3): 291-?

    View details for DOI 10.1038/nbt0314-291b

    View details for PubMedID 24727780

  • Gene-centric Meta-analysis in 87,736 Individuals of European Ancestry Identifies Multiple Blood-Pressure-Related Loci. American journal of human genetics Tragante, V., Barnes, M. R., Ganesh, S. K., Lanktree, M. B., Guo, W., Franceschini, N., Smith, E. N., Johnson, T., Holmes, M. V., Padmanabhan, S., Karczewski, K. J., Almoguera, B., Barnard, J., Baumert, J., Chang, Y. C., Elbers, C. C., Farrall, M., Fischer, M. E., Gaunt, T. R., Gho, J. M., Gieger, C., Goel, A., Gong, Y., Isaacs, A., Kleber, M. E., Mateo Leach, I., McDonough, C. W., Meijs, M. F., Melander, O., Nelson, C. P., Nolte, I. M., Pankratz, N., Price, T. S., Shaffer, J., Shah, S., Tomaszewski, M., van der Most, P. J., van Iperen, E. P., Vonk, J. M., Witkowska, K., Wong, C. O., Zhang, L., Beitelshees, A. L., Berenson, G. S., Bhatt, D. L., Brown, M., Burt, A., Cooper-DeHoff, R. M., Connell, J. M., Cruickshanks, K. J., Curtis, S. P., Davey-Smith, G., Delles, C., Gansevoort, R. T., Guo, X., Haiqing, S., Hastie, C. E., Hofker, M. H., Hovingh, G. K., Kim, D. S., Kirkland, S. A., Klein, B. E., Klein, R., Li, Y. R., Maiwald, S., Newton-Cheh, C., O'Brien, E. T., Onland-Moret, N. C., Palmas, W., Parsa, A., Penninx, B. W., Pettinger, M., Vasan, R. S., Ranchalis, J. E., M Ridker, P., Rose, L. M., Sever, P., Shimbo, D., Steele, L., Stolk, R. P., Thorand, B., Trip, M. D., van Duijn, C. M., Verschuren, W. M., Wijmenga, C., Wyatt, S., Young, J. H., Zwinderman, A. H., Bezzina, C. R., Boerwinkle, E., Casas, J. P., Caulfield, M. J., Chakravarti, A., Chasman, D. I., Davidson, K. W., Doevendans, P. A., Dominiczak, A. F., FitzGerald, G. A., Gums, J. G., Fornage, M., Hakonarson, H., Halder, I., Hillege, H. L., Illig, T., Jarvik, G. P., Johnson, J. A., Kastelein, J. J., Koenig, W., Kumari, M., März, W., Murray, S. S., O'Connell, J. R., Oldehinkel, A. J., Pankow, J. S., Rader, D. J., Redline, S., Reilly, M. P., Schadt, E. E., Kottke-Marchant, K., Snieder, H., Snyder, M., Stanton, A. V., Tobin, M. D., Uitterlinden, A. G., van der Harst, P., van der Schouw, Y. T., Samani, N. J., Watkins, H., Johnson, A. D., Reiner, A. P., Zhu, X., de Bakker, P. I., Levy, D., Asselbergs, F. W., Munroe, P. B., Keating, B. J. 2014; 94 (3): 349-360


    Blood pressure (BP) is a heritable risk factor for cardiovascular disease. To investigate genetic associations with systolic BP (SBP), diastolic BP (DBP), mean arterial pressure (MAP), and pulse pressure (PP), we genotyped ~50,000 SNPs in up to 87,736 individuals of European ancestry and combined these in a meta-analysis. We replicated findings in an independent set of 68,368 individuals of European ancestry. Our analyses identified 11 previously undescribed associations in independent loci containing 31 genes including PDE1A, HLA-DQB1, CDK6, PRKAG2, VCL, H19, NUCB2, RELA, HOXC@ complex, FBN1, and NFAT5 at the Bonferroni-corrected array-wide significance threshold (p < 6 × 10(-7)) and confirmed 27 previously reported associations. Bioinformatic analysis of the 11 loci provided support for a putative role in hypertension of several genes, such as CDK6 and NUCB2. Analysis of potential pharmacological targets in databases of small molecules showed that ten of the genes are predicted to be a target for small molecules. In summary, we identified previously unknown loci associated with BP. Our findings extend our understanding of genes involved in BP regulation, which may provide new targets for therapeutic intervention or drug response stratification.

    View details for DOI 10.1016/j.ajhg.2013.12.016

    View details for PubMedID 24560520

    View details for PubMedCentralID PMC3951943

  • Whole-genome haplotyping using long reads and statistical methods NATURE BIOTECHNOLOGY Kuleshov, V., Xie, D., Chen, R., Pushkarev, D., Ma, Z., Blauwkamp, T., Kertesz, M., Snyder, M. 2014; 32 (3): 261-266


    The rapid growth of sequencing technologies has greatly contributed to our understanding of human genetics. Yet, despite this growth, mainstream technologies have not been fully able to resolve the diploid nature of the human genome. Here we describe statistically aided, long-read haplotyping (SLRH), a rapid, accurate method that uses a statistical algorithm to take advantage of the partially phased information contained in long genomic fragments analyzed by short-read sequencing. For a human sample, as little as 30 Gbp of additional sequencing data are needed to phase genotypes identified by 50× coverage whole-genome sequencing. Using SLRH, we phase 99% of single-nucleotide variants in three human genomes into long haplotype blocks 0.2-1 Mbp in length. We apply our method to determine allele-specific methylation patterns in a human genome and identify hundreds of differentially methylated regions that were previously unknown. SLRH should facilitate population-scale haplotyping of human genomes.

    View details for DOI 10.1038/nbt.2833

    View details for Web of Science ID 000332819800024

    View details for PubMedID 24561555

  • Ordering and dynamical properties of superbright C-60 molecules on Ag(111) PHYSICAL REVIEW B Li, H. I., Abreu, G. J., Shukla, A. K., Fournee, V., Ledieu, J., Loli, L. N., Rauterkus, S. E., Snyder, M. V., Su, S. Y., Marino, K. E., Diehl, R. D. 2014; 89 (8)
  • Coherent functional modules improve transcription factor target identification, cooperativity prediction, and disease association. PLoS genetics Karczewski, K. J., Snyder, M., Altman, R. B., Tatonetti, N. P. 2014; 10 (2)


    Transcription factors (TFs) are fundamental controllers of cellular regulation that function in a complex and combinatorial manner. Accurate identification of a transcription factor's targets is essential to understanding the role that factors play in disease biology. However, due to a high false positive rate, identifying coherent functional target sets is difficult. We have created an improved mapping of targets by integrating ChIP-Seq data with 423 functional modules derived from 9,395 human expression experiments. We identified 5,002 TF-module relationships, significantly improved TF target prediction, and found 30 high-confidence TF-TF associations, of which 14 are known. Importantly, we also connected TFs to diseases through these functional modules and identified 3,859 significant TF-disease relationships. As an example, we found a link between MEF2A and Crohn's disease, which we validated in an independent expression dataset. These results show the power of combining expression data and ChIP-Seq data to remove noise and better extract the associations between TFs, functional modules, and disease.

    View details for DOI 10.1371/journal.pgen.1004122

    View details for PubMedID 24516403

    View details for PubMedCentralID PMC3916285

  • Coherent functional modules improve transcription factor target identification, cooperativity prediction, and disease association. PLoS genetics Karczewski, K. J., Snyder, M., Altman, R. B., Tatonetti, N. P. 2014; 10 (2)

    View details for DOI 10.1371/journal.pgen.1004122

    View details for PubMedID 24516403

  • iPOP and its role in participatory medicine GENOME MEDICINE Snyder, M. 2014; 6


    Michael Snyder shares his thoughts on participatory medicine and how omics profiling could fit into this new model of healthcare where patients are at the center of medicine.

    View details for DOI 10.1186/gm512

    View details for Web of Science ID 000335597000001

    View details for PubMedID 24479626

  • Landscape and variation of RNA secondary structure across the human transcriptome. Nature Wan, Y., Qu, K., Zhang, Q. C., Flynn, R. A., Manor, O., Ouyang, Z., Zhang, J., Spitale, R. C., Snyder, M. P., Segal, E., Chang, H. Y. 2014; 505 (7485): 706-709

    View details for DOI 10.1038/nature12946

    View details for PubMedID 24476892

  • Identification of STAT5A and STAT5B Target Genes in Human T Cells. PloS one Kanai, T., Seki, S., Jenks, J. A., Kohli, A., Kawli, T., Martin, D. P., Snyder, M., Bacchetta, R., Nadeau, K. C. 2014; 9 (1)

    View details for DOI 10.1371/journal.pone.0086790

    View details for PubMedID 24497979

  • Path-scan: a reporting tool for identifying clinically actionable variants. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Daneshjou, R., Zappala, Z., Kukurba, K., Boyle, S. M., Ormond, K. E., Klein, T. E., Snyder, M., Bustamante, C. D., Altman, R. B., Montgomery, S. B. 2014; 19: 229-240


    The American College of Medical Genetics and Genomics (ACMG) recently released guidelines regarding the reporting of incidental findings in sequencing data. Given the availability of Direct to Consumer (DTC) genetic testing and the falling cost of whole exome and genome sequencing, individuals will increasingly have the opportunity to analyze their own genomic data. We have developed a web-based tool, PATH-SCAN, which annotates individual genomes and exomes for ClinVar designated pathogenic variants found within the genes from the ACMG guidelines. Because mutations in these genes predispose individuals to conditions with actionable outcomes, our tool will allow individuals or researchers to identify potential risk variants in order to consult physicians or genetic counselors for further evaluation. Moreover, our tool allows individuals to anonymously submit their pathogenic burden, so that we can crowd source the collection of quantitative information regarding the frequency of these variants. We tested our tool on 1092 publicly available genomes from the 1000 Genomes project, 163 genomes from the Personal Genome Project, and 15 genomes from a clinical genome sequencing research project. Excluding the most commonly seen variant in 1000 Genomes, about 20% of all genomes analyzed had a ClinVar designated pathogenic variant that required further evaluation.

    View details for PubMedID 24297550

    View details for PubMedCentralID PMC4008882

  • Identification of STAT5A and STAT5B target genes in human T cells. PloS one Kanai, T., Seki, S., Jenks, J. A., Kohli, A., Kawli, T., Martin, D. P., Snyder, M., Bacchetta, R., Nadeau, K. C. 2014; 9 (1)


    Signal transducer and activator of transcription (STAT) comprises a family of universal transcription factors that help cells sense and respond to environmental signals. STAT5 refers to two highly related proteins, STAT5A and STAT5B, with critical function: their complete deficiency is lethal in mice; in humans, STAT5B deficiency alone leads to endocrine and immunological problems, while STAT5A deficiency has not been reported. STAT5A and STAT5B show peptide sequence similarities greater than 90%, but subtle structural differences suggest possible non-redundant roles in gene regulation. However, these roles remain unclear in humans. We applied chromatin immunoprecipitation followed by DNA sequencing using human CD4(+) T cells to detect candidate genes regulated by STAT5A and/or STAT5B, and quantitative-PCR in STAT5A or STAT5B knock-down (KD) human CD4(+) T cells to validate the findings. Our data show STAT5A and STAT5B play redundant roles in cell proliferation and apoptosis via SGK1 interaction. Interestingly, we found a novel, unique role for STAT5A in binding to genes involved in neural development and function (NDRG1, DNAJC6, and SSH2), while STAT5B appears to play a distinct role in T cell development and function via DOCK8, SNX9, FOXP3 and IL2RA binding. Our results also suggest that one or more co-activators for STAT5A and/or STAT5B may play important roles in establishing different binding abilities and gene regulation behaviors. The new identification of these genes regulated by STAT5A and/or STAT5B has major implications for understanding the pathophysiology of cancer progression, neural disorders, and immune abnormalities.

    View details for DOI 10.1371/journal.pone.0086790

    View details for PubMedID 24497979

    View details for PubMedCentralID PMC3907443

  • Exome sequencing and genome-wide copy number variant mapping reveal novel associations with sensorineural hereditary hearing loss. BMC genomics Haraksingh, R. R., Jahanbani, F., Rodriguez-Paris, J., Gelernter, J., Nadeau, K. C., Oghalai, J. S., Schrijver, I., Snyder, M. P. 2014; 15: 1155-?


    The genetic diversity of loci and mutations underlying hereditary hearing loss is an active area of investigation. To identify loci associated with predominantly non-syndromic sensorineural hearing loss, we performed exome sequencing of families and of single probands, as well as copy number variation (CNV) mapping in a case-control cohort.Analysis of three distinct families revealed several candidate loci in two families and a single strong candidate gene, MYH7B, for hearing loss in one family. MYH7B encodes a Type II myosin, consistent with a role for cytoskeletal proteins in hearing. High-resolution genome-wide CNV analysis of 150 cases and 157 controls revealed deletions in genes known to be involved in hearing (e.g. GJB6, OTOA, and STRC, encoding connexin 30, otoancorin, and stereocilin, respectively), supporting CNV contributions to hearing loss phenotypes. Additionally, a novel region on chromosome 16 containing part of the PDXDC1 gene was found to be frequently deleted in hearing loss patients (OR = 3.91, 95% CI: 1.62-9.40, p = 1.45 x 10-7).We conclude that many known as well as novel loci and distinct types of mutations not typically tested in clinical settings can contribute to the etiology of hearing loss. Our study also demonstrates the challenges of exome sequencing and genome-wide CNV mapping for direct clinical application, and illustrates the need for functional and clinical follow-up as well as curated open-access databases.

    View details for DOI 10.1186/1471-2164-15-1155

    View details for PubMedID 25528277

  • Global analysis of transcription factor-binding sites in yeast using ChIP-Seq. Methods in molecular biology (Clifton, N.J.) Lefrançois, P., Gallagher, J. E., Snyder, M. 2014; 1205: 231-255


    Transcription factors influence gene expression through their ability to bind DNA at specific regulatory elements. Specific DNA-protein interactions can be isolated through the chromatin immunoprecipitation (ChIP) procedure, in which DNA fragments bound by the protein of interest are recovered. ChIP is followed by high-throughput DNA sequencing (Seq) to determine the genomic provenance of ChIP DNA fragments and their relative abundance in the sample. This chapter describes a ChIP-Seq strategy adapted for budding yeast to enable the genome-wide characterization of binding sites of transcription factors (TFs) and other DNA-binding proteins in an efficient and cost-effective way.Yeast strains with epitope-tagged TFs are most commonly used for ChIP-Seq, along with their matching untagged control strains. The initial step of ChIP involves the cross-linking of DNA and proteins. Next, yeast cells are lysed and sonicated to shear chromatin into smaller fragments. An antibody against an epitope-tagged TF is used to pull down chromatin complexes containing DNA and the TF of interest. DNA is then purified and proteins degraded. Specific barcoded adapters for multiplex DNA sequencing are ligated to ChIP DNA. Short DNA sequence reads (28-36 base pairs) are parsed according to the barcode and aligned against the yeast reference genome, thus generating a nucleotide-resolution map of transcription factor-binding sites and their occupancy.

    View details for DOI 10.1007/978-1-4939-1363-3_15

    View details for PubMedID 25213249

  • Strain Kaplan of Pseudorabies Virus Genome Sequenced by PacBio Single-Molecule Real-Time Sequencing Technology. Genome announcements Tombácz, D., Sharon, D., Oláh, P., Csabai, Z., Snyder, M., Boldogkoi, Z. 2014; 2 (4)


    Pseudorabies virus (PRV) is a neurotropic herpesvirus that causes Aujeszky's disease in pigs. PRV strains are widely used as transsynaptic tracers for mapping neural circuits. We present here the complete and fully annotated genome sequence of strain Kaplan of PRV, determined by Pacific Biosciences RSII long-read sequencing technology.

    View details for DOI 10.1128/genomeA.00628-14

    View details for PubMedID 25035325

  • Serum profiling using protein microarrays to identify disease related antigens. Methods in molecular biology (Clifton, N.J.) Sharon, D., Snyder, M. 2014; 1176: 169-178


    Disease related antigens are of great importance in the clinic. They are used as markers to screen patients for various forms of cancer, to monitor response to therapy, or to serve as therapeutic targets (Chapman et al., Ann Oncol 18(5):868-873, 2007; Soussi et al., Cancer Res 60:1777-1788, 2000; Anderson and LaBaer, J Proteome Res 4:1123-1133, 2005; Levenson, Biochim Biophy Acta 1770:847-856, 2007). In cancer endogenous levels of protein expression may be disrupted or proteins may be expressed in an aberrant fashion resulting in an immune response that bypasses self tolerance (Soussi et al., Cancer Res 60:1777-1788, 2000; Disis et al., J Clin Oncol 15(11):3363-3367, 1997; Molina et al., Breast Cancer Res Treat 51:109-119, 1998). Protein microarrays, which represent a large fraction of the human proteome, have been used to identify antigens in multiple diseases including cancer (Anderson and LaBaer, J Proteome Res 4:1123-1133, 2005; Disis et al., J Clin Oncol 15(11):3363-3367, 1997; Hudson et al., Proc Natl Acad Sci U S A 104(44):17494-17499, 2007; Beyer et al., J Neuroimmunol 242:26-32, 2012). Typically, arrays are probed with immunoglobulin (Ig) samples from patients as well as healthy controls, then compared to determine which antigens (Ag's) are more reactive within the patient group (Hudson et al., Proc Natl Acad Sci U S A 104(44):17494-17499).

    View details for DOI 10.1007/978-1-4939-0992-6_14

    View details for PubMedID 25030927

  • Personalized sequencing and the future of medicine: discovery, diagnosis and defeat of disease PHARMACOGENOMICS Esplin, E. D., Oei, L., Snyder, M. P. 2014; 15 (14): 1771-1790

    View details for DOI 10.2217/pgs.14.117

    View details for Web of Science ID 000346180100006

  • STORMSeq: an open-source, user-friendly pipeline for processing personal genomics data in the cloud. PloS one Karczewski, K. J., Fernald, G. H., Martin, A. R., Snyder, M., Tatonetti, N. P., Dudley, J. T. 2014; 9 (1)


    The increasing public availability of personal complete genome sequencing data has ushered in an era of democratized genomics. However, read mapping and variant calling software is constantly improving and individuals with personal genomic data may prefer to customize and update their variant calls. Here, we describe STORMSeq (Scalable Tools for Open-Source Read Mapping), a graphical interface cloud computing solution that does not require a parallel computing environment or extensive technical experience. This customizable and modular system performs read mapping, read cleaning, and variant calling and annotation. At present, STORMSeq costs approximately $2 and 5-10 hours to process a full exome sequence and $30 and 3-8 days to process a whole genome sequence. We provide this open-access and open-source resource as a user-friendly interface in Amazon EC2.

    View details for DOI 10.1371/journal.pone.0084860

    View details for PubMedID 24454756

  • Chromatin immunoprecipitation and multiplex sequencing (ChIP-Seq) to identify global transcription factor binding sites in the nematode Caenorhabditis elegans. Methods in enzymology Brdlik, C. M., Niu, W., Snyder, M. 2014; 539: 89-111


    The global identification of transcription factor (TF) binding sites is a critical step in the elucidation of the functional elements of the genome. Several methods have been developed that map TF binding in human cells, yeast, and other model organisms. These methods make use of chromatin immunoprecipitation, or ChIP, and take advantage of the fact that formaldehyde fixation of living cells can be used to cross-link DNA sequences to the TFs that bind them in vivo. In ChIP, the cross-linked TF-DNA complexes are sheared by sonication, size fractionated, and incubated with antibody specific to the TF of interest to generate a library of TF-bound DNA sequences. ChIP-chip was the first technology developed to globally identify TF-bound DNA sequences and involves subsequent hybridization of the ChIP DNA to oligonucleotide microarrays. However, ChIP-chip proved to be costly, labor-intensive, and limited by the fixed number of probes available on the microarray chip. ChIP-Seq combines ChIP with massively parallel high-throughput sequencing (see Explanatory Chapter: Next Generation Sequencing) and has demonstrated vast improvement over ChIP-chip with respect to time and cost, signal-to-noise ratio, and resolution. In particular, multiplex sequencing can be used to achieve a higher throughput in ChIP-Seq analyses involving organisms with genomes of lower complexity than that of human (Lefrançois et al., 2009) and thereby reduce the cost and amount of time needed for each result. The multiplex ChIP-Seq method described in this section has been developed for Caenorhabditis elegans, but is easily adaptable for other organisms.

    View details for DOI 10.1016/B978-0-12-420120-0.00007-4

    View details for PubMedID 24581441

  • Distinct Splice Variants and Pathway Enrichment in the Cell-Line Models of Aggressive Human Breast Cancer Subtypes JOURNAL OF PROTEOME RESEARCH Menon, R., Im, H., Zhang, E. (., Wu, S., Chen, R., Snyder, M., Hancock, W. S., Omenn, G. S. 2014; 13 (1): 212-227


    This study was conducted as a part of the Chromosome-Centric Human Proteome Project (C-HPP) of the Human Proteome Organization. The United States team of C-HPP is focused on characterizing the protein-coding genes in chromosome 17. Despite its small size, chromosome 17 is rich in protein-coding genes; it contains many cancer-associated genes, including BRCA1, ERBB2, (Her2/neu), and TP53. The goal of this study was to examine the splice variants expressed in three ERBB2 expressed breast cancer cell-line models of hormone-receptor-negative breast cancers by integrating RNA-Seq and proteomic mass spectrometry data. The cell lines represent distinct phenotypic variations subtype: SKBR3 (ERBB2+ (overexpression)/ER-/PR-; adenocarcinoma), SUM190 (ERBB2+ (overexpression)/ER-/PR-; inflammatory breast cancer), and SUM149 (ERBB2 (low expression) ER-/PR-; inflammatory breast cancer). We identified more than one splice variant for 1167 genes expressed in at least one of the three cancer cell lines. We found multiple variants of genes that are in the signaling pathways downstream of ERBB2 along with variants specific to one cancer cell line compared with the other two cancer cell lines and with normal mammary cells. The overall transcript profiles based on read counts indicated more similarities between SKBR3 and SUM190. The top-ranking Gene Ontology and BioCarta pathways for the cell-line specific variants pointed to distinct key mechanisms including: amino sugar metabolism, caspase activity, and endocytosis in SKBR3; different aspects of metabolism, especially of lipids in SUM190; cell-to-cell adhesion, integrin, and ERK1/ERK2 signaling; and translational control in SUM149. The analyses indicated an enrichment in the electron transport chain processes in the ERBB2 overexpressed cell line models and an association of nucleotide binding, RNA splicing, and translation processes with the IBC models, SUM190 and SUM149. Detailed experimental studies on the distinct variants identified from each of these three breast cancer cell line models that may open opportunities for drug target discovery and help unveil their specific roles in cancer progression and metastasis.

    View details for DOI 10.1021/pr400773v

    View details for Web of Science ID 000329472700022

    View details for PubMedID 24111759

  • STAT3 Targets Suggest Mechanisms of Aggressive Tumorigenesis in Diffuse Large B-Cell Lymphoma G3-GENES GENOMES GENETICS Hardee, J., Ouyang, Z., Zhang, Y., Kundaje, A., Lacroute, P., Snyder, M. 2013; 3 (12): 2173-2185


    The signal transducer and activator of transcription 3 (STAT3) is a transcription factor that, when dysregulated, becomes a powerful oncogene found in many human cancers, including diffuse large B-cell lymphoma. Diffuse large B-cell lymphoma is the most common form of non-Hodgkin's lymphoma and has two major subtypes: germinal center B-cell-like and activated B-cell-like. Compared with the germinal center B-cell-like form, activated B-cell-like lymphomas respond much more poorly to current therapies and often exhibit overexpression or overactivation of STAT3. To investigate how STAT3 might contribute to this aggressive phenotype, we have integrated genome-wide studies of STAT3 DNA binding using chromatin immunoprecipitation-sequencing with whole-transcriptome profiling using RNA-sequencing. STAT3 binding sites are present near almost a third of all genes that differ in expression between the two subtypes, and examination of the affected genes identified previously undetected and clinically significant pathways downstream of STAT3 that drive oncogenesis. Novel treatments aimed at these pathways may increase the survivability of activated B-cell-like diffuse large B-cell lymphoma.

    View details for DOI 10.1534/g3.113.007674

    View details for Web of Science ID 000328334500008

    View details for PubMedID 24142927

    View details for PubMedCentralID PMC3852380

  • Metadata Checklist for the Integrated Personal Omics Study: Proteomics and Metabolomics Experiments. Big data Snyder, M., Mias, G., Stanberry, L., Kolker, E. 2013; 1 (4): 202-206


    The integrative personal omics profiling study introduced a novel, integrative approach based on personalized, longitudinal, multi-omics data. The study collected genomic, transcriptomic, proteomic, metabolomic, and autoantibody profiles from a single individual over a 14-month period. The results revealed various medical risks and extensive, dynamic changes in diverse molecular components and biological pathways across healthy and diseased conditions. The current article is a data publication that provides the checklists for the metadata of the proteomics (see Table 1 ) and metabolomics (see Table 2 ) datasets of the study. The proposed checklist was recently developed and endorsed by the Data-Enabled Life Sciences Alliance (DELSA Global). We call for the broader use of data publications using the metadata checklist to make omics data more discoverable, interpretable, and reusable, while enabling appropriate attribution to data generators and infrastructure science builders.

    View details for DOI 10.1089/big.2013.0040

    View details for PubMedID 27447252

  • METADATA CHECKLIST FOR THE INTEGRATED PERSONAL OMICS STUDY: Proteomics and Metabolomics Experiments BIG DATA Snyder, M., Mias, G., Stanberry, L., Kolker, E. 2013; 1 (4): BD202-U81
  • TOWARD MORE TRANSPARENT AND REPRODUCIBLE OMICS STUDIES THROUGH A COMMON METADATA CHECKLIST AND DATA PUBLICATIONS BIG DATA Kolker, E., Oezdemir, V., Martens, L., Hancock, W., Anderson, G., Anderson, N., Aynacioglu, S., Baranova, A., Campagna, S. R., Chen, R., Choiniere, J., Dearth, S. P., Feng, W., Ferguson, L., Fox, G., Frishman, D., Grossman, R., Heath, A., Higdon, R., Hutz, M. H., Janko, I., Jiang, L., Joshi, S., Kel, A., Kemnitz, J. W., Kohane, I. S., Kolker, N., Lancet, D., Lee, E., Li, W., Lisitsa, A., Llerena, A., Macnealy-Koch, C., Marshall, J., Masuzzo, P., May, A., Mias, G., Monroe, M., Montague, E., Mooney, S., Nesvizhskii, A., Noronha, S., Omenn, G., Rajasimha, H., Ramamoorthy, P., Sheehan, J., Smarr, L., Smith, C. V., Smith, T., Snyder, M., Rapole, S., Srivastava, S., Stanberry, L., Stewart, E., Toppo, S., Uetz, P., Verheggen, K., Voy, B. H., Warnich, L., Wilhelm, S. W., Yandl, G. 2013; 1 (4): BD196-?


    Biological processes are fundamentally driven by complex interactions between biomolecules. Integrated high-throughput omics studies enable multifaceted views of cells, organisms, or their communities. With the advent of new post-genomics technologies, omics studies are becoming increasingly prevalent; yet the full impact of these studies can only be realized through data harmonization, sharing, meta-analysis, and integrated research. These essential steps require consistent generation, capture, and distribution of metadata. To ensure transparency, facilitate data harmonization, and maximize reproducibility and usability of life sciences studies, we propose a simple common omics metadata checklist. The proposed checklist is built on the rich ontologies and standards already in use by the life sciences community. The checklist will serve as a common denominator to guide experimental design, capture important parameters, and be used as a standard format for stand-alone data publications. The omics metadata checklist and data publications will create efficient linkages between omics data and knowledge-based life sciences innovation and, importantly, allow for appropriate attribution to data generators and infrastructure science builders in the post-genomics era. We ask that the life sciences community test the proposed omics metadata checklist and data publications and provide feedback for their use and improvement.

    View details for DOI 10.1089/big.2013.0039

    View details for Web of Science ID 000209646300005

  • Impacts of variation in the human genome on gene regulation. Journal of molecular biology Haraksingh, R. R., Snyder, M. P. 2013; 425 (21): 3970-3977


    Recent advances in fast and inexpensive DNA sequencing have enabled the extensive study of genomic and transciptomic variation in humans. Human genomic variation is composed of sequence and structural changes including single-nucleotide and multinucleotide variants, short insertions or deletions (indels), larger copy number variants, and similarly sized copy neutral inversions and translocations. It is now well established that any two genomes differ extensively and that structural changes constitute the most prominent source of this variation. There have also been major technological advances in RNA sequencing to globally quantify and describe diversity in transcripts. Large consortia such as the 1000 Genomes Project and the Enclyclopedia of DNA Elements Project are producing increasingly comphrehensive maps outlining the regions of the human genome containing variants and functional elements, respectively. Integration of genetic variation data and extensive annotation of functional genomic elements, along with the ability to measure global transcription, allow the impacts of genetic variants on gene expression to be resolved. There are several well-established models by which genetic variants affect gene regulation depending on the type, nature, and position of the variant with respect to the affected genes. These effects can be manifested in two ways: changes to transcript sequences and isoforms by coding variants, and changes to transcript abundance by dosage or regulatory variants. Here, we review the current state of how genetic variations impact gene regulation locally and globally in the human genome.

    View details for DOI 10.1016/j.jmb.2013.07.015

    View details for PubMedID 23871684

  • Defective sphingosine 1-phosphate receptor 1 (S1P1) phosphorylation exacerbates TH17-mediated autoimmune neuroinflammation. Nature immunology Garris, C. S., Wu, L., Acharya, S., Arac, A., Blaho, V. A., Huang, Y., Moon, B. S., Axtell, R. C., Ho, P. P., Steinberg, G. K., Lewis, D. B., Sobel, R. A., Han, D. K., Steinman, L., Snyder, M. P., Hla, T., Han, M. H. 2013; 14 (11): 1166-1172

    View details for DOI 10.1038/ni.2730

    View details for PubMedID 24076635

  • Comprehensive whole-genome sequencing of an early-stage primary myelofibrosis patient defines low mutational burden and non-recurrent candidate genes. Haematologica Merker, J. D., Roskin, K. M., Ng, D., Pan, C., Fisk, D. G., King, J. J., Hoh, R., Stadler, M., Okumoto, L. M., Abidi, P., Hewitt, R., Jones, C. D., Gojenola, L., Clark, M. J., Zhang, B., Cherry, A. M., George, T. I., Snyder, M., Boyd, S. D., Zehnder, J. L., Fire, A. Z., Gotlib, J. 2013; 98 (11): 1689-1696


    In order to identify novel somatic mutations associated with classic BCR/ABL1-negative myeloproliferative neoplasms, we performed high-coverage genome sequencing of DNA from peripheral blood granulocytes and cultured skin fibroblasts from a patient with MPL W515K-positive primary myelofibrosis. The primary myelofibrosis genome had a low somatic mutation rate, consistent with that observed in similar hematopoietic tumor genomes. Interfacing of whole-genome DNA sequence data with RNA expression data identified three somatic mutations of potential functional significance: a nonsense mutation in CARD6, implicated in modulation of NF-kappaB activation; a 19-base pair deletion involving a potential regulatory region in the 5'-untranslated region of BRD2, implicated in transcriptional regulation and cell cycle control; and a non-synonymous point mutation in KIAA0355, an uncharacterized protein. Additional mutations in three genes (CAP2, SOX30, and MFRP) were also evident, albeit with no support for expression at the RNA level. Re-sequencing of these six genes in 178 patients with polycythemia vera, essential thrombocythemia, and myelofibrosis did not identify recurrent somatic mutations in these genes. Finally, we describe methods for reducing false-positive variant calls in the analysis of hematologic malignancies with a low somatic mutation rate. This trial is registered with (NCT01108159).

    View details for DOI 10.3324/haematol.2013.092379

    View details for PubMedID 23872309

  • A single-molecule long-read survey of the human transcriptome. Nature biotechnology Sharon, D., Tilgner, H., Grubert, F., Snyder, M. 2013; 31 (11): 1009-1014

    View details for DOI 10.1038/nbt.2705

    View details for PubMedID 24108091

  • Incorporating Motif Analysis into Gene Co-expression Networks Reveals Novel Modular Expression Pattern and New Signaling Pathways PLOS GENETICS Ma, S., Shah, S., Bohnert, H. J., Snyder, M., Dinesh-Kumar, S. P. 2013; 9 (10)


    Understanding of gene regulatory networks requires discovery of expression modules within gene co-expression networks and identification of promoter motifs and corresponding transcription factors that regulate their expression. A commonly used method for this purpose is a top-down approach based on clustering the network into a range of densely connected segments, treating these segments as expression modules, and extracting promoter motifs from these modules. Here, we describe a novel bottom-up approach to identify gene expression modules driven by known cis-regulatory motifs in the gene promoters. For a specific motif, genes in the co-expression network are ranked according to their probability of belonging to an expression module regulated by that motif. The ranking is conducted via motif enrichment or motif position bias analysis. Our results indicate that motif position bias analysis is an effective tool for genome-wide motif analysis. Sub-networks containing the top ranked genes are extracted and analyzed for inherent gene expression modules. This approach identified novel expression modules for the G-box, W-box, site II, and MYB motifs from an Arabidopsis thaliana gene co-expression network based on the graphical Gaussian model. The novel expression modules include those involved in house-keeping functions, primary and secondary metabolism, and abiotic and biotic stress responses. In addition to confirmation of previously described modules, we identified modules that include new signaling pathways. To associate transcription factors that regulate genes in these co-expression modules, we developed a novel reporter system. Using this approach, we evaluated MYB transcription factor-promoter interactions within MYB motif modules.

    View details for DOI 10.1371/journal.pgen.1003840

    View details for Web of Science ID 000330367200023

    View details for PubMedID 24098147

    View details for PubMedCentralID PMC3789834

  • Genome-wide Association Analysis of Blood-Pressure Traits in African-Ancestry Individuals Reveals Common Associated Genes in African and Non-African Populations AMERICAN JOURNAL OF HUMAN GENETICS Franceschini, N., Fox, E., Zhang, Z., Edwards, T. L., Nalls, M. A., Sung, Y. J., Tayo, B. O., Sun, Y. V., Gottesman, O., Adeyemo, A., Johnson, A. D., Young, J. H., Rice, K., Duan, Q., Chen, F., Li, Y., Tang, H., Fornage, M., Keene, K. L., Andrews, J. S., Smith, J. A., Fau, J. D., Guangfa, Z., Guo, W., Liu, Y., Murray, S. S., Musani, S. K., Srinivasan, S., Edwards, D. R., Wang, H., Becker, L. C., Bovet, P., Bochud, M., Broecke, U., Burnier, M., Carty, C., Chasman, D. I., Ehret, G., Chen, W., Chen, G., Chen, W., Ding, J., Dreisbach, A. W., Evans, M. K., Guo, X., Garcia, M. E., Jensen, R., Keller, M. E., Lettre, G., Lotay, V., Martin, L. W., Moore, J. H., Morrison, A. C., Mosley, T. H., Ogunniyi, A., Palmas, W., Papanicolaou, G., Penman, A., Polak, J. F., Ridker, P. M., Salako, B., Singleton, A. B., Shriner, D., Taylor, K. D., Vasan, R., Wiggins, K., Williams, S. M., Yanek, L. R., Zhao, W., Zonderman, A. B., Becker, D. M., Berenson, G., Boerwinkle, E., Bottinger, E., Cushman, M., Eaton, C., Nyberg, F., Heiss, G., Hirschhron, J. N., Howard, V. J., Karczewsk, K. J., Lanktree, M. B., Liu, K., Liu, Y., Loos, R., Margolis, K., Snyder, M., Psaty, B. M., Schork, N. J., Weir, D. R., Rotimi, C. N., Sale, M. M., Harris, T., Kardia, S. L., Hunt, S. C., Arnett, D., Redline, S., Cooper, R. S., Risch, N. J., Rao, D. C., Rotter, J. I., Chakravarti, A., Reiner, A. P., Levy, D., Keating, B. J., Zhu, X. 2013; 93 (3): 545-554


    High blood pressure (BP) is more prevalent and contributes to more severe manifestations of cardiovascular disease (CVD) in African Americans than in any other United States ethnic group. Several small African-ancestry (AA) BP genome-wide association studies (GWASs) have been published, but their findings have failed to replicate to date. We report on a large AA BP GWAS meta-analysis that includes 29,378 individuals from 19 discovery cohorts and subsequent replication in additional samples of AA (n = 10,386), European ancestry (EA) (n = 69,395), and East Asian ancestry (n = 19,601). Five loci (EVX1-HOXA, ULK4, RSPO3, PLEKHG1, and SOX6) reached genome-wide significance (p < 1.0 × 10(-8)) for either systolic or diastolic BP in a transethnic meta-analysis after correction for multiple testing. Three of these BP loci (EVX1-HOXA, RSPO3, and PLEKHG1) lack previous associations with BP. We also identified one independent signal in a known BP locus (SOX6) and provide evidence for fine mapping in four additional validated BP loci. We also demonstrate that validated EA BP GWAS loci, considered jointly, show significant effects in AA samples. Consequently, these findings suggest that BP loci might have universal effects across studied populations, demonstrating that multiethnic samples are an essential component in identifying, fine mapping, and understanding their trait variability.

    View details for DOI 10.1016/j.ajhg.2013.07.010

    View details for Web of Science ID 000330268900014

    View details for PubMedID 23972371

  • Sequencing Y Chromosomes Resolves Discrepancy in Time to Common Ancestor of Males Versus Females SCIENCE Poznik, G. D., Henn, B. M., Yee, M., Sliwerska, E., Euskirchen, G. M., Lin, A. A., Snyder, M., Quintana-Murci, L., Kidd, J. M., Underhill, P. A., Bustamante, C. D. 2013; 341 (6145): 562-565


    The Y chromosome and the mitochondrial genome have been used to estimate when the common patrilineal and matrilineal ancestors of humans lived. We sequenced the genomes of 69 males from nine populations, including two in which we find basal branches of the Y-chromosome tree. We identify ancient phylogenetic structure within African haplogroups and resolve a long-standing ambiguity deep within the tree. Applying equivalent methodologies to the Y chromosome and the mitochondrial genome, we estimate the time to the most recent common ancestor (T(MRCA)) of the Y chromosome to be 120 to 156 thousand years and the mitochondrial genome T(MRCA) to be 99 to 148 thousand years. Our findings suggest that, contrary to previous claims, male lineages do not coalesce significantly more recently than female lineages.

    View details for DOI 10.1126/science.1237619

    View details for Web of Science ID 000322586700057

    View details for PubMedID 23908239

  • Genome-wide profiling of human cap-independent translation-enhancing elements. Nature methods Wellensiek, B. P., Larsen, A. C., Stephens, B., Kukurba, K., Waern, K., Briones, N., Liu, L., Snyder, M., Jacobs, B. L., Kumar, S., Chaput, J. C. 2013; 10 (8): 747-750


    We report an in vitro selection strategy to identify RNA sequences that mediate cap-independent initiation of translation. This method entails mRNA display of trillions of genomic fragments, selection for initiation of translation and high-throughput deep sequencing. We identified >12,000 translation-enhancing elements (TEEs) in the human genome, generated a high-resolution map of human TEE-bearing regions (TBRs), and validated the function of a subset of sequences in vitro and in cultured cells.

    View details for DOI 10.1038/nmeth.2522

    View details for PubMedID 23770754

  • Genome-wide profiling of human cap-independent translation-enhancing elements NATURE METHODS Wellensiek, B. P., Larsen, A. C., Stephens, B., Kukurba, K., Waern, K., Briones, N., Liu, L., Snyder, M., Jacobs, B. L., Kumar, S., Chaput, J. C. 2013; 10 (8): 747-?

    View details for DOI 10.1038/nmeth.2522

    View details for Web of Science ID 000322453600023

    View details for PubMedID 23770754

  • Functional genomic screen of human stem cell differentiation reveals pathways involved in neurodevelopment and neurodegeneration PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Zhang, Y., Schulz, V. P., Reed, B. D., Wang, Z., Pan, X., Mariani, J., Euskirchen, G., Snyder, M. P., Vaccarino, F. M., Ivanova, N., Weissman, S. M., Szekely, A. M. 2013; 110 (30): 12361-12366


    Human embryonic stem cells (hESCs) can be induced and differentiated to form a relatively homogeneous population of neuronal precursors in vitro. We have used this system to screen for genes necessary for neural lineage development by using a pooled human short hairpin RNA (shRNA) library screen and massively parallel sequencing. We confirmed known genes and identified several unpredicted genes with interrelated functions that were specifically required for the formation or survival of neuronal progenitor cells without interfering with the self-renewal capacity of undifferentiated hESCs. Among these are several genes that have been implicated in various neurodevelopmental disorders (i.e., brain malformations, mental retardation, and autism). Unexpectedly, a set of genes mutated in late-onset neurodegenerative disorders and with roles in the formation of RNA granules were also found to interfere with neuronal progenitor cell formation, suggesting their functional relevance in early neurogenesis. This study advances the feasibility and utility of using pooled shRNA libraries in combination with next-generation sequencing for a high-throughput, unbiased functional genomic screen. Our approach can also be used with patient-specific human-induced pluripotent stem cell-derived neural models to obtain unparalleled insights into developmental and degenerative processes in neurological or neuropsychiatric disorders with monogenic or complex inheritance.

    View details for DOI 10.1073/pnas.1309725110

    View details for Web of Science ID 000322112300054

    View details for PubMedID 23836664

    View details for PubMedCentralID PMC3725080

  • Variation and genetic control of protein abundance in humans NATURE Wu, L., Candille, S. I., Choi, Y., Xie, D., Jiang, L., Li-Pook-Than, J., Tang, H., Snyder, M. 2013; 499 (7456): 79-82


    Gene expression differs among individuals and populations and is thought to be a major determinant of phenotypic variation. Although variation and genetic loci responsible for RNA expression levels have been analysed extensively in human populations, our knowledge is limited regarding the differences in human protein abundance and the genetic basis for this difference. Variation in messenger RNA expression is not a perfect surrogate for protein expression because the latter is influenced by an array of post-transcriptional regulatory mechanisms, and, empirically, the correlation between protein and mRNA levels is generally modest. Here we used isobaric tag-based quantitative mass spectrometry to determine relative protein levels of 5,953 genes in lymphoblastoid cell lines from 95 diverse individuals genotyped in the HapMap Project. We found that protein levels are heritable molecular phenotypes that exhibit considerable variation between individuals, populations and sexes. Levels of specific sets of proteins involved in the same biological process covary among individuals, indicating that these processes are tightly regulated at the protein level. We identified cis-pQTLs (protein quantitative trait loci), including variants not detected by previous transcriptome studies. This study demonstrates the feasibility of high-throughput human proteome quantification that, when integrated with DNA variation and transcriptome information, adds a new dimension to the characterization of gene expression regulation.

    View details for DOI 10.1038/nature12223

    View details for Web of Science ID 000321285600037

    View details for PubMedID 23676674

  • Identification of Genes Critical for Resistance to Infection by West Nile Virus Using RNA-Seq Analysis VIRUSES-BASEL Qian, F., Chung, L., Zheng, W., Bruno, V., Alexander, R. P., Wang, Z., Wang, X., Kurscheid, S., Zhao, H., Fikrig, E., Gerstein, M., Snyder, M., Montgomery, R. R. 2013; 5 (7): 1664-1681


    The West Nile virus (WNV) is an emerging infection of biodefense concern and there are no available treatments or vaccines. Here we used a high-throughput method based on a novel gene expression analysis, RNA-Seq, to give a global picture of differential gene expression by primary human macrophages of 10 healthy donors infected in vitro with WNV. From a total of 28 million reads per sample, we identified 1,514 transcripts that were differentially expressed after infection. Both predicted and novel gene changes were detected, as were gene isoforms, and while many of the genes were expressed by all donors, some were unique. Knock-down of genes not previously known to be associated with WNV resistance identified their critical role in control of viral infection. Our study distinguishes both common gene pathways as well as novel cellular responses. Such analyses will be valuable for translational studies of susceptible and resistant individuals--and for targeting therapeutics--in multiple biological settings.

    View details for DOI 10.3390/v5071664

    View details for Web of Science ID 000322172200005

    View details for PubMedID 23881275

    View details for PubMedCentralID PMC3738954

  • Genome Wide Proteomics of ERBB2 and EGFR and Other Oncogenic Pathways in Inflammatory Breast Cancer. Journal of proteome research Zhang, E. Y., Cristofanilli, M., Robertson, F., Reuben, J. M., Mu, Z., Beavis, R. C., Im, H., Snyder, M., Hofree, M., Ideker, T., Omenn, G. S., Fanayan, S., Jeong, S., Paik, Y., Zhang, A. F., Wu, S., Hancock, W. S. 2013; 12 (6): 2805-2817


    In this study we selected three breast cancer cell lines (SKBR3, SUM149 and SUM190) with different oncogene expression levels involved in ERBB2 and EGFR signaling pathways as a model system for the evaluation of selective integration of subsets of transcriptomic and proteomic data. We assessed the oncogene status with reads per kilobase per million mapped reads (RPKM) values for ERBB2 (14.4, 400, and 300 for SUM149, SUM190, and SKBR3, respectively) and for EGFR (60.1, not detected, and 1.4 for the same 3 cell lines). We then used RNA-Seq data to identify those oncogenes with significant transcript levels in these cell lines (total 31) and interrogated the corresponding proteomics data sets for proteins with significant interaction values with these oncogenes. The number of observed interactors for each oncogene showed a significant range, e.g., 4.2% (JAK1) to 27.3% (MYC). The percentage is measured as a fraction of the total protein interactions in a given data set vs total interactors for that oncogene in STRING (Search Tool for the Retrieval of Interacting Genes/Proteins, version 9.0) and I2D (Interologous Interaction Database, version 1.95). This approach allowed us to focus on 4 main oncogenes, ERBB2, EGFR, MYC, and GRB2, for pathway analysis. We used bioinformatics sites GeneGo, PathwayCommons and NCI receptor signaling networks to identify pathways that contained the four main oncogenes and had good coverage in the transcriptomic and proteomic data sets as well as a significant number of oncogene interactors. The four pathways identified were ERBB signaling, EGFR1 signaling, integrin outside-in signaling, and validated targets of C-MYC transcriptional activation. The greater dynamic range of the RNA-Seq values allowed the use of transcript ratios to correlate observed protein values with the relative levels of the ERBB2 and EGFR transcripts in each of the four pathways. This provided us with potential proteomic signatures for the SUM149 and 190 cell lines, growth factor receptor-bound protein 7 (GRB7), Crk-like protein (CRKL) and Catenin delta-1 (CTNND1) for ERBB signaling; caveolin 1 (CAV1), plectin (PLEC) for EGFR signaling; filamin A (FLNA) and actinin alpha1 (ACTN1) (associated with high levels of EGFR transcript) for integrin signalings; branched chain amino-acid transaminase 1 (BCAT1), carbamoyl-phosphate synthetase (CAD), nucleolin (NCL) (high levels of EGFR transcript); transferrin receptor (TFRC), metadherin (MTDH) (high levels of ERBB2 transcript) for MYC signaling; S100-A2 protein (S100A2), caveolin 1 (CAV1), Serpin B5 (SERPINB5), stratifin (SFN), PYD and CARD domain containing (PYCARD), and EPH receptor A2 (EPHA2) for PI3K signaling, p53 subpathway. Future studies of inflammatory breast cancer (IBC), from which the cell lines were derived, will be used to explore the significance of these observations.

    View details for DOI 10.1021/pr4001527

    View details for PubMedID 23647160

  • Overview of high throughput sequencing technologies to elucidate molecular pathways in cardiovascular diseases. Circulation research Churko, J. M., Mantalas, G. L., Snyder, M. P., Wu, J. C. 2013; 112 (12): 1613-1623


    High throughput sequencing technologies have become essential in studies on genomics, epigenomics, and transcriptomics. Although sequencing information has traditionally been elucidated using a low throughput technique called Sanger sequencing, high throughput sequencing technologies are capable of sequencing multiple DNA molecules in parallel, enabling hundreds of millions of DNA molecules to be sequenced at a time. This advantage allows high throughput sequencing to be used to create large data sets, generating more comprehensive insights into the cellular genomic and transcriptomic signatures of various diseases and developmental stages. Within high throughput sequencing technologies, whole exome sequencing can be used to identify novel variants and other mutations that may underlie many genetic cardiac disorders, whereas RNA sequencing can be used to analyze how the transcriptome changes. Chromatin immunoprecipitation sequencing and methylation sequencing can be used to identify epigenetic changes, whereas ribosome sequencing can be used to determine which mRNA transcripts are actively being translated. In this review, we will outline the differences in various sequencing modalities and examine the main sequencing platforms on the market in terms of their relative read depths, speeds, and costs. Finally, we will discuss the development of future sequencing platforms and how these new technologies may improve on current sequencing platforms. Ultimately, these sequencing technologies will be instrumental in further delineating how the cardiovascular system develops and how perturbations in DNA and RNA can lead to cardiovascular disease.

    View details for DOI 10.1161/CIRCRESAHA.113.300939

    View details for PubMedID 23743227

  • Metabolomics as a robust tool in systems biology and personalized medicine: an open letter to the metabolomics community METABOLOMICS Snyder, M., Li, X. 2013; 9 (3): 532-534
  • iPOP Goes the World: Integrated Personalized Omics Profiling and the Road toward Improved Health Care. Chemistry & biology Li-Pook-Than, J., Snyder, M. 2013; 20 (5): 660-666


    The health of an individual depends upon their DNA as well as upon environmental factors (environome or exposome). It is expected that although the genome is the blueprint of an individual, its analysis with that of the other omes such as the DNA methylome, the transcriptome, proteome, and metabolome will further provide a dynamic assessment of the physiology and health state of an individual. This review will help to categorize the current progress of omics analyses and how omics integration can be used for medical research. We believe that integrative personal omics profiling (iPOP) is a stepping stone to a new road to personalized health care and may improve disease risk assessment, accuracy of diagnosis, disease monitoring, targeted treatments, and understanding the biological processes of disease states for their prevention.

    View details for DOI 10.1016/j.chembiol.2013.05.001

    View details for PubMedID 23706632

  • Identification of Potential Glycan Cancer Markers with Sialic Acid Attached to Sialic Acid and Up-regulated Fucosylated Galactose Structures in Epidermal Growth Factor Receptor Secreted from A431 Cell Line. Molecular & cellular proteomics Wu, S., Taylor, A. D., Lu, Q., Hanash, S. M., Im, H., Snyder, M., Hancock, W. S. 2013; 12 (5): 1239-1249


    We have used powerful HPLC-mass spectrometric approaches to characterize the secreted form of epidermal growth factor receptor (sEGFR). We demonstrated that the amino acid sequence lacked the cytoplasmic domain and was consistent with the primary sequence reported for EGFR purified from a human plasma pool. One of the sEGFR forms, attributed to the alternative RNA splicing, was also confirmed by transcriptional analysis (RNA sequencing). Two unusual types of glycan structures were observed in sEGFR as compared with membrane-bound EGFR from the A431 cell line. The unusual glycan structures were di-sialylated glycans (sialic acid attached to sialic acid) at Asn-151 and N-acetylhexosamine attached to a branched fucosylated galactose with N-acetylglucosamine moieties (HexNAc-(Fuc)Gal-GlcNAc) at Asn-420. These unusual glycans at specific sites were either present at a much lower level or were not observable in membrane-bound EGFR present in the A431 cell lysate. The observation of these di-sialylated glycan structures was consistent with the observed expression of the corresponding α-N-acetylneuraminide α-2,8-sialyltransferase 2 (ST8SiA2) and α-N-acetylneuraminide α-2,8-sialyltransferase 4 (ST8SiA4), by quantitative real time RT-PCR. The connectivity present at the branched fucosylated galactose was also confirmed by methylation of the glycans followed by analysis with sequential fragmentation in mass spectrometry. We hypothesize that the presence of such glycan structures could promote secretion via anionic or steric repulsion mechanisms and thus facilitate the observation of these glycan forms in the secreted fractions. We plan to use this model system to facilitate the search for novel glycan structures present at specific sites in sEGFR as well as other secreted oncoproteins such as Erbb2 as markers of disease progression in blood samples from cancer patients.

    View details for DOI 10.1074/mcp.M112.024554

    View details for PubMedID 23371026

  • Comparative annotation of functional regions in the human genome using epigenomic data NUCLEIC ACIDS RESEARCH Won, K., Zhang, X., Wang, T., Ding, B., Raha, D., Snyder, M., Ren, B., Wang, W. 2013; 41 (8): 4423-4432


    Epigenetic regulation is dynamic and cell-type dependent. The recently available epigenomic data in multiple cell types provide an unprecedented opportunity for a comparative study of epigenetic landscape. We developed a machine-learning method called ChroModule to annotate the epigenetic states in eight ENCyclopedia Of DNA Elements cell types. The trained model successfully captured the characteristic histone-modification patterns associated with regulatory elements, such as promoters and enhancers, and showed superior performance on identifying enhancers compared with the state-of-art methods. In addition, given the fixed number of epigenetic states in the model, ChroModule allows straightforward illustration of epigenetic variability in multiple cell types. Using this feature, we found that invariable and variable epigenetic states across cell types correspond to housekeeping functions and stimulus response, respectively. Especially, we observed that enhancers, but not the other regulatory elements, dictate cell specificity, as similar cell types share common enhancers, and cell-type-specific enhancers are often bound by transcription factors playing critical roles in that cell type. More interestingly, we found some genomic regions are dormant in cell type but primed to become active in other cell types. These observations highlight the usefulness of ChroModule in comparative analysis and interpretation of multiple epigenomes.

    View details for DOI 10.1093/nar/gkt143

    View details for Web of Science ID 000318569700014

    View details for PubMedID 23482391

    View details for PubMedCentralID PMC3632130

  • Proteogenomic Analysis of Human Colon Carcinoma Cell Lines LIM1215, LIM1899, and LIM2405 JOURNAL OF PROTEOME RESEARCH Fanayan, S., Smith, J. T., Lee, L. Y., Yan, F., Snyder, M., Hancock, W. S., Nice, E. 2013; 12 (4): 1732-1742

    View details for DOI 10.1021/pr3010869

    View details for Web of Science ID 000317327500018

  • Preparation of recombinant protein spotted arrays for proteome-wide identification of kinase targets. Current protocols in protein science / editorial board, John E. Coligan ... [et al.] Im, H., Snyder, M. 2013; Chapter 27: Unit 27 4-?


    Protein microarrays allow unique approaches for interrogating global protein interaction networks. Protein arrays can be divided into two categories: antibody arrays and functional protein arrays. Antibody arrays consist of various antibodies and are appropriate for profiling protein abundance and modifications. Functional full-length protein arrays employ full-length proteins with various post-translational modifications. A key advantage of the latter is rapid parallel processing of large number of proteins for studying highly controlled biochemical activities, protein-protein interactions, protein-nucleic acid interactions, and protein-small molecule interactions. This unit presents a protocol for constructing functional yeast protein microarrays for global kinase substrate identification. This approach enables the rapid determination of protein interaction networks in yeast on a proteome-wide level. The same methodology can be readily applied to higher eukaryotic systems with careful consideration of overexpression strategy.

    View details for DOI 10.1002/0471140864.ps2704s72

    View details for PubMedID 23546622

  • A Major Epigenetic Programming Mechanism Guided by piRMAs DEVELOPMENTAL CELL Huang, X. A., Yin, H., Sweeney, S., Raha, D., Snyder, M., Lin, H. 2013; 24 (5): 502-516


    A central enigma in epigenetics is how epigenetic factors are guided to specific genomic sites for their function. Previously, we reported that a Piwi-piRNA complex associates with the piRNA-complementary site in the Drosophila genome and regulates its epigenetic state. Here, we report that Piwi-piRNA complexes bind to numerous piRNA-complementary sequences throughout the genome, implicating piRNAs as a major mechanism that guides Piwi and Piwi-associated epigenetic factors to program the genome. To test this hypothesis, we demonstrate that inserting piRNA-complementary sequences to an ectopic site leads to Piwi, HP1a, and Su(var)3-9 recruitment to the site as well as H3K9me2/3 enrichment and reduced RNA polymerase II association, indicating that piRNA is both necessary and sufficient to recruit Piwi and epigenetic factors to specific genomic sites. Piwi deficiency drastically changed the epigenetic landscape and polymerase II profile throughout the genome, revealing the Piwi-piRNA mechanism as a major epigenetic programming mechanism in Drosophila.

    View details for DOI 10.1016/j.devcel.2013.01.023

    View details for Web of Science ID 000316163000005

    View details for PubMedID 23434410

  • Accurate Identification and Analysis of Human mRNA Isoforms Using Deep Long Read Sequencing G3-GENES GENOMES GENETICS Tilgner, H., Raha, D., Habegger, L., Mohiuddin, M., Gerstein, M., Snyder, M. 2013; 3 (3): 387-397


    Precise identification of RNA-coding regions and transcriptomes of eukaryotes is a significant problem in biology. Currently, eukaryote transcriptomes are analyzed using deep short-read sequencing experiments of complementary DNAs. The resulting short-reads are then aligned against a genome and annotated junctions to infer biological meaning. Here we use long-read complementary DNA datasets for the analysis of a eukaryotic transcriptome and generate two large datasets in the human K562 and HeLa S3 cell lines. Both data sets comprised at least 4 million reads and had median read lengths greater than 500 bp. We show that annotation-independent alignments of these reads provide partial gene structures that are very much in-line with annotated gene structures, 15% of which have not been obtained in a previous de novo analysis of short reads. For long-noncoding RNAs (i.e., lncRNA) genes, however, we find an increased fraction of novel gene structures among our alignments. Other important aspects of transcriptome analysis, such as the description of cell type-specific splicing, can be performed in an accurate, reliable and completely annotation-free manner, making it ideal for the analysis of transcriptomes of newly sequenced genomes. Furthermore, we demonstrate that long read sequence can be assembled into full-length transcripts with considerable success. Our method is applicable to all long read sequencing technologies.

    View details for DOI 10.1534/g3.112.004812

    View details for Web of Science ID 000315950000002

    View details for PubMedID 23450794

  • Personal genomes, quantitative dynamic omics and personalized medicine. Quantitative biology (Beijing, China) Mias, G. I., Snyder, M. 2013; 1 (1): 71-90


    The rapid technological developments following the Human Genome Project have made possible the availability of personalized genomes. As the focus now shifts from characterizing genomes to making personalized disease associations, in combination with the availability of other omics technologies, the next big push will be not only to obtain a personalized genome, but to quantitatively follow other omics. This will include transcriptomes, proteomes, metabolomes, antibodyomes, and new emerging technologies, enabling the profiling of thousands of molecular components in individuals. Furthermore, omics profiling performed longitudinally can probe the temporal patterns associated with both molecular changes and associated physiological health and disease states. Such data necessitates the development of computational methodology to not only handle and descriptively assess such data, but also construct quantitative biological models. Here we describe the availability of personal genomes and developing omics technologies that can be brought together for personalized implementations and how these novel integrated approaches may effectively provide a precise personalized medicine that focuses on not only characterization and treatment but ultimately the prevention of disease.

    View details for PubMedID 25798291

  • Extensive Transcript Diversity and Novel Upstream Open Reading Frame Regulation in Yeast G3-GENES GENOMES GENETICS Waern, K., Snyder, M. 2013; 3 (2): 343-352


    To understand the diversity of transcripts in yeast (Saccharomyces cerevisiae) we analyzed the transcriptional landscapes for cells grown under 18 different environmental conditions. Each sample was analyzed using RNA-sequencing, and a total of 670,446,084 uniquely mapped reads and 377,263 poly-adenylated end tags were produced. Consistent with previous studies, we find that the majority of yeast genes are expressed under one or more different conditions. By directly comparing the 5' and 3' ends of the transcribed regions, we find extensive differences in transcript ends across many conditions, especially those of stationary phase, growth in grape juice, and salt stimulation, suggesting differential choice of transcription start and stop sites is pervasive in yeast. Relative to the exponential growth condition (i.e., YPAD), transcripts differing at the 5' ends and 3' ends are predicted to differ in their annotated start codon in 21 genes and their annotated stop codon in 63 genes. Many (431) upstream open reading frames (uORFs) are found in alternate 5' ends and are significantly enriched in transcripts produced during the salt response. Mutational analysis of five genes with uORFs revealed that two sets of uORFs increase the expression of a reporter construct, indicating a role in activation which had not been reported previously, whereas two other uORFs decreased expression. In addition, RNA binding protein motifs are statistically enriched for alternate ends under many conditions. Overall, these results demonstrate enormous diversity of transcript ends, and that this heterogeneity is regulated under different environmental conditions. Moreover, transcript end diversity has important biological implications for the regulation of gene expression. In addition, our data also serve as a valuable resource for the scientific community.

    View details for DOI 10.1534/g3.112.003640

    View details for Web of Science ID 000314881600019

    View details for PubMedID 23390610

  • SeqFold: Genome-scale reconstruction of RNA secondary structure integrating high-throughput sequencing data GENOME RESEARCH Ouyang, Z., Snyder, M. P., Chang, H. Y. 2013; 23 (2): 377-387


    We present an integrative approach, SeqFold, that combines high-throughput RNA structure profiling data with computational prediction for genome-scale reconstruction of RNA secondary structures. SeqFold transforms experimental RNA structure information into a structure preference profile (SPP) and uses it to select stable RNA structure candidates representing the structure ensemble. Under a high-dimensional classification framework, SeqFold efficiently matches a given SPP to the most likely cluster of structures sampled from the Boltzmann-weighted ensemble. SeqFold is able to incorporate diverse types of RNA structure profiling data, including parallel analysis of RNA structure (PARS), selective 2'-hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq), fragmentation sequencing (FragSeq) data generated by deep sequencing, and conventional SHAPE data. Using the known structures of a wide range of mRNAs and noncoding RNAs as benchmarks, we demonstrate that SeqFold outperforms or matches existing approaches in accuracy and is more robust to noise in experimental data. Application of SeqFold to reconstruct the secondary structures of the yeast transcriptome reveals the diverse impact of RNA secondary structure on gene regulation, including translation efficiency, transcription initiation, and protein-RNA interactions. SeqFold can be easily adapted to incorporate any new types of high-throughput RNA structure profiling data and is widely applicable to analyze RNA structures in any transcriptome.

    View details for DOI 10.1101/gr.138545.112

    View details for Web of Science ID 000314323100016

    View details for PubMedID 23064747

  • Two methods for full-length RNA sequencing for low quantities of cells and single cells PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Pan, X., Durrett, R. E., Zhu, H., Tanaka, Y., Li, Y., Zi, X., Marjani, S. L., Euskirchen, G., Ma, C., LaMotte, R. H., Park, I., Snyder, M. P., Mason, C. E., Weissman, S. M. 2013; 110 (2): 594-599


    The ability to determine the gene expression pattern in low quantities of cells or single cells is important for resolving a variety of problems in many biological disciplines. A robust description of the expression signature of a single cell requires determination of the full-length sequence of the expressed mRNAs in the cell, yet existing methods have either 3' biased or variable transcript representation. Here, we report our protocols for the amplification and high-throughput sequencing of very small amounts of RNA for sequencing using procedures of either semirandom primed PCR or phi29 DNA polymerase-based DNA amplification, for the cDNA generated with oligo-dT and/or random oligonucleotide primers. Unlike existing methods, these protocols produce relatively uniformly distributed sequences covering the full length of almost all transcripts independent of their sizes, from 1,000 to 10 cells, and even with single cells. Both protocols produced satisfactory detection/coverage of the abundant mRNAs from a single K562 erythroleukemic cell or a single dorsal root ganglion neuron. The phi29-based method produces long products with less noise, uses an isothermal reaction, and is simple to practice. The semirandom primed PCR procedure is more sensitive and reproducible at low transcript levels or with low quantities of cells. These methods provide tools for mRNA sequencing or RNA sequencing when only low quantities of cells, a single cell, or even degraded RNA are available for profiling.

    View details for DOI 10.1073/pnas.1217322109

    View details for Web of Science ID 000313906600047

    View details for PubMedID 23267071

  • Multimodal Dynamic Profiling of Healthy and Diseased States for Future Personalized Health Care CLINICAL PHARMACOLOGY & THERAPEUTICS Mias, G. I., Snyder, M. 2013; 93 (1): 29-32

    View details for DOI 10.1038/clpt.2012.204

    View details for Web of Science ID 000312618200021

    View details for PubMedID 23187877

  • Specific plasma autoantibody reactivity in myelodysplastic syndromes. Scientific reports Mias, G. I., Chen, R., Zhang, Y., Sridhar, K., Sharon, D., Xiao, L., Im, H., Snyder, M. P., Greenberg, P. L. 2013; 3: 3311-?

    View details for DOI 10.1038/srep03311

    View details for PubMedID 24264604

  • Integrative analysis of longitudinal metabolomics data from a personal multi-omics profile. Metabolites Stanberry, L., Mias, G. I., Haynes, W., Higdon, R., Snyder, M., Kolker, E. 2013; 3 (3): 741-760


    The integrative personal omics profile (iPOP) is a pioneering study that combines genomics, transcriptomics, proteomics, metabolomics and autoantibody profiles from a single individual over a 14-month period. The observation period includes two episodes of viral infection: a human rhinovirus and a respiratory syncytial virus. The profile studies give an informative snapshot into the biological functioning of an organism. We hypothesize that pathway expression levels are associated with disease status. To test this hypothesis, we use biological pathways to integrate metabolomics and proteomics iPOP data. The approach computes the pathways' differential expression levels at each time point, while taking into account the pathway structure and the longitudinal design. The resulting pathway levels show strong association with the disease status. Further, we identify temporal patterns in metabolite expression levels. The changes in metabolite expression levels also appear to be consistent with the disease status. The results of the integrative analysis suggest that changes in biological pathways may be used to predict and monitor the disease. The iPOP experimental design, data acquisition and analysis issues are discussed within the broader context of personal profiling.

    View details for DOI 10.3390/metabo3030741

    View details for PubMedID 24958148

    View details for PubMedCentralID PMC3901289

  • Tissue-specific direct targets of Caenorhabditis elegans Rb/E2F dictate distinct somatic and germline programs GENOME BIOLOGY Kudron, M., Niu, W., Lu, Z., Wang, G., Gerstein, M., Snyder, M., Reinke, V. 2013; 14 (1)
  • High-throughput sequencing for biology and medicine MOLECULAR SYSTEMS BIOLOGY Soon, W. W., Hariharan, M., Snyder, M. P. 2013; 9


    Advances in genome sequencing have progressed at a rapid pace, with increased throughput accompanied by plunging costs. But these advances go far beyond faster and cheaper. High-throughput sequencing technologies are now routinely being applied to a wide range of important topics in biology and medicine, often allowing researchers to address important biological questions that were not possible before. In this review, we discuss these innovative new approaches-including ever finer analyses of transcriptome dynamics, genome structure and genomic variation-and provide an overview of the new insights into complex biological systems catalyzed by these technologies. We also assess the impact of genotyping, genome sequencing and personal omics profiling on medical applications, including diagnosis and disease monitoring. Finally, we review recent developments in single-cell sequencing, and conclude with a discussion of possible future advances and obstacles for sequencing in biology and health.

    View details for DOI 10.1038/msb.2012.61

    View details for Web of Science ID 000314415800010

    View details for PubMedID 23340846

  • Systematic investigation of protein-small molecule interactions IUBMB LIFE Li, X., Wang, X., Snyder, M. 2013; 65 (1): 2-8


    Cell signaling is extensively wired between cellular components to sustain cell proliferation, differentiation, and adaptation. The interaction network is often manifested in how protein function is regulated through interacting with other cellular components including small molecule metabolites. While many biochemical interactions have been established as reactions between protein enzymes and their substrates and products, much less is known at the system level about how small metabolites regulate protein functions through allosteric binding. In the past decade, study of protein-small molecule interactions has been lagging behind other types of interactions. Recent technological advances have explored several high-throughput platforms to reveal many "unexpected" protein-small molecule interactions that could have profound impact on our understanding of cell signaling. These interactions will help bridge gaps in existing regulatory loops of cell signaling and serve as new targets for medical intervention. In this review, we summarize recent advances of systematic investigation of protein-metabolite/small molecule interactions, and discuss the impact of such studies and their potential impact on both biological researches and medicine.

    View details for DOI 10.1002/iub.1111

    View details for Web of Science ID 000312886200002

    View details for PubMedID 23225626

  • Exome sequencing by targeted enrichment. Current protocols in molecular biology / edited by Frederick M. Ausubel ... [et al.] Clark, M. J., Chen, R., Snyder, M. 2013; Chapter 7: Unit7 12-?


    This unit describes methods for targeted enrichment of the exon-coding portions of the genome using Agilent SureSelect Human All Exon 50 Mb and Roche Nimblegen SeqCap EZ Exome platforms. Each platform targets and enriches a large overlapping portion of the greater human exome. The protocols here describe the biochemical procedures used to enrich exomic DNA with each platform, including recommended modifications to the manufacturers' protocols. In addition, a brief description of the sequencing protocol and estimation of the needed amount of sequencing for each platform is included. Finally, a detailed analytical pipeline for processing the subsequent data is described. These protocols focus specifically on human exome sequencing platforms, but can be applied with some modification to other organisms and targeted enrichment approaches.

    View details for DOI 10.1002/0471142727.mb0712s102

    View details for PubMedID 23547016

  • The variable somatic genome. Cell cycle O'Huallachain, M., Weissman, S. M., Snyder, M. P. 2013; 12 (1): 5-6

    View details for DOI 10.4161/cc.23069

    View details for PubMedID 23255102

  • Proteogenomic Analysis of Human Colon Carcinoma Cell Lines LIM1215, LIM1899, and LIM2405. Journal of proteome research Fanayan, S., Smith, J. T., Lee, L. Y., Yan, F., Snyder, M., Hancock, W. S., Nice, E. 2013


    As part of the genome-wide and chromosome-centric human proteomic project (C-HPP), we have integrated shotgun proteomics approach and a genome-wide transcriptomic approach (RNA-Seq) of a set of human colon cancer cell lines (LIM1215, LIM1899 and LIM2405) that were selected to represent a wide range of pathological states of colorectal cancer. The combination of a standard proteomics approach (1D-gel electrophoresis coupled to LC/ion trap mass spectrometry) and RNA-Seq allowed us to exploit the greater depth of the transcriptomics measurement (∼9800 transcripts per cell line) versus the protein observations (∼1900 protein identifications per cell line). Conversely, the proteomics data were helpful in identifying both cancer associated proteins with differential expression patterns as well as protein networks and pathways which appear to be deregulated in these cell lines. Examples of potential markers include mortalin, nucleophosmin, ezrin, LASP1, alpha and beta forms of spectrin, exportin, the carcinoembryonic antigen family, EGFR and MET. Interaction analyses identified the large intermediate filament family, the protein folding network and adapter proteins in focal adhesion networks, which included the CDC42 and RHOA signaling pathways that may have potential for identifying phenotypic states representing poorly and moderately differentiated states of CRC, with or without metastases.

    View details for DOI 10.1021/pr3010869

    View details for PubMedID 23458625

  • Tissue-specific direct targets of Caenorhabditis elegans Rb/E2F dictate distinct somatic and germline programs. Genome biology Kudron, M., Niu, W., Lu, Z., Wang, G., Gerstein, M., Snyder, M., Reinke, V. 2013; 14 (1): R5


    BACKGROUND: The tumor suppressor Rb/E2F regulates gene expression to control differentiation in multiple tissues during development, although how it directs tissue-specific gene regulation in vivo is poorly understood. RESULTS: We determined the genome-wide binding profiles for Caenorhabditis elegans Rb/E2F-like components in the germline, in the intestine and broadly throughout the soma, and uncovered highly tissue-specific binding patterns and target genes. Chromatin association by LIN-35, the C. elegans ortholog of Rb, is impaired in the germline but robust in the soma, a characteristic that might govern differential effects on gene expression in the two cell types. In the intestine, LIN-35 and the heterochromatin protein HPL-2, the ortholog of Hp1, coordinately bind at many sites lacking E2F. Finally, selected direct target genes contribute to the soma-to-germline transformation of lin-35 mutants, including mes-4, a soma-specific target that promotes H3K36 methylation, and csr-1, a germline-specific target that functions in a 22G small RNA pathway. CONCLUSIONS: In sum, identification of tissue-specific binding profiles and effector target genes reveals important insights into the mechanisms by which Rb/E2F controls distinct cell fates in vivo.

    View details for DOI 10.1186/gb-2013-14-1-r5

    View details for PubMedID 23347407

  • Promise of personalized omics to precision medicine WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE Chen, R., Snyder, M. 2013; 5 (1): 73-82


    The rapid development of high-throughput technologies and computational frameworks enables the examination of biological systems in unprecedented detail. The ability to study biological phenomena at omics levels in turn is expected to lead to significant advances in personalized and precision medicine. Patients can be treated according to their own molecular characteristics. Individual omes as well as the integrated profiles of multiple omes, such as the genome, the epigenome, the transcriptome, the proteome, the metabolome, the antibodyome, and other omics information are expected to be valuable for health monitoring, preventative measures, and precision medicine. Moreover, omics technologies have the potential to transform medicine from traditional symptom-oriented diagnosis and treatment of diseases toward disease prevention and early diagnostics. We discuss here the advances and challenges in systems biology-powered personalized medicine at its current stage, as well as a prospective view of future personalized health care at the end of this review.

    View details for DOI 10.1002/wsbm.1198

    View details for Web of Science ID 000312736200005

    View details for PubMedID 23184638

  • A Chromosome-centric Human Proteome Project (C-HPP) to Characterize the Sets of Proteins Encoded in Chromosome 17 JOURNAL OF PROTEOME RESEARCH Liu, S., Im, H., Bairoch, A., Cristofanilli, M., Chen, R., Deutsch, E. W., Dalton, S., Fenyo, D., Fanayan, S., Gates, C., Gaudet, P., Hincapie, M., Hanash, S., Kim, H., Jeong, S., Lundberg, E., Mias, G., Menon, R., Mu, Z., Nice, E., Paik, Y., Uhlen, M., Wells, L., Wu, S., Yan, F., Zhang, F., Zhang, Y., Snyder, M., Omenn, G. S., Beavis, R. C., Hancock, W. S. 2013; 12 (1): 45-57


    We report progress assembling the parts list for chromosome 17 and illustrate the various processes that we have developed to integrate available data from diverse genomic and proteomic knowledge bases. As primary resources, we have used GPMDB, neXtProt, PeptideAtlas, Human Protein Atlas (HPA), and GeneCards. All sites share the common resource of Ensembl for the genome modeling information. We have defined the chromosome 17 parts list with the following information: 1169 protein-coding genes, the numbers of proteins confidently identified by various experimental approaches as documented in GPMDB, neXtProt, PeptideAtlas, and HPA, examples of typical data sets obtained by RNASeq and proteomic studies of epithelial derived tumor cell lines (disease proteome) and a normal proteome (peripheral mononuclear cells), reported evidence of post-translational modifications, and examples of alternative splice variants (ASVs). We have constructed a list of the 59 "missing" proteins as well as 201 proteins that have inconclusive mass spectrometric (MS) identifications. In this report we have defined a process to establish a baseline for the incorporation of new evidence on protein identification and characterization as well as related information from transcriptome analyses. This initial list of "missing" proteins that will guide the selection of appropriate samples for discovery studies as well as antibody reagents. Also we have illustrated the significant diversity of protein variants (including post-translational modifications, PTMs) using regions on chromosome 17 that contain important oncogenes. We emphasize the need for mandated deposition of proteomics data in public databases, the further development of improved PTM, ASV, and single nucleotide variant (SNV) databases, and the construction of Web sites that can integrate and regularly update such information. In addition, we describe the distribution of both clustered and scattered sets of protein families on the chromosome. Since chromosome 17 is rich in cancer-associated genes, we have focused the clustering of cancer-associated genes in such genomic regions and have used the ERBB2 amplicon as an example of the value of a proteogenomic approach in which one integrates transcriptomic with proteomic information and captures evidence of coexpression through coordinated regulation.

    View details for DOI 10.1021/pr300985j

    View details for Web of Science ID 000313156300007

    View details for PubMedID 23259914

  • Centromere-Like Regions in the Budding Yeast Genome PLOS GENETICS Lefrancois, P., Auerbach, R. K., Yellman, C. M., Roeder, G. S., Snyder, M. 2013; 9 (1)


    Accurate chromosome segregation requires centromeres (CENs), the DNA sequences where kinetochores form, to attach chromosomes to microtubules. In contrast to most eukaryotes, which have broad centromeres, Saccharomyces cerevisiae possesses sequence-defined point CENs. Chromatin immunoprecipitation followed by sequencing (ChIP-Seq) reveals colocalization of four kinetochore proteins at novel, discrete, non-centromeric regions, especially when levels of the centromeric histone H3 variant, Cse4 (a.k.a. CENP-A or CenH3), are elevated. These regions of overlapping protein binding enhance the segregation of plasmids and chromosomes and have thus been termed Centromere-Like Regions (CLRs). CLRs form in close proximity to S. cerevisiae CENs and share characteristics typical of both point and regional CENs. CLR sequences are conserved among related budding yeasts. Many genomic features characteristic of CLRs are also associated with these conserved homologous sequences from closely related budding yeasts. These studies provide general and important insights into the origin and evolution of centromeres.

    View details for DOI 10.1371/journal.pgen.1003209

    View details for Web of Science ID 000314651500052

    View details for PubMedID 23349633

  • Copy Number Variation detection from 1000 Genomes project exon capture sequencing data BMC BIOINFORMATICS Wu, J., Grzeda, K. R., Stewart, C., Grubert, F., Urban, A. E., Snyder, M. P., Marth, G. T. 2012; 13


    DNA capture technologies combined with high-throughput sequencing now enable cost-effective, deep-coverage, targeted sequencing of complete exomes. This is well suited for SNP discovery and genotyping. However there has been little attention devoted to Copy Number Variation (CNV) detection from exome capture datasets despite the potentially high impact of CNVs in exonic regions on protein function.As members of the 1000 Genomes Project analysis effort, we investigated 697 samples in which 931 genes were targeted and sampled with 454 or Illumina paired-end sequencing. We developed a rigorous Bayesian method to detect CNVs in the genes, based on read depth within target regions. Despite substantial variability in read coverage across samples and targeted exons, we were able to identify 107 heterozygous deletions in the dataset. The experimentally determined false discovery rate (FDR) of the cleanest dataset from the Wellcome Trust Sanger Institute is 12.5%. We were able to substantially improve the FDR in a subset of gene deletion candidates that were adjacent to another gene deletion call (17 calls). The estimated sensitivity of our call-set was 45%.This study demonstrates that exonic sequencing datasets, collected both in population based and medical sequencing projects, will be a useful substrate for detecting genic CNV events, particularly deletions. Based on the number of events we found and the sensitivity of the methods in the present dataset, we estimate on average 16 genic heterozygous deletions per individual genome. Our power analysis informs ongoing and future projects about sequencing depth and uniformity of read coverage required for efficient detection.

    View details for DOI 10.1186/1471-2105-13-305

    View details for Web of Science ID 000314688600001

    View details for PubMedID 23157288

    View details for PubMedCentralID PMC3563612

  • Whole Genome Sequence Analysis of Primary Myelofibrosis. 54th Annual Meeting and Exposition of the American-Society-of-Hematology (ASH) Merker, J. D., Roskin, K., Ng, D., Pan, C., Fisk, D. G., Jones, C. D., Gojenola, L., Clark, M. J., Zhang, B., Cherry, M., Snyder, M., Boyd, S. D., Zehnder, J. L., Fire, A. Z., Gotlib, J. AMER SOC HEMATOLOGY. 2012
  • Genome interpretation and assembly-recent progress and next steps. Nature biotechnology Baker, S., Joecker, A., Church, G., Snyder, M., West, J., Salzberg, S., Worthey, E., Smith, T., Wang, J., Reid, J. G. 2012; 30 (11): 1081-1083

    View details for DOI 10.1038/nbt.2425

    View details for PubMedID 23138307

  • Michael Snyder. Interview by Asher Mullard. Nature reviews. Drug discovery Snyder, M. 2012; 11 (10): 744-?

    View details for DOI 10.1038/nrd3867

    View details for PubMedID 23023673

  • Systems biology: personalized medicine for the future? CURRENT OPINION IN PHARMACOLOGY Chen, R., Snyder, M. 2012; 12 (5): 623-628


    Systems biology is actively transforming the field of modern health care from symptom-based disease diagnosis and treatment to precision medicine in which patients are treated based on their individual characteristics. Development of high-throughput technologies such as high-throughout sequencing and mass spectrometry has enabled scientists and clinicians to examine genomes, transcriptomes, proteomes, metabolomes, and other omics information in unprecedented detail. The combined 'omics' information leads to a global profiling of health and disease, and provides new approaches for personalized health monitoring and preventative medicine. In this article, we review the efforts of systems biology in personalized medicine in the past 2 years, and discuss in detail achievements and concerns, as well as highlights and hurdles for future personalized health care.

    View details for DOI 10.1016/j.coph.2012.07.011

    View details for Web of Science ID 000310478800017

    View details for PubMedID 22858243

  • SWI/SNF Chromatin-remodeling Factors: Multiscale Analyses and Diverse Functions JOURNAL OF BIOLOGICAL CHEMISTRY Euskirchen, G., Auerbach, R. K., Snyder, M. 2012; 287 (37): 30897-30905


    Chromatin-remodeling enzymes play essential roles in many biological processes, including gene expression, DNA replication and repair, and cell division. Although one such complex, SWI/SNF, has been extensively studied, new discoveries are still being made. Here, we review SWI/SNF biochemistry; highlight recent genomic and proteomic advances; and address the role of SWI/SNF in human diseases, including cancer and viral infections. These studies have greatly increased our understanding of complex nuclear processes.

    View details for DOI 10.1074/jbc.R111.309302

    View details for Web of Science ID 000308791300003

    View details for PubMedID 22952240

  • Ubiquitous heterogeneity and asymmetry of the chromatin environment at regulatory elements GENOME RESEARCH Kundaje, A., Kyriazopoulou-Panagiotopoulou, S., Libbrecht, M., Smith, C. L., Raha, D., Winters, E. E., Johnson, S. M., Snyder, M., Batzoglou, S., Sidow, A. 2012; 22 (9): 1735-1747


    Gene regulation at functional elements (e.g., enhancers, promoters, insulators) is governed by an interplay of nucleosome remodeling, histone modifications, and transcription factor binding. To enhance our understanding of gene regulation, the ENCODE Consortium has generated a wealth of ChIP-seq data on DNA-binding proteins and histone modifications. We additionally generated nucleosome positioning data on two cell lines, K562 and GM12878, by MNase digestion and high-depth sequencing. Here we relate 14 chromatin signals (12 histone marks, DNase, and nucleosome positioning) to the binding sites of 119 DNA-binding proteins across a large number of cell lines. We developed a new method for unsupervised pattern discovery, the Clustered AGgregation Tool (CAGT), which accounts for the inherent heterogeneity in signal magnitude, shape, and implicit strand orientation of chromatin marks. We applied CAGT on a total of 5084 data set pairs to obtain an exhaustive catalog of high-resolution patterns of histone modifications and nucleosome positioning signals around bound transcription factors. Our analyses reveal extensive heterogeneity in how histone modifications are deposited, and how nucleosomes are positioned around binding sites. With the exception of the CTCF/cohesin complex, asymmetry of nucleosome positioning is predominant. Asymmetry of histone modifications is also widespread, for all types of chromatin marks examined, including promoter, enhancer, elongation, and repressive marks. The fine-resolution signal shapes discovered by CAGT unveiled novel correlation patterns between chromatin marks, nucleosome positioning, and sequence content. Meta-analyses of the signal profiles revealed a common vocabulary of chromatin signals shared across multiple cell lines and binding proteins.

    View details for DOI 10.1101/gr.136366.111

    View details for Web of Science ID 000308272800015

    View details for PubMedID 22955985

    View details for PubMedCentralID PMC3431490

  • Deep sequencing of subcellular RNA fractions shows splicing to be predominantly co-transcriptional in the human genome but inefficient for IncRNAs GENOME RESEARCH Tilgner, H., Knowles, D. G., Johnson, R., Davis, C. A., Chakrabortty, S., Djebali, S., Curado, J., Snyder, M., Gingeras, T. R., Guigo, R. 2012; 22 (9): 1616-1625


    Splicing remains an incompletely understood process. Recent findings suggest that chromatin structure participates in its regulation. Here, we analyze the RNA from subcellular fractions obtained through RNA-seq in the cell line K562. We show that in the human genome, splicing occurs predominantly during transcription. We introduce the coSI measure, based on RNA-seq reads mapping to exon junctions and borders, to assess the degree of splicing completion around internal exons. We show that, as expected, splicing is almost fully completed in cytosolic polyA+ RNA. In chromatin-associated RNA (which includes the RNA that is being transcribed), for 5.6% of exons, the removal of the surrounding introns is fully completed, compared with 0.3% of exons for which no intron-removal has occurred. The remaining exons exist as a mixture of spliced and fewer unspliced molecules, with a median coSI of 0.75. Thus, most RNAs undergo splicing while being transcribed: "co-transcriptional splicing." Consistent with co-transcriptional spliceosome assembly and splicing, we have found significant enrichment of spliceosomal snRNAs in chromatin-associated RNA compared with other cellular RNA fractions and other nonspliceosomal snRNAs. CoSI scores decrease along the gene, pointing to a "first transcribed, first spliced" rule, yet more downstream exons carry other characteristics, favoring rapid, co-transcriptional intron removal. Exons with low coSI values, that is, in the process of being spliced, are enriched with chromatin marks, consistent with a role for chromatin in splicing during transcription. For alternative exons and long noncoding RNAs, splicing tends to occur later, and the latter might remain unspliced in some cases.

    View details for DOI 10.1101/gr.134445.111

    View details for Web of Science ID 000308272800004

    View details for PubMedID 22955974

  • A highly integrated and complex PPARGC1A transcription factor binding network in HepG2 cells GENOME RESEARCH Charos, A. E., Reed, B. D., Raha, D., Szekely, A. M., Weissman, S. M., Snyder, M. 2012; 22 (9): 1668-1679


    PPARGC1A is a transcriptional coactivator that binds to and coactivates a variety of transcription factors (TFs) to regulate the expression of target genes. PPARGC1A plays a pivotal role in regulating energy metabolism and has been implicated in several human diseases, most notably type II diabetes. Previous studies have focused on the interplay between PPARGC1A and individual TFs, but little is known about how PPARGC1A combines with all of its partners across the genome to regulate transcriptional dynamics. In this study, we describe a core PPARGC1A transcriptional regulatory network operating in HepG2 cells treated with forskolin. We first mapped the genome-wide binding sites of PPARGC1A using chromatin-IP followed by high-throughput sequencing (ChIP-seq) and uncovered overrepresented DNA sequence motifs corresponding to known and novel PPARGC1A network partners. We then profiled six of these site-specific TF partners using ChIP-seq and examined their network connectivity and combinatorial binding patterns with PPARGC1A. Our analysis revealed extensive overlap of targets including a novel link between PPARGC1A and HSF1, a TF regulating the conserved heat shock response pathway that is misregulated in diabetes. Importantly, we found that different combinations of TFs bound to distinct functional sets of genes, thereby helping to reveal the combinatorial regulatory code for metabolic and other cellular processes. In addition, the different TFs often bound near the promoters and coding regions of each other's genes suggesting an intricate network of interdependent regulation. Overall, our study provides an important framework for understanding the systems-level control of metabolic gene expression in humans.

    View details for DOI 10.1101/gr.127761.111

    View details for Web of Science ID 000308272800009

    View details for PubMedID 22955979

    View details for PubMedCentralID PMC3431484

  • VAT: a computational framework to functionally annotate variants in personal genomes within a cloud-computing environment BIOINFORMATICS Habegger, L., Balasubramanian, S., Chen, D. Z., Khurana, E., Sboner, A., Harmanci, A., Rozowsky, J., Clarke, D., Snyder, M., Gerstein, M. 2012; 28 (17): 2267-2269


    The functional annotation of variants obtained through sequencing projects is generally assumed to be a simple intersection of genomic coordinates with genomic features. However, complexities arise for several reasons, including the differential effects of a variant on alternatively spliced transcripts, as well as the difficulty in assessing the impact of small insertions/deletions and large structural variants. Taking these factors into consideration, we developed the Variant Annotation Tool (VAT) to functionally annotate variants from multiple personal genomes at the transcript level as well as obtain summary statistics across genes and individuals. VAT also allows visualization of the effects of different variants, integrates allele frequencies and genotype data from the underlying individuals and facilitates comparative analysis between different groups of individuals. VAT can either be run through a command-line interface or as a web application. Finally, in order to enable on-demand access and to minimize unnecessary transfers of large data files, VAT can be run as a virtual machine in a cloud-computing environment.VAT is implemented in C and PHP. The VAT web service, Amazon Machine Image, source code and detailed documentation are available at

    View details for DOI 10.1093/bioinformatics/bts368

    View details for Web of Science ID 000308019200008

    View details for PubMedID 22743228

  • Understanding transcriptional regulation by integrative analysis of transcription factor binding data GENOME RESEARCH Cheng, C., Alexander, R., Min, R., Leng, J., Yip, K. Y., Rozowsky, J., Yan, K., Dong, X., Djebali, S., Ruan, Y., Davis, C. A., Carninci, P., Lassman, T., Gingerasi, T. R., Guigo, R., Birney, E., Weng, Z., Snyder, M., Gerstein, M. 2012; 22 (9): 1658-1667


    Statistical models have been used to quantify the relationship between gene expression and transcription factor (TF) binding signals. Here we apply the models to the large-scale data generated by the ENCODE project to study transcriptional regulation by TFs. Our results reveal a notable difference in the prediction accuracy of expression levels of transcription start sites (TSSs) captured by different technologies and RNA extraction protocols. In general, the expression levels of TSSs with high CpG content are more predictable than those with low CpG content. For genes with alternative TSSs, the expression levels of downstream TSSs are more predictable than those of the upstream ones. Different TF categories and specific TFs vary substantially in their contributions to predicting expression. Between two cell lines, the differential expression of TSS can be precisely reflected by the difference of TF-binding signals in a quantitative manner, arguing against the conventional on-and-off model of TF binding. Finally, we explore the relationships between TF-binding signals and other chromatin features such as histone modifications and DNase hypersensitivity for determining expression. The models imply that these features regulate transcription in a highly coordinated manner.

    View details for DOI 10.1101/gr.136838.111

    View details for Web of Science ID 000308272800008

    View details for PubMedID 22955978

  • Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors GENOME RESEARCH Wang, J., Zhuang, J., Iyer, S., Lin, X., Whitfield, T. W., Greven, M. C., Pierce, B. G., Dong, X., Kundaje, A., Cheng, Y., Rando, O. J., Birney, E., Myers, R. M., Noble, W. S., Snyder, M., Weng, Z. 2012; 22 (9): 1798-1812


    Chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-seq) has become the dominant technique for mapping transcription factor (TF) binding regions genome-wide. We performed an integrative analysis centered around 457 ChIP-seq data sets on 119 human TFs generated by the ENCODE Consortium. We identified highly enriched sequence motifs in most data sets, revealing new motifs and validating known ones. The motif sites (TF binding sites) are highly conserved evolutionarily and show distinct footprints upon DNase I digestion. We frequently detected secondary motifs in addition to the canonical motifs of the TFs, indicating tethered binding and cobinding between multiple TFs. We observed significant position and orientation preferences between many cobinding TFs. Genes specifically expressed in a cell line are often associated with a greater occurrence of nearby TF binding in that cell line. We observed cell-line-specific secondary motifs that mediate the binding of the histone deacetylase HDAC2 and the enhancer-binding protein EP300. TF binding sites are located in GC-rich, nucleosome-depleted, and DNase I sensitive regions, flanked by well-positioned nucleosomes, and many of these features show cell type specificity. The GC-richness may be beneficial for regulating TF binding because, when unoccupied by a TF, these regions are occupied by nucleosomes in vivo. We present the results of our analysis in a TF-centric web repository Factorbook ( and will continually update this repository as more ENCODE data are generated.

    View details for DOI 10.1101/gr.139105.112

    View details for Web of Science ID 000308272800020

    View details for PubMedID 22955990

    View details for PubMedCentralID PMC3431495

  • A Genome-Scale Resource for In Vivo Tag-Based Protein Function Exploration in C. elegans CELL Sarov, M., Murray, J. I., Schanze, K., Pozniakovski, A., Niu, W., Angermann, K., Hasse, S., Rupprecht, M., Vinis, E., Tinney, M., Preston, E., Zinke, A., Enst, S., Teichgraber, T., Janette, J., Reis, K., Janosch, S., Schloissnig, S., Ejsmont, R. K., Slightam, C., Xu, X., Kim, S. K., Reinke, V., Stewart, A. F., Snyder, M., Waterston, R. H., Hyman, A. A. 2012; 150 (4): 855-866


    Understanding the in vivo dynamics of protein localization and their physical interactions is important for many problems in biology. To enable systematic protein function interrogation in a multicellular context, we built a genome-scale transgenic platform for in vivo expression of fluorescent- and affinity-tagged proteins in Caenorhabditis elegans under endogenous cis regulatory control. The platform combines computer-assisted transgene design, massively parallel DNA engineering, and next-generation sequencing to generate a resource of 14,637 genomic DNA transgenes, which covers 73% of the proteome. The multipurpose tag used allows any protein of interest to be localized in vivo or affinity purified using standard tag-based assays. We illustrate the utility of the resource by systematic chromatin immunopurification and automated 4D imaging, which produced detailed DNA binding and cell/tissue distribution maps for key transcription factor proteins.

    View details for DOI 10.1016/j.cell.2012.08.001

    View details for Web of Science ID 000308002300018

    View details for PubMedID 22901814

  • Discovery of Stress Responsive DNA Regulatory Motifs in Arabidopsis PLOS ONE Ma, S., Bachan, S., Porto, M., Bohnert, H. J., Snyder, M., Dinesh-Kumar, S. P. 2012; 7 (8)


    The discovery of DNA regulatory motifs in the sequenced genomes using computational methods remains challenging. Here, we present MotifIndexer--a comprehensive strategy for de novo identification of DNA regulatory motifs at a genome level. Using word-counting methods, we indexed the existence of every 8-mer oligo composed of bases A, C, G, T, r, y, s, w, m, k, n or 12-mer oligo composed of A, C, G, T, n, in the promoters of all predicted genes of Arabidopsis thaliana genome and of selected stress-induced co-expressed genes. From this analysis, we identified number of over-represented motifs. Among these, major critical motifs were identified using a position filter. We used a model based on uniform distribution and the z-scores derived from this model to describe position bias. Interestingly, many motifs showed position bias towards the transcription start site. We extended this model to show biased distribution of motifs in the genomes of both A. thaliana and rice. We also used MotifIndexer to identify conserved motifs in co-expressed gene groups from two Arabidopsis species, A. thaliana and A. lyrata. This new comparative genomics method does not depend on alignments of homologous gene promoter sequences.

    View details for DOI 10.1371/journal.pone.0043198

    View details for Web of Science ID 000307500100069

    View details for PubMedID 22912824

  • Investigating metabolite-protein interactions: An overview of available techniques METHODS Yang, G. X., Li, X., Snyder, M. 2012; 57 (4): 459-466


    Metabolites comprise the molar majority of chemical substances in living cells, and metabolite-protein interactions are expected to be quite common. Many interactions have already been identified and have been shown to be involved in the regulation of different types of cellular processes including signaling events, enzyme activities, protein localizations and interactions. Recent technological advances have greatly facilitated the detection of metabolite-protein interactions at high sensitivity and some of these have been applied on a large scale. In this manuscript, we review the available in vitro, in silico and in vivo technologies for mapping small-molecule-protein interactions. Although some of these were developed for drug-protein interactions they can be applied for mapping metabolite-protein interactions. Information gained from the use of these approaches can be applied to the manipulation of cellular processes and therapeutic applications.

    View details for DOI 10.1016/j.ymeth.2012.06.013

    View details for Web of Science ID 000309625600009

    View details for PubMedID 22750303

    View details for PubMedCentralID PMC3448827

  • Patient-Specific Induced Pluripotent Stem Cells as a Model for Familial Dilated Cardiomyopathy SCIENCE TRANSLATIONAL MEDICINE Sun, N., Yazawa, M., Liu, J., Han, L., Sanchez-Freire, V., Abilez, O. J., Navarrete, E. G., Hu, S., Wang, L., Lee, A., Pavlovic, A., Lin, S., Chen, R., Hajjar, R. J., Snyder, M. P., Dolmetsch, R. E., Butte, M. J., Ashley, E. A., Longaker, M. T., Robbins, R. C., Wu, J. C. 2012; 4 (130)


    Characterized by ventricular dilatation, systolic dysfunction, and progressive heart failure, dilated cardiomyopathy (DCM) is the most common form of cardiomyopathy in patients. DCM is the most common diagnosis leading to heart transplantation and places a significant burden on healthcare worldwide. The advent of induced pluripotent stem cells (iPSCs) offers an exceptional opportunity for creating disease-specific cellular models, investigating underlying mechanisms, and optimizing therapy. Here, we generated cardiomyocytes from iPSCs derived from patients in a DCM family carrying a point mutation (R173W) in the gene encoding sarcomeric protein cardiac troponin T. Compared to control healthy individuals in the same family cohort, cardiomyocytes derived from iPSCs from DCM patients exhibited altered regulation of calcium ion (Ca(2+)), decreased contractility, and abnormal distribution of sarcomeric α-actinin. When stimulated with a β-adrenergic agonist, DCM iPSC-derived cardiomyocytes showed characteristics of cellular stress such as reduced beating rates, compromised contraction, and a greater number of cells with abnormal sarcomeric α-actinin distribution. Treatment with β-adrenergic blockers or overexpression of sarcoplasmic reticulum Ca(2+) adenosine triphosphatase (Serca2a) improved the function of iPSC-derived cardiomyocytes from DCM patients. Thus, iPSC-derived cardiomyocytes from DCM patients recapitulate to some extent the morphological and functional phenotypes of DCM and may serve as a useful platform for exploring disease mechanisms and for drug screening.

    View details for DOI 10.1126/scitranslmed.3003552

    View details for Web of Science ID 000303045900004

    View details for PubMedID 22517884

    View details for PubMedCentralID PMC3657516

  • Extensive In vivo Metabolite-Protein Interactions Revealed by Large-Scale Systematic Analyses Experimental Biology Meeting 2012 Snyder, M., Li, X., Gianoulis, T., Yip, K., Gerstein, M. FEDERATION AMER SOC EXP BIOL. 2012
  • A core erythroid transcriptional network is repressed by a master regulator of myelo-lymphoid differentiation PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Wontakal, S. N., Guo, X., Smith, C., MacCarthy, T., Bresnick, E. H., Bergman, A., Snyder, M. P., Weissman, S. M., Zheng, D., Skoultchi, A. I. 2012; 109 (10): 3832-3837


    Two mechanisms that play important roles in cell fate decisions are control of a "core transcriptional network" and repression of alternative transcriptional programs by antagonizing transcription factors. Whether these two mechanisms operate together is not known. Here we report that GATA-1, SCL, and Klf1 form an erythroid core transcriptional network by co-occupying >300 genes. Importantly, we find that PU.1, a negative regulator of terminal erythroid differentiation, is a highly integrated component of this network. GATA-1, SCL, and Klf1 act to promote, whereas PU.1 represses expression of many of the core network genes. PU.1 also represses the genes encoding GATA-1, SCL, Klf1, and important GATA-1 cofactors. Conversely, in addition to repressing PU.1 expression, GATA-1 also binds to and represses >100 PU.1 myelo-lymphoid gene targets in erythroid progenitors. Mathematical modeling further supports that this dual mechanism of repressing both the opposing upstream activator and its downstream targets provides a synergistic, robust mechanism for lineage specification. Taken together, these results amalgamate two key developmental principles, namely, regulation of a core transcriptional network and repression of an alternative transcriptional program, thereby enhancing our understanding of the mechanisms that establish cellular identity.

    View details for DOI 10.1073/pnas.1121019109

    View details for Web of Science ID 000301117700049

    View details for PubMedID 22357756

  • The Chromosome-Centric Human Proteome Project for cataloging proteins encoded in the genome NATURE BIOTECHNOLOGY Paik, Y., Jeong, S., Omenn, G. S., Uhlen, M., Hanash, S., Cho, S. Y., Lee, H., Na, K., Choi, E., Yan, F., Zhang, F., Zhang, Y., Snyder, M., Cheng, Y., Chen, R., Marko-Varga, G., Deutsch, E. W., Kim, H., Kwon, J., Aebersold, R., Bairoch, A., Taylor, A. D., Kim, K. Y., Lee, E., Hochstrasser, D., Legrain, P., Hancock, W. S. 2012; 30 (3): 221-223

    View details for Web of Science ID 000301303800011

    View details for PubMedID 22398612

  • Tcf7 Is an Important Regulator of the Switch of Self-Renewal and Differentiation in a Multipotential Hematopoietic Cell Line PLOS GENETICS Wu, J. Q., Seay, M., Schulz, V. P., Hariharan, M., Tuck, D., Lian, J., Du, J., Shi, M., Ye, Z., Gerstein, M., Snyder, M. P., Weissman, S. 2012; 8 (3)


    A critical problem in biology is understanding how cells choose between self-renewal and differentiation. To generate a comprehensive view of the mechanisms controlling early hematopoietic precursor self-renewal and differentiation, we used systems-based approaches and murine EML multipotential hematopoietic precursor cells as a primary model. EML cells give rise to a mixture of self-renewing Lin-SCA+CD34+ cells and partially differentiated non-renewing Lin-SCA-CD34- cells in a cell autonomous fashion. We identified and validated the HMG box protein TCF7 as a regulator in this self-renewal/differentiation switch that operates in the absence of autocrine Wnt signaling. We found that Tcf7 is the most down-regulated transcription factor when CD34+ cells switch into CD34- cells, using RNA-Seq. We subsequently identified the target genes bound by TCF7, using ChIP-Seq. We show that TCF7 and RUNX1 (AML1) bind to each other's promoter regions and that TCF7 is necessary for the production of the short isoforms, but not the long isoforms of RUNX1, suggesting that TCF7 and the short isoforms of RUNX1 function coordinately in regulation. Tcf7 knock-down experiments and Gene Set Enrichment Analyses suggest that TCF7 plays a dual role in promoting the expression of genes characteristic of self-renewing CD34+ cells while repressing genes activated in partially differentiated CD34- state. Finally a network of up-regulated transcription factors of CD34+ cells was constructed. Factors that control hematopoietic stem cell (HSC) establishment and development, cell growth, and multipotency were identified. These studies in EML cells demonstrate fundamental cell-intrinsic properties of the switch between self-renewal and differentiation, and yield valuable insights for manipulating HSCs and other differentiating systems.

    View details for DOI 10.1371/journal.pgen.1002565

    View details for Web of Science ID 000302254800041

    View details for PubMedID 22412390

    View details for PubMedCentralID PMC3297581

  • Correlation of Global MicroRNA Expression With Basal Cell Carcinoma Subtype G3-GENES GENOMES GENETICS Heffelfinger, C., Ouyang, Z., Engberg, A., Leffell, D. J., Hanlon, A. M., Gordon, P. B., Zheng, W., Zhao, H., Snyder, M. P., Bale, A. E. 2012; 2 (2): 279-286


    Basal cell carcinomas (BCCs) are the most common cancers in the United States. The histologic appearance distinguishes several subtypes, each of which can have a different biologic behavior. In this study, global miRNA expression was quantified by high-throughput sequencing in nodular BCCs, a subtype that is slow growing, and infiltrative BCCs, aggressive tumors that extend through the dermis and invade structures such as cutaneous nerves. Principal components analysis correctly classified seven of eight infiltrative tumors on the basis of miRNA expression. The remaining tumor, on pathology review, contained a mixture of nodular and infiltrative elements. Nodular tumors did not cluster tightly, likely reflecting broader histopathologic diversity in this class, but trended toward forming a group separate from infiltrative BCCs. Quantitative polymerase chain reaction assays were developed for six of the miRNAs that showed significant differences between the BCC subtypes, and five of these six were validated in a replication set of four infiltrative and three nodular tumors. The expression level of miR-183, a miRNA that inhibits invasion and metastasis in several types of malignancies, was consistently lower in infiltrative than nodular tumors and could be one element underlying the difference in invasiveness. These results represent the first miRNA profiling study in BCCs and demonstrate that miRNA gene expression may be involved in tumor pathogenesis and particularly in determining the aggressiveness of these malignancies.

    View details for DOI 10.1534/g3.111.001115

    View details for Web of Science ID 000312411000015

    View details for PubMedID 22384406

  • Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors GENOME BIOLOGY Yip, K. Y., Cheng, C., Bhardwaj, N., Brown, J. B., Leng, J., Kundaje, A., Rozowsky, J., Birney, E., Bickel, P., Snyder, M., Gerstein, M. 2012; 13 (9)


    Transcription factors function by binding different classes of regulatory elements. The Encyclopedia of DNA Elements (ENCODE) project has recently produced binding data for more than 100 transcription factors from about 500 ChIP-seq experiments in multiple cell types. While this large amount of data creates a valuable resource, it is nonetheless overwhelmingly complex and simultaneously incomplete since it covers only a small fraction of all human transcription factors.As part of the consortium effort in providing a concise abstraction of the data for facilitating various types of downstream analyses, we constructed statistical models that capture the genomic features of three paired types of regions by machine-learning methods: firstly, regions with active or inactive binding; secondly, those with extremely high or low degrees of co-binding, termed HOT and LOT regions; and finally, regulatory modules proximal or distal to genes. From the distal regulatory modules, we developed computational pipelines to identify potential enhancers, many of which were validated experimentally. We further associated the predicted enhancers with potential target transcripts and the transcription factors involved. For HOT regions, we found a significant fraction of transcription factor binding without clear sequence motifs and showed that this observation could be related to strong DNA accessibility of these regions.Overall, the three pairs of regions exhibit intricate differences in chromosomal locations, chromatin features, factors that bind them, and cell-type specificity. Our machine learning approach enables us to identify features potentially general to all transcription factors, including those not included in the data.

    View details for DOI 10.1186/gb-2012-13-9-r48

    View details for Web of Science ID 000313182600001

    View details for PubMedID 22950945

    View details for PubMedCentralID PMC3491392

  • An encyclopedia of mouse DNA elements (Mouse ENCODE) GENOME BIOLOGY Stamatoyannopoulos, J. A., Snyder, M., Hardison, R., Ren, B., Gingeras, T., Gilbert, D. M., Groudine, M., Bender, M., Kaul, R., Canfield, T., Giste, E., Johnson, A., Zhang, M., Balasundaram, G., Byron, R., Roach, V., Sabo, P. J., Sandstrom, R., Stehling, A. S., Thurman, R. E., Weissman, S. M., Cayting, P., Hariharan, M., Lian, J., Cheng, Y., Landt, S. G., Ma, Z., Wold, B. J., Dekker, J., Crawford, G. E., Keller, C. A., Wu, W., Morrissey, C., Kumar, S. A., Mishra, T., Jain, D., Byrska-Bishop, M., Blankenberg, D., Lajoie, B. R., Jain, G., Sanyal, A., Chen, K., Denas, O., Taylor, J., Blobel, G. A., Weiss, M. J., Pimkin, M., Deng, W., Marinov, G. K., Williams, B. A., Fisher-Aylor, K. I., DeSalvo, G., Kiralusha, A., Trout, D., Amrhein, H., Mortazavi, A., Edsall, L., McCleary, D., Kuan, S., Shen, Y., Yue, F., Ye, Z., Davis, C. A., Zaleski, C., Jha, S., Xue, C., Dobin, A., Lin, W., Fastuca, M., Wang, H., Guigo, R., Djebali, S., Lagarde, J., Ryba, T., Sasaki, T., Malladi, V. S., Cline, M. S., Kirkup, V. M., Learned, K., Rosenbloom, K. R., Kent, W. J., Feingold, E. A., Good, P. J., Pazin, M., Lowdon, R. F., Adams, L. B. 2012; 13 (8)
  • Q & A: the Snyderome GENOME BIOLOGY Snyder, M. 2012; 13 (3)


    Michael Snyder answers Genome Biology's questions on the human and professional stories underlying his Snyderome integrative omics project.

    View details for DOI 10.1186/gb-2012-13-3-147

    View details for Web of Science ID 000308544200010

    View details for PubMedID 22424393

  • Phosphorylation of Yeast Transcription Factors Correlates with the Evolution of Novel Sequence and Function JOURNAL OF PROTEOME RESEARCH Kaganovich, M., Snyder, M. 2012; 11 (1): 261-268


    Gene duplication is a significant source of novel genes and the dynamics of gene duplicate retention vs loss are poorly understood, particularly in terms of the functional and regulatory specialization of their gene products. We compiled a comprehensive data set of S. cerevisiae phosphosites to study the role of phosphorylation in yeast paralog divergence. We found that proteins coded by duplicated genes created in the Whole Genome Duplication (WGD) event and in a period prior to the WGD are significantly more phosphorylated than other duplicates or singletons. Though the amino acid sequence of each paralog of a given pair tends to diverge fairly similarly from their common ortholog in a related species, the phosphorylated amino acids tend to diverge in sequence from the ortholog at different rates. We observed that transcription factors (TFs) are disproportionately present among the set of duplicate genes and among the set of proteins that are phosphorylated. Interestingly, TFs that occur on higher levels of the transcription network hierarchy (i.e., tend to regulate other TFs) tend to be more phosphorylated than lower-level TFs. We found that TF paralog divergence in expression, binding, and sequence correlates with the abundance of phosphosites. Overall, these studies have important implications for understanding divergence of gene function and regulation in eukaryotes.

    View details for DOI 10.1021/pr201065k

    View details for Web of Science ID 000298827700024

    View details for PubMedID 22141333

  • Characterization of Enhancer Function from Genome-Wide Analyses ANNUAL REVIEW OF GENOMICS AND HUMAN GENETICS, VOL 13 Maston, G. A., Landt, S. G., Snyder, M., Green, M. R. 2012; 13: 29-57


    There has been a recent surge in the use of genome-wide methodologies to identify and annotate the transcriptional regulatory elements in the human genome. Here we review some of these methodologies and the conceptual insights about transcription regulation that have been gained from the use of genome-wide studies. It has become clear that the binding of transcription factors is itself a highly regulated process, and binding does not always appear to have functional consequences. Numerous properties have now been associated with regulatory elements that may be useful in their identification. Several aspects of enhancer function have been shown to be more widespread than was previously appreciated, including the highly combinatorial nature of transcription factor binding, the postinitiation regulation of many target genes, and the binding of enhancers at early stages to maintain their competence during development. Going forward, the integration of multiple genome-wide data sets should become a standard approach to elucidate higher-order regulatory interactions.

    View details for DOI 10.1146/annurev-genom-090711-163723

    View details for Web of Science ID 000310143800002

    View details for PubMedID 22703170

  • An encyclopedia of mouse DNA elements (Mouse ENCODE). Genome biology Stamatoyannopoulos, J. A., Snyder, M., Hardison, R., Ren, B., Gingeras, T., Gilbert, D. M., Groudine, M., Bender, M., Kaul, R., Canfield, T., Giste, E., Johnson, A., Zhang, M., Balasundaram, G., Byron, R., Roach, V., Sabo, P. J., Sandstrom, R., Stehling, A. S., Thurman, R. E., Weissman, S. M., Cayting, P., Hariharan, M., Lian, J., Cheng, Y., Landt, S. G., Ma, Z., Wold, B. J., Dekker, J., Crawford, G. E., Keller, C. A., Wu, W., Morrissey, C., Kumar, S. A., Mishra, T., Jain, D., Byrska-Bishop, M., Blankenberg, D., Lajoie1, B. R., Jain, G., Sanyal, A., Chen, K. B., Denas, O., Taylor, J., Blobel, G. A., Weiss, M. J., Pimkin, M., Deng, W., Marinov, G. K., Williams, B. A., Fisher-Aylor, K. I., Desalvo, G., Kiralusha, A., Trout, D., Amrhein, H., Mortazavi, A., Edsall, L., McCleary, D., Kuan, S., Shen, Y., Yue, F., Ye, Z., Davis, C. A., Zaleski, C., Jha, S., Xue, C., Dobin, A., Lin, W., Fastuca, M., Wang, H., Guigo, R., Djebali, S., Lagarde, J., Ryba, T., Sasaki, T., Malladi, V. S., Cline, M. S., Kirkup, V. M., Learned, K., Rosenbloom, K. R., Kent, W. J., Feingold, E. A., Good, P. J., Pazin, M., Lowdon, R. F., Adams, L. B. 2012; 13 (8): 418


    ABSTRACT: To complement the human Encyclopedia of DNA Elements (ENCODE) project and to enable a broad range of mouse genomics efforts, the Mouse ENCODE Consortium is applying the same experimental pipelines developed for human ENCODE to annotate the mouse genome.

    View details for DOI 10.1186/gb-2012-13-8-418

    View details for PubMedID 22889292

    View details for PubMedCentralID PMC3491367

  • Interpretome: a freely available, modular, and secure personal genome interpretation engine. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Karczewski, K. J., Tirrell, R. P., Cordero, P., Tatonetti, N. P., Dudley, J. T., Salari, K., Snyder, M., Altman, R. B., Kim, S. K. 2012: 339-350


    The decreasing cost of genotyping and genome sequencing has ushered in an era of genomic personalized medicine. More than 100,000 individuals have been genotyped by direct-to-consumer genetic testing services, which offer a glimpse into the interpretation and exploration of a personal genome. However, these interpretations, which require extensive manual curation, are subject to the preferences of the company and are not customizable by the individual. Academic institutions teaching personalized medicine, as well as genetic hobbyists, may prefer to customize their analysis and have full control over the content and method of interpretation. We present the Interpretome, a system for private genome interpretation, which contains all genotype information in client-side interpretation scripts, supported by server-side databases. We provide state-of-the-art analyses for teaching clinical implications of personal genomics, including disease risk assessment and pharmacogenomics. Additionally, we have implemented client-side algorithms for ancestry inference, demonstrating the power of these methods without excessive computation. Finally, the modular nature of the system allows for plugin capabilities for custom analyses. This system will allow for personal genome exploration without compromising privacy, facilitating hands-on courses in genomics and personalized medicine.

    View details for PubMedID 22174289

  • A High-Resolution Whole-Genome Map of Key Chromatin Modifications in the Adult Drosophila melanogaster PLOS GENETICS Yin, H., Sweeney, S., Raha, D., Snyder, M., Lin, H. 2011; 7 (12)


    Epigenetic research has been focused on cell-type-specific regulation; less is known about common features of epigenetic programming shared by diverse cell types within an organism. Here, we report a modified method for chromatin immunoprecipitation and deep sequencing (ChIP-Seq) and its use to construct a high-resolution map of the Drosophila melanogaster key histone marks, heterochromatin protein 1a (HP1a) and RNA polymerase II (polII). These factors are mapped at 50-bp resolution genome-wide and at 5-bp resolution for regulatory sequences of genes, which reveals fundamental features of chromatin modification landscape shared by major adult Drosophila cell types: the enrichment of both heterochromatic and euchromatic marks in transposons and repetitive sequences, the accumulation of HP1a at transcription start sites with stalled polII, the signatures of histone code and polII level/position around the transcriptional start sites that predict both the mRNA level and functionality of genes, and the enrichment of elongating polII within exons at splicing junctions. These features, likely conserved among diverse epigenomes, reveal general strategies for chromatin modifications.

    View details for DOI 10.1371/journal.pgen.1002380

    View details for Web of Science ID 000299167900003

    View details for PubMedID 22194694

  • Genome-Wide Mapping of Copy Number Variation in Humans: Comparative Analysis of High Resolution Array Platforms PLOS ONE Haraksingh, R. R., Abyzov, A., Gerstein, M., Urban, A. E., Snyder, M. 2011; 6 (11)


    Accurate and efficient genome-wide detection of copy number variants (CNVs) is essential for understanding human genomic variation, genome-wide CNV association type studies, cytogenetics research and diagnostics, and independent validation of CNVs identified from sequencing based technologies. Numerous, array-based platforms for CNV detection exist utilizing array Comparative Genome Hybridization (aCGH), Single Nucleotide Polymorphism (SNP) genotyping or both. We have quantitatively assessed the abilities of twelve leading genome-wide CNV detection platforms to accurately detect Gold Standard sets of CNVs in the genome of HapMap CEU sample NA12878, and found significant differences in performance. The technologies analyzed were the NimbleGen 4.2 M, 2.1 M and 3×720 K Whole Genome and CNV focused arrays, the Agilent 1×1 M CGH and High Resolution and 2×400 K CNV and SNP+CGH arrays, the Illumina Human Omni1Quad array and the Affymetrix SNP 6.0 array. The Gold Standards used were a 1000 Genomes Project sequencing-based set of 3997 validated CNVs and an ultra high-resolution aCGH-based set of 756 validated CNVs. We found that sensitivity, total number, size range and breakpoint resolution of CNV calls were highest for CNV focused arrays. Our results are important for cost effective CNV detection and validation for both basic and clinical applications.

    View details for DOI 10.1371/journal.pone.0027859

    View details for Web of Science ID 000298168100021

    View details for PubMedID 22140474

  • Construction and Analysis of an Integrated Regulatory Network Derived from High-Throughput Sequencing Data PLOS COMPUTATIONAL BIOLOGY Cheng, C., Yan, K., Hwang, W., Qian, J., Bhardwaj, N., Rozowsky, J., Lu, Z. J., Niu, W., Alves, P., Kato, M., Snyder, M., Gerstein, M. 2011; 7 (11)


    We present a network framework for analyzing multi-level regulation in higher eukaryotes based on systematic integration of various high-throughput datasets. The network, namely the integrated regulatory network, consists of three major types of regulation: TF→gene, TF→miRNA and miRNA→gene. We identified the target genes and target miRNAs for a set of TFs based on the ChIP-Seq binding profiles, the predicted targets of miRNAs using annotated 3'UTR sequences and conservation information. Making use of the system-wide RNA-Seq profiles, we classified transcription factors into positive and negative regulators and assigned a sign for each regulatory interaction. Other types of edges such as protein-protein interactions and potential intra-regulations between miRNAs based on the embedding of miRNAs in their host genes were further incorporated. We examined the topological structures of the network, including its hierarchical organization and motif enrichment. We found that transcription factors downstream of the hierarchy distinguish themselves by expressing more uniformly at various tissues, have more interacting partners, and are more likely to be essential. We found an over-representation of notable network motifs, including a FFL in which a miRNA cost-effectively shuts down a transcription factor and its target. We used data of C. elegans from the modENCODE project as a primary model to illustrate our framework, but further verified the results using other two data sets. As more and more genome-wide ChIP-Seq and RNA-Seq data becomes available in the near future, our methods of data integration have various potential applications.

    View details for DOI 10.1371/journal.pcbi.1002190

    View details for Web of Science ID 000297263700001

    View details for PubMedID 22125477

  • Performance comparison of exome DNA sequencing technologies NATURE BIOTECHNOLOGY Clark, M. J., Chen, R., Lam, H. Y., Karczewski, K. J., Chen, R., Euskirchen, G., Butte, A. J., Snyder, M. 2011; 29 (10): 908-U206


    Whole exome sequencing by high-throughput sequencing of target-enriched genomic DNA (exome-seq) has become common in basic and translational research as a means of interrogating the interpretable part of the human genome at relatively low cost. We present a comparison of three major commercial exome sequencing platforms from Agilent, Illumina and Nimblegen applied to the same human blood sample. Our results suggest that the Nimblegen platform, which is the only one to use high-density overlapping baits, covers fewer genomic regions than the other platforms but requires the least amount of sequencing to sensitively detect small variants. Agilent and Illumina are able to detect a greater total number of variants with additional sequencing. Illumina captures untranslated regions, which are not targeted by the Nimblegen and Agilent platforms. We also compare exome sequencing and whole genome sequencing (WGS) of the same sample, demonstrating that exome sequencing can detect additional small variants missed by WGS.

    View details for DOI 10.1038/nbt.1975

    View details for Web of Science ID 000296273000017

    View details for PubMedID 21947028

  • Phased Whole-Genome Genetic Risk in a Family Quartet Using a Major Allele Reference Sequence PLOS GENETICS Dewey, F. E., Chen, R., Cordero, S. P., Ormond, K. E., Caleshu, C., Karczewski, K. J., Whirl-Carrillo, M., Wheeler, M. T., Dudley, J. T., Byrnes, J. K., Cornejo, O. E., Knowles, J. W., Woon, M., Sangkuhl, K., Gong, L., Thorn, C. F., Hebert, J. M., Capriotti, E., David, S. P., Pavlovic, A., West, A., Thakuria, J. V., Ball, M. P., Zaranek, A. W., Rehm, H. L., Church, G. M., West, J. S., Bustamante, C. D., Snyder, M., Altman, R. B., Klein, T. E., Butte, A. J., Ashley, E. A. 2011; 7 (9)


    Whole-genome sequencing harbors unprecedented potential for characterization of individual and family genetic variation. Here, we develop a novel synthetic human reference sequence that is ethnically concordant and use it for the analysis of genomes from a nuclear family with history of familial thrombophilia. We demonstrate that the use of the major allele reference sequence results in improved genotype accuracy for disease-associated variant loci. We infer recombination sites to the lowest median resolution demonstrated to date (< 1,000 base pairs). We use family inheritance state analysis to control sequencing error and inform family-wide haplotype phasing, allowing quantification of genome-wide compound heterozygosity. We develop a sequence-based methodology for Human Leukocyte Antigen typing that contributes to disease risk prediction. Finally, we advance methods for analysis of disease and pharmacogenomic risk across the coding and non-coding genome that incorporate phased variant data. We show these methods are capable of identifying multigenic risk for inherited thrombophilia and informing the appropriate pharmacological therapy. These ethnicity-specific, family-based approaches to interpretation of genetic variation are emblematic of the next generation of genetic risk assessment using whole-genome sequencing.

    View details for DOI 10.1371/journal.pgen.1002280

    View details for Web of Science ID 000295419100031

    View details for PubMedID 21935354

    View details for PubMedCentralID PMC3174201

  • Arabidopsis RTNLB1 and RTNLB2 Reticulon-Like Proteins Regulate Intracellular Trafficking and Activity of the FLS2 Immune Receptor PLANT CELL Lee, H. Y., Bowen, C. H., Popescu, G. V., Kang, H., Kato, N., Ma, S., Dinesh-Kumar, S., Snyder, M., Popescu, S. C. 2011; 23 (9): 3374-3391


    Receptors localized at the plasma membrane are critical for the recognition of pathogens. The molecular determinants that regulate receptor transport to the plasma membrane are poorly understood. In a screen for proteins that interact with the FLAGELIN-SENSITIVE2 (FLS2) receptor using Arabidopsis thaliana protein microarrays, we identified the reticulon-like protein RTNLB1. We showed that FLS2 interacts in vivo with both RTNLB1 and its homolog RTNLB2 and that a Ser-rich region in the N-terminal tail of RTNLB1 is critical for the interaction with FLS2. Transgenic plants that lack RTNLB1 and RTNLB2 (rtnlb1 rtnlb2) or overexpress RTNLB1 (RTNLB1ox) exhibit reduced activation of FLS2-dependent signaling and increased susceptibility to pathogens. In both rtnlb1 rtnlb2 and RTNLB1ox, FLS2 accumulation at the plasma membrane was significantly affected compared with the wild type. Transient overexpression of RTNLB1 led to FLS2 retention in the endoplasmic reticulum (ER) and affected FLS2 glycosylation but not FLS2 stability. Removal of the critical N-terminal Ser-rich region or either of the two Tyr-dependent sorting motifs from RTNLB1 causes partial reversion of the negative effects of excess RTNLB1 on FLS2 transport out of the ER and accumulation at the membrane. The results are consistent with a model whereby RTNLB1 and RTNLB2 regulate the transport of newly synthesized FLS2 to the plasma membrane.

    View details for DOI 10.1105/tpc.111.089656

    View details for Web of Science ID 000296739100025

    View details for PubMedID 21949153

  • Cooperative transcription factor associations discovered using regulatory variation PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Karczewski, K. J., Tatonetti, N. P., Landt, S. G., Yang, X., Slifer, T., Altman, R. B., Snyder, M. 2011; 108 (32): 13353-13358


    Regulation of gene expression at the transcriptional level is achieved by complex interactions of transcription factors operating at their target genes. Dissecting the specific combination of factors that bind each target is a significant challenge. Here, we describe in detail the Allele Binding Cooperativity test, which uses variation in transcription factor binding among individuals to discover combinations of factors and their targets. We developed the ALPHABIT (a large-scale process to hunt for allele binding interacting transcription factors) pipeline, which includes statistical analysis of binding sites followed by experimental validation, and demonstrate that this method predicts transcription factors that associate with NFκB. Our method successfully identifies factors that have been known to work with NFκB (E2A, STAT1, IRF2), but whose global coassociation and sites of cooperative action were not known. In addition, we identify a unique coassociation (EBF1) that had not been reported previously. We present a general approach for discovering combinatorial models of regulation and advance our understanding of the genetic basis of variation in transcription factor binding.

    View details for DOI 10.1073/pnas.1103105108

    View details for Web of Science ID 000293691400076

    View details for PubMedID 21828005

    View details for PubMedCentralID PMC3156166

  • A Comprehensive Map of Mobile Element Insertion Polymorphisms in Humans PLOS GENETICS Stewart, C., Kural, D., Stroemberg, M. P., Walker, J. A., Konkel, M. K., Stuetz, A. M., Urban, A. E., Grubert, F., Lam, H. Y., Lee, W., Busby, M., Indap, A. R., Garrison, E., Huff, C., Xing, J., Snyder, M. P., Jorde, L. B., Batzer, M. A., Korbel, J. O., Marth, G. T. 2011; 7 (8)


    As a consequence of the accumulation of insertion events over evolutionary time, mobile elements now comprise nearly half of the human genome. The Alu, L1, and SVA mobile element families are still duplicating, generating variation between individual genomes. Mobile element insertions (MEI) have been identified as causes for genetic diseases, including hemophilia, neurofibromatosis, and various cancers. Here we present a comprehensive map of 7,380 MEI polymorphisms from the 1000 Genomes Project whole-genome sequencing data of 185 samples in three major populations detected with two detection methods. This catalog enables us to systematically study mutation rates, population segregation, genomic distribution, and functional properties of MEI polymorphisms and to compare MEI to SNP variation from the same individuals. Population allele frequencies of MEI and SNPs are described, broadly, by the same neutral ancestral processes despite vastly different mutation mechanisms and rates, except in coding regions where MEI are virtually absent, presumably due to strong negative selection. A direct comparison of MEI and SNP diversity levels suggests a differential mobile element insertion rate among populations.

    View details for DOI 10.1371/journal.pgen.1002236

    View details for Web of Science ID 000294297000031

    View details for PubMedID 21876680

  • AlleleSeq: analysis of allele-specific expression and binding in a network framework MOLECULAR SYSTEMS BIOLOGY Rozowsky, J., Abyzov, A., Wang, J., Alves, P., Raha, D., Harmanci, A., Leng, J., Bjornson, R., Kong, Y., Kitabayashi, N., Bhardwaj, N., Rubin, M., Snyder, M., Gerstein, M. 2011; 7


    To study allele-specific expression (ASE) and binding (ASB), that is, differences between the maternally and paternally derived alleles, we have developed a computational pipeline (AlleleSeq). Our pipeline initially constructs a diploid personal genome sequence (and corresponding personalized gene annotation) using genomic sequence variants (SNPs, indels, and structural variants), and then identifies allele-specific events with significant differences in the number of mapped reads between maternal and paternal alleles. There are many technical challenges in the construction and alignment of reads to a personal diploid genome sequence that we address, for example, bias of reads mapping to the reference allele. We have applied AlleleSeq to variation data for NA12878 from the 1000 Genomes Project as well as matched, deeply sequenced RNA-Seq and ChIP-Seq data sets generated for this purpose. In addition to observing fairly widespread allele-specific behavior within individual functional genomic data sets (including results consistent with X-chromosome inactivation), we can study the interaction between ASE and ASB. Furthermore, we investigate the coordination between ASE and ASB from multiple transcription factors events using a regulatory network framework. Correlation analyses and network motifs show mostly coordinated ASB and ASE.

    View details for DOI 10.1038/msb.2011.54

    View details for Web of Science ID 000294537800003

    View details for PubMedID 21811232

  • Identification of genomic indels and structural variations using split reads BMC GENOMICS Zhang, Z. D., Du, J., Lam, H., Abyzov, A., Urban, A. E., Snyder, M., Gerstein, M. 2011; 12


    Recent studies have demonstrated the genetic significance of insertions, deletions, and other more complex structural variants (SVs) in the human population. With the development of the next-generation sequencing technologies, high-throughput surveys of SVs on the whole-genome level have become possible. Here we present split-read identification, calibrated (SRiC), a sequence-based method for SV detection.We start by mapping each read to the reference genome in standard fashion using gapped alignment. Then to identify SVs, we score each of the many initial mappings with an assessment strategy designed to take into account both sequencing and alignment errors (e.g. scoring more highly events gapped in the center of a read). All current SV calling methods have multilevel biases in their identifications due to both experimental and computational limitations (e.g. calling more deletions than insertions). A key aspect of our approach is that we calibrate all our calls against synthetic data sets generated from simulations of high-throughput sequencing (with realistic error models). This allows us to calculate sensitivity and the positive predictive value under different parameter-value scenarios and for different classes of events (e.g. long deletions vs. short insertions). We run our calculations on representative data from the 1000 Genomes Project. Coupling the observed numbers of events on chromosome 1 with the calibrations gleaned from the simulations (for different length events) allows us to construct a relatively unbiased estimate for the total number of SVs in the human genome across a wide range of length scales. We estimate in particular that an individual genome contains ~670,000 indels/SVs.Compared with the existing read-depth and read-pair approaches for SV identification, our method can pinpoint the exact breakpoints of SV events, reveal the actual sequence content of insertions, and cover the whole size spectrum for deletions. Moreover, with the advent of the third-generation sequencing technologies that produce longer reads, we expect our method to be even more useful.

    View details for DOI 10.1186/1471-2164-12-375

    View details for Web of Science ID 000294205500001

    View details for PubMedID 21787423

  • Metabolites as global regulators: A new view of protein regulation BIOESSAYS Li, X., Snyder, M. 2011; 33 (7): 485-489

    View details for DOI 10.1002/bies.201100026

    View details for Web of Science ID 000292710500002

    View details for PubMedID 21495048

  • The Human Proteome Project: Current State and Future Direction MOLECULAR & CELLULAR PROTEOMICS Legrain, P., Aebersold, R., Archakov, A., Bairoch, A., Bala, K., Beretta, L., Bergeron, J., Borchers, C. H., Corthals, G. L., Costello, C. E., Deutsch, E. W., Domon, B., Hancock, W., He, F., Hochstrasser, D., Marko-Varga, G., Salekdeh, G. H., Sechi, S., Snyder, M., Srivastava, S., Uhlen, M., Wu, C. H., Yamamoto, T., Paik, Y., Omenn, G. S. 2011; 10 (7)


    After the successful completion of the Human Genome Project, the Human Proteome Organization has recently officially launched a global Human Proteome Project (HPP), which is designed to map the entire human protein set. Given the lack of protein-level evidence for about 30% of the estimated 20,300 protein-coding genes, a systematic global effort will be necessary to achieve this goal with respect to protein abundance, distribution, subcellular localization, interaction with other biomolecules, and functions at specific time points. As a general experimental strategy, HPP research groups will use the three working pillars for HPP: mass spectrometry, antibody capture, and bioinformatics tools and knowledge bases. The HPP participants will take advantage of the output and cross-analyses from the ongoing Human Proteome Organization initiatives and a chromosome-centric protein mapping strategy, termed C-HPP, with which many national teams are currently engaged. In addition, numerous biologically driven and disease-oriented projects will be stimulated and facilitated by the HPP. Timely planning with proper governance of HPP will deliver a protein parts list, reagents, and tools for protein studies and analyses, and a stronger basis for personalized medicine. The Human Proteome Organization urges each national research funding agency and the scientific community at large to identify their preferred pathways to participate in aspects of this highly promising project in a HPP consortium of funders and investigators.

    View details for DOI 10.1074/mcp.M111.009993

    View details for Web of Science ID 000292541500012

    View details for PubMedID 21742803

    View details for PubMedCentralID PMC3134076

  • Landscape of Next-Generation Sequencing Technologies ANALYTICAL CHEMISTRY Niedringhaus, T. P., Milanova, D., Kerby, M. B., Snyder, M. P., Barron, A. E. 2011; 83 (12): 4327-4341

    View details for DOI 10.1021/ac2010857

    View details for Web of Science ID 000291499800001

    View details for PubMedID 21612267

  • A Large Gene Network in Immature Erythroid Cells Is Controlled by the Myeloid and B Cell Transcriptional Regulator PU.1 PLOS GENETICS Wontakal, S. N., Guo, X., Will, B., Shi, M., Raha, D., Mahajan, M. C., Weissman, S., Snyder, M., Steidl, U., Zheng, D., Skoultchi, A. I. 2011; 7 (6)


    PU.1 is a hematopoietic transcription factor that is required for the development of myeloid and B cells. PU.1 is also expressed in erythroid progenitors, where it blocks erythroid differentiation by binding to and inhibiting the main erythroid promoting factor, GATA-1. However, other mechanisms by which PU.1 affects the fate of erythroid progenitors have not been thoroughly explored. Here, we used ChIP-Seq analysis for PU.1 and gene expression profiling in erythroid cells to show that PU.1 regulates an extensive network of genes that constitute major pathways for controlling growth and survival of immature erythroid cells. By analyzing fetal liver erythroid progenitors from mice with low PU.1 expression, we also show that the earliest erythroid committed cells are dramatically reduced in vivo. Furthermore, we find that PU.1 also regulates many of the same genes and pathways in other blood cells, leading us to propose that PU.1 is a multifaceted factor with overlapping, as well as distinct, functions in several hematopoietic lineages.

    View details for DOI 10.1371/journal.pgen.1001392

    View details for Web of Science ID 000292386300004

    View details for PubMedID 21695229

  • CNVnator: An approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing GENOME RESEARCH Abyzov, A., Urban, A. E., Snyder, M., Gerstein, M. 2011; 21 (6): 974-984


    Copy number variation (CNV) in the genome is a complex phenomenon, and not completely understood. We have developed a method, CNVnator, for CNV discovery and genotyping from read-depth (RD) analysis of personal genome sequencing. Our method is based on combining the established mean-shift approach with additional refinements (multiple-bandwidth partitioning and GC correction) to broaden the range of discovered CNVs. We calibrated CNVnator using the extensive validation performed by the 1000 Genomes Project. Because of this, we could use CNVnator for CNV discovery and genotyping in a population and characterization of atypical CNVs, such as de novo and multi-allelic events. Overall, for CNVs accessible by RD, CNVnator has high sensitivity (86%-96%), low false-discovery rate (3%-20%), high genotyping accuracy (93%-95%), and high resolution in breakpoint discovery (<200 bp in 90% of cases with high sequencing coverage). Furthermore, CNVnator is complementary in a straightforward way to split-read and read-pair approaches: It misses CNVs created by retrotransposable elements, but more than half of the validated CNVs that it identifies are not detected by split-read or read-pair. By genotyping CNVs in the CEPH, Yoruba, and Chinese-Japanese populations, we estimated that at least 11% of all CNV loci involve complex, multi-allelic events, a considerably higher estimate than reported earlier. Moreover, among these events, we observed cases with allele distribution strongly deviating from Hardy-Weinberg equilibrium, possibly implying selection on certain complex loci. Finally, by combining discovery and genotyping, we identified six potential de novo CNVs in two family trios.

    View details for DOI 10.1101/gr.114876.110

    View details for Web of Science ID 000291153400017

    View details for PubMedID 21324876

  • Genome-wide chromatin occupancy analysis reveals a role for ASH2 in transcriptional pausing NUCLEIC ACIDS RESEARCH Perez-Lluch, S., Blanco, E., Carbonell, A., Raha, D., Snyder, M., Serras, F., Corominas, M. 2011; 39 (11): 4628-4639


    An important mechanism for gene regulation involves chromatin changes via histone modification. One such modification is histone H3 lysine 4 trimethylation (H3K4me3), which requires histone methyltranferase complexes (HMT) containing the trithorax-group (trxG) protein ASH2. Mutations in ash2 cause a variety of pattern formation defects in the Drosophila wing. We have identified genome-wide binding of ASH2 in wing imaginal discs using chromatin immunoprecipitation combined with sequencing (ChIP-Seq). Our results show that genes with functions in development and transcriptional regulation are activated by ASH2 via H3K4 trimethylation in nearby nucleosomes. We have characterized the occupancy of phosphorylated forms of RNA Polymerase II and histone marks associated with activation and repression of transcription. ASH2 occupancy correlates with phosphorylated forms of RNA Polymerase II and histone activating marks in expressed genes. Additionally, RNA Polymerase II phosphorylation on serine 5 and H3K4me3 are reduced in ash2 mutants in comparison to wild-type flies. Finally, we have identified specific motifs associated with ASH2 binding in genes that are differentially expressed in ash2 mutants. Our data suggest that recruitment of the ASH2-containing HMT complexes is context specific and points to a function of ASH2 and H3K4me3 in transcriptional pausing control.

    View details for DOI 10.1093/nar/gkq1322

    View details for Web of Science ID 000291755000015

    View details for PubMedID 21310711

  • A User's Guide to the Encyclopedia of DNA Elements (ENCODE) PLOS BIOLOGY Myers, R. M., Stamatoyannopoulos, J., Snyder, M., Dunham, I., Hardison, R. C., Bernstein, B. E., Gingeras, T. R., Kent, W. J., Birney, E., Wold, B., Crawford, G. E., Bernstein, B. E., Epstein, C. B., Shoresh, N., Ernst, J., Mikkelsen, T. S., Kheradpour, P., Zhang, X., Wang, L., Issner, R., Coyne, M. J., Durham, T., Ku, M., Thanh Truong, T., Ward, L. D., Altshuler, R. C., Lin, M. F., Kellis, M., Gingeras, T. R., Davis, C. A., Kapranov, P., Dobin, A., Zaleski, C., Schlesinger, F., Batut, P., Chakrabortty, S., Jha, S., Lin, W., Drenkow, J., Wang, H., Bell, K., Gao, H., Bell, I., Dumais, E., Dumais, J., Antonarakis, S. E., Ucla, C., Borel, C., Guigo, R., Djebali, S., Lagarde, J., Kingswood, C., Ribeca, P., Sammeth, M., Alioto, T., Merkel, A., Tilgner, H., Carninci, P., Hayashizaki, Y., Lassmann, T., Takahashi, H., Abdelhamid, R. F., Hannon, G., Fejes-Toth, K., Preall, J., Gordon, A., Sotirova, V., Reymond, A., Howald, C., Graison, E. A., Chrast, J., Ruan, Y., Ruan, X., Shahab, A., Poh, W. T., Wei, C., Crawford, G. E., Furey, T. S., Boyle, A. P., Sheffield, N. C., Song, L., Shibata, Y., Vales, T., Winter, D., Zhang, Z., London, D., Wang, T., Birney, E., Keefe, D., Iyer, V. R., Lee, B., McDaniell, R. M., Liu, Z., Battenhouse, A., Bhinge, A. A., Lieb, J. D., Grasfeder, L. L., Showers, K. A., Giresi, P. G., Kim, S. K., Shestak, C., Myers, R. M., Pauli, F., Reddy, T. E., Gertz, J., Partridge, E. C., Jain, P., Sprouse, R. O., Bansal, A., Pusey, B., Muratet, M. A., Varley, K. E., Bowling, K. M., Newberry, K. M., Nesmith, A. S., Dilocker, J. A., Parker, S. L., Waite, L. L., Thibeault, K., Roberts, K., Absher, D. M., Wold, B., Mortazavi, A., Williams, B., Marinov, G., Trout, D., Pepke, S., King, B., McCue, K., Kirilusha, A., DeSalvo, G., Fisher-Aylor, K., Amrhein, H., Vielmetter, J., Sherlock, G., Sidow, A., Batzoglou, S., Rauch, R., Kundaje, A., Libbrecht, M., Margulies, E. H., Parker, S. C., Elnitski, L., Green, E. D., Hubbard, T., Harrow, J., Searle, S., Kokocinski, F., Aken, B., Frankish, A., Hunt, T., Despacio-Reyes, G., Kay, M., Mukherjee, G., Bignell, A., Saunders, G., Boychenko, V., Brent, M., van Baren, M. J., Brown, R. H., Gerstein, M., Khurana, E., Balasubramanian, S., Zhang, Z., Lam, H., Cayting, P., Robilotto, R., Lu, Z., Guigo, R., Derrien, T., Tanzer, A., Knowles, D. G., Mariotti, M., Kent, W. J., Haussler, D., Harte, R., Diekhans, M., Kellis, M., Lin, M., Kheradpour, P., Ernst, J., Reymond, A., Howald, C., Graison, E. A., Chrast, J., Valencia, A., Tress, M., Manuel Rodriguez, J., Snyder, M., Landt, S. G., Raha, D., Shi, M., Euskirchen, G., Grubert, F., Kasowski, M., Lian, J., Cayting, P., Lacroute, P., Xu, Y., Monahan, H., Patacsil, D., Slifer, T., Yang, X., Charos, A., Reed, B., Wu, L., Auerbach, R. K., Habegger, L., Hariharan, M., Rozowsky, J., Abyzov, A., Weissman, S. M., Gerstein, M., Struhl, K., Lamarre-Vincent, N., Lindahl-Allen, M., Miotto, B., Moqtaderi, Z., Fleming, J. D., Newburger, P., Farnham, P. J., Frietze, S., O'Geen, H., Xu, X., Blahnik, K. R., Cao, A. R., Iyengar, S., Stamatoyannopoulos, J. A., Kaul, R., Thurman, R. E., Wang, H., Navas, P. A., Sandstrom, R., Sabo, P. J., Weaver, M., Canfield, T., Lee, K., Neph, S., Roach, V., Reynolds, A., Johnson, A., Rynes, E., Giste, E., Vong, S., Neri, J., Frum, T., Johnson, E. M., Nguyen, E. D., Ebersol, A. K., Sanchez, M. E., Sheffer, H. H., Lotakis, D., Haugen, E., Humbert, R., Kutyavin, T., Shafer, T., Dekker, J., Lajoie, B. R., Sanyal, A., Kent, W. J., Rosenbloom, K. R., Dreszer, T. R., Raney, B. J., Barber, G. P., Meyer, L. R., Sloan, C. A., Malladi, V. S., Cline, M. S., Learned, K., Swing, V. K., Zweig, A. S., Rhead, B., Fujita, P. A., Roskin, K., Karolchik, D., Kuhn, R. M., Haussler, D., Birney, E., Dunham, I., Wilder, S. P., Keefe, D., Sobral, D., Herrero, J., Beal, K., Lukk, M., Brazma, A., Vaquerizas, J. M., Luscombe, N. M., Bickel, P. J., Boley, N., Brown, J. B., Li, Q., Huang, H., Gerstein, M., Habegger, L., Sboner, A., Rozowsky, J., Auerbach, R. K., Yip, K. Y., Cheng, C., Yan, K., Bhardwaj, N., Wang, J., Lochovsky, L., Jee, J., Gibson, T., Leng, J., Du, J., Hardison, R. C., Harris, R. S., Song, G., Miller, W., Haussler, D., Roskin, K., Suh, B., Wang, T., Paten, B., Noble, W. S., Hoffman, M. M., Buske, O. J., Weng, Z., Dong, X., Wang, J., Xi, H., Tenenbaum, S. A., Doyle, F., Penalva, L. O., Chittur, S., Tullius, T. D., Parker, S. C., White, K. P., Karmakar, S., Victorsen, A., Jameel, N., Bild, N., Grossman, R. L., Snyder, M., Landt, S. G., Yang, X., Patacsil, D., Slifer, T., Dekker, J., Lajoie, B. R., Sanyal, A., Weng, Z., Whitfield, T. W., Wang, J., Collins, P. J., Trinklein, N. D., Partridge, E. C., Myers, R. M., Giddings, M. C., Chen, X., Khatun, J., Maier, C., Yu, Y., Gunawardena, H., Risk, B., Feingold, E. A., Lowdon, R. F., Dillon, L. A., Good, P. J. 2011; 9 (4)


    The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health. The ENCODE Consortium is integrating multiple technologies and approaches in a collective effort to discover and define the functional elements encoded in the human genome, including genes, transcripts, and transcriptional regulatory regions, together with their attendant chromatin states and DNA methylation patterns. In the process, standards to ensure high-quality data have been implemented, and novel algorithms have been developed to facilitate analysis. Data and derived results are made available through a freely accessible database. Here we provide an overview of the project and the resources it is generating and illustrate the application of ENCODE data to interpret the human genome.

    View details for DOI 10.1371/journal.pbio.1001046

    View details for Web of Science ID 000289938900014

  • Diverse protein kinase interactions identified by protein microarrays reveal novel connections between cellular processes GENES & DEVELOPMENT Fasolo, J., Sboner, A., Sun, M. G., Yu, H., Chen, R., Sharon, D., Kim, P. M., Gerstein, M., Snyder, M. 2011; 25 (7): 767-778


    Protein kinases are key regulators of cellular processes. In spite of considerable effort, a full understanding of the pathways they participate in remains elusive. We globally investigated the proteins that interact with the majority of yeast protein kinases using protein microarrays. Eighty-five kinases were purified and used to probe yeast proteome microarrays. One-thousand-twenty-three interactions were identified, and the vast majority were novel. Coimmunoprecipitation experiments indicate that many of these interactions occurred in vivo. Many novel links of kinases to previously distinct cellular pathways were discovered. For example, the well-studied Kss1 filamentous pathway was found to bind components of diverse cellular pathways, such as those of the stress response pathway and the Ccr4-Not transcriptional/translational regulatory complex; genetic tests revealed that these different components operate in the filamentation pathway in vivo. Overall, our results indicate that kinases operate in a highly interconnected network that coordinates many activities of the proteome. Our results further demonstrate that protein microarrays uncover a diverse set of interactions not observed previously.

    View details for DOI 10.1101/gad.1998811

    View details for Web of Science ID 000289062700010

    View details for PubMedID 21460040

  • Diverse Roles and Interactions of the SWI/SNF Chromatin Remodeling Complex Revealed Using Global Approaches PLOS GENETICS Euskirchen, G. M., Auerbach, R. K., Davidov, E., Gianoulis, T. A., Zhong, G., Rozowsky, J., Bhardwaj, N., Gerstein, M. B., Snyder, M. 2011; 7 (3)


    A systems understanding of nuclear organization and events is critical for determining how cells divide, differentiate, and respond to stimuli and for identifying the causes of diseases. Chromatin remodeling complexes such as SWI/SNF have been implicated in a wide variety of cellular processes including gene expression, nuclear organization, centromere function, and chromosomal stability, and mutations in SWI/SNF components have been linked to several types of cancer. To better understand the biological processes in which chromatin remodeling proteins participate, we globally mapped binding regions for several components of the SWI/SNF complex throughout the human genome using ChIP-Seq. SWI/SNF components were found to lie near regulatory elements integral to transcription (e.g. 5' ends, RNA Polymerases II and III, and enhancers) as well as regions critical for chromosome organization (e.g. CTCF, lamins, and DNA replication origins). Interestingly we also find that certain configurations of SWI/SNF subunits are associated with transcripts that have higher levels of expression, whereas other configurations of SWI/SNF factors are associated with transcripts that have lower levels of expression. To further elucidate the association of SWI/SNF subunits with each other as well as with other nuclear proteins, we also analyzed SWI/SNF immunoprecipitated complexes by mass spectrometry. Individual SWI/SNF factors are associated with their own family members, as well as with cellular constituents such as nuclear matrix proteins, key transcription factors, and centromere components, implying a ubiquitous role in gene regulation and nuclear function. We find an overrepresentation of both SWI/SNF-associated regions and proteins in cell cycle and chromosome organization. Taken together the results from our ChIP and immunoprecipitation experiments suggest that SWI/SNF facilitates gene regulation and genome function more broadly and through a greater diversity of interactions than previously appreciated.

    View details for DOI 10.1371/journal.pgen.1002008

    View details for Web of Science ID 000288996600042

    View details for PubMedID 21408204

  • Mapping copy number variation by population-scale genome sequencing NATURE Mills, R. E., Walter, K., Stewart, C., Handsaker, R. E., Chen, K., Alkan, C., Abyzov, A., Yoon, S. C., Ye, K., Cheetham, R. K., Chinwalla, A., Conrad, D. F., Fu, Y., Grubert, F., Hajirasouliha, I., Hormozdiari, F., Iakoucheva, L. M., Iqbal, Z., Kang, S., Kidd, J. M., Konkel, M. K., Korn, J., Khurana, E., Kural, D., Lam, H. Y., Leng, J., Li, R., Li, Y., Lin, C., Luo, R., Mu, X. J., Nemesh, J., Peckham, H. E., Rausch, T., Scally, A., Shi, X., Stromberg, M. P., Stuetz, A. M., Urban, A. E., Walker, J. A., Wu, J., Zhang, Y., Zhang, Z. D., Batzer, M. A., Ding, L., Marth, G. T., McVean, G., Sebat, J., Snyder, M., Wang, J., Ye, K., Eichler, E. E., Gerstein, M. B., Hurles, M. E., Lee, C., McCarroll, S. A., Korbel, J. O. 2011; 470 (7332): 59-65


    Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analysing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.

    View details for DOI 10.1038/nature09708

    View details for Web of Science ID 000286886400033

    View details for PubMedID 21293372

    View details for PubMedCentralID PMC3077050

  • Prediction and characterization of noncoding RNAs in C. elegans by integrating conservation, secondary structure, and high-throughput sequencing and array data GENOME RESEARCH Lu, Z. J., Yip, K. Y., Wang, G., Shou, C., Hillier, L. W., Khurana, E., Agarwal, A., Auerbach, R., Rozowsky, J., Cheng, C., Kato, M., Miller, D. M., Slack, F., Snyder, M., Waterston, R. H., Reinke, V., Gerstein, M. B. 2011; 21 (2): 276-285


    We present an integrative machine learning method, incRNA, for whole-genome identification of noncoding RNAs (ncRNAs). It combines a large amount of expression data, RNA secondary-structure stability, and evolutionary conservation at the protein and nucleic-acid level. Using the incRNA model and data from the modENCODE consortium, we are able to separate known C. elegans ncRNAs from coding sequences and other genomic elements with a high level of accuracy (97% AUC on an independent validation set), and find more than 7000 novel ncRNA candidates, among which more than 1000 are located in the intergenic regions of C. elegans genome. Based on the validation set, we estimate that 91% of the approximately 7000 novel ncRNA candidates are true positives. We then analyze 15 novel ncRNA candidates by RT-PCR, detecting the expression for 14. In addition, we characterize the properties of all the novel ncRNA candidates and find that they have distinct expression patterns across developmental stages and tend to use novel RNA structural families. We also find that they are often targeted by specific transcription factors (∼59% of intergenic novel ncRNA candidates). Overall, our study identifies many new potential ncRNAs in C. elegans and provides a method that can be adapted to other organisms.

    View details for DOI 10.1101/gr.110189.110

    View details for Web of Science ID 000286804100013

    View details for PubMedID 21177971

  • Diverse transcription factor binding features revealed by genome-wide ChIP-seq in C. elegans GENOME RESEARCH Niu, W., Lu, Z. J., Zhong, M., Sarov, M., Murray, J. I., Brdlik, C. M., Janette, J., Chen, C., Alves, P., Preston, E., Slightham, C., Jiang, L., Hyman, A. A., Kim, S. K., Waterston, R. H., Gerstein, M., Snyder, M., Reinke, V. 2011; 21 (2): 245-254


    Regulation of gene expression by sequence-specific transcription factors is central to developmental programs and depends on the binding of transcription factors with target sites in the genome. To date, most such analyses in Caenorhabditis elegans have focused on the interactions between a single transcription factor with one or a few select target genes. As part of the modENCODE Consortium, we have used chromatin immunoprecipitation coupled with high-throughput DNA sequencing (ChIP-seq) to determine the genome-wide binding sites of 22 transcription factors (ALR-1, BLMP-1, CEH-14, CEH-30, EGL-27, EGL-5, ELT-3, EOR-1, GEI-11, HLH-1, LIN-11, LIN-13, LIN-15B, LIN-39, MAB-5, MDL-1, MEP-1, PES-1, PHA-4, PQM-1, SKN-1, and UNC-130) at diverse developmental stages. For each factor we determined candidate gene targets, both coding and non-coding. The typical binding sites of almost all factors are within a few hundred nucleotides of the transcript start site. Most factors target a mixture of coding and non-coding target genes, although one factor preferentially binds to non-coding RNA genes. We built a regulatory network among the 22 factors to determine their functional relationships to each other and found that some factors appear to act preferentially as regulators and others as target genes. Examination of the binding targets of three related HOX factors--LIN-39, MAB-5, and EGL-5--indicates that these factors regulate genes involved in cellular migration, neuronal function, and vulval differentiation, consistent with their known roles in these developmental processes. Ultimately, the comprehensive mapping of transcription factor binding sites will identify features of transcriptional networks that regulate C. elegans developmental processes.

    View details for DOI 10.1101/gr.114587.110

    View details for Web of Science ID 000286804100010

    View details for PubMedID 21177963

    View details for PubMedCentralID PMC3032928

  • RSEQtools: a modular framework to analyze RNA-Seq data using compact, anonymized data summaries BIOINFORMATICS Habegger, L., Sboner, A., Gianoulis, T. A., Rozowsky, J., Agarwal, A., Snyder, M., Gerstein, M. 2011; 27 (2): 281-283


    The advent of next-generation sequencing for functional genomics has given rise to quantities of sequence information that are often so large that they are difficult to handle. Moreover, sequence reads from a specific individual can contain sufficient information to potentially identify and genetically characterize that person, raising privacy concerns. In order to address these issues, we have developed the Mapped Read Format (MRF), a compact data summary format for both short and long read alignments that enables the anonymization of confidential sequence information, while allowing one to still carry out many functional genomics studies. We have developed a suite of tools (RSEQtools) that use this format for the analysis of RNA-Seq experiments. These tools consist of a set of modules that perform common tasks such as calculating gene expression values, generating signal tracks of mapped reads and segmenting that signal into actively transcribed regions. Moreover, the tools can readily be used to build customizable RNA-Seq workflows. In addition to the anonymization afforded by MRF, this format also facilitates the decoupling of the alignment of reads from downstream analyses. Availability and implementation: RSEQtools is implemented in C and the source code is available at

    View details for DOI 10.1093/bioinformatics/btq643

    View details for Web of Science ID 000286215200025

    View details for PubMedID 21134889

    View details for PubMedCentralID PMC3018817

  • Stat3 is essential for neuronal differentiation through direct transcriptional regulation of the Sox6 gene FEBS LETTERS Snyder, M., Huang, X., Zhang, J. J. 2011; 585 (1): 148-152


    The transcription factor Signal Transducer and Activator of Transcription 3 (Stat3) functions in various cellular processes including neuronal differentiation. We show that the SRY-box containing gene 6 (Sox6) gene, important for neuronal differentiation, is a direct target gene of Stat3. We demonstrate that in response to ligand stimulation, Stat3 binds to the Sox6 promoter and induces its expression. Furthermore, Stat3 is activated and Sox6 is induced during neuronal differentiation of P19 cells in the absence of exogenous ligand treatment. Moreover, using an RNA interference approach, we show that Stat3 is required for Sox6 expression during neuronal differentiation.

    View details for DOI 10.1016/j.febslet.2010.11.030

    View details for Web of Science ID 000285921500025

    View details for PubMedID 21094641

  • The human proteome project: Current state and future direction. Molecular & cellular proteomics : MCP Legrain, P., Aebersold, R., Archakov, A., Bairoch, A., Bala, K., Beretta, L., Bergeron, J., Borchers, C., Corthals, G. L., Costello, C. E., Deutsch, E. W., Domon, B., Hancock, W., He, F., Hochstrasser, D., Marko-Varga, G., Salekdeh, G. H., Sechi, S., Snyder, M., Srivastava, S., Uhlen, M., Hu, C. H., Yamamoto, T., Paik, Y. K., Omenn, G. S. 2011


    After successful completion of the Human Genome Project (HGP), HUPO has recently officially launched a global Human Proteome Project (HPP) which is designed to map the entire human protein set. Given the presence of about 30% undisclosed proteins out of 20,300 protein gene products, a systematic global effort is necessary to achieve this goal with respect to protein abundance, distribution, subcellular localization, interaction with other biomolecules, and functions at specific time points. As a general experimental strategy, HPP groups employ the three working pillars for HPP: mass spectrometry, antibody capture, and bioinformatics tools and knowledge base. The HPP participants will take advantage of the output and cross-analyses from the ongoing HUPO initiatives and a chromosome-based protein mapping strategy, termed C-HPP with many national teams currently engaged. In addition, numerous biologically-driven projects will be stimulated and facilitated by the HPP. Timely planning with proper governance of HPP will deliver a protein parts list, reagents and tools for protein studies and analyses, and a stronger basis for personalized medicine. HUPO urges each national research funding agency and the scientific community at large to identify their preferred pathways to participate in aspects of this highly promising project in a HPP consortium of funders and investigators.

    View details for DOI 10.1074/mcp.O111.009993

    View details for PubMedID 21531903

  • The CRIT framework for identifying cross patterns in systems biology and application to chemogenomics GENOME BIOLOGY Gianoulis, T. A., Agarwal, A., Snyder, M., Gerstein, M. B. 2011; 12 (3)


    Biological data is often tabular but finding statistically valid connections between entities in a sequence of tables can be problematic--for example, connecting particular entities in a drug property table to gene properties in a second table, using a third table associating genes with drugs. Here we present an approach (CRIT) to find connections such as these and show how it can be applied in a variety of genomic contexts including chemogenomics data.

    View details for DOI 10.1186/gb-2011-12-3-r32

    View details for Web of Science ID 000291309200012

    View details for PubMedID 21453526

    View details for PubMedCentralID PMC3129682

  • Regulatory Variation Within Between Species ANNUAL REVIEW OF GENOMICS AND HUMAN GENETICS, VOL 12 Zheng, W., Gianoulis, T. A., Karczewski, K. J., Zhao, H., Snyder, M. 2011; 12: 327-346


    Understanding how individuals differ from one another and from closely related species is a fundamental problem in biology. Recent evidence suggests that much of the variation both within and between species is due to differential gene regulation. Here we review differential gene regulation focusing on evolutionary-developmental (evo-devo) biology, global comparison of genomic sequences, whole-genome gene expression, and transcription factor (TF) binding profiles. We also explore the relationship between divergence rate of regulatory sequences, coding sequences, and TF binding events using several different measures and discuss their implications in the context of evolution of regulatory networks. Finally, we discuss the current status and future challenges in relating regulatory variation to the divergence across and within species.

    View details for DOI 10.1146/annurev-genom-082908-150139

    View details for Web of Science ID 000295819900014

    View details for PubMedID 21721942

  • Kinase substrate interactions. Methods in molecular biology (Clifton, N.J.) Smith, M. G., Ptacek, J., Snyder, M. 2011; 723: 201-212


    Kinases have become popular therapeutic targets primarily due to their integral role in cell cycle and tumor progression. The efficacy of high-throughput screening efforts is dependent on the development of high quality multiplex tools capable of replacing lower-throughput technologies such as mass spectroscopy or solution-based assays for the study of kinase-substrate interactions. Functional protein microarrays are comprised of thousands of immobilized proteins on glass slides that have been used successfully to identify protein-protein interactions. Here, we describe the application of functional protein microarrays for the identification of the phosphorylation targets of individual protein kinases using highly sensitive radioactive detection and robust informatics algorithms.

    View details for DOI 10.1007/978-1-61779-043-0_13

    View details for PubMedID 21370067

  • Measuring the Evolutionary Rewiring of Biological Networks PLOS COMPUTATIONAL BIOLOGY Shou, C., Bhardwaj, N., Lam, H. Y., Yan, K., Kim, P. M., Snyder, M., Gerstein, M. B. 2011; 7 (1)


    We have accumulated a large amount of biological network data and expect even more to come. Soon, we anticipate being able to compare many different biological networks as we commonly do for molecular sequences. It has long been believed that many of these networks change, or "rewire", at different rates. It is therefore important to develop a framework to quantify the differences between networks in a unified fashion. We developed such a formalism based on analogy to simple models of sequence evolution, and used it to conduct a systematic study of network rewiring on all the currently available biological networks. We found that, similar to sequences, biological networks show a decreased rate of change at large time divergences, because of saturation in potential substitutions. However, different types of biological networks consistently rewire at different rates. Using comparative genomics and proteomics data, we found a consistent ordering of the rewiring rates: transcription regulatory, phosphorylation regulatory, genetic interaction, miRNA regulatory, protein interaction, and metabolic pathway network, from fast to slow. This ordering was found in all comparisons we did of matched networks between organisms. To gain further intuition on network rewiring, we compared our observed rewirings with those obtained from simulation. We also investigated how readily our formalism could be mapped to other network contexts; in particular, we showed how it could be applied to analyze changes in a range of "commonplace" networks such as family trees, co-authorships and linux-kernel function dependencies.

    View details for DOI 10.1371/journal.pcbi.1001050

    View details for Web of Science ID 000286652100009

    View details for PubMedID 21253555

  • RNA sequencing. Methods in molecular biology (Clifton, N.J.) Waern, K., Nagalakshmi, U., Snyder, M. 2011; 759: 125-132


    This chapter describes the RNA sequencing (RNA-Seq) protocol, whereby RNA from yeast cells is prepared for sequencing on an Illumina Genome Analyzer. The protocol can easily be altered to use RNA from a different organism. This chapter covers RNA extraction, cDNA synthesis, cDNA fragmentation, and Illumina cDNA library generation and contains some brief remarks on bioinformatic analysis.

    View details for DOI 10.1007/978-1-61779-173-4_8

    View details for PubMedID 21863485

  • Integrative Analysis of the Caenorhabditis elegans Genome by the modENCODE Project SCIENCE Gerstein, M. B., Lu, Z. J., Van Nostrand, E. L., Cheng, C., Arshinoff, B. I., Liu, T., Yip, K. Y., Robilotto, R., Rechtsteiner, A., Ikegami, K., Alves, P., Chateigner, A., Perry, M., Morris, M., Auerbach, R. K., Feng, X., Leng, J., Vielle, A., Niu, W., Rhrissorrakrai, K., Agarwal, A., Alexander, R. P., Barber, G., Brdlik, C. M., Brennan, J., Brouillet, J. J., Carr, A., Cheung, M., Clawson, H., Contrino, S., Dannenberg, L. O., Dernburg, A. F., Desai, A., Dick, L., Dose, A. C., Du, J., Egelhofer, T., Ercan, S., Euskirchen, G., Ewing, B., Feingold, E. A., Gassmann, R., Good, P. J., Green, P., Gullier, F., Gutwein, M., Guyer, M. S., Habegger, L., Han, T., Henikoff, J. G., Henz, S. R., Hinrichs, A., Holster, H., Hyman, T., Iniguez, A. L., Janette, J., Jensen, M., Kato, M., Kent, W. J., Kephart, E., Khivansara, V., Khurana, E., Kim, J. K., Kolasinska-Zwierz, P., Lai, E. C., Latorre, I., Leahey, A., Lewis, S., Lloyd, P., Lochovsky, L., Lowdon, R. F., Lubling, Y., Lyne, R., MacCoss, M., Mackowiak, S. D., Mangone, M., McKay, S., Mecenas, D., Merrihew, G., Miller, D. M., Muroyama, A., Murray, J. I., Ooi, S., Pham, H., Phippen, T., Preston, E. A., Rajewsky, N., Raetsch, G., Rosenbaum, H., Rozowsky, J., Rutherford, K., Ruzanov, P., Sarov, M., Sasidharan, R., Sboner, A., Scheid, P., Segal, E., Shin, H., Shou, C., Slack, F. J., Slightam, C., Smith, R., Spencer, W. C., Stinson, E. O., Taing, S., Takasaki, T., Vafeados, D., Voronina, K., Wang, G., Washington, N. L., Whittle, C. M., Wu, B., Yan, K., Zeller, G., Zha, Z., Zhong, M., Zhou, X., Ahringer, J., Strome, S., Gunsalus, K. C., Micklem, G., Liu, X. S., Reinke, V., Kim, S. K., Hillier, L. W., Henikoff, S., Piano, F., Snyder, M., Stein, L., Lieb, J. D., Waterston, R. H. 2010; 330 (6012): 1775-1787


    We systematically generated large-scale data sets to improve genome annotation for the nematode Caenorhabditis elegans, a key model organism. These data sets include transcriptome profiling across a developmental time course, genome-wide identification of transcription factor-binding sites, and maps of chromatin organization. From this, we created more complete and accurate gene models, including alternative splice forms and candidate noncoding RNAs. We constructed hierarchical networks of transcription factor-binding and microRNA interactions and discovered chromosomal locations bound by an unusually large number of transcription factors. Different patterns of chromatin composition and histone modification were revealed between chromosome arms and centers, with similarly prominent differences between autosomes and the X chromosome. Integrating data types, we built statistical models relating chromatin, transcription factor binding, and gene expression. Overall, our analyses ascribed putative functions to most of the conserved genome.

    View details for DOI 10.1126/science.1196914

    View details for Web of Science ID 000285603700031

    View details for PubMedID 21177976

  • Exploring successful community pharmacist-physician collaborative working relationships using mixed methods RESEARCH IN SOCIAL & ADMINISTRATIVE PHARMACY Snyder, M. E., Zillich, A. J., Primack, B. A., Rice, K. R., McGivney, M. A., Pringle, J. L., Smith, R. B. 2010; 6 (4): 307-323


    Collaborative working relationships (CWRs) between community pharmacists and physicians may foster the provision of medication therapy management services, disease state management, and other patient care activities; however, pharmacists have expressed difficulty in developing such relationships. Additional work is needed to understand the specific pharmacist-physician exchanges that effectively contribute to the development of CWR. Data from successful pairs of community pharmacists and physicians may provide further insights into these exchange variables and expand research on models of professional collaboration.To describe the professional exchanges that occurred between community pharmacists and physicians engaged in successful CWRs, using a published conceptual model and tool for quantifying the extent of collaboration.A national pool of experts in community pharmacy practice identified community pharmacists engaged in CWRs with physicians. Five pairs of community pharmacists and physician colleagues participated in individual semistructured interviews, and 4 of these pairs completed the Pharmacist-Physician Collaborative Index (PPCI). Main outcome measures include quantitative (ie, scores on the PPCI) and qualitative information about professional exchanges within 3 domains found previously to influence relationship development: relationship initiation, trustworthiness, and role specification.On the PPCI, participants scored similarly on trustworthiness; however, physicians scored higher on relationship initiation and role specification. The qualitative interviews revealed that when initiating relationships, it was important for many pharmacists to establish open communication through face-to-face visits with physicians. Furthermore, physicians were able to recognize in these pharmacists a commitment for improved patient care. Trustworthiness was established by pharmacists making consistent contributions to care that improved patient outcomes over time. Open discussions regarding professional roles and an acknowledgment of professional norms (ie, physicians as decision makers) were essential.The findings support and extend the literature on pharmacist-physician CWRs by examining the exchange domains of relationship initiation, trustworthiness, and role specification qualitatively and quantitatively among pairs of practitioners. Relationships appeared to develop in a manner consistent with a published model for CWRs, including the pharmacist as relationship initiator, the importance of communication during early stages of the relationship, and an emphasis on high-quality pharmacist contributions.

    View details for DOI 10.1016/j.sapharm.2009.11.008

    View details for Web of Science ID 000285168400005

    View details for PubMedID 21111388

  • Transformation of Candida albicans with a synthetic hygromycin B resistance gene YEAST Basso, L. R., Bartiss, A., Mao, Y., Gast, C. E., Coelho, P. S., Snyder, M., Wong, B. 2010; 27 (12): 1039-1048


    Synthetic genes that confer resistance to the antibiotic nourseothricin in the pathogenic fungus Candida albicans are available, but genes conferring resistance to other antibiotics are not. We found that multiple C. albicans strains were inhibited by hygromycin B, so we designed a 1026 bp gene (CaHygB) that encodes Escherichia coli hygromycin B phosphotransferase with C. albicans codons. CaHygB conferred hygromycin B resistance in C. albicans transformed with ars2-containing plasmids or single-copy integrating vectors. Since CaHygB did not confer nourseothricin resistance and since the nourseothricin resistance marker SAT-1 did not confer hygromycin B resistance, we reasoned that these two markers could be used for homologous gene disruptions in wild-type C. albicans. We used PCR to fuse CaHygB or SAT-1 to approximately 1 kb of 5' and 3' noncoding DNA from C. albicans ARG4, HIS1 and LEU2, and introduced the resulting amplicons into six wild-type C. albicans strains. Homologous targeting frequencies were approximately 50-70%, and disruption of ARG4, HIS1 and LEU2 alleles was verified by the respective transformants' inabilities to grow without arginine, histidine and leucine. CaHygB should be a useful tool for genetic manipulation of different C. albicans strains, including clinical isolates.

    View details for DOI 10.1002/yea.1813

    View details for Web of Science ID 000285210600006

    View details for PubMedID 20737428

  • Statistical Issues in Mapping QTLs for RNA-seq Data 19th Annual Meeting of the International-Genetic-Epidemiology-Society Zheng, W., Raha, D., Snyder, M., Zhao, H. WILEY-BLACKWELL. 2010: 942–42
  • Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-Seq reads BMC GENOMICS Martin, J., Bruno, V. M., Fang, Z., Meng, X., Blow, M., Zhang, T., Sherlock, G., Snyder, M., Wang, Z. 2010; 11


    Comprehensive annotation and quantification of transcriptomes are outstanding problems in functional genomics. While high throughput mRNA sequencing (RNA-Seq) has emerged as a powerful tool for addressing these problems, its success is dependent upon the availability and quality of reference genome sequences, thus limiting the organisms to which it can be applied.Here, we describe Rnnotator, an automated software pipeline that generates transcript models by de novo assembly of RNA-Seq data without the need for a reference genome. We have applied the Rnnotator assembly pipeline to two yeast transcriptomes and compared the results to the reference gene catalogs of these organisms. The contigs produced by Rnnotator are highly accurate (95%) and reconstruct full-length genes for the majority of the existing gene models (54.3%). Furthermore, our analyses revealed many novel transcribed regions that are absent from well annotated genomes, suggesting Rnnotator serves as a complementary approach to analysis based on a reference genome for comprehensive transcriptomics.These results demonstrate that the Rnnotator pipeline is able to reconstruct full-length transcripts in the absence of a complete reference genome.

    View details for DOI 10.1186/1471-2164-11-663

    View details for Web of Science ID 000285303000001

    View details for PubMedID 21106091

  • Extensive In Vivo Metabolite-Protein Interactions Revealed by Large-Scale Systematic Analyses CELL Li, X., Gianoulis, T. A., Yip, K. Y., Gerstein, M., Snyder, M. 2010; 143 (4): 639-650


    Natural small compounds comprise most cellular molecules and bind proteins as substrates, products, cofactors, and ligands. However, a large-scale investigation of in vivo protein-small metabolite interactions has not been performed. We developed a mass spectrometry assay for the large-scale identification of in vivo protein-hydrophobic small metabolite interactions in yeast and analyzed compounds that bind ergosterol biosynthetic proteins and protein kinases. Many of these proteins bind small metabolites; a few interactions were previously known, but the vast majority are new. Importantly, many key regulatory proteins such as protein kinases bind metabolites. Ergosterol was found to bind many proteins and may function as a general regulator. It is required for the activity of Ypk1, a mammalian AKT/SGK kinase homolog. Our study defines potential key regulatory steps in lipid biosynthetic pathways and suggests that small metabolites may play a more general role as regulators of protein activity and function than previously appreciated.

    View details for DOI 10.1016/j.cell.2010.09.048

    View details for Web of Science ID 000284149100020

    View details for PubMedID 21035178

  • A map of human genome variation from population-scale sequencing NATURE Altshuler, D., Durbin, R. M., Abecasis, G. R., Bentley, D. R., Chakravarti, A., Clark, A. G., Collins, F. S., De La Vega, F. M., Donnelly, P., Egholm, M., Flicek, P., Gabriel, S. B., Gibbs, R. A., Knoppers, B. M., Lander, E. S., Lehrach, H., Mardis, E. R., McVean, G. A., Nickerson, D., Peltonen, L., Schafer, A. J., Sherry, S. T., Wang, J., Wilson, R. K., Gibbs, R. A., Deiros, D., Metzker, M., Muzny, D., Reid, J., Wheeler, D., Wang, J., Li, J., Jian, M., Li, G., Li, R., Liang, H., Tian, G., Wang, B., Wang, J., Wang, W., Yang, H., Zhang, X., Zheng, H., Lander, E. S., Altshuler, D. L., Ambrogio, L., Bloom, T., Cibulskis, K., Fennell, T. J., Gabriel, S. B., Jaffe, D. B., Shefler, E., Sougnez, C. L., Bentley, D. R., Gormley, N., Humphray, S., Kingsbury, Z., Koko-Gonzales, P., Stone, J., McKernan, K. J., Costa, G. L., Ichikawa, J. K., Lee, C. C., Sudbrak, R., Lehrach, H., Borodina, T. A., Dahl, A., Davydov, A. N., Marquardt, P., Mertes, F., Nietfeld, W., Rosenstiel, P., Schreiber, S., Soldatov, A. V., Timmermann, B., Tolzmann, M., Egholm, M., Affourtit, J., Ashworth, D., Attiya, S., Bachorski, M., Buglione, E., Burke, A., Caprio, A., Celone, C., Clark, S., Conners, D., Desany, B., Gu, L., Guccione, L., Kao, K., Kebbel, A., Knowlton, J., Labrecque, M., McDade, L., Mealmaker, C., Minderman, M., Nawrocki, A., Niazi, F., Pareja, K., Ramenani, R., Riches, D., Song, W., Turcotte, C., Wang, S., Mardis, E. R., Dooling, D., Fulton, L., Fulton, R., Weinstock, G., Durbin, R. M., Burton, J., Carter, D. M., Churcher, C., Coffey, A., Cox, A., Palotie, A., Quail, M., Skelly, T., Stalker, J., Swerdlow, H. P., Turner, D., De Witte, A., Giles, S., Gibbs, R. A., Wheeler, D., Bainbridge, M., Challis, D., Sabo, A., Yu, F., Yu, J., Wang, J., Fang, X., Guo, X., Li, R., Li, Y., Luo, R., Tai, S., Wu, H., Zheng, H., Zheng, X., Zhou, Y., Yang, H., Marth, G. T., Garrison, E. P., Huang, W., Indap, A., Kural, D., Lee, W., Leong, W. F., Huang, W., Indap, A., Kural, D., Lee, W., Leong, W. F., Quinlan, A. R., Stewart, C., Stromberg, M. P., Ward, A. N., Wu, J., Lee, C., Mills, R. E., Shi, X., Daly, M. J., DePristo, M. A., Altshuler, D. L., Ball, A. D., Banks, E., Bloom, T., Browning, B. L., Cibulskis, K., Fennell, T. J., Garimella, K. V., Grossman, S. R., Handsaker, R. E., Hanna, M., Hartl, C., Jaffe, D. B., Kernytsky, A. M., Korn, J. M., Li, H., Maguire, J. R., McCarroll, S. A., McKenna, A., Nemesh, J. C., Philippakis, A. A., Poplin, R. E., Price, A., Rivas, M. A., Sabeti, P. C., Schaffner, S. F., Shefler, E., Shlyakhter, I. A., Cooper, D. N., Ball, E. V., Mort, M., Phillips, A. D., Stenson, P. D., Sebat, J., Makarov, V., Ye, K., Yoon, S. C., Bustamante, C. D., Clark, A. G., Boyko, A., Degenhardt, J., Gravel, S., Gutenkunst, R. N., Kaganovich, M., Keinan, A., Lacroute, P., Ma, X., Reynolds, A., Clarke, L., Flicek, P., Cunningham, F., Herrero, J., Keenen, S., Kulesha, E., Leinonen, R., McLaren, W., Radhakrishnan, R., Smith, R. E., Zalunin, V., Zheng-Bradley, X., Korbel, J. O., Stuetz, A. M., Humphray, S., Bauer, M., Cheetham, R. K., Cox, T., Eberle, M., James, T., Kahn, S., Murray, L., Ye, K., De La Vega, F. M., Fu, Y., Hyland, F. C., Manning, J. M., McLaughlin, S. F., Peckham, H. E., Sakarya, O., Sun, Y. A., Tsung, E. F., Batzer, M. A., Konkel, M. K., Walker, J. A., Sudbrak, R., Albrecht, M. W., Amstislavskiy, V. S., Herwig, R., Parkhomchuk, D. V., Sherry, S. T., Agarwala, R., Khouri, H., Morgulis, A. O., Paschall, J. E., Phan, L. D., Rotmistrovsky, K. E., Sanders, R. D., Shumway, M. F., Xiao, C., McVean, G. A., Auton, A., Iqbal, Z., Lunter, G., Marchini, J. L., Moutsianas, L., Myers, S., Tumian, A., Desany, B., Knight, J., Winer, R., Craig, D. W., Beckstrom-Sternberg, S. M., Christoforides, A., Kurdoglu, A. A., Pearson, J., Sinari, S. A., Tembe, W. D., Haussler, D., Hinrichs, A. S., Katzman, S. J., Kern, A., Kuhn, R. M., Przeworski, M., Hernandez, R. D., Howie, B., Kelley, J. L., Melton, S. C., Abecasis, G. R., Li, Y., Anderson, P., Blackwell, T., Chen, W., Cookson, W. O., Ding, J., Kang, H. M., Lathrop, M., Liang, L., Moffatt, M. F., Scheet, P., Sidore, C., Snyder, M., Zhan, X., Zoellner, S., Awadalla, P., Casals, F., Idaghdour, Y., Keebler, J., Stone, E. A., Zilversmit, M., Jorde, L., Xing, J., Eichler, E. E., Aksay, G., Alkan, C., Hajirasouliha, I., Hormozdiari, F., Kidd, J. M., Sahinalp, S. C., Sudmant, P. H., Mardis, E. R., Chen, K., Chinwalla, A., Ding, L., Koboldt, D. C., McLellan, M. D., Dooling, D., Weinstock, G., Wallis, J. W., Wendl, M. C., Zhang, Q., Durbin, R. M., Albers, C. A., Ayub, Q., Balasubramaniam, S., Barrett, J. C., Carter, D. M., Chen, Y., Conrad, D. F., Danecek, P., Dermitzakis, E. T., Hu, M., Huang, N., Hurles, M. E., Jin, H., Jostins, L., Keane, T. M., Keane, T. M., Le, S. Q., Lindsay, S., Long, Q., MacArthur, D. G., Montgomery, S. B., Parts, L., Stalker, J., Tyler-Smith, C., Walter, K., Zhang, Y., Gerstein, M. B., Snyder, M., Abyzov, A., Abyzov, A., Balasubramanian, S., Bjornson, R., Du, J., Grubert, F., Habegger, L., Haraksingh, R., Jee, J., Khurana, E., Lam, H. Y., Leng, J., Mu, X. J., Urban, A. E., Zhang, Z., Li, Y., Luo, R., Marth, G. T., Garrison, E. P., Kural, D., Quinlan, A. R., Stewart, C., Stromberg, M. P., Ward, A. N., Wu, J., Lee, C., Mills, R. E., Shi, X., McCarroll, S. A., Banks, E., DePristo, M. A., Handsaker, R. E., Hartl, C., Korn, J. M., Li, H., Nemesh, J. C., Sebat, J., Makarov, V., Ye, K., Yoon, S. C., Degenhardt, J., Kaganovich, M., Clarke, L., Smith, R. E., Zheng-Bradley, X., Korbel, J. O., Humphray, S., Cheetham, R. K., Eberle, M., Kahn, S., Murray, L., Ye, K., De La Vega, F. M., Fu, Y., Peckham, H. E., Sun, Y. A., Batzer, M. A., Konkel, M. K., Xiao, C., Iqbal, Z., Desany, B., Blackwell, T., Snyder, M., Xing, J., Eichler, E. E., Aksay, G., Alkan, C., Hajirasouliha, I., Hormozdiari, F., Kidd, J. M., Chen, K., Chinwalla, A., Ding, L., McLellan, M. D., Wallis, J. W., Hurles, M. E., Conrad, D. F., Walter, K., Zhang, Y., Gerstein, M. B., Snyder, M., Abyzov, A., Du, J., Grubert, F., Haraksingh, R., Jee, J., Khurana, E., Lam, H. Y., Leng, J., Mu, X. J., Urban, A. E., Zhang, Z., Gibbs, R. A., Bainbridge, M., Challis, D., Coafra, C., Dinh, H., Kovar, C., Lee, S., Muzny, D., Nazareth, L., Reid, J., Sabo, A., Yu, F., Yu, J., Marth, G. T., Garrison, E. P., Indap, A., Leong, W. F., Quinlan, A. R., Stewart, C., Ward, A. N., Wu, J., Cibulskis, K., Fennell, T. J., Gabriel, S. B., Garimella, K. V., Hartl, C., Shefler, E., Sougnez, C. L., Wilkinson, J., Clark, A. G., Gravel, S., Grubert, F., Clarke, L., Flicek, P., Smith, R. E., Zheng-Bradley, X., Sherry, S. T., Khouri, H. M., Paschall, J. E., Shumway, M. F., Xiao, C., McVean, G. A., Katzman, S. J., Abecasis, G. R., Blackwell, T., Mardis, E. R., Dooling, D., Fulton, L., Fulton, R., Koboldt, D. C., Durbin, R. M., Balasubramaniam, S., Coffey, A., Keane, T. M., MacArthur, D. G., Palotie, A., Scott, C., Stalker, J., Tyler-Smith, C., Gerstein, M. B., Balasubramanian, S., Chakravarti, A., Knoppers, B. M., Peltonen, L., Abecasis, G. R., Bustamante, C. D., Gharani, N., Gibbs, R. A., Jorde, L., Kaye, J. S., Kent, A., Li, T., McGuire, A. L., McVean, G. A., Ossorio, P. N., Rotimi, C. N., Su, Y., Toji, L. H., Tyler-Smith, C., Brooks, L. D., Felsenfeld, A. L., McEwen, J. E., Abdallah, A., Juenger, C. R., Clemm, N. C., Collins, F. S., Duncanson, A., Green, E. D., Guyer, M. S., Peterson, J. L., Schafer, A. J., Abecasis, G. R., Altshuler, D. L., Auton, A., Brooks, L. D., Durbin, R. M., Gibbs, R. A., Hurles, M. E., McVean, G. A. 2010; 467 (7319): 1061-1073


    The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.

    View details for DOI 10.1038/nature09534

    View details for Web of Science ID 000283548600039

    View details for PubMedCentralID PMC3042601

  • Yeast proteomics and protein microarrays JOURNAL OF PROTEOMICS Chen, R., Snyder, M. 2010; 73 (11): 2147-2157


    Our understanding of biological processes as well as human diseases has improved greatly thanks to studies on model organisms such as yeast. The power of scientific approaches with yeast lies in its relatively simple genome, its facile classical and molecular genetics, as well as the evolutionary conservation of many basic biological mechanisms. However, even in this simple model organism, systems biology studies, especially proteomic studies had been an intimidating task. During the past decade, powerful high-throughput technologies in proteomic research have been developed for yeast including protein microarray technology. The protein microarray technology allows the interrogation of protein-protein, protein-DNA, protein-small molecule interaction networks as well as post-translational modification networks in a large-scale, high-throughput manner. With this technology, many groundbreaking findings have been established in studies with the budding yeast Saccharomyces cerevisiae, most of which could have been unachievable with traditional approaches. Discovery of these networks has profound impact on explicating biological processes with a proteomic point of view, which may lead to a better understanding of normal biological phenomena as well as various human diseases.

    View details for DOI 10.1016/j.jprot.2010.08.003

    View details for Web of Science ID 000283903000008

    View details for PubMedID 20728591

  • Comprehensive annotation of the transcriptome of the human fungal pathogen Candida albicans using RNA-seq GENOME RESEARCH Bruno, V. M., Wang, Z., Marjani, S. L., Euskirchen, G. M., Martin, J., Sherlock, G., Snyder, M. 2010; 20 (10): 1451-1458


    Candida albicans is the major invasive fungal pathogen of humans, causing diseases ranging from superficial mucosal infections to disseminated, systemic infections that are often lifethreatening. We have used massively parallel high-throughput sequencing of cDNA (RNA-seq) to generate a high-resolution map of the C. albicans transcriptome under several different environmental conditions. We have quantitatively determined all of the regions that are transcribed under these different conditions, and have identified 602 novel transcriptionally active regions (TARs) and numerous novel introns that are not represented in the current genome annotation. Interestingly, the expression of many of these TARs is regulated in a condition-specific manner. This comprehensive transcriptome analysis significantly enhances the current genome annotation of C. albicans, a necessary framework for a complete understanding of the molecular mechanisms of pathogenesis for this important eukaryotic pathogen.

    View details for DOI 10.1101/gr.109553.110

    View details for Web of Science ID 000282375000015

    View details for PubMedID 20810668

  • Annotating non-coding regions of the genome NATURE REVIEWS GENETICS Alexander, R. P., Fang, G., Rozowsky, J., Snyder, M., Gerstein, M. B. 2010; 11 (8): 559-571


    Most of the human genome consists of non-protein-coding DNA. Recently, progress has been made in annotating these non-coding regions through the interpretation of functional genomics experiments and comparative sequence analysis. One can conceptualize functional genomics analysis as involving a sequence of steps: turning the output of an experiment into a 'signal' at each base pair of the genome; smoothing this signal and segmenting it into small blocks of initial annotation; and then clustering these small blocks into larger derived annotations and networks. Finally, one can relate functional genomics annotations to conserved units and measures of conservation derived from comparative sequence analysis.

    View details for DOI 10.1038/nrg2814

    View details for Web of Science ID 000279988800012

    View details for PubMedID 20628352

  • Initiation of the TORC1-Regulated G(0) Program Requires Igo1/2, which License Specific mRNAs to Evade Degradation via the 5 '-3 ' mRNA Decay Pathway MOLECULAR CELL Talarek, N., Cameroni, E., Jaquenoud, M., Luo, X., Bontron, S., Lippman, S., Devgan, G., Snyder, M., Broach, J. R., De Virgilio, C. 2010; 38 (3): 345-355


    Eukaryotic cell proliferation is controlled by growth factors and essential nutrients, in the absence of which cells may enter into a quiescent (G(0)) state. In yeast, nitrogen and/or carbon limitation causes downregulation of the conserved TORC1 and PKA signaling pathways and, consequently, activation of the PAS kinase Rim15, which orchestrates G(0) program initiation and ensures proper life span by controlling distal readouts, including the expression of specific genes. Here, we report that Rim15 coordinates transcription with posttranscriptional mRNA protection by phosphorylating the paralogous Igo1 and Igo2 proteins. This event, which stimulates Igo proteins to associate with the mRNA decapping activator Dhh1, shelters newly expressed mRNAs from degradation via the 5'-3' mRNA decay pathway, thereby enabling their proper translation during initiation of the G(0) program. These results delineate a likely conserved mechanism by which nutrient limitation leads to stabilization of specific mRNAs that are critical for cell differentiation and life span.

    View details for DOI 10.1016/j.molcel.2010.02.039

    View details for Web of Science ID 000277818400006

    View details for PubMedID 20471941

  • MOTIPS: Automated Motif Analysis for Predicting Targets of Modular Protein Domains BMC BIOINFORMATICS Lam, H. Y., Kim, P. M., Mok, J., Tonikian, R., Sidhu, S. S., Turk, B. E., Snyder, M., Gerstein, M. B. 2010; 11


    Many protein interactions, especially those involved in signaling, involve short linear motifs consisting of 5-10 amino acid residues that interact with modular protein domains such as the SH3 binding domains and the kinase catalytic domains. One straightforward way of identifying these interactions is by scanning for matches to the motif against all the sequences in a target proteome. However, predicting domain targets by motif sequence alone without considering other genomic and structural information has been shown to be lacking in accuracy.We developed an efficient search algorithm to scan the target proteome for potential domain targets and to increase the accuracy of each hit by integrating a variety of pre-computed features, such as conservation, surface propensity, and disorder. The integration is performed using naïve Bayes and a training set of validated experiments.By integrating a variety of biologically relevant features to predict domain targets, we demonstrated a notably improved prediction of modular protein domain targets. Combined with emerging high-resolution data of domain specificities, we believe that our approach can assist in the reconstruction of many signaling pathways.

    View details for DOI 10.1186/1471-2105-11-243

    View details for Web of Science ID 000279728900007

    View details for PubMedID 20459839

  • Genomic binding profiles of functionally distinct RNA polymerase III transcription complexes in human cells NATURE STRUCTURAL & MOLECULAR BIOLOGY Moqtaderi, Z., Wang, J., Raha, D., White, R. J., Snyder, M., Weng, Z., Struhl, K. 2010; 17 (5): 635-U139


    Genome-wide occupancy profiles of five components of the RNA polymerase III (Pol III) machinery in human cells identified the expected tRNA and noncoding RNA targets and revealed many additional Pol III-associated loci, mostly near short interspersed elements (SINEs). Several genes are targets of an alternative transcription factor IIIB (TFIIIB) containing Brf2 instead of Brf1 and have extremely low levels of TFIIIC. Strikingly, expressed Pol III genes, unlike nonexpressed Pol III genes, are situated in regions with a pattern of histone modifications associated with functional Pol II promoters. TFIIIC alone associates with numerous ETC loci, via the B box or a novel motif. ETCs are often near CTCF binding sites, suggesting a potential role in chromosome organization. Our results suggest that human Pol III complexes associate preferentially with regions near functional Pol II promoters and that TFIIIC-mediated recruitment of TFIIIB is regulated in a locus-specific manner.

    View details for DOI 10.1038/nsmb.1794

    View details for Web of Science ID 000277330700020

    View details for PubMedID 20418883

  • Genetic analysis of variation in transcription factor binding in yeast NATURE Zheng, W., Zhao, H., Mancera, E., Steinmetz, L. M., Snyder, M. 2010; 464 (7292): 1187-U106


    Variation in transcriptional regulation is thought to be a major cause of phenotypic diversity. Although widespread differences in gene expression among individuals of a species have been observed, studies to examine the variability of transcription factor binding on a global scale have not been performed, and thus the extent and underlying genetic basis of transcription factor binding diversity is unknown. By mapping differences in transcription factor binding among individuals, here we present the genetic basis of such variation on a genome-wide scale. Whole-genome Ste12-binding profiles were determined using chromatin immunoprecipitation coupled with DNA sequencing in pheromone-treated cells of 43 segregants of a cross between two highly diverged yeast strains and their parental lines. We identified extensive Ste12-binding variation among individuals, and mapped underlying cis- and trans-acting loci responsible for such variation. We showed that most transcription factor binding variation is cis-linked, and that many variations are associated with polymorphisms residing in the binding motifs of Ste12 as well as those of several proposed Ste12 cofactors. We also identified two trans-factors, AMN1 and FLO8, that modulate Ste12 binding to promoters of more than ten genes under alpha-factor treatment. Neither of these two genes was previously known to regulate Ste12, and we suggest that they may be mediators of gene activity and phenotypic diversity. Ste12 binding strongly correlates with gene expression for more than 200 genes, indicating that binding variation is functional. Many of the variable-bound genes are involved in cell wall organization and biogenesis. Overall, these studies identified genetic regulators of molecular diversity among individuals and provide new insights into mechanisms of gene regulation.

    View details for DOI 10.1038/nature08934

    View details for Web of Science ID 000276891100036

    View details for PubMedID 20237471

  • Variation in Transcription Factor Binding Among Humans SCIENCE Kasowski, M., Grubert, F., Heffelfinger, C., Hariharan, M., Asabere, A., Waszak, S. M., Habegger, L., Rozowsky, J., Shi, M., Urban, A. E., Hong, M., Karczewski, K. J., Huber, W., Weissman, S. M., Gerstein, M. B., Korbel, J. O., Snyder, M. 2010; 328 (5975): 232-235


    Differences in gene expression may play a major role in speciation and phenotypic diversity. We examined genome-wide differences in transcription factor (TF) binding in several humans and a single chimpanzee by using chromatin immunoprecipitation followed by sequencing. The binding sites of RNA polymerase II (PolII) and a key regulator of immune responses, nuclear factor kappaB (p65), were mapped in 10 lymphoblastoid cell lines, and 25 and 7.5% of the respective binding regions were found to differ between individuals. Binding differences were frequently associated with single-nucleotide polymorphisms and genomic structural variants, and these differences were often correlated with differences in gene expression, suggesting functional consequences of binding variation. Furthermore, comparing PolII binding between humans and chimpanzee suggests extensive divergence in TF binding. Our results indicate that many differences in individuals and species occur at the level of TF binding, and they provide insight into the genetic events responsible for these differences.

    View details for DOI 10.1126/science.1183621

    View details for Web of Science ID 000276459600043

    View details for PubMedID 20299548

    View details for PubMedCentralID PMC2938768

  • Molecular Mechanisms of Ethanol-Induced Pathogenesis Revealed by RNA-Sequencing PLOS PATHOGENS Camarena, L., Bruno, V., Euskirchen, G., Poggio, S., Snyder, M. 2010; 6 (4)


    Acinetobacter baumannii is a common pathogen whose recent resistance to drugs has emerged as a major health problem. Ethanol has been found to increase the virulence of A. baumannii in Dictyostelium discoideum and Caenorhabditis elegans models of infection. To better understand the causes of this effect, we examined the transcriptional profile of A. baumannii grown in the presence or absence of ethanol using RNA-Seq. Using the Illumina/Solexa platform, a total of 43,453,960 reads (35 nt) were obtained, of which 3,596,474 mapped uniquely to the genome. Our analysis revealed that ethanol induces the expression of 49 genes that belong to different functional categories. A strong induction was observed for genes encoding metabolic enzymes, indicating that ethanol is efficiently assimilated. In addition, we detected the induction of genes encoding stress proteins, including upsA, hsp90, groEL and lon as well as permeases, efflux pumps and a secreted phospholipase C. In stationary phase, ethanol strongly induced several genes involved with iron assimilation and a high-affinity phosphate transport system, indicating that A. baumannii makes a better use of the iron and phosphate resources in the medium when ethanol is used as a carbon source. To evaluate the role of phospholipase C (Plc1) in virulence, we generated and analyzed a deletion mutant for plc1. This strain exhibits a modest, but reproducible, reduction in the cytotoxic effect caused by A. baumannii on epithelial cells, suggesting that phospholipase C is important for virulence. Overall, our results indicate the power of applying RNA-Seq to identify key modulators of bacterial pathogenesis. We suggest that the effect of ethanol on the virulence of A. baumannii is multifactorial and includes a general stress response and other specific components such as phospholipase C.

    View details for DOI 10.1371/journal.ppat.1000834

    View details for Web of Science ID 000277722400007

    View details for PubMedID 20368969

  • Dynamic transcriptomes during neural differentiation of human embryonic stem cells revealed by short, long, and paired-end sequencing PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Wu, J. Q., Habegger, L., Noisa, P., Szekely, A., Qiu, C., Hutchison, S., Raha, D., Egholm, M., Lin, H., Weissman, S., Cui, W., Gerstein, M., Snyder, M. 2010; 107 (11): 5254-5259


    To examine the fundamental mechanisms governing neural differentiation, we analyzed the transcriptome changes that occur during the differentiation of hESCs into the neural lineage. Undifferentiated hESCs as well as cells at three stages of early neural differentiation-N1 (early initiation), N2 (neural progenitor), and N3 (early glial-like)-were analyzed using a combination of single read, paired-end read, and long read RNA sequencing. The results revealed enormous complexity in gene transcription and splicing dynamics during neural cell differentiation. We found previously unannotated transcripts and spliced isoforms specific for each stage of differentiation. Interestingly, splicing isoform diversity is highest in undifferentiated hESCs and decreases upon differentiation, a phenomenon we call isoform specialization. During neural differentiation, we observed differential expression of many types of genes, including those involved in key signaling pathways, and a large number of extracellular receptors exhibit stage-specific regulation. These results provide a valuable resource for studying neural differentiation and reveal insights into the mechanisms underlying in vitro neural differentiation of hESCs, such as neural fate specification, neural progenitor cell identity maintenance, and the transition from a predominantly neuronal state into one with increased gliogenic potential.

    View details for DOI 10.1073/pnas.0914114107

    View details for Web of Science ID 000275714300079

    View details for PubMedID 20194744

    View details for PubMedCentralID PMC2841935

  • Personal genome sequencing: current approaches and challenges GENES & DEVELOPMENT Snyder, M., Du, J., Gerstein, M. 2010; 24 (5): 423-431


    The revolution in DNA sequencing technologies has now made it feasible to determine the genome sequences of many individuals; i.e., "personal genomes." Genome sequences of cells and tissues from both normal and disease states have been determined. Using current approaches, whole human genome sequences are not typically assembled and determined de novo, but, instead, variations relative to a reference sequence are identified. We discuss the current state of personal genome sequencing, the main steps involved in determining a genome sequence (i.e., identifying single-nucleotide polymorphisms [SNPs] and structural variations [SVs], assembling new sequences, and phasing haplotypes), and the challenges and performance metrics for evaluating the accuracy of the reconstruction. Finally, we consider the possible individual and societal benefits of personal genome sequences.

    View details for DOI 10.1101/gad.1864110

    View details for Web of Science ID 000275055900001

    View details for PubMedID 20194435

  • X chromosome-wide analyses of genomic DNA methylation states and gene expression in male and female neutrophils PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Yasukochi, Y., Maruyama, O., Mahajan, M. C., Padden, C., Euskirchen, G. M., Schulz, V., Hirakawa, H., Kuhara, S., Pan, X., Newburger, P. E., Snyder, M., Weissman, S. M. 2010; 107 (8): 3704-3709


    The DNA methylation status of human X chromosomes from male and female neutrophils was identified by high-throughput sequencing of HpaII and MspI digested fragments. In the intergenic and intragenic regions on the X chromosome, the sites outside CpG islands were heavily hypermethylated to the same degree in both genders. Nearly half of X chromosome promoters were either hypomethylated or hypermethylated in both females and males. Nearly one third of X chromosome promoters were a mixture of hypomethylated and heterogeneously methylated sites in females and were hypomethylated in males. Thus, a large fraction of genes that are silenced on the inactive X chromosome are hypomethylated in their promoter regions. These genes frequently belong to the evolutionarily younger strata of the X chromosome. The promoters that were hypomethylated at more than two sites contained most of the genes that escaped silencing on the inactive X chromosome. The overall levels of expression of X-linked genes were indistinguishable in females and males, regardless of the methylation state of the inactive X chromosome. Thus, in addition to DNA methylation, other factors are involved in the fine tuning of gene dosage compensation in neutrophils.

    View details for DOI 10.1073/pnas.0914812107

    View details for Web of Science ID 000275130900077

    View details for PubMedID 20133578

  • Close association of RNA polymerase II and many transcription factors with Pol III genes PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Raha, D., Wang, Z., Moqtaderi, Z., Wu, L., Zhong, G., Gerstein, M., Struhl, K., Snyder, M. 2010; 107 (8): 3639-3644


    Transcription of the eukaryotic genomes is carried out by three distinct RNA polymerases I, II, and III, whereby each polymerase is thought to independently transcribe a distinct set of genes. To investigate a possible relationship of RNA polymerases II and III, we mapped their in vivo binding sites throughout the human genome by using ChIP-Seq in two different cell lines, GM12878 and K562 cells. Pol III was found to bind near many known genes as well as several previously unidentified target genes. RNA-Seq studies indicate that a majority of the bound genes are expressed, although a subset are not suggestive of stalling by RNA polymerase III. Pol II was found to bind near many known Pol III genes, including tRNA, U6, HVG, hY, 7SK and previously unidentified Pol III target genes. Similarly, in vivo binding studies also reveal that a number of transcription factors normally associated with Pol II transcription, including c-Fos, c-Jun and c-Myc, also tightly associate with most Pol III-transcribed genes. Inhibition of Pol II activity using alpha-amanitin reduced expression of a number of Pol III genes (e.g., U6, hY, HVG), suggesting that Pol II plays an important role in regulating their transcription. These results indicate that, contrary to previous expectations, polymerases can often work with one another to globally coordinate gene expression.

    View details for DOI 10.1073/pnas.0911315106

    View details for Web of Science ID 000275130900066

    View details for PubMedID 20139302

  • Deciphering Protein Kinase Specificity Through Large-Scale Analysis of Yeast Phosphorylation Site Motifs SCIENCE SIGNALING Mok, J., Kim, P. M., Lam, H. Y., Piccirillo, S., Zhou, X., Jeschke, G. R., Sheridan, D. L., Parker, S. A., Desai, V., Jwa, M., Cameroni, E., Niu, H., Good, M., Remenyi, A., Ma, J. N., Sheu, Y., Sassi, H. E., Sopko, R., Chan, C. S., De Virgilio, C., Hollingsworth, N. M., Lim, W. A., Stern, D. F., Stillman, B., Andrews, B. J., Gerstein, M. B., Snyder, M., Turk, B. E. 2010; 3 (109)


    Phosphorylation is a universal mechanism for regulating cell behavior in eukaryotes. Although protein kinases target short linear sequence motifs on their substrates, the rules for kinase substrate recognition are not completely understood. We used a rapid peptide screening approach to determine consensus phosphorylation site motifs targeted by 61 of the 122 kinases in Saccharomyces cerevisiae. By correlating these motifs with kinase primary sequence, we uncovered previously unappreciated rules for determining specificity within the kinase family, including a residue determining P-3 arginine specificity among members of the CMGC [CDK (cyclin-dependent kinase), MAPK (mitogen-activated protein kinase), GSK (glycogen synthase kinase), and CDK-like] group of kinases. Furthermore, computational scanning of the yeast proteome enabled the prediction of thousands of new kinase-substrate relationships. We experimentally verified several candidate substrates of the Prk1 family of kinases in vitro and in vivo and identified a protein substrate of the kinase Vhs1. Together, these results elucidate how kinase catalytic domains recognize their phosphorylation targets and suggest general avenues for the identification of previously unknown kinase substrates across eukaryotes.

    View details for DOI 10.1126/scisignal.2000482

    View details for Web of Science ID 000275647900005

    View details for PubMedID 20159853

  • Genome-Wide Identification of Binding Sites Defines Distinct Functions for Caenorhabditis elegans PHA-4/FOXA in Development and Environmental Response PLOS GENETICS Zhong, M., Niu, W., Lu, Z. J., Sarov, M., Murray, J. I., Janette, J., Raha, D., Sheaffer, K. L., Lam, H. Y., Preston, E., Slightham, C., Hillier, L. W., Brock, T., Agarwal, A., Auerbach, R., Hyman, A. A., Gerstein, M., Mango, S. E., Kim, S. K., Waterston, R. H., Reinke, V., Snyder, M. 2010; 6 (2)


    Transcription factors are key components of regulatory networks that control development, as well as the response to environmental stimuli. We have established an experimental pipeline in Caenorhabditis elegans that permits global identification of the binding sites for transcription factors using chromatin immunoprecipitation and deep sequencing. We describe and validate this strategy, and apply it to the transcription factor PHA-4, which plays critical roles in organ development and other cellular processes. We identified thousands of binding sites for PHA-4 during formation of the embryonic pharynx, and also found a role for this factor during the starvation response. Many binding sites were found to shift dramatically between embryos and starved larvae, from developmentally regulated genes to genes involved in metabolism. These results indicate distinct roles for this regulator in two different biological processes and demonstrate the versatility of transcription factors in mediating diverse biological roles.

    View details for DOI 10.1371/journal.pgen.1000848

    View details for Web of Science ID 000275262700016

    View details for PubMedID 20174564

  • Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library NATURE BIOTECHNOLOGY Lam, H. Y., Mu, X. J., Stuetz, A. M., Tanzer, A., Cayting, P. D., Snyder, M., Kim, P. M., Korbel, J. O., Gerstein, M. B. 2010; 28 (1): 47-U76


    Structural variants (SVs) are a major source of human genomic variation; however, characterizing them at nucleotide resolution remains challenging. Here we assemble a library of breakpoints at nucleotide resolution from collating and standardizing ~2,000 published SVs. For each breakpoint, we infer its ancestral state (through comparison to primate genomes) and its mechanism of formation (e.g., nonallelic homologous recombination, NAHR). We characterize breakpoint sequences with respect to genomic landmarks, chromosomal location, sequence motifs and physical properties, finding that the occurrence of insertions and deletions is more balanced than previously reported and that NAHR-formed breakpoints are associated with relatively rigid, stable DNA helices. Finally, we demonstrate an approach, BreakSeq, for scanning the reads from short-read sequenced genomes against our breakpoint library to accurately identify previously overlooked SVs, which we then validate by PCR. As new data become available, we expect our BreakSeq approach will become more sensitive and facilitate rapid SV genotyping of personal genomes.

    View details for DOI 10.1038/nbt.1600

    View details for Web of Science ID 000273430400020

    View details for PubMedID 20037582



    Much of eukaryotic gene regulation is mediated by binding of transcription factors near or within their target genes. Transcription factor binding sites (TFBS) are often identified globally using chromatin immunoprecipitation (ChIP) in which specific protein-DNA interactions are isolated using an antibody against the factor of interest. Coupling ChIP with high-throughput DNA sequencing allows identification of TFBS in a direct, unbiased fashion; this technique is termed ChIP-Sequencing (ChIP-Seq). In this chapter, we describe the yeast ChIP-Seq procedure, including the protocols for ChIP, input DNA preparation, and Illumina DNA sequencing library preparation. Descriptions of Illumina sequencing and data processing and analysis are also included. The use of multiplex short-read sequencing (i.e., barcoding) enables the analysis of many ChIP samples simultaneously, which is especially valuable for organisms with small genomes such as yeast.

    View details for DOI 10.1016/S0076-6879(10)70004-5

    View details for Web of Science ID 000275827900004

    View details for PubMedID 20946807

  • RNA-Seq: a method for comprehensive transcriptome analysis. Current protocols in molecular biology / edited by Frederick M. Ausubel ... [et al.] Nagalakshmi, U., Waern, K., Snyder, M. 2010; Chapter 4: Unit 4 11 1-13


    A recently developed technique called RNA Sequencing (RNA-Seq) uses massively parallel sequencing to allow transcriptome analyses of genomes at a far higher resolution than is available with Sanger sequencing- and microarray-based methods. In the RNA-Seq method, complementary DNAs (cDNAs) generated from the RNA of interest are directly sequenced using next-generation sequencing technologies. The reads obtained from this can then be aligned to a reference genome in order to construct a whole-genome transcriptome map. RNA-Seq has been used successfully to precisely quantify transcript levels, confirm or revise previously annotated 5' and 3' ends of genes, and map exon/intron boundaries. This unit describes protocols for performing RNA-Seq using the Illumina sequencing platform.

    View details for DOI 10.1002/0471142727.mb0411s89

    View details for PubMedID 20069539

  • Systems biology approaches to disease marker discovery DISEASE MARKERS Sharon, D., Chen, R., Snyder, M. 2010; 28 (4): 209-224


    Our understanding of human disease and potential therapeutics is improving rapidly. In order to take advantage of these developments it is important to be able to identify disease markers. Many new high-throughput genomics and proteomics technologies are being implemented to identify candidate disease markers. These technologies include protein microarrays, next-generation DNA sequencing and mass spectrometry platforms. Such methods are particularly important for elucidating the repertoire of molecular markers in the genome, transcriptome, proteome and metabolome of patients with diseases such as cancer, autoimmune diseases, and viral infections, resulting from the disruption of many biological pathways. These new technologies have identified many potential disease markers. These markers are expected to be valuable to achieve the promise of truly personalized medicine.

    View details for DOI 10.3233/DMA-2010-0707

    View details for Web of Science ID 000279321200003

    View details for PubMedID 20534906

  • EBNA1 regulates cellular gene expression by binding cellular promoters PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Canaan, A., Haviv, I., Urban, A. E., Schulz, V. P., Hartman, S., Zhang, Z., Palejev, D., Deisseroth, A. B., Lacy, J., Snyder, M., Gerstein, M., Weissman, S. M. 2009; 106 (52): 22421-22426


    Epstein-Barr virus (EBV) is associated with several types of lymphomas and epithelial tumors including Burkitt's lymphoma (BL), HIV-associated lymphoma, posttransplant lymphoproliferative disorder, and nasopharyngeal carcinoma. EBV nuclear antigen 1 (EBNA1) is expressed in all EBV associated tumors and is required for latency and transformation. EBNA1 initiates latent viral replication in B cells, maintains the viral genome copy number, and regulates transcription of other EBV-encoded latent genes. These activities are mediated through the ability of EBNA1 to bind viral-DNA. To further elucidate the role of EBNA1 in the host cell, we have examined the effect of EBNA1 on cellular gene expression by microarray analysis using the B cell BJAB and the epithelial 293 cell lines transfected with EBNA1. Analysis of the data revealed distinct profiles of cellular gene changes in BJAB and 293 cell lines. Subsequently, chromatin immune-precipitation revealed a direct binding of EBNA1 to cellular promoters. We have correlated EBNA1 bound promoters with changes in gene expression. Sequence analysis of the 100 promoters most enriched revealed a DNA motif that differs from the EBNA1 binding site in the EBV genome.

    View details for DOI 10.1073/pnas.0911676106

    View details for Web of Science ID 000273178700069

    View details for PubMedID 20080792

  • Mapping accessible chromatin regions using Sono-Seq PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Auerbach, R. K., Euskirchen, G., Rozowsky, J., Lamarre-Vincent, N., Moqtaderi, Z., Lefrancois, P., Struhl, K., Gerstein, M., Snyder, M. 2009; 106 (35): 14926-14931


    Disruptions in local chromatin structure often indicate features of biological interest such as regulatory regions. We find that sonication of cross-linked chromatin, when combined with a size-selection step and massively parallel short-read sequencing, can be used as a method (Sono-Seq) to map locations of high chromatin accessibility in promoter regions. Sono-Seq sites frequently correspond to actively transcribed promoter regions, as evidenced by their co-association with RNA Polymerase II ChIP regions, transcription start sites, histone H3 lysine 4 trimethylation (H3K4me3) marks, and CpG islands; signals over other sites, such as those bound by the CTCF insulator, are also observed. The pattern of breakage by Sono-Seq overlaps with, but is distinct from, that observed for FAIRE and DNase I hypersensitive sites. Our results demonstrate that Sono-Seq can be a useful and simple method by which to map many local alterations in chromatin structure. Furthermore, our results provide insights into the mapping of binding sites by using ChIP-Seq experiments and the value of reference samples that should be used in such experiments.

    View details for DOI 10.1073/pnas.0905443106

    View details for Web of Science ID 000269481000036

    View details for PubMedID 19706456

  • Global analysis of the glycoproteome in Saccharomyces cerevisiae reveals new roles for protein glycosylation in eukaryotes MOLECULAR SYSTEMS BIOLOGY Kung, L. A., Tao, S., Qian, J., Smith, M. G., Snyder, M., Zhu, H. 2009; 5


    To further understand the roles of protein glycosylation in eukaryotes, we globally identified glycan-containing proteins in yeast. A fluorescent lectin binding assay was developed and used to screen protein microarrays containing over 5000 proteins purified from yeast. A total of 534 yeast proteins were identified that bound either Concanavalin A (ConA) or Wheat-Germ Agglutinin (WGA); 406 of them were novel. Among the novel glycoproteins, 45 were validated by mobility shift upon treatment with EndoH and PNGase F, thereby extending the number of validated yeast glycoproteins to 350. In addition to many components of the secretory pathway, we identified other types of proteins, such as transcription factors and mitochondrial proteins. To further explore the role of glycosylation in mitochondrial function, the localization of four mitochondrial proteins was examined in the presence and absence of tunicamycin, an inhibitor of N-linked protein glycosylation. For two proteins, localization to the mitochondria is diminished upon tunicamycin treatment, indicating that protein glycosylation is important for protein function. Overall, our studies greatly extend our understanding of protein glycosylation in eukaryotes through the cataloguing of glycoproteins, and describe a novel role for protein glycosylation in mitochondrial protein function and localization.

    View details for DOI 10.1038/msb.2009.64

    View details for Web of Science ID 000270456400002

    View details for PubMedID 19756047

  • Impact of Chromatin Structures on DNA Processing for Genomic Analyses PLOS ONE Teytelman, L., Oezaydin, B., Zill, O., Lefrancois, P., Snyder, M., Rine, J., Eisen, M. B. 2009; 4 (8)


    Chromatin has an impact on recombination, repair, replication, and evolution of DNA. Here we report that chromatin structure also affects laboratory DNA manipulation in ways that distort the results of chromatin immunoprecipitation (ChIP) experiments. We initially discovered this effect at the Saccharomyces cerevisiae HMR locus, where we found that silenced chromatin was refractory to shearing, relative to euchromatin. Using input samples from ChIP-Seq studies, we detected a similar bias throughout the heterochromatic portions of the yeast genome. We also observed significant chromatin-related effects at telomeres, protein binding sites, and genes, reflected in the variation of input-Seq coverage. Experimental tests of candidate regions showed that chromatin influenced shearing at some loci, and that chromatin could also lead to enriched or depleted DNA levels in prepared samples, independently of shearing effects. Our results suggested that assays relying on immunoprecipitation of chromatin will be biased by intrinsic differences between regions packaged into different chromatin structures - biases which have been largely ignored to date. These results established the pervasiveness of this bias genome-wide, and suggested that this bias can be used to detect differences in chromatin structures across the genome.

    View details for DOI 10.1371/journal.pone.0006700

    View details for Web of Science ID 000269267400008

    View details for PubMedID 19693276

  • Intrinsic histone-DNA interactions are not the major determinant of nucleosome positions in vivo NATURE STRUCTURAL & MOLECULAR BIOLOGY Zhang, Y., Moqtaderi, Z., Rattner, B. P., Euskirchen, G., Snyder, M., Kadonaga, J. T., Liu, X. S., Struhl, K. 2009; 16 (8): 847-U70


    We assess the role of intrinsic histone-DNA interactions by mapping nucleosomes assembled in vitro on genomic DNA. Nucleosomes strongly prefer yeast DNA over Escherichia coli DNA, indicating that the yeast genome evolved to favor nucleosome formation. Many yeast promoter and terminator regions intrinsically disfavor nucleosome formation, and nucleosomes assembled in vitro show strong rotational positioning. Nucleosome arrays generated by the ACF assembly factor have fewer nucleosome-free regions, reduced rotational positioning and less translational positioning than obtained by intrinsic histone-DNA interactions. Notably, nucleosomes assembled in vitro have only a limited preference for specific translational positions and do not show the pattern observed in vivo. Our results argue against a genomic code for nucleosome positioning, and they suggest that the nucleosomal pattern in coding regions arises primarily from statistical positioning from a barrier near the promoter that involves some aspect of transcriptional initiation by RNA polymerase II.

    View details for DOI 10.1038/nsmb.1636

    View details for Web of Science ID 000268738700012

    View details for PubMedID 19620965

  • The genetic architecture of Down syndrome phenotypes revealed by high-resolution analysis of human segmental trisomies PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Korbel, J. O., Tirosh-Wagner, T., Urban, A. E., Chen, X., Kasowski, M., Dai, L., Grubert, F., Erdman, C., Gao, M. C., Lange, K., Sobel, E. M., Barlow, G. M., Aylsworth, A. S., Carpenter, N. J., Clark, R. D., Cohen, M. Y., Doran, E., Falik-Zaccai, T., Lewin, S. O., Lott, I. T., McGillivray, B. C., Moeschler, J. B., Pettenati, M. J., Pueschel, S. M., Rao, K. W., Shaffer, L. G., Shohat, M., Van Riper, A. J., Warburton, D., Weissman, S., Gerstein, M. B., Snyder, M., Korenberg, J. R. 2009; 106 (29): 12031-12036


    Down syndrome (DS), or trisomy 21, is a common disorder associated with several complex clinical phenotypes. Although several hypotheses have been put forward, it is unclear as to whether particular gene loci on chromosome 21 (HSA21) are sufficient to cause DS and its associated features. Here we present a high-resolution genetic map of DS phenotypes based on an analysis of 30 subjects carrying rare segmental trisomies of various regions of HSA21. By using state-of-the-art genomics technologies we mapped segmental trisomies at exon-level resolution and identified discrete regions of 1.8-16.3 Mb likely to be involved in the development of 8 DS phenotypes, 4 of which are congenital malformations, including acute megakaryocytic leukemia, transient myeloproliferative disorder, Hirschsprung disease, duodenal stenosis, imperforate anus, severe mental retardation, DS-Alzheimer Disease, and DS-specific congenital heart disease (DSCHD). Our DS-phenotypic maps located DSCHD to a <2-Mb interval. Furthermore, the map enabled us to present evidence against the necessary involvement of other loci as well as specific hypotheses that have been put forward in relation to the etiology of DS-i.e., the presence of a single DS consensus region and the sufficiency of DSCR1 and DYRK1A, or APP, in causing several severe DS phenotypes. Our study demonstrates the value of combining advanced genomics with cohorts of rare patients for studying DS, a prototype for the role of copy-number variation in complex disease.

    View details for DOI 10.1073/pnas.0813248106

    View details for Web of Science ID 000268178400040

    View details for PubMedID 19597142

  • Integrating Sequencing Technologies in Personal Genomics: Optimal Low Cost Reconstruction of Structural Variants PLOS COMPUTATIONAL BIOLOGY Du, J., Bjornson, R. D., Zhang, Z. D., Kong, Y., Snyder, M., Gerstein, M. B. 2009; 5 (7)


    The goal of human genome re-sequencing is obtaining an accurate assembly of an individual's genome. Recently, there has been great excitement in the development of many technologies for this (e.g. medium and short read sequencing from companies such as 454 and SOLiD, and high-density oligo-arrays from Affymetrix and NimbelGen), with even more expected to appear. The costs and sensitivities of these technologies differ considerably from each other. As an important goal of personal genomics is to reduce the cost of re-sequencing to an affordable point, it is worthwhile to consider optimally integrating technologies. Here, we build a simulation toolbox that will help us optimally combine different technologies for genome re-sequencing, especially in reconstructing large structural variants (SVs). SV reconstruction is considered the most challenging step in human genome re-sequencing. (It is sometimes even harder than de novo assembly of small genomes because of the duplications and repetitive sequences in the human genome.) To this end, we formulate canonical problems that are representative of issues in reconstruction and are of small enough scale to be computationally tractable and simulatable. Using semi-realistic simulations, we show how we can combine different technologies to optimally solve the assembly at low cost. With mapability maps, our simulations efficiently handle the inhomogeneous repeat-containing structure of the human genome and the computational complexity of practical assembly algorithms. They quantitatively show how combining different read lengths is more cost-effective than using one length, how an optimal mixed sequencing strategy for reconstructing large novel SVs usually also gives accurate detection of SNPs/indels, how paired-end reads can improve reconstruction efficiency, and how adding in arrays is more efficient than just sequencing for disentangling some complex SVs. Our strategy should facilitate the sequencing of human genomes at maximum accuracy and low cost.

    View details for DOI 10.1371/journal.pcbi.1000432

    View details for Web of Science ID 000269220100023

    View details for PubMedID 19593373

  • Recommendations from the 2008 International Summit on Proteomics Data Release and Sharing Policy: The Amsterdam Principles JOURNAL OF PROTEOME RESEARCH Rodriguez, H., Snyder, M., Uhlen, M., Andrews, P., Beavis, R., Borchers, C., Chalkley, R. J., Cho, S. Y., Cottingham, K., Dunn, M., Dylag, T., Edgar, R., Hare, P., Heck, A. J., Hirsch, R. F., Kennedy, K., Kolar, P., Kraus, H., Mallick, P., Nesvizhskii, A., Ping, P., Ponten, F., Yang, L., Yates, J. R., Stein, S. E., Hermjakob, H., Kinsinger, C. R., Apweiler, R. 2009; 8 (7): 3689-3692


    Policies supporting the rapid and open sharing of genomic data have directly fueled the accelerated pace of discovery in large-scale genomics research. The proteomics community is starting to implement analogous policies and infrastructure for making large-scale proteomics data widely available on a precompetitive basis. On August 14, 2008, the National Cancer Institute (NCI) convened the "International Summit on Proteomics Data Release and Sharing Policy" in Amsterdam, The Netherlands, to identify and address potential roadblocks to rapid and open access to data. The six principles agreed upon by key stakeholders at the summit addressed issues surrounding (1) timing, (2) comprehensiveness, (3) format, (4) deposition to repositories, (5) quality metrics, and (6) responsibility for proteomics data release. This summit report explores various approaches to develop a framework of data release and sharing principles that will most effectively fulfill the needs of the funding agencies and the research community.

    View details for DOI 10.1021/pr900023z

    View details for Web of Science ID 000267694600043

    View details for PubMedID 19344107

  • Unlocking the secrets of the genome NATURE Celniker, S. E., Dillon, L. A., Gerstein, M. B., Gunsalus, K. C., Henikoff, S., Karpen, G. H., Kellis, M., Lai, E. C., Lieb, J. D., MacAlpine, D. M., Micklem, G., Piano, F., Snyder, M., Stein, L., White, K. P., Waterston, R. H. 2009; 459 (7249): 927-930

    View details for DOI 10.1038/459927a

    View details for Web of Science ID 000267063500031

    View details for PubMedID 19536255

  • Dynamic and complex transcription factor binding during an inducible response in yeast GENES & DEVELOPMENT Ni, L., Bruce, C., Hart, C., Leigh-Bell, J., Gelperin, D., Umansky, L., Gerstein, M. B., Snyder, M. 2009; 23 (11): 1351-1363


    Complex biological processes are often regulated, at least in part, by the binding of transcription factors to their targets. Recently, considerable effort has been made to analyze the binding of relevant factors to the suite of targets they regulate, thereby generating a regulatory circuit map. However, for most studies the dynamics of binding have not been analyzed, and thus the temporal order of events and mechanisms by which this occurs are poorly understood. We globally analyzed in detail the temporal order of binding of several key factors involved in the salt response of yeast to their target genes. Analysis of Yap4 and Sko1 binding to their target genes revealed multiple temporal classes of binding patterns: (1) constant binding, (2) rapid induction, (3) slow induction, and (4) transient induction. These results demonstrate that individual transcription factors can have multiple binding patterns and help define the different types of temporal binding patterns used in eukaryotic gene regulation. To investigate these binding patterns further, we also analyzed the binding of seven other key transcription factors implicated in osmotic regulation, including Hot1, Msn1, Msn2, Msn4, Skn7, and Yap6, and found significant coassociation among the different factors at their gene targets. Moreover, the binding of several key factors was correlated with distinct classes of Yap4- and Sko1-binding patterns and with distinct types of genes. Gene expression studies revealed association of Yap4, Sko1, and other transcription factor-binding patterns with different gene expression patterns. The integration and analysis of binding and expression information reveals a complex dynamic and hierarchical circuit in which specific combinations of transcription factors target distinct sets of genes at discrete times to coordinate a rapid and important biological response.

    View details for DOI 10.1101/gad.1781909

    View details for Web of Science ID 000266524100009

    View details for PubMedID 19487574

  • Integrated analysis of co-expressed MAP kinase substrates in Arabidopsis thaliana. Plant signaling & behavior Popescu, S. C., Popescu, G. V., Snyder, M., Dinesh-Kumar, S. P. 2009; 4 (6): 524-527

    View details for PubMedID 19816141

  • Distinct Genomic Aberrations Associated with ERG Rearranged Prostate Cancer GENES CHROMOSOMES & CANCER Demichelis, F., Setlur, S. R., Beroukhim, R., Perner, S., Korbel, J. O., LaFargue, C. J., Pflueger, D., Pina, C., Hofer, M. D., Sboner, A., Svensson, M. A., Rickman, D. S., Urban, A., Snyder, M., Meyerson, M., Lee, C., Gerstein, M. B., Kuefer, R., Rubin, M. A. 2009; 48 (4): 366-380


    Emerging molecular and clinical data suggest that ETS fusion prostate cancer represents a distinct molecular subclass, driven most commonly by a hormonally regulated promoter and characterized by an aggressive natural history. The study of the genomic landscape of prostate cancer in the light of ETS fusion events is required to understand the foundation of this molecularly and clinically distinct subtype. We performed genome-wide profiling of 49 primary prostate cancers and identified 20 recurrent chromosomal copy number aberrations, mainly occurring as genomic losses. Co-occurring events included losses at 19q13.32 and 1p22.1. We discovered three genomic events associated with ERG rearranged prostate cancer, affecting 6q, 7q, and 16q. 6q loss in nonrearranged prostate cancer is accompanied by gene expression deregulation in an independent dataset and by protein deregulation of MYO6. To analyze copy number alterations within the ETS genes, we performed a comprehensive analysis of all 27 ETS genes and of the 3 Mbp genomic area between ERG and TMPRSS2 (21q) with an unprecedented resolution (30 bp). We demonstrate that high-resolution tiling arrays can be used to pin-point breakpoints leading to fusion events. This study provides further support to define a distinct molecular subtype of prostate cancer based on the presence of ETS gene rearrangements.

    View details for DOI 10.1002/gcc.20647

    View details for Web of Science ID 000263572700007

    View details for PubMedID 19156837

  • A myelopoiesis-associated regulatory intergenic noncoding RNA transcript within the human HOXA cluster BLOOD Zhang, X., Lian, Z., Padden, C., Gerstein, M. B., Rozowsky, J., Snyder, M., Gingeras, T. R., Kapranov, P., Weissman, S. M., Newburger, P. E. 2009; 113 (11): 2526-2534


    We have identified an intergenic transcriptional activity that is located between the human HOXA1 and HOXA2 genes, shows myeloid-specific expression, and is up-regulated during granulocytic differentiation. The novel gene, termed HOTAIRM1 (HOX antisense intergenic RNA myeloid 1), is transcribed antisense to the HOXA genes and originates from the same CpG island that embeds the start site of HOXA1. The transcript appears to be a noncoding RNA containing no long open-reading frame; sucrose gradient analysis shows no association with polyribosomal fractions. HOTAIRM1 is the most prominent intergenic transcript expressed and up-regulated during induced granulocytic differentiation of NB4 promyelocytic leukemia and normal human hematopoietic cells; its expression is specific to the myeloid lineage. Its induction during retinoic acid (RA)-driven granulocytic differentiation is through RA receptor and may depend on the expression of myeloid cell development factors targeted by RA signaling. Knockdown of HOTAIRM1 quantitatively blunted RA-induced expression of HOXA1 and HOXA4 during the myeloid differentiation of NB4 cells, and selectively attenuated induction of transcripts for the myeloid differentiation genes CD11b and CD18, but did not noticeably impact the more distal HOXA genes. These findings suggest that HOTAIRM1 plays a role in the myelopoiesis through modulation of gene expression in the HOXA cluster.

    View details for DOI 10.1182/blood-2008-06-162164

    View details for Web of Science ID 000264110600021

    View details for PubMedID 19144990

  • A high throughput embryonic stem cell screen identifies Oct-2 as a bifunctional regulator of neuronal differentiation GENES & DEVELOPMENT Theodorou, E., Dalembert, G., Heffelfinger, C., White, E., Weissman, S., Corcoran, L., Snyder, M. 2009; 23 (5): 575-588


    Neuronal differentiation is a complex process that involves a plethora of regulatory steps. To identify transcription factors that influence neuronal differentiation we developed a high throughput screen using embryonic stem (ES) cells. Seven-hundred human transcription factor clones were stably introduced into mouse ES (mES) cells and screened for their ability to induce neuronal differentiation of mES cells. Twenty-four factors that are capable of inducing neuronal differentiation were identified, including four known effectors of neuronal differentiation, 11 factors with limited evidence of involvement in regulating neuronal differentiation, and nine novel factors. One transcription factor, Oct-2, was studied in detail and found to be a bifunctional regulator: It can either repress or induce neuronal differentiation, depending on the particular isoform. Ectopic expression experiments demonstrate that isoform Oct-2.4 represses neuronal differentiation, whereas Oct-2.2 activates neuron formation. Consistent with a role in neuronal differentiation, Oct-2.2 expression is induced during differentiation, and cells depleted of Oct-2 and its homolog Oct-1 have a reduced capacity to differentiate into neurons. Our results reveal a number of transcription factors potentially important for mammalian neuronal differentiation, and indicate that Oct-2 may serve as a binary switch to repress differentiation in precursor cells and induce neuronal differentiation later during neuronal development.

    View details for DOI 10.1101/gad.1772509

    View details for Web of Science ID 000263918500005

    View details for PubMedID 19270158

  • Quantifying environmental adaptation of metabolic pathways in metagenomics PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Gianoulis, T. A., Raes, J., Patel, P. V., Bjornson, R., Korbel, J. O., Letunic, I., Yamada, T., Paccanaro, A., Jensen, L. J., Snyder, M., Bork, P., Gerstein, M. B. 2009; 106 (5): 1374-1379


    Recently, approaches have been developed to sample the genetic content of heterogeneous environments (metagenomics). However, by what means these sequences link distinct environmental conditions with specific biological processes is not well understood. Thus, a major challenge is how the usage of particular pathways and subnetworks reflects the adaptation of microbial communities across environments and habitats-i.e., how network dynamics relates to environmental features. Previous research has treated environments as discrete, somewhat simplified classes (e.g., terrestrial vs. marine), and searched for obvious metabolic differences among them (i.e., treating the analysis as a typical classification problem). However, environmental differences result from combinations of many factors, which often vary only slightly. Therefore, we introduce an approach that employs correlation and regression to relate multiple, continuously varying factors defining an environment to the extent of particular microbial pathways present in a geographic site. Moreover, rather than looking only at individual correlations (one-to-one), we adapted canonical correlation analysis and related techniques to define an ensemble of weighted pathways that maximally covaries with a combination of environmental variables (many-to-many), which we term a metabolic footprint. Applied to available aquatic datasets, we identified footprints predictive of their environment that can potentially be used as biosensors. For example, we show a strong multivariate correlation between the energy-conversion strategies of a community and multiple environmental gradients (e.g., temperature). Moreover, we identified covariation in amino acid transport and cofactor synthesis, suggesting that limiting amounts of cofactor can (partially) explain increased import of amino acids in nutrient-limited conditions.

    View details for DOI 10.1073/pnas.0808022106

    View details for Web of Science ID 000263074600018

    View details for PubMedID 19164758

  • Efficient yeast ChIP-Seq using multiplex short-read DNA sequencing BMC GENOMICS Lefrancois, P., Euskirchen, G. M., Auerbach, R. K., Rozowsky, J., Gibson, T., Yellman, C. M., Gerstein, M., Snyder, M. 2009; 10


    Short-read high-throughput DNA sequencing technologies provide new tools to answer biological questions. However, high cost and low throughput limit their widespread use, particularly in organisms with smaller genomes such as S. cerevisiae. Although ChIP-Seq in mammalian cell lines is replacing array-based ChIP-chip as the standard for transcription factor binding studies, ChIP-Seq in yeast is still underutilized compared to ChIP-chip. We developed a multiplex barcoding system that allows simultaneous sequencing and analysis of multiple samples using Illumina's platform. We applied this method to analyze the chromosomal distributions of three yeast DNA binding proteins (Ste12, Cse4 and RNA PolII) and a reference sample (input DNA) in a single experiment and demonstrate its utility for rapid and accurate results at reduced costs.We developed a barcoding ChIP-Seq method for the concurrent analysis of transcription factor binding sites in yeast. Our multiplex strategy generated high quality data that was indistinguishable from data obtained with non-barcoded libraries. None of the barcoded adapters induced differences relative to a non-barcoded adapter when applied to the same DNA sample. We used this method to map the binding sites for Cse4, Ste12 and Pol II throughout the yeast genome and we found 148 binding targets for Cse4, 823 targets for Ste12 and 2508 targets for PolII. Cse4 was strongly bound to all yeast centromeres as expected and the remaining non-centromeric targets correspond to highly expressed genes in rich media. The presence of Cse4 non-centromeric binding sites was not reported previously.We designed a multiplex short-read DNA sequencing method to perform efficient ChIP-Seq in yeast and other small genome model organisms. This method produces accurate results with higher throughput and reduced cost. Given constant improvements in high-throughput sequencing technologies, increasing multiplexing will be possible to further decrease costs per sample and to accelerate the completion of large consortium projects such as modENCODE.

    View details for DOI 10.1186/1471-2164-10-37

    View details for Web of Science ID 000264970100002

    View details for PubMedID 19159457

  • Proteomic-Based Detection of a Protein Cluster Dysregulated during Cardiovascular Development Identifies Biomarkers of Congenital Heart Defects PLOS ONE Nath, A. K., Krauthammer, M., Li, P., Davidov, E., Butler, L. C., Copel, J., Katajamaa, M., Oresic, M., Buhimschi, I., Buhimschi, C., Snyder, M., Madri, J. A. 2009; 4 (1)


    Cardiovascular development is vital for embryonic survival and growth. Early gestation embryo loss or malformation has been linked to yolk sac vasculopathy and congenital heart defects (CHDs). However, the molecular pathways that underlie these structural defects in humans remain largely unknown hindering the development of molecular-based diagnostic tools and novel therapies.Murine embryos were exposed to high glucose, a condition known to induce cardiovascular defects in both animal models and humans. We further employed a mass spectrometry-based proteomics approach to identify proteins differentially expressed in embryos with defects from those with normal cardiovascular development. The proteins detected by mass spectrometry (WNT16, ST14, Pcsk1, Jumonji, Morca2a, TRPC5, and others) were validated by Western blotting and immunoflorescent staining of the yolk sac and heart. The proteins within the proteomic dataset clustered to adhesion/migration, differentiation, transport, and insulin signaling pathways. A functional role for several proteins (WNT16, ADAM15 and NOGO-A/B) was demonstrated in an ex vivo model of heart development. Additionally, a successful application of a cluster of protein biomarkers (WNT16, ST14 and Pcsk1) as a prenatal screen for CHDs was confirmed in a study of human amniotic fluid (AF) samples from women carrying normal fetuses and those with CHDs.The novel finding that WNT16, ST14 and Pcsk1 protein levels increase in fetuses with CHDs suggests that these proteins may play a role in the etiology of human CHDs. The information gained through this bed-side to bench translational approach contributes to a more complete understanding of the protein pathways dysregulated during cardiovascular development and provides novel avenues for diagnostic and therapeutic interventions, beneficial to fetuses at risk for CHDs.

    View details for DOI 10.1371/journal.pone.0004221

    View details for Web of Science ID 000265481900004

    View details for PubMedID 19156209

  • Three Distinct Condensin Complexes Control C. elegans Chromosome Dynamics CURRENT BIOLOGY Csankovszki, G., Collette, K., Spahl, K., Carey, J., Snyder, M., Petty, E., Patel, U., Tabuchi, T., Liu, H., Mcleod, I., Thompson, J., Sarkesik, A., Yates, J., Meyer, B. J., Hagstrom, K. 2009; 19 (1): 9-19


    Condensin complexes organize chromosome structure and facilitate chromosome segregation. Higher eukaryotes have two complexes, condensin I and condensin II, each essential for chromosome segregation. The nematode Caenorhabditis elegans was considered an exception, because it has a mitotic condensin II complex but appeared to lack mitotic condensin I. Instead, its condensin I-like complex (here called condensin I(DC)) dampens gene expression along hermaphrodite X chromosomes during dosage compensation.Here we report the discovery of a third condensin complex, condensin I, in C. elegans. We identify new condensin subunits and show that each complex has a conserved five-subunit composition. Condensin I differs from condensin I(DC) by only a single subunit. Yet condensin I binds to autosomes and X chromosomes in both sexes to promote chromosome segregation, whereas condensin I(DC) binds specifically to X chromosomes in hermaphrodites to regulate transcript levels. Both condensin I and II promote chromosome segregation, but associate with different chromosomal regions during mitosis and meiosis. Unexpectedly, condensin I also localizes to regions of cohesion between meiotic chromosomes before their segregation.We demonstrate that condensin subunits in C. elegans form three complexes, one that functions in dosage compensation and two that function in mitosis and meiosis. These results highlight how the duplication and divergence of condensin subunits during evolution may facilitate their adaptation to specialized chromosomal roles and illustrate the versatility of condensins to function in both gene regulation and chromosome segregation.

    View details for DOI 10.1016/j.cub.2008.12.006

    View details for Web of Science ID 000262584100022

    View details for PubMedID 19119011

  • PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data GENOME BIOLOGY Korbel, J. O., Abyzov, A., Mu, X. J., Carriero, N., Cayting, P., Zhang, Z., Snyder, M., Gerstein, M. B. 2009; 10 (2)


    Personal-genomics endeavors, such as the 1000 Genomes project, are generating maps of genomic structural variants by analyzing ends of massively sequenced genome fragments. To process these we developed Paired-End Mapper (PEMer; This comprises an analysis pipeline, compatible with several next-generation sequencing platforms; simulation-based error models, yielding confidence-values for each structural variant; and a back-end database. The simulations demonstrated high structural variant reconstruction efficiency for PEMer's coverage-adjusted multi-cutoff scoring-strategy and showed its relative insensitivity to base-calling errors.

    View details for DOI 10.1186/gb-2009-10-2-r23

    View details for Web of Science ID 000266345600020

    View details for PubMedID 19236709

  • RNA-Seq: a revolutionary tool for transcriptomics NATURE REVIEWS GENETICS Wang, Z., Gerstein, M., Snyder, M. 2009; 10 (1): 57-63


    RNA-Seq is a recently developed approach to transcriptome profiling that uses deep-sequencing technologies. Studies using this method have already altered our view of the extent and complexity of eukaryotic transcriptomes. RNA-Seq also provides a far more precise measurement of levels of transcripts and their isoforms than other methods. This article describes the RNA-Seq approach, the challenges associated with its application, and the advances made so far in characterizing several eukaryote transcriptomes.

    View details for DOI 10.1038/nrg2484

    View details for Web of Science ID 000261866500012

    View details for PubMedID 19015660

  • MAPK target networks in Arabidopsis thaliana revealed using functional protein microarrays GENES & DEVELOPMENT Popescu, S. C., Popescu, G. V., Bachan, S., Zhang, Z., Gerstein, M., Snyder, M., Dinesh-Kumar, S. P. 2009; 23 (1): 80-92


    Signaling through mitogen-activated protein kinases (MPKs) cascades is a complex and fundamental process in eukaryotes, requiring MPK-activating kinases (MKKs) and MKK-activating kinases (MKKKs). However, to date only a limited number of MKK-MPK interactions and MPK phosphorylation substrates have been revealed. We determined which Arabidopsis thaliana MKKs preferentially activate 10 different MPKs in vivo and used the activated MPKs to probe high-density protein microarrays to determine their phosphorylation targets. Our analyses revealed known and novel signaling modules encompassing 570 MPK phosphorylation substrates; these substrates were enriched in transcription factors involved in the regulation of development, defense, and stress responses. Selected MPK substrates were validated by in planta reconstitution experiments. A subset of activated and wild-type MKKs induced cell death, indicating a possible role for these MKKs in the regulation of cell death. Interestingly, MKK7- and MKK9-induced death requires Sgt1, a known regulator of cell death induced during plant innate immunity. Our predicted MKK-MPK phosphorylation network constitutes a valuable resource to understand the function and specificity of MPK signaling systems.

    View details for DOI 10.1101/gad.1740009

    View details for Web of Science ID 000262369700008

    View details for PubMedID 19095804

  • PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls NATURE BIOTECHNOLOGY Rozowsky, J., Euskirchen, G., Auerbach, R. K., Zhang, Z. D., Gibson, T., Bjornson, R., Carriero, N., Snyder, M., Gerstein, M. B. 2009; 27 (1): 66-75


    Chromatin immunoprecipitation (ChIP) followed by tag sequencing (ChIP-seq) using high-throughput next-generation instrumentation is fast, replacing chromatin immunoprecipitation followed by genome tiling array analysis (ChIP-chip) as the preferred approach for mapping of sites of transcription-factor binding and chromatin modification. Using two deeply sequenced data sets for human RNA polymerase II and STAT1, each with matching input-DNA controls, we describe a general scoring approach to address unique challenges in ChIP-seq data analysis. Our approach is based on the observation that sites of potential binding are strongly correlated with signal peaks in the control, likely revealing features of open chromatin. We develop a two-pass strategy called PeakSeq to compensate for this. A two-pass strategy compensates for signal caused by open chromatin, as revealed by inclusion of the controls. The first pass identifies putative binding sites and compensates for genomic variation in the 'mappability' of sequences. The second pass filters out sites not significantly enriched compared to the normalized control, computing precise enrichments and significances. Our scoring procedure enables us to optimize experimental design by estimating the depth of sequencing required for a desired level of coverage and demonstrating that more than two replicates provides only a marginal gain in information.

    View details for DOI 10.1038/nbt.1518

    View details for Web of Science ID 000262471200025

    View details for PubMedID 19122651

  • Protein microarrays. Methods in molecular biology (Clifton, N.J.) Fasolo, J., Snyder, M. 2009; 548: 209-222


    Protein microarrays containing nearly the entire yeast proteome have been constructed. They are typically prepared by overexpression and high-throughput purification and printing onto microscope slides. The arrays can be used to screen nearly the entire proteome in an unbiased fashion and have enormous utility for a variety of applications. These include protein-protein interactions, identification of novel lipid- and nucleic acid-binding proteins, and finding targets of small molecules, protein kinases, and other modification enzymes. Protein microarrays are thus powerful tools for individual studies as well as systematic characterization of proteins and their biochemical activities and regulation.

    View details for DOI 10.1007/978-1-59745-540-4_12

    View details for PubMedID 19521827

  • Global identification of protein kinase substrates by protein microarray analysis NATURE PROTOCOLS Mok, J., Im, H., Snyder, M. 2009; 4 (12): 1820-1827


    Herein, we describe a protocol for the global identification of in vitro substrates targeted by protein kinases using protein microarray technology. Large numbers of fusion proteins tagged at their carboxy-termini are purified in 96-well format and spotted in duplicate onto amino-silane-coated slides in a spatially addressable manner. These arrays are incubated in the presence of purified kinase and radiolabeled ATP, and then washed, dried and analyzed by autoradiography. The extent of phosphorylation of each spot is quantified and normalized, and proteins that are reproducibly phosphorylated in the presence of the active kinase relative to control slides are scored as positive substrates. This approach enables the rapid determination of kinase-substrate relationship on a proteome-wide scale, and although developed using yeast, has since been adapted to higher eukaryotic systems. Expression, purification and printing of the yeast proteome require about 3 weeks. Afterwards, each kinase assay takes approximately 3 h to perform.

    View details for DOI 10.1038/nprot.2009.194

    View details for Web of Science ID 000274226100011

    View details for PubMedID 20010933

  • MSB: A mean-shift-based approach for the analysis of structural variation in the genome GENOME RESEARCH Wang, L., Abyzov, A., Korbel, J. O., Snyder, M., Gerstein, M. 2009; 19 (1): 106-117


    Genome structural variation includes segmental duplications, deletions, and other rearrangements, and array-based comparative genomic hybridization (array-CGH) is a popular technology for determining this. Drawing relevant conclusions from array-CGH requires computational methods for partitioning the chromosome into segments of elevated, reduced, or unchanged copy number. Several approaches have been described, most of which attempt to explicitly model the underlying distribution of data based on particular assumptions. Often, they optimize likelihood functions for estimating model parameters, by expectation maximization or related approaches; however, this requires good parameter initialization through prespecifying the number of segments. Moreover, convergence is difficult to achieve, since many parameters are required to characterize an experiment. To overcome these limitations, we propose a nonparametric method without a global criterion to be optimized. Our method involves mean-shift-based (MSB) procedures; it considers the observed array-CGH signal as sampling from a probability-density function, uses a kernel-based approach to estimate local gradients for this function, and iteratively follows them to determine local modes of the signal. Overall, our method achieves robust discontinuity-preserving smoothing, thus accurately segmenting chromosomes into regions of duplication and deletion. It does not require the number of segments as input, nor does its convergence depend on this. We successfully applied our method to both simulated data and array-CGH experiments on glioblastoma and adenocarcinoma. We show that it performs at least as well as, and often better than, 10 previously published algorithms. Finally, we show that our approach can be extended to segmenting the signal resulting from the depth-of-coverage of mapped reads from next-generation sequencing.

    View details for DOI 10.1101/gr.080069.108

    View details for Web of Science ID 000262200000010

    View details for PubMedID 19037015

  • Mismatch oligonucleotides in human and yeast: guidelines for probe design on tiling microarrays BMC GENOMICS Seringhaus, M., Rozowsky, J., Royce, T., Nagalakshmi, U., Jee, J., Snyder, M., Gerstein, M. 2008; 9


    Mismatched oligonucleotides are widely used on microarrays to differentiate specific from nonspecific hybridization. While many experiments rely on such oligos, the hybridization behavior of various degrees of mismatch (MM) structure has not been extensively studied. Here, we present the results of two large-scale microarray experiments on S. cerevisiae and H. sapiens genomic DNA, to explore MM oligonucleotide behavior with real sample mixtures under tiling-array conditions.We examined all possible nucleotide substitutions at the central position of 36-nucleotide probes, and found that nonspecific binding by MM oligos depends upon the individual nucleotide substitutions they incorporate: C-->A, C-->G and T-->A (yielding purine-purine mispairs) are most disruptive, whereas A-->X were least disruptive. We also quantify a marked GC skew effect: substitutions raising probe GC content exhibit higher intensity (and vice versa). This skew is small in highly-expressed regions (+/- 0.5% of total intensity range) and large (+/- 2% or more) elsewhere. Multiple mismatches per oligo are largely additive in effect: each MM added in a distributed fashion causes an additional 21% intensity drop relative to PM, three-fold more disruptive than adding adjacent mispairs (7% drop per MM).We investigate several parameters for oligonucleotide design, including the effects of each central nucleotide substitution on array signal intensity and of multiple MM per oligo. To avoid GC skew, individual substitutions should not alter probe GC content. RNA sample mixture complexity may increase the amount of nonspecific hybridization, magnify GC skew and boost the intensity of MM oligos at all levels.

    View details for DOI 10.1186/1471-2164-9-635

    View details for Web of Science ID 000264109200001

    View details for PubMedID 19117516

  • Analysis of copy number variants and segmental duplications in the human genome: Evidence for a change in the process of formation in recent evolutionary history GENOME RESEARCH Kim, P. M., Lam, H. Y., Urban, A. E., Korbel, J. O., Affourtit, J., Grubert, F., Chen, X., Weissman, S., Snyder, M., Gerstein, M. B. 2008; 18 (12): 1865-1874


    Segmental duplications (SDs) are operationally defined as >1 kb stretches of duplicated DNA with high sequence identity. They arise from copy number variants (CNVs) fixed in the population. To investigate the formation of SDs and CNVs, we examine their large-scale patterns of co-occurrence with different repeats. Alu elements, a major class of genomic repeats, had previously been identified as prime drivers of SD formation. We also observe this association; however, we find that it sharply decreases for younger SDs. Continuing this trend, we find only weak associations of CNVs with Alus. Similarly, we find an association of SDs with processed pseudogenes, which is decreasing for younger SDs and absent entirely for CNVs. Next, we find that SDs are significantly co-localized with each other, resulting in a highly skewed "power-law" distribution and chromosomal hotspots. We also observe a significant association of CNVs with SDs, but find that an SD-mediated mechanism only accounts for some CNVs (<28%). Overall, our results imply that a shift in predominant formation mechanism occurred in recent history: approximately 40 million years ago, during the "Alu burst" in retrotransposition activity, non-allelic homologous recombination, first mediated by Alus and then the by newly formed CNVs themselves, was the main driver of genome rearrangements; however, its relative importance has decreased markedly since then, with proportionally more events now stemming from other repeats and from non-homologous end-joining. In addition to a coarse-grained analysis, we performed targeted sequencing of 67 CNVs and then analyzed a combined set of 270 CNVs (540 breakpoints) to verify our conclusions.

    View details for DOI 10.1101/gr.081422.108

    View details for Web of Science ID 000261398900002

    View details for PubMedID 18842824

  • Genome-wide relationship between histone H3 lysine 4 mono- and tri-methylation and transcription factor binding GENOME RESEARCH Robertson, A. G., Bilenky, M., Tam, A., Zhao, Y., Zeng, T., Thiessen, N., Cezard, T., Fejes, A. P., Wederell, E. D., Cullum, R., Euskirchen, G., Krzywinski, M., Birol, I., Snyder, M., Hoodless, P. A., Hirst, M., Marra, M. A., Jones, S. J. 2008; 18 (12): 1906-1917


    We characterized the relationship of H3K4me1 and H3K4me3 at distal and proximal regulatory elements by comparing ChIP-seq profiles for these histone modifications and for two functionally different transcription factors: STAT1 in the immortalized HeLa S3 cell line, with and without interferon-gamma (IFNG) stimulation; and FOXA2 in mouse adult liver tissue. In unstimulated and stimulated HeLa cells, respectively, we determined approximately 270,000 and approximately 301,000 H3K4me1-enriched regions, and approximately 54,500 and approximately 76,100 H3K4me3-enriched regions. In mouse adult liver, we determined approximately 227,000 and approximately 34,800 H3K4me1 and H3K4me3 regions. Seventy-five percent of the approximately 70,300 STAT1 binding sites in stimulated HeLa cells and 87% of the approximately 11,000 FOXA2 sites in mouse liver were distal to known gene TSS; in both cell types, approximately 83% of these distal sites were associated with at least one of the two histone modifications, and H3K4me1 was associated with over 96% of marked distal sites. After filtering against predicted transcription start sites, 50% of approximately 26,800 marked distal IFNG-stimulated STAT1 binding sites, but 95% of approximately 5800 marked distal FOXA2 sites, were associated with H3K4me1 only. Results for HeLa cells generated additional insights into transcriptional regulation involving STAT1. STAT1 binding was associated with 25% of all H3K4me1 regions in stimulated HeLa cells, suggesting that a single transcription factor can interact with an unexpectedly large fraction of regulatory regions. Strikingly, for a large majority of the locations of stimulated STAT1 binding, the dominant H3K4me1/me3 combinations were established before activation, suggesting mechanisms independent of IFNG stimulation and high-affinity STAT1 binding.

    View details for DOI 10.1101/gr.078519.108

    View details for Web of Science ID 000261398900006

    View details for PubMedID 18787082

  • High-Resolution Copy-Number Variation Map Reflects Human Olfactory Receptor Diversity and Evolution PLOS GENETICS Hasin, Y., Olender, T., Khen, M., Gonzaga-Jauregui, C., Kim, P. M., Urban, A. E., Snyder, M., Gerstein, M. B., Lancet, D., Korbel, J. O. 2008; 4 (11)


    Olfactory receptors (ORs), which are involved in odorant recognition, form the largest mammalian protein superfamily. The genomic content of OR genes is considerably reduced in humans, as reflected by the relatively small repertoire size and the high fraction ( approximately 55%) of human pseudogenes. Since several recent low-resolution surveys suggested that OR genomic loci are frequently affected by copy-number variants (CNVs), we hypothesized that CNVs may play an important role in the evolution of the human olfactory repertoire. We used high-resolution oligonucleotide tiling microarrays to detect CNVs across 851 OR gene and pseudogene loci. Examining genomic DNA from 25 individuals with ancestry from three populations, we identified 93 OR gene loci and 151 pseudogene loci affected by CNVs, generating a mosaic of OR dosages across persons. Our data suggest that approximately 50% of the CNVs involve more than one OR, with the largest CNV spanning 11 loci. In contrast to earlier reports, we observe that CNVs are more frequent among OR pseudogenes than among intact genes, presumably due to both selective constraints and CNV formation biases. Furthermore, our results show an enrichment of CNVs among ORs with a close human paralog or lacking a one-to-one ortholog in chimpanzee. Interestingly, among the latter we observed an enrichment in CNV losses over gains, a finding potentially related to the known diminution of the human OR repertoire. Quantitative PCR experiments performed for 122 sampled ORs agreed well with the microarray results and uncovered 23 additional CNVs. Importantly, these experiments allowed us to uncover nine common deletion alleles that affect 15 OR genes and five pseudogenes. Comparison to the chimpanzee reference genome revealed that all of the deletion alleles are human derived, therefore indicating a profound effect of human-specific deletions on the individual OR gene content. Furthermore, these deletion alleles may be used in future genetic association studies of olfactory inter-individual differences.

    View details for DOI 10.1371/journal.pgen.1000249

    View details for Web of Science ID 000261481000004

    View details for PubMedID 18989455

  • A procedure for highly specific, sensitive, and unbiased whole-genome amplification PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Pan, X., Urban, A. E., Palejev, D., Schulz, V., Grubert, F., Hu, Y., Snyder, M., Weissman, S. M. 2008; 105 (40): 15499-15504


    Highly specific amplification of complex DNA pools without bias or template-independent products (TIPs) remains a challenge. We have developed a method using phi29 DNA polymerase and trehalose and optimized control of amplification to create micrograms of specific amplicons without TIPs from down to subfemtograms of DNA. With an input of as little as 0.5-2.5 ng of human gDNA or a few cells, the product could be close to native DNA in locus representation. The amplicons from 5 and 0.5 ng of DNA faithfully demonstrated all previously known heterozygous segmental duplications and deletions (3 Mb to 18 kb) located on chromosome 22 and even a homozygous deletion smaller than 1 kb with high-resolution chromosome-wide comparative genomic hybridization. With 550k Infinium BeadChip SNP typing, the >99.7% accuracy was compared favorably with results on unamplified DNA. Importantly, underrepresentation of chromosome termini that occurred with GenomiPhi v2 was greatly rescued with the present procedure, and the call rate and accuracy of SNP typing were also improved for the amplicons with a 0.5-ng, partially degraded DNA input. In addition, the amplification proceeded logarithmically in terms of total yield before saturation; the intact cells was amplified >50 times more efficiently than an equivalent amount of extracted DNA; and the locus imbalance for amplicons with 0.1 ng or lower input of DNA was variable, whereas for higher input it was largely reproducible. This procedure facilitates genomic analysis with single cells or other traces of DNA, and generates products suitable for analysis by massively parallel sequencing as well as microarray hybridization.

    View details for DOI 10.1073/pnas.0808028105

    View details for Web of Science ID 000260360500052

    View details for PubMedID 18832167

  • High-quality binary protein interaction map of the yeast interactome network SCIENCE Yu, H., Braun, P., Yildirim, M. A., Lemmens, I., Venkatesan, K., Sahalie, J., Hirozane-Kishikawa, T., Gebreab, F., Li, N., Simonis, N., Hao, T., Rual, J., Dricot, A., Vazquez, A., Murray, R. R., Simon, C., Tardivo, L., Tam, S., Svrzikapa, N., Fan, C., De Smet, A., Motyl, A., Hudson, M. E., Park, J., Xin, X., Cusick, M. E., Moore, T., Boone, C., Snyder, M., Roth, F. P., Barabasi, A., Tavernier, J., Hill, D. E., Vidal, M. 2008; 322 (5898): 104-110


    Current yeast interactome network maps contain several hundred molecular complexes with limited and somewhat controversial representation of direct binary interactions. We carried out a comparative quality assessment of current yeast interactome data sets, demonstrating that high-throughput yeast two-hybrid (Y2H) screening provides high-quality binary interaction information. Because a large fraction of the yeast binary interactome remains to be mapped, we developed an empirically controlled mapping framework to produce a "second-generation" high-quality, high-throughput Y2H data set covering approximately 20% of all yeast binary interactions. Both Y2H and affinity purification followed by mass spectrometry (AP/MS) data are of equally high quality but of a fundamentally different and complementary nature, resulting in networks with different topological and biological properties. Compared to co-complex interactome models, this binary map is enriched for transient signaling interactions and intercomplex connections with a highly significant clustering between essential proteins. Rather than correlating with essentiality, protein connectivity correlates with genetic pleiotropy.

    View details for DOI 10.1126/science.1158684

    View details for Web of Science ID 000259680200048

    View details for PubMedID 18719252

  • Modeling ChIP Sequencing In Silico with Applications PLOS COMPUTATIONAL BIOLOGY Zhang, Z. D., Rozowsky, J., Snyder, M., Chang, J., Gerstein, M. 2008; 4 (8)


    ChIP sequencing (ChIP-seq) is a new method for genomewide mapping of protein binding sites on DNA. It has generated much excitement in functional genomics. To score data and determine adequate sequencing depth, both the genomic background and the binding sites must be properly modeled. To develop a computational foundation to tackle these issues, we first performed a study to characterize the observed statistical nature of this new type of high-throughput data. By linking sequence tags into clusters, we show that there are two components to the distribution of tag counts observed in a number of recent experiments: an initial power-law distribution and a subsequent long right tail. Then we develop in silico ChIP-seq, a computational method to simulate the experimental outcome by placing tags onto the genome according to particular assumed distributions for the actual binding sites and for the background genomic sequence. In contrast to current assumptions, our results show that both the background and the binding sites need to have a markedly nonuniform distribution in order to correctly model the observed ChIP-seq data, with, for instance, the background tag counts modeled by a gamma distribution. On the basis of these results, we extend an existing scoring approach by using a more realistic genomic-background model. This enables us to identify transcription-factor binding sites in ChIP-seq data in a statistically rigorous fashion.

    View details for DOI 10.1371/journal.pcbi.1000158

    View details for Web of Science ID 000260041300021

    View details for PubMedID 18725927

  • A genomic analysis of RNA polymerase II modification and chromatin architecture related to 3 ' end RNA polyadenylation GENOME RESEARCH Lian, Z., Karpikov, A., Lian, J., Mahajan, M. C., Hartman, S., Gerstein, M., Snyder, M., Weissman, S. M. 2008; 18 (8): 1224-1237


    Genomic analyses have been applied extensively to analyze the process of transcription initiation in mammalian cells, but less to transcript 3' end formation and transcription termination. We used a novel approach to prepare 3' end fragments from polyadenylated RNA, and mapped the position of the poly(A) addition site using oligonucleotide arrays tiling 1% of the human genome. This approach revealed more 3' ends than had been annotated. The distribution of these ends relative to RNA polymerase II (PolII) and di- and trimethylated lysine 4 and lysine 36 of histone H3 was compared. A substantial fraction of unannotated 3' ends of RNA are intronic and antisense to the embedding gene. Poly(A) ends of annotated messages lie on average 2 kb upstream of the end of PolII binding (termination). Near the termination sites, and in some internal sites, unphosphorylated and C-terminal domain (CTD) serine 2 phosphorylated PolII (POLR2A) accumulate, suggesting pausing of the polymerase and perhaps dephosphorylation prior to release. Lysine 36 trimethylation occurs across transcribed genes, sometimes alternating with stretches of DNA in which lysine 36 dimethylation is more prominent. Lysine 36 methylation decreases at or near the site of polyadenylation, sometimes disappearing before disappearance of phosphorylated RNA PolII or release of PolII from DNA. Our results suggest that transcription termination loss of histone 3 lysine 36 methylation and later release of RNA polymerase. The latter is often associated with polymerase pausing. Overall, our study reveals extensive sites of poly(A) addition and provides insights into the events that occur during 3' end formation.

    View details for DOI 10.1101/gr.075804.107

    View details for Web of Science ID 000258116100004

    View details for PubMedID 18487515

  • Genome-Wide Occupancy of SREBP1 and Its Partners NFY and SP1 Reveals Novel Functional Roles and Combinatorial Regulation of Distinct Classes of Genes PLOS GENETICS Reed, B. D., Charos, A. E., Szekely, A. M., Weissman, S. M., Snyder, M. 2008; 4 (7)


    The sterol regulatory element-binding protein (SREBP) family member SREBP1 is a critical transcriptional regulator of cholesterol and fatty acid metabolism and has been implicated in insulin resistance, diabetes, and other diet-related diseases. We globally identified the promoters occupied by SREBP1 and its binding partners NFY and SP1 in a human hepatocyte cell line using chromatin immunoprecipitation combined with genome tiling arrays (ChIP-chip). We find that SREBP1 occupies the promoters of 1,141 target genes involved in diverse biological pathways, including novel targets with roles in lipid metabolism and insulin signaling. We also identify a conserved SREBP1 DNA-binding motif in SREBP1 target promoters, and we demonstrate that many SREBP1 target genes are transcriptionally activated by treatment with insulin and glucose using gene expression microarrays. Finally, we show that SREBP1 cooperates extensively with NFY and SP1 throughout the genome and that unique combinations of these factors target distinct functional pathways. Our results provide insight into the regulatory circuitry in which SREBP1 and its network partners coordinate a complex transcriptional response in the liver with cues from the diet.

    View details for DOI 10.1371/journal.pgen.1000133

    View details for Web of Science ID 000260410600005

    View details for PubMedID 18654640

  • The transcriptional landscape of the yeast genome defined by RNA sequencing SCIENCE Nagalakshmi, U., Wang, Z., Waern, K., Shou, C., Raha, D., Gerstein, M., Snyder, M. 2008; 320 (5881): 1344-1349


    The identification of untranslated regions, introns, and coding regions within an organism remains challenging. We developed a quantitative sequencing-based method called RNA-Seq for mapping transcribed regions, in which complementary DNA fragments are subjected to high-throughput sequencing and mapped to the genome. We applied RNA-Seq to generate a high-resolution transcriptome map of the yeast genome and demonstrated that most (74.5%) of the nonrepetitive sequence of the yeast genome is transcribed. We confirmed many known and predicted introns and demonstrated that others are not actively used. Alternative initiation codons and upstream open reading frames also were identified for many yeast genes. We also found unexpected 3'-end heterogeneity and the presence of many overlapping genes. These results indicate that the yeast transcriptome is more complex than previously appreciated.

    View details for DOI 10.1126/science.1158441

    View details for Web of Science ID 000256441100046

    View details for PubMedID 18451266

  • The current excitement about copy-number variation: how it relates to gene duplications and protein families CURRENT OPINION IN STRUCTURAL BIOLOGY Korbel, J. O., Kim, P. M., Chen, X., Urban, A. E., Weissman, S., Snyder, M., Gerstein, M. B. 2008; 18 (3): 366-374


    Following recent technological advances there has been an increasing interest in genome structural variants (SVs), in particular copy-number variants (CNVs)--large-scale duplications and deletions. Although not immediately evident, CNV surveys make a conceptual connection between the fields of population genetics and protein families, in particular with regard to the stability and expandability of families. The mechanisms giving rise to CNVs can be considered as fundamental processes underlying gene duplication and loss; duplicated genes being the results of 'successful' copies, fixed and maintained in the population. Conversely, many 'unsuccessful' duplicates remain in the genome as pseudogenes. Here, we survey studies on CNVs, highlighting issues related to protein families. In particular, CNVs tend to affect specific gene functional categories, such as those associated with environmental response, and are depleted in genes related to basic cellular processes. Furthermore, CNVs occur more often at the periphery of the protein interaction network. In comparison, protein families associated with successful and unsuccessful duplicates are associated with similar functional categories but are differentially placed in the interaction network. These trends are likely reflective of CNV formation biases and natural selection, both of which differentially influence distinct protein families.

    View details for DOI 10.1016/

    View details for Web of Science ID 000257539100013

    View details for PubMedID 18511261

  • Leptin affects endocardial cushion formation by modulating EMT and migration via Akt signaling cascades JOURNAL OF CELL BIOLOGY Nath, A. K., Brown, R. M., Michaud, M., Sierra-Honigmann, M. R., Snyder, M., Madri, J. A. 2008; 181 (2): 367-380


    Blood circulation is dependent on heart valves to direct blood flow through the heart and great vessels. Valve development relies on epithelial to mesenchymal transition (EMT), a central feature of embryonic development and metastatic cancer. Abnormal EMT and remodeling contribute to the etiology of several congenital heart defects. Leptin and its receptor were detected in the mouse embryonic heart. Using an ex vivo model of cardiac EMT, the inhibition of leptin results in a signal transducer and activator of transcription 3 and Snail/vascular endothelial cadherin-independent decrease in EMT and migration. Our data suggest that an Akt signaling pathway underlies the observed phenotype. Furthermore, loss of leptin phenocopied the functional inhibition of alphavbeta3 integrin receptor and resulted in decreased alphavbeta3 integrin and matrix metalloprotease 2, suggesting that the leptin signaling pathway is involved in adhesion and migration processes. This study adds leptin to the repertoire of factors that mediate EMT and, for the first time, demonstrates a role for the interleukin 6 family in embryonic EMT.

    View details for Web of Science ID 000255410300018

    View details for PubMedID 18411306

  • Myo2p, a class V myosin in budding yeast, associates with a large ribonucleic acid-protein complex that contains mRNAs and subunits of the RNA-processing body RNA-A PUBLICATION OF THE RNA SOCIETY Chang, W., Zaarour, R. F., Reck-Peterson, S., Rinn, J., Singer, R. H., Snyder, M., Novick, P., Mooseker, M. S. 2008; 14 (3): 491-502


    Myo2p is an essential class V myosin in budding yeast with several identified functions in organelle trafficking and spindle orientation. The present study demonstrates that Myo2p is a component of a large RNA-containing complex (Myo2p-RNP) that is distinct from polysomes based on sedimentation analysis and lack of ribosomal subunits in the Myo2p-RNP. Microarray analysis of RNAs that coimmunoprecipitate with Myo2p revealed the presence of a large number of mRNAs in this complex. The Myo2p-RNA complex is in part composed of the RNA processing body (P-body) based on coprecipitation with P-body protein subunits and partial colocalization of Myo2p with P-bodies. P-body disassembly is delayed in the motor mutant, myo2-66, indicating that Myo2p may facilitate the release of mRNAs from the P-body.

    View details for DOI 10.1261/rna.665008

    View details for Web of Science ID 000253565400012

    View details for PubMedID 18218704

  • Systematic analysis of transcribed loci in ENCODE regions using RACE sequencing reveals extensive transcription in the human genome GENOME BIOLOGY QianWu, J., Du, J., Rozowsky, J., Zhang, Z., Urban, A. E., Euskirchen, G., Weissman, S., Gerstein, M., Snyder, M. 2008; 9 (1)


    Recent studies of the mammalian transcriptome have revealed a large number of additional transcribed regions and extraordinary complexity in transcript diversity. However, there is still much uncertainty regarding precisely what portion of the genome is transcribed, the exact structures of these novel transcripts, and the levels of the transcripts produced.We have interrogated the transcribed loci in 420 selected ENCyclopedia Of DNA Elements (ENCODE) regions using rapid amplification of cDNA ends (RACE) sequencing. We analyzed annotated known gene regions, but primarily we focused on novel transcriptionally active regions (TARs), which were previously identified by high-density oligonucleotide tiling arrays and on random regions that were not believed to be transcribed. We found RACE sequencing to be very sensitive and were able to detect low levels of transcripts in specific cell types that were not detectable by microarrays. We also observed many instances of sense-antisense transcripts; further analysis suggests that many of the antisense transcripts (but not all) may be artifacts generated from the reverse transcription reaction. Our results show that the majority of the novel TARs analyzed (60%) are connected to other novel TARs or known exons. Of previously unannotated random regions, 17% were shown to produce overlapping transcripts. Furthermore, it is estimated that 9% of the novel transcripts encode proteins.We conclude that RACE sequencing is an efficient, sensitive, and highly accurate method for characterization of the transcriptome of specific cell/tissue types. Using this method, it appears that much of the genome is represented in polyA+ RNA. Moreover, a fraction of the novel RNAs can encode protein and are likely to be functional.

    View details for DOI 10.1186/gb-2008-9-1-r3

    View details for Web of Science ID 000253779800011

    View details for PubMedID 18173853

  • The development of protein microarrays and their applications in DNA-protein and protein-protein interaction analyses of Arabidopsis transcription factors MOLECULAR PLANT Gong, W., He, K., Covington, M., Dinesh-Kumar, S. P., Snyder, M., Harmer, S. L., Zhu, Y., Deng, X. W. 2008; 1 (1): 27-41


    We used our collection of Arabidopsis transcription factor (TF) ORFeome clones to construct protein microarrays containing as many as 802 TF proteins. These protein microarrays were used for both protein-DNA and protein-protein interaction analyses. For protein-DNA interaction studies, we examined AP2/ERF family TFs and their cognate cis-elements. By careful comparison of the DNA-binding specificity of 13 TFs on the protein microarray with previous non-microarray data, we showed that protein microarrays provide an efficient and high throughput tool for genome-wide analysis of TF-DNA interactions. This microarray protein-DNA interaction analysis allowed us to derive a comprehensive view of DNA-binding profiles of AP2/ERF family proteins in Arabidopsis. It also revealed four TFs that bound the EE (evening element) and had the expected phased gene expression under clock-regulation, thus providing a basis for further functional analysis of their roles in clock regulation of gene expression. We also developed procedures for detecting protein interactions using this TF protein microarray and discovered four novel partners that interact with HY5, which can be validated by yeast two-hybrid assays. Thus, plant TF protein microarrays offer an attractive high-throughput alternative to traditional techniques for TF functional characterization on a global scale.

    View details for DOI 10.1093/mp/ssm009

    View details for Web of Science ID 000259068900005

    View details for PubMedID 19802365

  • RNA polymerase II stalling: loading at the start prepares genes for a sprint GENOME BIOLOGY Wu, J. Q., Snyder, M. 2008; 9 (5)


    Stalling of RNA polymerase II near the promoter has recently been found to be much more common than previously thought. Genome-wide surveys of the phenomenon suggest that it is likely to be a rate-limiting control on gene activation that poises developmental and stimulus-responsive genes for prompt expression when inducing signals are received.

    View details for DOI 10.1186/gb-2008-9-5-220

    View details for Web of Science ID 000257564800002

    View details for PubMedID 18466645

  • Identification of differentially expressed proteins in ovarian cancer using high-density protein microarrays PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Hudson, M. E., Pozdnyakova, I., Haines, K., Mor, G., Snyder, M. 2007; 104 (44): 17494-17499


    Ovarian cancer is a leading cause of deaths, yet many aspects of the biology of the disease and a routine means of its detection are lacking. We have used protein microarrays and autoantibodies from cancer patients to identify proteins that are aberrantly expressed in ovarian tissue. Sera from 30 cancer patients and 30 healthy individuals were used to probe microarrays containing 5,005 human proteins. Ninety-four antigens were identified that exhibited enhanced reactivity from sera in cancer patients relative to control sera. The differential reactivity of four antigens was tested by using immunoblot analysis and tissue microarrays. Lamin A/C, SSRP1, and RALBP1 were found to exhibit increased expression in the cancer tissue relative to controls. The combined signals from multiple antigens proved to be a robust test to identify cancerous ovarian tissue. These antigens were also reactive with tissue from other types of cancer and thus are not specific to ovarian cancer. Overall our studies identified candidate tissue marker proteins for ovarian cancer and demonstrate that protein microarrays provide a powerful approach to identify proteins aberrantly expressed in disease states.

    View details for DOI 10.1073/pnas.0708572104

    View details for Web of Science ID 000250638400048

    View details for PubMedID 17954908

  • Paired-end mapping reveals extensive structural variation in the human genome SCIENCE Korbel, J. O., Urban, A. E., Affourtit, J. P., Godwin, B., Grubert, F., Simons, J. F., Kim, P. M., Palejev, D., Carriero, N. J., Du, L., Taillon, B. E., Chen, Z., Tanzer, A., Saunders, A. C., Chi, J., Yang, F., Carter, N. P., Hurles, M. E., Weissman, S. M., Harkins, T. T., Gerstein, M. B., Egholm, M., Snyder, M. 2007; 318 (5849): 420-426


    Structural variation of the genome involves kilobase- to megabase-sized deletions, duplications, insertions, inversions, and complex combinations of rearrangements. We introduce high-throughput and massive paired-end mapping (PEM), a large-scale genome-sequencing method to identify structural variants (SVs) approximately 3 kilobases (kb) or larger that combines the rescue and capture of paired ends of 3-kb fragments, massive 454 sequencing, and a computational approach to map DNA reads onto a reference genome. PEM was used to map SVs in an African and in a putatively European individual and identified shared and divergent SVs relative to the reference genome. Overall, we fine-mapped more than 1000 SVs and documented that the number of SVs among humans is much larger than initially hypothesized; many of the SVs potentially affect gene function. The breakpoint junction sequences of more than 200 SVs were determined with a novel pooling strategy and computational analysis. Our analysis provided insights into the mechanisms of SV formation in humans.

    View details for DOI 10.1126/science.1149504

    View details for Web of Science ID 000250230400038

    View details for PubMedID 17901297

  • Transcription factor binding site identification in yeast: a comparison of high-density oligonucleotide and PCR-based microarray platforms FUNCTIONAL & INTEGRATIVE GENOMICS Borneman, A. R., Zhang, Z. D., Rozowsky, J., Seringhaus, M. R., Gerstein, M., Snyder, M. 2007; 7 (4): 335-345


    In recent years, techniques have been developed to map transcription factor binding sites using chromatin immunoprecipitation combined with DNA microarrays (chIP chip). Initially, polymerase chain reaction (PCR)-based DNA arrays were used for the chIP chip procedure, however, high-density oligonucleotide (HDO) arrays, which allow for the production of thousands more features per array, have emerged as a competing array platform. To compare the two platforms, data from chIP chip analysis performed for three factors (Tec1, Ste12, and Sok2) using both HDO and PCR arrays under identical experimental conditions were compared. HDO arrays provided increased reproducibility and sensitivity, detecting approximately three times more binding events than the PCR arrays while also showing increased accuracy. The increased resolution provided by the HDO arrays also allowed for the identification of multiple binding peaks in close proximity and of novel binding events such as binding within ORFs. The HDO array platform provides a far more robust array system by all measures than PCR-based arrays, all of which is directly attributable to the large number of probes available.

    View details for DOI 10.1007/s10142-007-0054-7

    View details for Web of Science ID 000249808300006

    View details for PubMedID 17638031

  • Arabidopsis protein microarrays for the high-throughput identification of protein-protein interactions. Plant signaling & behavior Popescu, S. C., Snyder, M., Dinesh-Kumar, S. 2007; 2 (5): 416-420


    Protein microarray technology has emerged as a powerful new approach for the study of thousands of proteins simultaneously. Protein microarrays have been used for a wide variety of applications for the human and yeast systems. In a recent study, we demonstrated that Arabidopsis functional protein microarrays can be generated and employed to characterize the function of plant proteins. The arrayed proteins were produced using an optimized large-scale plant-based expression system. In a proof-of concept study, 173 known and novel potential substrates of calmodulin (CaM) and calmodulin-like proteins (CML) were identified in an unbiased and high-throughput manner. The information documented here on novel potential CaM targets provides new testable hypotheses in the area of CaM/Ca(2+)-regulated processes and represents a resource of functional information for the scientific community.

    View details for PubMedID 19704619

  • Divergence of transcription factor binding sites across related yeast species SCIENCE Borneman, A. R., Gianoulis, T. A., Zhang, Z. D., Yu, H., Rozowsky, J., Seringhaus, M. R., Wang, L. Y., Gerstein, M., Snyder, M. 2007; 317 (5839): 815-819


    Characterization of interspecies differences in gene regulation is crucial for understanding the molecular basis of both phenotypic diversity and evolution. By means of chromatin immunoprecipitation and DNA microarray analysis, the divergence in the binding sites of the pseudohyphal regulators Ste12 and Tec1 was determined in the yeasts Saccharomyces cerevisiae, S. mikatae, and S. bayanus under pseudohyphal conditions. We have shown that most of these sites have diverged across these species, far exceeding the interspecies variation in orthologous genes. A group of Ste12 targets was shown to be bound only in S. mikatae and S. bayanus under pseudohyphal conditions. Many of these genes are targets of Ste12 during mating in S. cerevisiae, indicating that specialization between the two pathways has occurred in this species. Transcription factor binding sites have therefore diverged substantially faster than ortholog content. Thus, gene regulation resulting from transcription factor binding is likely to be a major cause of divergence between related species.

    View details for DOI 10.1126/science.1140748

    View details for Web of Science ID 000248624500044

    View details for PubMedID 17690298

  • Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project NATURE Birney, E., Stamatoyannopoulos, J. A., Dutta, A., Guigo, R., Gingeras, T. R., Margulies, E. H., Weng, Z., Snyder, M., Dermitzakis, E. T., Stamatoyannopoulos, J. A., Thurman, R. E., Kuehn, M. S., Taylor, C. M., Neph, S., Koch, C. M., Asthana, S., Malhotra, A., Adzhubei, I., Greenbaum, J. A., Andrews, R. M., Flicek, P., Boyle, P. J., Cao, H., Carter, N. P., Clelland, G. K., Davis, S., Day, N., Dhami, P., Dillon, S. C., Dorschner, M. O., Fiegler, H., Giresi, P. G., Goldy, J., Hawrylycz, M., Haydock, A., Humbert, R., James, K. D., Johnson, B. E., Johnson, E. M., Frum, T. T., Rosenzweig, E. R., Karnani, N., Lee, K., Lefebvre, G. C., Navas, P. A., Neri, F., Parker, S. C., Sabo, P. J., Sandstrom, R., Shafer, A., Vetrie, D., Weaver, M., Wilcox, S., Yu, M., Collins, F. S., Dekker, J., Lieb, J. D., Tullius, T. D., Crawford, G. E., Sunyaev, S., Noble, W. S., Dunham, I., Dutta, A., Guigo, R., Denoeud, F., Reymond, A., Kapranov, P., Rozowsky, J., Zheng, D., Castelo, R., Frankish, A., Harrow, J., Ghosh, S., Sandelin, A., Hofacker, I. L., Baertsch, R., Keefe, D., Flicek, P., Dike, S., Cheng, J., Hirsch, H. A., Sekinger, E. A., Lagarde, J., Abril, J. F., Shahab, A., Flamm, C., Fried, C., Hackermueller, J., Hertel, J., Lindemeyer, M., Missal, K., Tanzer, A., Washietl, S., Korbel, J., Emanuelsson, O., Pedersen, J. S., Holroyd, N., Taylor, R., Swarbreck, D., Matthews, N., Dickson, M. C., Thomas, D. J., Weirauch, M. T., Gilbert, J., Drenkow, J., Bell, I., Zhao, X., Srinivasan, K. G., Sung, W., Ooi, H. S., Chiu, K. P., Foissac, S., Alioto, T., Brent, M., Pachter, L., Tress, M. L., Valencia, A., Choo, S. W., Choo, C. Y., Ucla, C., Manzano, C., Wyss, C., Cheung, E., Clark, T. G., Brown, J. B., Ganesh, M., Patel, S., Tammana, H., Chrast, J., Henrichsen, C. N., Kai, C., Kawai, J., Nagalakshmi, U., Wu, J., Lian, Z., Lian, J., Newburger, P., Zhang, X., Bickel, P., Mattick, J. S., Carninci, P., Hayashizaki, Y., Weissman, S., Dermitzakis, E. T., Margulies, E. H., Hubbard, T., Myers, R. M., Rogers, J., Stadler, P. F., Lowe, T. M., Wei, C., Ruan, Y., Snyder, M., Birney, E., Struhl, K., Gerstein, M., Antonarakis, S. E., Gingeras, T. R., Brown, J. B., Flicek, P., Fu, Y., Keefe, D., Birney, E., Denoeud, F., Gerstein, M., Green, E. D., Kapranov, P., Karaoez, U., Myers, R. M., Noble, W. S., Reymond, A., Rozowsky, J., Struhl, K., Siepel, A., Stamatoyannopoulos, J. A., Taylor, C. M., Taylor, J., Thurman, R. E., Tullius, T. D., Washietl, S., Zheng, D., Liefer, L. A., Wetterstrand, K. A., Good, P. J., Feingold, E. A., Guyer, M. S., Collins, F. S., Margulies, E. H., Cooper, G. M., Asimenos, G., Thomas, D. J., Dewey, C. N., Siepel, A., Birney, E., Keefe, D., Hou, M., Taylor, J., Nikolaev, S., Montoya-Burgos, J. I., Loeytynoja, A., Whelan, S., Pardi, F., Massingham, T., Brown, J. B., Huang, H., Zhang, N. R., Bickel, P., Holmes, I., Mullikin, J. C., Ureta-Vidal, A., Paten, B., Seringhaus, M., Church, D., Rosenbloom, K., Kent, W. J., Stone, E. A., Gerstein, M., Antonarakis, S. E., Batzoglou, S., Goldman, N., Hardison, R. C., Haussler, D., Miller, W., Pachter, L., Green, E. D., Sidow, A., Weng, Z., Trinklein, N. D., Fu, Y., Zhang, Z. D., Karaoez, U., Barrera, L., Stuart, R., Zheng, D., Ghosh, S., Flicek, P., King, D. C., Taylor, J., Ameur, A., Enroth, S., Bieda, M. C., Koch, C. M., Hirsch, H. A., Wei, C., Cheng, J., Kim, J., Bhinge, A. A., Giresi, P. G., Jiang, N., Liu, J., Yao, F., Sung, W., Chiu, K. P., Vega, V. B., Lee, C. W., Ng, P., Shahab, A., Sekinger, E. A., Yang, A., Moqtaderi, Z., Zhu, Z., Xu, X., Squazzo, S., Oberley, M. J., Inman, D., Singer, M. A., Richmond, T. A., Munn, K. J., Rada-Iglesias, A., Wallerman, O., Komorowski, J., Clelland, G. K., Wilcox, S., Dillon, S. C., Andrews, R. M., Fowler, J. C., Couttet, P., James, K. D., Lefebvre, G. C., Bruce, A. W., Dovey, O. M., Ellis, P. D., Dhami, P., Langford, C. F., Carter, N. P., Vetrie, D., Kapranov, P., Nix, D. A., Bell, I., Patel, S., Rozowsky, J., Euskirchen, G., Hartman, S., Lian, J., Wu, J., Urban, A. E., Kraus, P., Van Calcar, S., Heintzman, N., Kim, T. H., Wang, K., Qu, C., Hon, G., Luna, R., Glass, C. K., Rosenfeld, M. G., Force Aldred, S., Cooper, S. J., Halees, A., Lin, J. M., Shulha, H. P., Zhang, X., Xu, M., Haidar, J. N., Yu, Y., Birney, E., Weissman, S., Ruan, Y., Lieb, J. D., Iyer, V. R., Green, R. D., Gingeras, T. R., Wadelius, C., Dunham, I., Struhl, K., Hardison, R. C., Gerstein, M., Farnham, P. J., Myers, R. M., Ren, B., Snyder, M., Thomas, D. J., Rosenbloom, K., Harte, R. A., Hinrichs, A. S., Trumbower, H., Clawson, H., Hillman-Jackson, J., Zweig, A. S., Smith, K., Thakkapallayil, A., Barber, G., Kuhn, R. M., Karolchik, D., Haussler, D., Kent, W. J., Dermitzakis, E. T., Armengol, L., Bird, C. P., Clark, T. G., Cooper, G. M., de Bakker, P. I., Kern, A. D., Lopez-Bigas, N., Martin, J. D., Stranger, B. E., Thomas, D. J., Woodroffe, A., Batzoglou, S., Davydov, E., Dimas, A., Eyras, E., Hallgrimsdottir, I. B., Hardison, R. C., Huppert, J., Sidow, A., Taylor, J., Trumbower, H., Zody, M. C., Guigo, R., Mullikin, J. C., Abecasis, G. R., Estivill, X., Birney, E., Bouffard, G. G., Guan, X., Hansen, N. F., Idol, J. R., Maduro, V. V., Maskeri, B., McDowell, J. C., Park, M., Thomas, P. J., Young, A. C., Blakesley, R. W., Muzny, D. M., Sodergren, E., Wheeler, D. A., Worley, K. C., Jiang, H., Weinstock, G. M., Gibbs, R. A., Graves, T., Fulton, R., Mardis, E. R., Wilson, R. K., Clamp, M., Cuff, J., Gnerre, S., Jaffe, D. B., Chang, J. L., Lindblad-Toh, K., Lander, E. S., Koriabine, M., Nefedov, M., Osoegawa, K., Yoshinaga, Y., Zhu, B., de Jong, P. J. 2007; 447 (7146): 799-816


    We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.

    View details for DOI 10.1038/nature05874

    View details for Web of Science ID 000247207500034

    View details for PubMedID 17571346

    View details for PubMedCentralID PMC2212820

  • Systematic prediction and validation of breakpoints associated with copy-number variants in the human genome PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Korbel, J. O., Urban, A. E., Grubert, F., Du, J., Royce, T. E., Starr, P., Zhong, G., Emanuel, B. S., Weissman, S. M., Snyder, M., Gerstein, M. B. 2007; 104 (24): 10110-10115


    Copy-number variants (CNVs) are an abundant form of genetic variation in humans. However, approaches for determining exact CNV breakpoint sequences (physical deletion or duplication boundaries) across individuals, crucial for associating genotype to phenotype, have been lacking so far, and the vast majority of CNVs have been reported with approximate genomic coordinates only. Here, we report an approach, called BreakPtr, for fine-mapping CNVs (available from We statistically integrate both sequence characteristics and data from high-resolution comparative genome hybridization experiments in a discrete-valued, bivariate hidden Markov model. Incorporation of nucleotide-sequence information allows us to take into account the fact that recently duplicated sequences (e.g., segmental duplications) often coincide with breakpoints. In anticipation of an upcoming increase in CNV data, we developed an iterative, "active" approach to initially scoring with a preliminary model, performing targeted validations, retraining the model, and then rescoring, and a flexible parameterization system that intuitively collapses from a full model of 2,503 parameters to a core one of only 10. Using our approach, we accurately mapped >400 breakpoints on chromosome 22 and a region of chromosome 11, refining the boundaries of many previously approximately mapped CNVs. Four predicted breakpoints flanked known disease-associated deletions. We validated an additional four predicted CNV breakpoints by sequencing. Overall, our results suggest a predictive resolution of approximately 300 bp. This level of resolution enables more precise correlations between CNVs and across individuals than previously possible, allowing the study of CNV population frequencies. Further, it enabled us to demonstrate a clear Mendelian pattern of inheritance for one of the CNVs.

    View details for DOI 10.1073/pnas.0703834104

    View details for Web of Science ID 000247363000036

    View details for PubMedID 17551006

  • Assessing the performance of different high-density tiling microarray strategies for mapping transcribed regions of the human genome GENOME RESEARCH Emanuelsson, O., Nagalakshmi, U., Zheng, D., Rozowsky, J. S., Urban, A. E., Du, J., Lian, Z., Stolc, V., Weissman, S., Snyder, M., Gerstein, M. B. 2007; 17 (6): 886-897


    Genomic tiling microarrays have become a popular tool for interrogating the transcriptional activity of large regions of the genome in an unbiased fashion. There are several key parameters associated with each tiling experiment (e.g., experimental protocols and genomic tiling density). Here, we assess the role of these parameters as they are manifest in different tiling-array platforms used for transcription mapping. First, we analyze how a number of published tiling-array experiments agree with established gene annotation on human chromosome 22. We observe that the transcription detected from high-density arrays correlates substantially better with annotation than that from other array types. Next, we analyze the transcription-mapping performance of the two main high-density oligonucleotide array platforms in the ENCODE regions of the human genome. We hybridize identical biological samples and develop several ways of scoring the arrays and segmenting the genome into transcribed and nontranscribed regions, with the aim of making the platforms most comparable to each other. Finally, we develop a platform comparison approach based on agreement with known annotation. Overall, we find that the performance improves with more data points per locus, coupled with statistical scoring approaches that properly take advantage of this, where this larger number of data points arises from higher genomic tiling density and the use of replicate arrays and mismatches. While we do find significant differences in the performance of the two high-density platforms, we also find that they complement each other to some extent. Finally, our experiments reveal a significant amount of novel transcription outside of known genes, and an appreciable sample of this was validated by independent experiments.

    View details for DOI 10.1101/gr.5014606

    View details for Web of Science ID 000247226900020

    View details for PubMedID 17119069

  • Mapping the chromosomal targets of STAT1 by Sequence Tag Analysis of Genomic Enrichment (STAGE) GENOME RESEARCH Bhinge, A. A., Kim, J., Euskirchen, G. M., Snyder, M., Iyer, V. R. 2007; 17 (6): 910-916


    Identifying the genome-wide binding sites of transcription factors is important in deciphering transcriptional regulatory networks. ChIP-chip (Chromatin immunoprecipitation combined with microarrays) has been widely used to map transcription factor binding sites in the human genome. However, whole genome ChIP-chip analysis is still technically challenging in vertebrates. We recently developed STAGE as an unbiased method for identifying transcription factor binding sites in the genome. STAGE is conceptually based on SAGE, except that the input is ChIP-enriched DNA. In this study, we implemented an improved sequencing strategy and analysis methods and applied STAGE to map the genomic binding profile of the transcription factor STAT1 after interferon treatment. STAT1 is mainly responsible for mediating the cellular responses to interferons, such as cell proliferation, apoptosis, immune surveillance, and immune responses. We present novel algorithms for STAGE tag analysis to identify enriched loci with high specificity, as verified by quantitative ChIP. STAGE identified several previously unknown STAT1 target genes, many of which are involved in mediating the response to interferon-gamma signaling. STAGE is thus a viable method for identifying the chromosomal targets of transcription factors and generating meaningful biological hypotheses that further our understanding of transcriptional regulatory networks.

    View details for DOI 10.1101/gr.5574907

    View details for Web of Science ID 000247226900022

    View details for PubMedID 17568006

  • Structured RNAs in the ENCODE selected regions of the human genome GENOME RESEARCH Washietl, S., Pedersen, J. S., Korbel, J. O., Stocsits, C., Gruber, A. R., Hackermueller, J., Hertel, J., Lindemeyer, M., Reiche, K., Tanzer, A., Ucla, C., Wyss, C., Antonarakis, S. E., Denoeud, F., Lagarde, J., Drenkow, J., Kapranov, P., Gingeras, T. R., Guigo, R., Snyder, M., Gerstein, M. B., Reymond, A., Hofacker, I. L., Stadler, P. F. 2007; 17 (6): 852-864


    Functional RNA structures play an important role both in the context of noncoding RNA transcripts as well as regulatory elements in mRNAs. Here we present a computational study to detect functional RNA structures within the ENCODE regions of the human genome. Since structural RNAs in general lack characteristic signals in primary sequence, comparative approaches evaluating evolutionary conservation of structures are most promising. We have used three recently introduced programs based on either phylogenetic-stochastic context-free grammar (EvoFold) or energy directed folding (RNAz and AlifoldZ), yielding several thousand candidate structures (corresponding to approximately 2.7% of the ENCODE regions). EvoFold has its highest sensitivity in highly conserved and relatively AU-rich regions, while RNAz favors slightly GC-rich regions, resulting in a relatively small overlap between methods. Comparison with the GENCODE annotation points to functional RNAs in all genomic contexts, with a slightly increased density in 3'-UTRs. While we estimate a significant false discovery rate of approximately 50%-70% many of the predictions can be further substantiated by additional criteria: 248 loci are predicted by both RNAz and EvoFold, and an additional 239 RNAz or EvoFold predictions are supported by the (more stringent) AlifoldZ algorithm. Five hundred seventy RNAz structure predictions fall into regions that show signs of selection pressure also on the sequence level (i.e., conserved elements). More than 700 predictions overlap with noncoding transcripts detected by oligonucleotide tiling arrays. One hundred seventy-five selected candidates were tested by RT-PCR in six tissues, and expression could be verified in 43 cases (24.6%).

    View details for DOI 10.1101/gr.5650707

    View details for Web of Science ID 000247226900017

    View details for PubMedID 17568003

  • Mapping of transcription factor binding regions in mammalian cells by ChIP: Comparison of array- and sequencing-based technologies GENOME RESEARCH Euskirchen, G. M., Rozowsky, J. S., Wei, C., Lee, W. H., Zhang, Z. D., Hartman, S., Emanuelsson, O., Stolc, V., Weissman, S., Gerstein, M. B., Ruan, Y., Snyder, M. 2007; 17 (6): 898-909


    Recent progress in mapping transcription factor (TF) binding regions can largely be credited to chromatin immunoprecipitation (ChIP) technologies. We compared strategies for mapping TF binding regions in mammalian cells using two different ChIP schemes: ChIP with DNA microarray analysis (ChIP-chip) and ChIP with DNA sequencing (ChIP-PET). We first investigated parameters central to obtaining robust ChIP-chip data sets by analyzing STAT1 targets in the ENCODE regions of the human genome, and then compared ChIP-chip to ChIP-PET. We devised methods for scoring and comparing results among various tiling arrays and examined parameters such as DNA microarray format, oligonucleotide length, hybridization conditions, and the use of competitor Cot-1 DNA. The best performance was achieved with high-density oligonucleotide arrays, oligonucleotides >/=50 bases (b), the presence of competitor Cot-1 DNA and hybridizations conducted in microfluidics stations. When target identification was evaluated as a function of array number, 80%-86% of targets were identified with three or more arrays. Comparison of ChIP-chip with ChIP-PET revealed strong agreement for the highest ranked targets with less overlap for the low ranked targets. With advantages and disadvantages unique to each approach, we found that ChIP-chip and ChIP-PET are frequently complementary in their relative abilities to detect STAT1 targets for the lower ranked targets; each method detected validated targets that were missed by the other method. The most comprehensive list of STAT1 binding regions is obtained by merging results from ChIP-chip and ChIP-sequencing. Overall, this study provides information for robust identification, scoring, and validation of TF targets using ChIP-based technologies.

    View details for DOI 10.1101/gr.5583007

    View details for Web of Science ID 000247226900021

    View details for PubMedID 17568005

  • Integrated analysis of experimental data sets reveals many novel promoters in 1% of the human genome GENOME RESEARCH Trinklein, N. D., Karaoz, U., Wu, J., Halees, A., Aldred, S. F., Collins, P. J., Zheng, D., Zhang, Z. D., Gerstein, M. B., Snyder, M., Myers, R. M., Weng, Z. 2007; 17 (6): 720-731


    The regulation of transcriptional initiation in the human genome is a critical component of global gene regulation, but a complete catalog of human promoters currently does not exist. In order to identify regulatory regions, we developed four computational methods to integrate 129 sets of ENCODE-wide chromatin immunoprecipitation data. They collectively predicted 1393 regions. Roughly 47% of the regions were unique to one method, as each method makes different assumptions about the data. Overall, predicted regions tend to localize to highly conserved, DNase I hypersensitive, and actively transcribed regions in the genome. Interestingly, a significant portion of the regions overlaps with annotated 3'-UTRs, suggesting that some of them might regulate anti-sense transcription. The majority of the predicted regions are >2 kb away from the 5'-ends of previously annotated human cDNAs and hence are novel. These novel regions may regulate unannotated transcripts or may represent new alternative transcription start sites of known genes. We tested 163 such regions for promoter activity in four cell lines using transient transfection assays, and 25% of them showed transcriptional activity above background in at least one cell line. We also performed 5'-RACE experiments on 62 novel regions, and 76% of the regions were associated with the 5'-ends of at least two RACE products. Our results suggest that there are at least 35% more functional promoters in the human genome than currently annotated.

    View details for DOI 10.1101/gr.5716607

    View details for Web of Science ID 000247226900006

    View details for PubMedID 17567992

  • Statistical analysis of the genomic distribution and correlation of regulatory elements in the ENCODE regions GENOME RESEARCH Zhang, Z. D., Paccanaro, A., Fu, Y., Weissman, S., Weng, Z., Chang, J., Snyder, M., Gerstein, M. B. 2007; 17 (6): 787-797


    The comprehensive inventory of functional elements in 44 human genomic regions carried out by the ENCODE Project Consortium enables for the first time a global analysis of the genomic distribution of transcriptional regulatory elements. In this study we developed an intuitive and yet powerful approach to analyze the distribution of regulatory elements found in many different ChIP-chip experiments on a 10 approximately 100-kb scale. First, we focus on the overall chromosomal distribution of regulatory elements in the ENCODE regions and show that it is highly nonuniform. We demonstrate, in fact, that regulatory elements are associated with the location of known genes. Further examination on a local, single-gene scale shows an enrichment of regulatory elements near both transcription start and end sites. Our results indicate that overall these elements are clustered into regulatory rich "islands" and poor "deserts." Next, we examine how consistent the nonuniform distribution is between different transcription factors. We perform on all the factors a multivariate analysis in the framework of a biplot, which enhances biological signals in the experiments. This groups transcription factors into sequence-specific and sequence-nonspecific clusters. Moreover, with experimental variation carefully controlled, detailed correlations show that the distribution of sites was generally reproducible for a specific factor between different laboratories and microarray platforms. Data sets associated with histone modifications have particularly strong correlations. Finally, we show how the correlations between factors change when only regulatory elements far from the transcription start sites are considered.

    View details for DOI 10.1101/gr.5573107

    View details for Web of Science ID 000247226900011

    View details for PubMedID 17567997

  • The DART classification of unannotated transcription within the ENCODE regions: Associating transcription with known and novel loci GENOME RESEARCH Rozowsky, J. S., Newburger, D., Sayward, F., Wu, J., Jordan, G., Korbel, J. O., Nagalakshmi, U., Yang, J., Zheng, D., Guigo, R., Gingeras, T. R., Weissman, S., Miller, P., Snyder, M., Gerstein, M. B. 2007; 17 (6): 732-745


    For the approximately 1% of the human genome in the ENCODE regions, only about half of the transcriptionally active regions (TARs) identified with tiling microarrays correspond to annotated exons. Here we categorize this large amount of "unannotated transcription." We use a number of disparate features to classify the 6988 novel TARs-array expression profiles across cell lines and conditions, sequence composition, phylogenetic profiles (presence/absence of syntenic conservation across 17 species), and locations relative to genes. In the classification, we first filter out TARs with unusual sequence composition and those likely resulting from cross-hybridization. We then associate some of those remaining with proximal exons having correlated expression profiles. Finally, we cluster unclassified TARs into putative novel loci, based on similar expression and phylogenetic profiles. To encapsulate our classification, we construct a Database of Active Regions and Tools ( DART has special facilities for rapidly handling and comparing many sets of TARs and their heterogeneous features, synchronizing across builds, and interfacing with other resources. Overall, we find that approximately 14% of the novel TARs can be associated with known genes, while approximately 21% can be clustered into approximately 200 novel loci. We observe that TARs associated with genes are enriched in the potential to form structural RNAs and many novel TAR clusters are associated with nearby promoters. To benchmark our classification, we design a set of experiments for testing the connectivity of novel TARs. Overall, we find that 18 of the 46 connections tested validate by RT-PCR and four of five sequenced PCR products confirm connectivity unambiguously.

    View details for DOI 10.1101/gr.5696007

    View details for Web of Science ID 000247226900007

    View details for PubMedID 17567993

  • What is a gene, post-ENCODE? History and updated definition GENOME RESEARCH Gerstein, M. B., Bruce, C., Rozowsky, J. S., Zheng, D., Du, J., Korbel, J. O., Emanuelsson, O., Zhang, Z. D., Weissman, S., Snyder, M. 2007; 17 (6): 669-681


    While sequencing of the human genome surprised us with how many protein-coding genes there are, it did not fundamentally change our perspective on what a gene is. In contrast, the complex patterns of dispersed regulation and pervasive transcription uncovered by the ENCODE project, together with non-genic conservation and the abundance of noncoding RNA genes, have challenged the notion of the gene. To illustrate this, we review the evolution of operational definitions of a gene over the past century--from the abstract elements of heredity of Mendel and Morgan to the present-day ORFs enumerated in the sequence databanks. We then summarize the current ENCODE findings and provide a computational metaphor for the complexity. Finally, we propose a tentative update to the definition of a gene: A gene is a union of genomic sequences encoding a coherent set of potentially overlapping functional products. Our definition side-steps the complexities of regulation and transcription by removing the former altogether from the definition and arguing that final, functional gene products (rather than intermediate transcripts) should be used to group together entities associated with a single gene. It also manifests how integral the concept of biological function is in defining genes.

    View details for DOI 10.1101/gr.6339607

    View details for Web of Science ID 000247226900002

    View details for PubMedID 17567988

  • Pseudogenes in the ENCODE regions: Consensus annotation, analysis of transcription, and evolution GENOME RESEARCH Zheng, D., Frankish, A., Baertsch, R., Kapranov, P., Reymond, A., Choo, S. W., Lu, Y., Denoeud, F., Antonarakis, S. E., Snyder, M., Ruan, Y., Wei, C., Gingeras, T. R., Guigo, R., Harrow, J., Gerstein, M. B. 2007; 17 (6): 839-851


    Arising from either retrotransposition or genomic duplication of functional genes, pseudogenes are "genomic fossils" valuable for exploring the dynamics and evolution of genes and genomes. Pseudogene identification is an important problem in computational genomics, and is also critical for obtaining an accurate picture of a genome's structure and function. However, no consensus computational scheme for defining and detecting pseudogenes has been developed thus far. As part of the ENCyclopedia Of DNA Elements (ENCODE) project, we have compared several distinct pseudogene annotation strategies and found that different approaches and parameters often resulted in rather distinct sets of pseudogenes. We subsequently developed a consensus approach for annotating pseudogenes (derived from protein coding genes) in the ENCODE regions, resulting in 201 pseudogenes, two-thirds of which originated from retrotransposition. A survey of orthologs for these pseudogenes in 28 vertebrate genomes showed that a significant fraction ( approximately 80%) of the processed pseudogenes are primate-specific sequences, highlighting the increasing retrotransposition activity in primates. Analysis of sequence conservation and variation also demonstrated that most pseudogenes evolve neutrally, and processed pseudogenes appear to have lost their coding potential immediately or soon after their emergence. In order to explore the functional implication of pseudogene prevalence, we have extensively examined the transcriptional activity of the ENCODE pseudogenes. We performed systematic series of pseudogene-specific RACE analyses. These, together with complementary evidence derived from tiling microarrays and high throughput sequencing, demonstrated that at least a fifth of the 201 pseudogenes are transcribed in one or more cell lines or tissues.

    View details for DOI 10.1101/gr.5586307

    View details for Web of Science ID 000247226900016

    View details for PubMedID 17568002

  • Getting connected: analysis and principles of biological networks GENES & DEVELOPMENT Zhu, X., Gerstein, M., Snyder, M. 2007; 21 (9): 1010-1024


    The execution of complex biological processes requires the precise interaction and regulation of thousands of molecules. Systematic approaches to study large numbers of proteins, metabolites, and their modification have revealed complex molecular networks. These biological networks are significantly different from random networks and often exhibit ubiquitous properties in terms of their structure and organization. Analyzing these networks provides novel insights in understanding basic mechanisms controlling normal cellular processes and disease pathologies.

    View details for DOI 10.1101/gad.1528707

    View details for Web of Science ID 000246154100002

    View details for PubMedID 17473168

  • Differential binding of calmodulin-related proteins to their targets revealed through high-density Arabidopsis protein microarrays PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Popescu, S. C., Popescu, G. V., Bachan, S., Zhang, Z., Seay, M., Gerstein, M., Snyder, M., Dinesh-Kumar, S. P. 2007; 104 (11): 4730-4735


    Calmodulins (CaMs) are the most ubiquitous calcium sensors in eukaryotes. A number of CaM-binding proteins have been identified through classical methods, and many proteins have been predicted to bind CaMs based on their structural homology with known targets. However, multicellular organisms typically contain many CaM-like (CML) proteins, and a global identification of their targets and specificity of interaction is lacking. In an effort to develop a platform for large-scale analysis of proteins in plants we have developed a protein microarray and used it to study the global analysis of CaM/CML interactions. An Arabidopsis thaliana expression collection containing 1,133 ORFs was generated and used to produce proteins with an optimized medium-throughput plant-based expression system. Protein microarrays were prepared and screened with several CaMs/CMLs. A large number of previously known and novel CaM/CML targets were identified, including transcription factors, receptor and intracellular protein kinases, F-box proteins, RNA-binding proteins, and proteins of unknown function. Multiple CaM/CML proteins bound many binding partners, but the majority of targets were specific to one or a few CaMs/CMLs indicating that different CaM family members function through different targets. Based on our analyses, the emergent CaM/CML interactome is more extensive than previously predicted. Our results suggest that calcium functions through distinct CaM/CML proteins to regulate a wide range of targets and cellular activities.

    View details for DOI 10.1073/pnas.0611615104

    View details for Web of Science ID 000244972700086

    View details for PubMedID 17360592

  • New insights into Acinetobacter baumannii pathogenesis revealed by high-density pyrosequencing and transposon mutagenesis GENES & DEVELOPMENT Smith, M. G., Gianoulis, T. A., Pukatzki, S., Mekalanos, J. J., Ornston, L. N., Gerstein, M., Snyder, M. 2007; 21 (5): 601-614


    Acinetobacter baumannii has emerged as an important and problematic human pathogen as it is the causative agent of several types of infections including pneumonia, meningitis, septicemia, and urinary tract infections. We explored the pathogenic content of this harmful pathogen using a combination of DNA sequencing and insertional mutagenesis. The genome of this organism was sequenced using a strategy involving high-density pyrosequencing, a novel, rapid method of high-throughput sequencing. Excluding the rDNA repeats, the assembled genome is 3,976,746 base pairs (bp) and has 3830 ORFs. A significant fraction of ORFs (17.2%) are located in 28 putative alien islands, indicating that the genome has acquired a large amount of foreign DNA. Consistent with its role in pathogenesis, a remarkable number of the islands (16) contain genes implicated in virulence, indicating the organism devotes a considerable portion of its genes to pathogenesis. The largest island contains elements homologous to the Legionella/Coxiella Type IV secretion apparatus. Type IV secretion systems have been demonstrated to be important for virulence in other organisms and thus are likely to help mediate pathogenesis of A. baumannii. Insertional mutagenesis generated avirulent isolates of A. baumannii and verified that six of the islands contain virulence genes, including two novel islands containing genes that lacked homology with others in the databases. The DNA sequencing approach described in this study allows the rapid elucidation of the DNA sequence of any microbe and, when combined with genetic screens, can identify many novel genes important for microbial pathogenesis.

    View details for DOI 10.1101/gad.1510307

    View details for Web of Science ID 000244760600011

    View details for PubMedID 17344419

  • Positional artifacts in microarrays: experimental verification and construction of COP, an automated detection tool NUCLEIC ACIDS RESEARCH Yu, H., Nguyen, K., Royce, T., Qian, J., Nelson, K., Snyder, M., Gerstein, M. 2007; 35 (2)


    Microarray technology is currently one of the most widely-used technologies in biology. Many studies focus on inferring the function of an unknown gene from its co-expressed genes. Here, we are able to show that there are two types of positional artifacts in microarray data introducing spurious correlations between genes. First, we find that genes that are close on the microarray chips tend to have higher correlations between their expression profiles. We call this the 'chip artifact'. Our calculations suggest that the carry-over during the printing process is one of the major sources of this type of artifact, which is later confirmed by our experiments. Based on our experiments, the measured intensity of a microarray spot contains 0.1% (for fully-hybridized spots) to 93% (for un-hybridized ones) of noise resulting from this artifact. Secondly, we, for the first time, show that genes that are close on the microtiter plates in microarray experiments also tend to have higher correlations. We call this the 'plate artifact'. Both types of artifacts exist with different severity in all cDNA microarray experiments that we analyzed. Therefore, we develop an automated web tool-COP (COrrelations by Positional artifacts) to detect these artifacts in microarray experiments. COP has been integrated with the microarray data normalization tool, ExpressYourself, which is available at Together, the two can eliminate most of the common noises in microarray data.

    View details for DOI 10.1093/nar/gkl871

    View details for Web of Science ID 000243993600001

    View details for PubMedID 17158151

  • Yeast protein microarrays YEAST GENE ANALYSIS, SECOND EDITION Ptacek, J., Snyder, M. 2007; 36: 303-?
  • Protein microarray technology 3rd International Conference on Functional Genomics of Ageing Hall, D. A., Ptacek, J., Snyder, M. ELSEVIER IRELAND LTD. 2007: 161–67


    Protein chips have emerged as a promising approach for a wide variety of applications including the identification of protein-protein interactions, protein-phospholipid interactions, small molecule targets, and substrates of proteins kinases. They can also be used for clinical diagnostics and monitoring disease states. This article reviews current methods in the generation and applications of protein microarrays.

    View details for DOI 10.1016/j.mad.2006.11.021

    View details for Web of Science ID 000244301700024

    View details for PubMedID 17126887

  • Tilescope: online analysis pipeline for high-density tiling microarray data GENOME BIOLOGY Zhang, Z. D., Rozowsky, J., Lam, H. Y., Du, J., Snyder, M., Gerstein, M. 2007; 8 (5)


    We developed Tilescope, a fully integrated data processing pipeline for analyzing high-density tiling-array data In a completely automated fashion, Tilescope will normalize signals between channels and across arrays, combine replicate experiments, score each array element, and identify genomic features. The program is designed with a modular, three-tiered architecture, facilitating parallelism, and a graphic user-friendly interface, presenting results in an organized web page, downloadable for further analysis.

    View details for DOI 10.1186/gb-2007-8-5-r81

    View details for Web of Science ID 000246983100034

    View details for PubMedID 17501994

  • A supervised hidden markov model framework for efficiently segmenting tiling array data in transcriptional and chIP-chip experiments: systematically incorporating validated biological knowledge BIOINFORMATICS Du, J., Rozowsky, J. S., Korbel, J. O., Zhang, Z. D., Royce, T. E., Schultz, M. H., Snyder, M., Gerstein, M. 2006; 22 (24): 3016-3024


    Large-scale tiling array experiments are becoming increasingly common in genomics. In particular, the ENCODE project requires the consistent segmentation of many different tiling array datasets into 'active regions' (e.g. finding transfrags from transcriptional data and putative binding sites from ChIP-chip experiments). Previously, such segmentation was done in an unsupervised fashion mainly based on characteristics of the signal distribution in the tiling array data itself. Here we propose a supervised framework for doing this. It has the advantage of explicitly incorporating validated biological knowledge into the model and allowing for formal training and testing.In particular, we use a hidden Markov model (HMM) framework, which is capable of explicitly modeling the dependency between neighboring probes and whose extended version (the generalized HMM) also allows explicit description of state duration density. We introduce a formal definition of the tiling-array analysis problem, and explain how we can use this to describe sampling small genomic regions for experimental validation to build up a gold-standard set for training and testing. We then describe various ideal and practical sampling strategies (e.g. maximizing signal entropy within a selected region versus using gene annotation or known promoters as positives for transcription or ChIP-chip data, respectively).For the practical sampling and training strategies, we show how the size and noise in the validated training data affects the performance of an HMM applied to the ENCODE transcriptional and ChIP-chip experiments. In particular, we show that the HMM framework is able to efficiently process tiling array data as well as or better than previous approaches. For the idealized sampling strategies, we show how we can assess their performance in a simulation framework and how a maximum entropy approach, which samples sub-regions with very different signal intensities, gives the maximally performing gold-standard. This latter result has strong implications for the optimum way medium-scale validation experiments should be carried out to verify the results of the genome-scale tiling array experiments.

    View details for DOI 10.1093/bioinformatics/btl515

    View details for Web of Science ID 000242715200008

    View details for PubMedID 17038339

  • High-throughput methods of regulatory element discovery BIOTECHNIQUES Hudson, M. E., Snyder, M. 2006; 41 (6): 673-?


    With the number of organisms whose genomes have been sequenced, a vast amount of information concerning the genetic structure of an organism's genome has been collected. However, effective experiment means to study how this information is accessed have only recently been developed. In this review, three basic methods for identifying regions of protein-DNA interaction will be introduced. The first two, chromatin immunoprecipitation (ChIP)-chip and ChIP-PET (for paired-end ditag), rely on the enrichment provided by chromosomal immunoprecipitation to interrogate the genomic sequence for the interaction sites of a protein of interest. In contrast, protein microarrays allow the identification of DNA binding protein that interacts with a DNA sequence of interest. These complementary methods of exploring protein-DNA interactions will increase our fundamental knowledge of how the information contained within the genome sequence is accessed and processed.

    View details for Web of Science ID 000242737100019

    View details for PubMedID 17191608

  • HTRA1 promoter polymorphism in wet age-related macular degeneration SCIENCE DeWan, A., Liu, M., Hartman, S., Zhang, S. S., Liu, D. T., Zhao, C., Tam, P. O., Chan, W. M., Lam, D. S., Snyder, M., Barnstable, C., Pang, C. P., Hoh, J. 2006; 314 (5801): 989-992


    Age-related macular degeneration (AMD), the most common cause of irreversible vision loss in individuals aged older than 50 years, is classified as either wet (neovascular) or dry (nonneovascular). Inherited variation in the complement factor H gene is a major risk factor for drusen in dry AMD. Here we report that a single-nucleotide polymorphism in the promoter region of HTRA1, a serine protease gene on chromosome 10q26, is a major genetic risk factor for wet AMD. A whole-genome association mapping strategy was applied to a Chinese population, yielding a P value of <10(-11). Individuals with the risk-associated genotype were estimated to have a likelihood of developing wet AMD 10 times that of individuals with the wild-type genotype.

    View details for DOI 10.1126/science.1133807

    View details for Web of Science ID 000241896000052

    View details for PubMedID 17053108

  • Charging it up: global analysis of protein phosphorylation TRENDS IN GENETICS Ptacek, J., Snyder, M. 2006; 22 (10): 545-554


    Protein phosphorylation affects most, if not all, cellular activities in eukaryotes and is essential for cell proliferation and development. An estimated 30% of cellular proteins are phosphorylated, representing the phosphoproteome, and phosphorylation can alter a protein's function, activity, localization and stability. Recent studies for large-scale identification of phosphosites using mass spectrometry are revealing the components of the phosphoproteome. The development of new tools, such as kinase assays using modified kinases or protein microarrays, enables rapid kinase substrate identification. The dynamics of specific phosphorylation events can now be monitored using mass spectrometry, single-cell analysis of flow cytometry, or fluorescent reporters. Together, these techniques are beginning to elucidate cellular processes and pathways regulated by phosphorylation, in addition to global regulatory networks.

    View details for DOI 10.1016/j.tig.2006.08.005

    View details for Web of Science ID 000241268400006

    View details for PubMedID 16908088

  • TOS9 regulates white-opaque switching in Candida albicans EUKARYOTIC CELL Srikantha, T., Borneman, A. R., Daniels, K. J., Pujol, C., Wu, W., Seringhaus, M. R., Gerstein, M., Yi, S., Snyder, M., Soll, D. R. 2006; 5 (10): 1674-1687


    In Candida albicans, the a1-alpha2 complex represses white-opaque switching, as well as mating. Based upon the assumption that the a1-alpha2 corepressor complex binds to the gene that regulates white-opaque switching, a chromatinimmunoprecipitation-microarray analysis strategy was used to identify 52 genes that bound to the complex. One of these genes, TOS9, exhibited an expression pattern consistent with a "master switch gene." TOS9 was only expressed in opaque cells, and its gene product, Tos9p, localized to the nucleus. Deletion of the gene blocked cells in the white phase, misexpression in the white phase caused stable mass conversion of cells to the opaque state, and misexpression blocked temperature-induced mass conversion from the opaque state to the white state. A model was developed for the regulation of spontaneous switching between the opaque state and the white state that includes stochastic changes of Tos9p levels above and below a threshold that induce changes in the chromatin state of an as-yet-unidentified switching locus. TOS9 has also been referred to as EAP2 and WOR1.

    View details for DOI 10.1128/EC.00252-06

    View details for Web of Science ID 000241344300010

    View details for PubMedID 16950924

  • Predicting essential genes in fungal genomes GENOME RESEARCH Seringhaus, M., Paccanaro, A., Borneman, A., Snyder, M., Gerstein, M. 2006; 16 (9): 1126-1135


    Essential genes are required for an organism's viability, and the ability to identify these genes in pathogens is crucial to directed drug development. Predicting essential genes through computational methods is appealing because it circumvents expensive and difficult experimental screens. Most such prediction is based on homology mapping to experimentally verified essential genes in model organisms. We present here a different approach, one that relies exclusively on sequence features of a gene to estimate essentiality and offers a promising way to identify essential genes in unstudied or uncultured organisms. We identified 14 characteristic sequence features potentially associated with essentiality, such as localization signals, codon adaptation, GC content, and overall hydrophobicity. Using the well-characterized baker's yeast Saccharomyces cerevisiae, we employed a simple Bayesian framework to measure the correlation of each of these features with essentiality. We then employed the 14 features to learn the parameters of a machine learning classifier capable of predicting essential genes. We trained our classifier on known essential genes in S. cerevisiae and applied it to the closely related and relatively unstudied yeast Saccharomyces mikatae. We assessed predictive success in two ways: First, we compared all of our predictions with those generated by homology mapping between these two species. Second, we verified a subset of our predictions with eight in vivo knockouts in S. mikatae, and we present here the first experimentally confirmed essential genes in this species.

    View details for DOI 10.1101/gr.5144106

    View details for Web of Science ID 000240238600007

    View details for PubMedID 16899653

  • Proteome chips for whole-organism assays NATURE REVIEWS MOLECULAR CELL BIOLOGY Kung, L. A., Snyder, M. 2006; 7 (8): 617-622


    Over the past 5 years, protein-chip technology has emerged as a useful tool for the study of many kinds of protein interactions and biochemical activities. The construction of Saccharomyces cerevisiae whole-proteome arrays has enabled further studies of such interactions in a proteome-wide context. Here, we explore some of the recent advances that have been made at the '-omic' level using protein microarrays.

    View details for DOI 10.1038/nrm1941

    View details for Web of Science ID 000239240000019

    View details for PubMedID 16723973

  • Linking DNA-binding proteins to their recognition sequences by using protein microarrays PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Ho, S., Jona, G., Chen, C. T., Johnston, M., Snyder, M. 2006; 103 (26): 9940-9945


    Analyses of whole-genome sequences and experimental data sets have revealed a large number of DNA sequence motifs that are conserved in many species and may be functional. However, methods of sufficient scale to explore the roles of these elements are lacking. We describe the use of protein arrays to identify proteins that bind to DNA sequences of interest. A microarray of 282 known and potential yeast transcription factors was produced and probed with oligonucleotides of evolutionarily conserved sequences that are potentially functional. Transcription factors that bound to specific DNA sequences were identified. One previously uncharacterized DNA-binding protein, Yjl103, was characterized in detail. We defined the binding site for this protein and identified a number of its target genes, many of which are involved in stress response and oxidative phosphorylation. Protein microarrays offer a high-throughput method for determining DNA-protein interactions.

    View details for DOI 10.1073/pnas.0509185103

    View details for Web of Science ID 000238872900036

    View details for PubMedID 16785442

  • Defined culture conditions of human embryonic stem cells PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Lu, J., Hou, R. H., Booth, C. J., Yang, S. H., Snyder, M. 2006; 103 (15): 5688-5693


    Human embryonic stem cells (hESCs) are pluripotent cells that have the potential to differentiate into any tissue in the human body; therefore, they are a valuable resource for regenerative medicine, drug screening, and developmental studies. However, the clinical application of hESCs is hampered by the difficulties of eliminating animal products in the culture medium and/or the complexity of conditions required to support hESC growth. We have developed a simple medium [termed hESC Cocktail (HESCO)] containing basic fibroblast growth factor, Wnt3a, April (a proliferation-inducing ligand)/BAFF (B cell-activating factor belonging to TNF), albumin, cholesterol, insulin, and transferrin, which is sufficient for hESC self-renewal and proliferation. Cells grown in HESCO were maintained in an undifferentiated state as determined by using six different stem cell markers, and their genomic integrity was confirmed by karyotyping. Cells cultured in HESCO readily form embryoid bodies in tissue culture and teratomas in mice. In both cases, the cells differentiated into each of the three cell lineages, ectoderm, endoderm, and mesoderm, indicating that they maintained their pluripotency. The use of a minimal medium sufficient for hESC growth is expected to greatly facilitate clinical application and developmental studies of hESCs.

    View details for DOI 10.1073/pnas.0601383103

    View details for Web of Science ID 000236896200012

    View details for PubMedID 16595624

  • High-resolution mapping of DNA copy alterations in human chromosome 22 using high-density tiling oligonucleotide arrays PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Urban, A. E., Korbel, J. O., Selzer, R., Richmond, T., Hacker, A., Popescu, G. V., Clubells, J. F., Green, R., Emanuel, B. S., Gerstein, M. B., Weissman, S. M., Snyder, M. 2006; 103 (12): 4534-4539


    Deletions and amplifications of the human genomic sequence (copy number polymorphisms) are the cause of numerous diseases and a potential cause of phenotypic variation in the normal population. Comparative genomic hybridization (CGH) has been developed as a useful tool for detecting alterations in DNA copy number that involve blocks of DNA several kilobases or larger in size. We have developed high-resolution CGH (HR-CGH) to detect accurately and with relatively little bias the presence and extent of chromosomal aberrations in human DNA. Maskless array synthesis was used to construct arrays containing 385,000 oligonucleotides with isothermal probes of 45-85 bp in length; arrays tiling the beta-globin locus and chromosome 22q were prepared. Arrays with a 9-bp tiling path were used to map a 622-bp heterozygous deletion in the beta-globin locus. Arrays with an 85-bp tiling path were used to analyze DNA from patients with copy number changes in the pericentromeric region of chromosome 22q. Heterozygous deletions and duplications as well as partial triploidies and partial tetraploidies of portions of chromosome 22q were mapped with high resolution (typically up to 200 bp) in each patient, and the precise breakpoints of two deletions were confirmed by DNA sequencing. Additional peaks potentially corresponding to known and novel additional CNPs were also observed. Our results demonstrate that HR-CGH allows the detection of copy number changes in the human genome at an unprecedented level of resolution.

    View details for DOI 10.1073/pnas.0511340103

    View details for Web of Science ID 000236362600039

    View details for PubMedID 16537408

  • Severe acute respiratory syndrome diagnostics using a coronavirus protein microarray PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Zhu, H., Hu, S. H., Jona, G., Zhu, X. W., Kreiswirth, N., Willey, B. M., Mazzulli, T., Liu, G. Z., Song, Q. F., Chen, P., Cameron, M., Tyler, A., Wang, J., Wen, J., Chen, W. J., Compton, S., Snyder, M. 2006; 103 (11): 4011-4016


    To monitor severe acute respiratory syndrome (SARS) infection, a coronavirus protein microarray that harbors proteins from SARS coronavirus (SARS-CoV) and five additional coronaviruses was constructed. These microarrays were used to screen approximately 400 Canadian sera from the SARS outbreak, including samples from confirmed SARS-CoV cases, respiratory illness patients, and healthcare professionals. A computer algorithm that uses multiple classifiers to predict samples from SARS patients was developed and used to predict 206 sera from Chinese fever patients. The test assigned patients into two distinct groups: those with antibodies to SARS-CoV and those without. The microarray also identified patients with sera reactive against other coronavirus proteins. Our results correlated well with an indirect immunofluorescence test and demonstrated that viral infection can be monitored for many months after infection. We show that protein microarrays can serve as a rapid, sensitive, and simple tool for large-scale identification of viral-specific antibodies in sera.

    View details for Web of Science ID 000236429300016

    View details for PubMedID 16537477

  • Target hub proteins serve as master regulators of development in yeast GENES & DEVELOPMENT Borneman, A. R., Leigh-Bell, J. A., Yu, H. Y., Bertone, P., Gerstein, M., Snyder, M. 2006; 20 (4): 435-448


    To understand the organization of the transcriptional networks that govern cell differentiation, we have investigated the transcriptional circuitry controlling pseudohyphal development in Saccharomyces cerevisiae. The binding targets of Ste12, Tec1, Sok2, Phd1, Mga1, and Flo8 were globally mapped across the yeast genome. The factors and their targets form a complex binding network, containing patterns characteristic of autoregulation, feedback and feed-forward loops, and cross-talk. Combinatorial binding to intergenic regions was commonly observed, which allowed for the identification of a novel binding association between Mga1 and Flo8, in which Mga1 requires Flo8 for binding to promoter regions. Further analysis of the network showed that the promoters of MGA1 and PHD1 were bound by all of the factors used in this study, identifying them as key target hubs. Overexpression of either of these two proteins specifically induced pseudohyphal growth under noninducing conditions, highlighting them as master regulators of the system. Our results indicate that target hubs can serve as master regulators whose activity is sufficient for the induction of complex developmental responses and therefore represent important regulatory nodes in biological networks.

    View details for DOI 10.1101/gad.1389306

    View details for Web of Science ID 000235428600007

    View details for PubMedID 16449570

  • Mapping pathways and phenotypes by systematic gene overexpression MOLECULAR CELL Sopko, R., Huang, D. Q., Preston, N., Chua, G., Papp, B., Kafadar, K., Snyder, M., Oliver, S. G., Cyert, M., Hughes, T. R., Boone, C., Andrews, B. 2006; 21 (3): 319-330


    Many disease states result from gene overexpression, often in a specific genetic context. To explore gene overexpression phenotypes systematically, we assembled an array of 5280 yeast strains, each containing an inducible copy of an S. cerevisiae gene, covering >80% of the genome. Approximately 15% of the overexpressed genes (769) reduced growth rate. This gene set was enriched for cell cycle-regulated genes, signaling molecules, and transcription factors. Overexpression of most toxic genes resulted in phenotypes different from known deletion mutant phenotypes, suggesting that overexpression phenotypes usually reflect a specific regulatory imbalance rather than disruption of protein complex stoichiometry. Global overexpression effects were also assayed in the context of a cyclin-dependent kinase mutant (pho85Delta). The resultant gene set was enriched for Pho85p targets and identified the yeast calcineurin-responsive transcription factor Crz1p as a substrate. Large-scale application of this approach should provide a strategy for identifying target molecules regulated by specific signaling pathways.

    View details for DOI 10.1016/j.molcel.2005.12.011

    View details for Web of Science ID 000235436100003

    View details for PubMedID 16455487

  • Yeast as a model for human disease. Current protocols in human genetics / editorial board, Jonathan L. Haines ... [et al.] Smith, M. G., Snyder, M. 2006; Chapter 15: Unit 15 6-?


    The sequencing of the human genome promised the identification of disease-causing genes and, subsequently, therapies for those diseases. However, when identifying the genetic basis of a disease, it is not uncommon to discover an abnormal protein whose normal function is unknown. The genetic manipulations required to assign function to genes is often extremely difficult, if not impossible, in human cells. Model organisms have been used to facilitate understanding of gene function because of the ease of genetic manipulations and because many features of eukaryotic physiology have been conserved across phyla. Yeast is a simple eukaryote with a tractable genome, a short generation time, and a large network of researchers who have generated a vast arsenal of research tools. These traits make yeast ideally suited to help reveal the function of genes implicated in human disease.

    View details for DOI 10.1002/0471142905.hg1506s48

    View details for PubMedID 18428391

  • Design optimization methods for genomic DNA tiling arrays GENOME RESEARCH Bertone, P., Trifonov, V., Rozowsky, J. S., Schubert, F., Emanuelsson, O., Karro, J., Kao, M. Y., Snyder, M., Gerstein, M. 2006; 16 (2): 271-281


    A recent development in microarray research entails the unbiased coverage, or tiling, of genomic DNA for the large-scale identification of transcribed sequences and regulatory elements. A central issue in designing tiling arrays is that of arriving at a single-copy tile path, as significant sequence cross-hybridization can result from the presence of non-unique probes on the array. Due to the fragmentation of genomic DNA caused by the widespread distribution of repetitive elements, the problem of obtaining adequate sequence coverage increases with the sizes of subsequence tiles that are to be included in the design. This becomes increasingly problematic when considering complex eukaryotic genomes that contain many thousands of interspersed repeats. The general problem of sequence tiling can be framed as finding an optimal partitioning of non-repetitive subsequences over a prescribed range of tile sizes, on a DNA sequence comprising repetitive and non-repetitive regions. Exact solutions to the tiling problem become computationally infeasible when applied to large genomes, but successive optimizations are developed that allow their practical implementation. These include an efficient method for determining the degree of similarity of many oligonucleotide sequences over large genomes, and two algorithms for finding an optimal tile path composed of longer sequence tiles. The first algorithm, a dynamic programming approach, finds an optimal tiling in linear time and space; the second applies a heuristic search to reduce the space complexity to a constant requirement. A Web resource has also been developed, accessible at, to generate optimal tile paths from user-provided DNA sequences.

    View details for DOI 10.1101/gr.4455906

    View details for Web of Science ID 000235122000015

    View details for PubMedID 16365382

  • ProCAT: a data analysis approach for protein microarrays GENOME BIOLOGY Zhu, X., Gerstein, M., Snyder, M. 2006; 7 (11)


    Protein microarrays provide a versatile method for the analysis of many protein biochemical activities. Existing DNA microarray analytical methods do not translate to protein microarrays due to differences between the technologies. Here we report a new approach, ProCAT, which corrects for background bias and spatial artifacts, identifies significant signals, filters nonspecific spots, and normalizes the resulting signal to protein abundance. ProCAT provides a powerful and flexible new approach for analyzing many types of protein microarrays.

    View details for DOI 10.1186/gb-2006-7-11-r110

    View details for Web of Science ID 000243967000014

    View details for PubMedID 17109749

  • BoCaTFBS: a boosted cascade learner to refine the binding sites suggested by ChIP-chip experiments GENOME BIOLOGY Wang, L., Snyder, M., Gerstein, M. 2006; 7 (11)


    Comprehensive mapping of transcription factor binding sites is essential in postgenomic biology. For this, we propose a mining approach combining noisy data from ChIP (chromatin immunoprecipitation)-chip experiments with known binding site patterns. Our method (BoCaTFBS) uses boosted cascades of classifiers for optimum efficiency, in which components are alternating decision trees; it exploits interpositional correlations; and it explicitly integrates massive negative information from ChIP-chip experiments. We applied BoCaTFBS within the ENCODE project and showed that it outperforms many traditional binding site identification methods (for instance, profiles).

    View details for DOI 10.1186/gb-2006-7-11-r102

    View details for Web of Science ID 000243967000006

    View details for PubMedID 17078876

  • Extrapolating traditional DNA microarray statistics to tiling and protein microarray technologies DNA MICROARRAYS, PART B: DATABASES AND STATISTICS Royce, T. E., Rozowsky, J. S., Luscombe, N. M., Emanuelsson, O., Yu, H., Zhu, X., Snyder, M., Gerstein, M. B. 2006; 411: 282-311


    A credit to microarray technology is its broad application. Two experiments--the tiling microarray experiment and the protein microarray experiment--are exemplars of the versatility of the microarrays. With the technology's expanding list of uses, the corresponding bioinformatics must evolve in step. There currently exists a rich literature developing statistical techniques for analyzing traditional gene-centric DNA microarrays, so the first challenge in analyzing the advanced technologies is to identify which of the existing statistical protocols are relevant and where and when revised methods are needed. A second challenge is making these often very technical ideas accessible to the broader microarray community. The aim of this chapter is to present some of the most widely used statistical techniques for normalizing and scoring traditional microarray data and indicate their potential utility for analyzing the newer protein and tiling microarray experiments. In so doing, we will assume little or no prior training in statistics of the reader. Areas covered include background correction, intensity normalization, spatial normalization, and the testing of statistical significance.

    View details for DOI 10.1016/S0076-6879(06)11015-0

    View details for Web of Science ID 000244506300015

    View details for PubMedID 16939796

  • Genomic analysis of insertion behavior and target specificity of mini-Tn7 and Tn3 transposons in Saccharomyces cerevisiae NUCLEIC ACIDS RESEARCH Seringhaus, M., Kumar, A., Hartigan, J., Snyder, M., Gerstein, M. 2006; 34 (8)


    Transposons are widely employed as tools for gene disruption. Ideally, they should display unbiased insertion behavior, and incorporate readily into any genomic DNA to which they are exposed. However, many transposons preferentially insert at specific nucleotide sequences. It is unclear to what extent such bias affects their usefulness as mutagenesis tools. Here, we examine insertion site specificity and global insertion behavior of two mini-transposons previously used for large-scale gene disruption in Saccharomyces cerevisiae: Tn3 and Tn7. Using an expanded set of insertion data, we confirm that Tn3 displays marked preference for the AT-rich 5 bp consensus site TA[A/T]TA, whereas Tn7 displays negligible target site preference. On a genome level, both transposons display marked non-uniform insertion behavior: certain sites are targeted far more often than expected, and both distributions depart drastically from Poisson. Thus, to compare their insertion behavior on a genome level, we developed a windowed Kolmogorov-Smirnov (K-S) test to analyze transposon insertion distributions in sequence windows of various sizes. We find that when scored in large windows (>300 bp), both Tn3 and Tn7 distributions appear uniform, whereas in smaller windows, Tn7 appears uniform while Tn3 does not. Thus, both transposons are effective tools for gene disruption, but Tn7 does so with less duplication and a more uniform distribution, better approximating the behavior of the ideal transposon.

    View details for DOI 10.1093/nar/gkl184

    View details for Web of Science ID 000237697000001

    View details for PubMedID 16648358

  • Novel transcribed regions in the human genome 71st Cold Spring Harbor Symposium on Quantitative Biology Rozowsky, J., Wu, J., Lian, Z., Nagalakshmi, U., Korbel, J. O., Kapranov, P., Zheng, D., Dyke, S., Newburger, P., Miller, P., Gingeras, T. R., Weissman, S., Gerstein, M., Snyder, M. COLD SPRING HARBOR LAB PRESS, PUBLICATIONS DEPT. 2006: 111–116


    We have used genomic tiling arrays to identify transcribed regions throughout the human genome. Analysis of the mapping results of RNA isolated from five cell/tissue types, NB4 cells, NB4 cells treated with retinoic acid (RA), NB4 cells treated with 12-O-tetradecanoylphorbol-13 acetate (TPA), neutrophils, and placenta, throughout the ENCODE region reveals a large number of novel transcribed regions. Interestingly, neutrophils exhibit a great deal of novel expression in several intronic regions. Comparison of the hybridization results of NB4 cells treated with different stimuli relative to untreated cells reveals that many new regions are expressed upon cell differentiation. One such region is the Hox locus, which contains a large number of novel regions expressed in a number of cell types. Analysis of the trinucleotide composition of the novel transcribed regions reveals that it is similar to that of known exons. These results suggest that many of the novel transcribed regions may have a functional role.

    View details for Web of Science ID 000245962800015

    View details for PubMedID 17381286

  • Global changes in STAT target selection and transcription regulation upon interferon treatments GENES & DEVELOPMENT Hartman, S. E., Bertone, P., Nath, A. K., Royce, T. E., Gerstein, M., Weissman, S., Snyder, M. 2005; 19 (24): 2953-2968


    The STAT (signal transducer and activator of transcription) proteins play a crucial role in the regulation of gene expression, but their targets and the manner in which they select them remain largely unknown. Using chromatin immunoprecipitation and DNA microarray analysis (ChIP-chip), we have identified the regions of human chromosome 22 bound by STAT1 and STAT2 in interferon-treated cells. Analysis of the genomic loci proximal to these binding sites introduced new candidate STAT1 and STAT2 target genes, several of which are affiliated with proliferation and apoptosis. The genes on chromosome 22 that exhibited interferon-induced up- or down-regulated expression were determined and correlated with the STAT-binding site information, revealing the potential regulatory effects of STAT1 and STAT2 on their target genes. Importantly, the comparison of STAT1-binding sites upon interferon (IFN)-gamma and IFN-alpha treatments revealed dramatic changes in binding locations between the two treatments. The IFN-alpha induction revealed nonconserved STAT1 occupancy at IFN-gamma-induced sites, as well as novel sites of STAT1 binding not evident in IFN-gamma-treated cells. Many of these correlated with binding by STAT2, but others were STAT2 independent, suggesting that multiple mechanisms direct STAT1 binding to its targets under different activation conditions. Overall, our results reveal a wealth of new information regarding IFN/STAT-binding targets and also fundamental insights into mechanisms of regulation of gene expression in different cell states.

    View details for DOI 10.1101/gad.1371305

    View details for Web of Science ID 000234095500004

    View details for PubMedID 16319195

  • Global analysis of protein phosphorylation in yeast NATURE Ptacek, J., Devgan, G., Michaud, G., Zhu, H., Zhu, X. W., Fasolo, J., Guo, H., Jona, G., Breitkreutz, A., Sopko, R., McCartney, R. R., Schmidt, M. C., Rachidi, N., Lee, S. J., Mah, A. S., Meng, L., Stark, M. J., Stern, D. F., De Virgilio, C., Tyers, M., Andrews, B., Gerstein, M., Schweitzer, B., Predki, P. F., Snyder, M. 2005; 438 (7068): 679-684


    Protein phosphorylation is estimated to affect 30% of the proteome and is a major regulatory mechanism that controls many basic cellular processes. Until recently, our biochemical understanding of protein phosphorylation on a global scale has been extremely limited; only one half of the yeast kinases have known in vivo substrates and the phosphorylating kinase is known for less than 160 phosphoproteins. Here we describe, with the use of proteome chip technology, the in vitro substrates recognized by most yeast protein kinases: we identified over 4,000 phosphorylation events involving 1,325 different proteins. These substrates represent a broad spectrum of different biochemical functions and cellular roles. Distinct sets of substrates were recognized by each protein kinase, including closely related kinases of the protein kinase A family and four cyclin-dependent kinases that vary only in their cyclin subunits. Although many substrates reside in the same cellular compartment or belong to the same functional category as their phosphorylating kinase, many others do not, indicating possible new roles for several kinases. Furthermore, integration of the phosphorylation results with protein-protein interaction and transcription factor binding data revealed novel regulatory modules. Our phosphorylation results have been assembled into a first-generation phosphorylation map for yeast. Because many yeast proteins and pathways are conserved, these results will provide insights into the mechanisms and roles of protein phosphorylation in many eukaryotes.

    View details for DOI 10.1038/nature04187

    View details for Web of Science ID 000233593100053

    View details for PubMedID 16319894

  • Biochemical and genetic analysis of the yeast proteome with a movable ORF collection GENES & DEVELOPMENT Gelperin, D. M., White, M. A., Wilkinson, M. L., Kon, Y., Kung, L. A., Wise, K. J., Lopez-Hoyo, N., Jiang, L. X., Piccirillo, S., Yu, H. Y., Gerstein, M., Dumont, M. E., Phizicky, E. M., Snyder, M., Grayhack, E. J. 2005; 19 (23): 2816-2826


    Functional analysis of the proteome is an essential part of genomic research. To facilitate different proteomic approaches, a MORF (moveable ORF) library of 5854 yeast expression plasmids was constructed, each expressing a sequence-verified ORF as a C-terminal ORF fusion protein, under regulated control. Analysis of 5573 MORFs demonstrates that nearly all verified ORFs are expressed, suggests the authenticity of 48 ORFs characterized as dubious, and implicates specific processes including cytoskeletal organization and transcriptional control in growth inhibition caused by overexpression. Global analysis of glycosylated proteins identifies 109 new confirmed N-linked and 345 candidate glycoproteins, nearly doubling the known yeast glycome.

    View details for Web of Science ID 000233765900003

    View details for PubMedID 16322557

  • Advances in functional protein microarray technology 15th Biennial Conference on Methods in Protein Structure Analysis Bertone, P., Snyder, M. WILEY-BLACKWELL. 2005: 5400–5411


    Numerous innovations in high-throughput protein production and microarray surface technologies have enabled the development of addressable formats for proteins ordered at high spatial density. Protein array implementations have largely focused on antibody arrays for high-throughput protein profiling. However, it is also possible to construct arrays of full-length, functional proteins from a library of expression clones. The advent of protein-based microarrays allows the global observation of biochemical activities on an unprecedented scale, where hundreds or thousands of proteins can be simultaneously screened for protein-protein, protein-nucleic acid, and small molecule interactions. This technology holds great potential for basic molecular biology research, disease marker identification, toxicological response profiling and pharmaceutical target screening.

    View details for DOI 10.1111/j.1742-4658.2005.04970.x

    View details for Web of Science ID 000232772200003

    View details for PubMedID 16262682

  • Checkpoint maintenance requires Ame1 and Okp1 CELL CYCLE Pot, I., Knockleby, J., Aneliunas, V., Nguyen, T., Ah-Kye, S., Liszt, G., Snyder, M., Hieter, P., Vogel, J. 2005; 4 (10): 1448-1456


    Kinetochore proteins are required for high fidelity chromosome segregation and as a platform for checkpoint signaling. Ame1 is an essential component of the COMA (Ctf19, Okp1, Mcm21, Ame1) sub-complex of the central kinetochore of budding yeast. In this study, we describe the isolation and characterization of an Ame1 conditional mutant, ame1-4. ame1-4 cells exhibit chromosome segregation defects and Mad2-dependent cell cycle delay similar to okp1-5 cells. However, the viability of ame1-4 cells is markedly reduced relative to wild type and okp1-5 cells after three hours at restrictive temperature. To determine if ame1-4 cells enter anaphase with mis-segregated chromosomes, we monitored the localization of Bub3:VFP as a marker for anaphase onset. ame1-4 cells containing mis-segregated sister chromatids initially accumulate Bub3:VFP at kinetochores, indicating checkpoint activation and a metaphase arrest. Subsequently, Bub3:VFP de-localizes and cells reinitiate DNA duplication and budding without cytokinesis in the presence of un-segregated chromosomes. Overexpression of OKP1 in ame1-4 cells restores ame1-4 protein localization and a stable arrest. Based on our results, we propose that Ame1 and Okp1 are required for a sustained checkpoint arrest in the presence of mis-segregated chromosomes. Our results suggest that checkpoint response might be controlled not only at the level of activation but also via signals that ensure maintenance of the response.

    View details for Web of Science ID 000233751500030

    View details for PubMedID 16177574

  • A pilot study of transcription unit analysis in rice using oligonucleotide tiling-path microarray PLANT MOLECULAR BIOLOGY Stolc, V., Li, L., Wang, X. F., Li, X. Y., Su, N., Tongprasit, W., Han, B., Xue, Y. B., Li, J. Y., Snyder, M., Gerstein, M., Wang, J., Deng, X. W. 2005; 59 (1): 137-149


    As the international efforts to sequence the rice genome are completed, an immediate challenge and opportunity is to comprehensively and accurately define all transcription units in the rice genome. Here we describe a strategy of using high-density oligonucleotide tiling-path microarrays to map transcription of the japonica rice genome. In a pilot experiment to test this approach, one array representing the reverse strand of the last 11.2 Mb sequence of chromosome 10 was analyzed in detail based on a mathematical model developed in this study. Analysis of the array data detected 77% of the reference gene models in a mixture of four RNA populations. Moreover, significant transcriptional activities were found in many of the previously annotated intergenic regions. These preliminary results demonstrate the utility of genome tiling microarrays in evaluating annotated rice gene models and in identifying novel transcription units that will facilitate rice genome annotation.

    View details for DOI 10.1007/s11103-005-6164-5

    View details for Web of Science ID 000232498000012

    View details for PubMedID 16217608

  • Issues in the analysis of oligonucleotide tiling microarrays for transcript mapping TRENDS IN GENETICS Royce, T. E., Rozowsky, J. S., Bertone, P., Samanta, M., Stolc, V., Weissman, S., Snyder, M., Gerstein, M. 2005; 21 (8): 466-475


    Traditional microarrays use probes complementary to known genes to quantitate the differential gene expression between two or more conditions. Genomic tiling microarray experiments differ in that probes that span a genomic region at regular intervals are used to detect the presence or absence of transcription. This difference means the same sets of biases and the methods for addressing them are unlikely to be relevant to both types of experiment. We introduce the informatics challenges arising in the analysis of tiling microarray experiments as open problems to the scientific community and present initial approaches for the analysis of this nascent technology.

    View details for DOI 10.1016/j.tig.2005.06.007

    View details for Web of Science ID 000231209200010

    View details for PubMedID 15979196

  • Prospects and challenges in proteomics PLANT PHYSIOLOGY Bertone, P., Snyder, M. 2005; 138 (2): 560-562

    View details for DOI 10.1104/pp.104.900154

    View details for Web of Science ID 000229774200009

    View details for PubMedID 15955915

  • Sexual dimorphism in mammalian gene expression TRENDS IN GENETICS Rinn, J. L., Snyder, M. 2005; 21 (5): 298-305


    Males and females have obvious phenotypic differences; they also exhibit differences related to health, life span, cognitive abilities and have different responses to diseases such as anemia, coronary heart disease, hypertension and renal dysfunction. Although the anatomical, hormonal and chemical differences between the sexes are well known, there are few molecular descriptors for gender-specific physiological traits and health risks. Recent studies using microarrays and other methods have made significant progress towards elucidating the molecular differences between mammalian sexes in a variety of tissues and towards identifying the transcription factors that regulate sex-biased gene expression. These findings are providing new insights into the molecular and genetic differences that dictate the different behaviors and physiologies of mammalian sexes.

    View details for DOI 10.1016/j.tig.2005.03.005

    View details for Web of Science ID 000229143800012

    View details for PubMedID 15851067

  • Applications of DNA tiling arrays to experimental genome annotation and regulatory pathway discovery CHROMOSOME RESEARCH Bertone, P., Gerstein, M., Snyder, M. 2005; 13 (3): 259-274


    Microarrays have become a popular and important technology for surveying global patterns in gene expression and regulation. A number of innovative experiments have extended microarray applications beyond the measurement of mRNA expression levels, in order to uncover aspects of large-scale chromosome function and dynamics. This has been made possible due to the recent development of tiling arrays, where all non-repetitive DNA comprising a chromosome or locus is represented at various sequence resolutions. Since tiling arrays are designed to contain the entire DNA sequence without prior consultation of existing gene annotation, they enable the discovery of novel transcribed sequences and regulatory elements through the unbiased interrogation of genomic loci. The implementation of such methods for the global analysis of large eukaryotic genomes presents significant technical challenges. Nonetheless, tiling arrays are expected to become instrumental for the genome-wide identification and characterization of functional elements. Combined with computational methods to relate these data and map the complex interactions of transcriptional regulators, tiling array experiments can provide insight toward a more comprehensive understanding of fundamental molecular and cellular processes.

    View details for DOI 10.1007/s10577-005-2165-0

    View details for Web of Science ID 000228868500005

    View details for PubMedID 15868420

  • Substrate specificity analysis of protein kinase complex Dbf2-Mob1 by peptide library and proteome array screening. BMC biochemistry Mah, A. S., Elia, A. E., Devgan, G., Ptacek, J., Schutkowski, M., Snyder, M., Yaffe, M. B., Deshaies, R. J. 2005; 6: 22-?


    The mitotic exit network (MEN) is a group of proteins that form a signaling cascade that is essential for cells to exit mitosis in Saccharomyces cerevisiae. The MEN has also been implicated in playing a role in cytokinesis. Two components of this signaling pathway are the protein kinase Dbf2 and its binding partner essential for its kinase activity, Mob1. The components of MEN that act upstream of Dbf2-Mob1 have been characterized, but physiological substrates for Dbf2-Mob1 have yet to be identified.Using a combination of peptide library selection, phosphorylation of optimal peptide variants, and screening of a phosphosite array, we found that Dbf2-Mob1 preferentially phosphorylated serine over threonine and required an arginine three residues upstream of the phosphorylated serine in its substrate. This requirement for arginine in peptide substrates could not be substituted with the similarly charged lysine. This specificity determined for peptide substrates was also evident in many of the proteins phosphorylated by Dbf2-Mob1 in a proteome chip analysis.We have determined by peptide library selection and phosphosite array screening that the protein kinase Dbf2-Mob1 preferentially phosphorylated substrates that contain an RXXS motif. A subsequent proteome microarray screen revealed proteins that can be phosphorylated by Dbf2-Mob1 in vitro. These proteins are enriched for RXXS motifs, and may include substrates that mediate the function of Dbf2-Mob1 in mitotic exit and cytokinesis. The relatively low degree of sequence restriction at the site of phosphorylation suggests that Dbf2 achieves specificity by docking its substrates at a site that is distinct from the phosphorylation site.

    View details for PubMedID 16242037

  • Global analysis of protein function using protein microarrays 2nd International Conference on Functional Genomics of Ageing Smith, M. G., Jona, G., Ptacek, J., Devgan, G., Zhu, H., Zhu, X. W., Snyder, M. ELSEVIER IRELAND LTD. 2005: 171–75


    Protein microarrays containing thousands of proteins arrayed at high density can be prepared and probed for a wide variety of activities, thereby allowing the large scale analysis of many proteins simultaneously. In addition to identifying the activities of many previously uncharacterized proteins, protein microarrays can reveal new activities of well-characterized proteins, thus providing new insights about the functions of these proteins. Below, we describe the construction and use of protein microarrays and their applications using yeast as a model system.

    View details for DOI 10.1016/j.mad.2004.09.019

    View details for Web of Science ID 000226564000022

    View details for PubMedID 15610776

  • Global identification of human transcribed sequences with genome tiling arrays SCIENCE Bertone, P., Stolc, V., Royce, T. E., Rozowsky, J. S., Urban, A. E., Zhu, X. W., Rinn, J. L., Tongprasit, W., Samanta, M., Weissman, S., Gerstein, M., Snyder, M. 2004; 306 (5705): 2242-2246


    Elucidating the transcribed regions of the genome constitutes a fundamental aspect of human biology, yet this remains an outstanding problem. To comprehensively identify coding sequences, we constructed a series of high-density oligonucleotide tiling arrays representing sense and antisense strands of the entire nonrepetitive sequence of the human genome. Transcribed sequences were located across the genome via hybridization to complementary DNA samples, reverse-transcribed from polyadenylated RNA obtained from human liver tissue. In addition to identifying many known and predicted genes, we found 10,595 transcribed sequences not detected by other methods. A large fraction of these are located in intergenic regions distal from previously annotated genes and exhibit significant homology to other mammalian proteins.

    View details for DOI 10.1126/science.1103388

    View details for Web of Science ID 000225950000042

    View details for PubMedID 15539566

  • DNA replication-timing analysis of human chromosome 22 at high resolution and different developmental states PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA White, E. J., Emanuelsson, O., Scalzo, D., Royce, T., Kosak, S., Oakeley, E. J., Weissman, S., Gerstein, M., Groudine, M., Snyder, M., Schubeler, D. 2004; 101 (51): 17771-17776


    Duplication of the genome during the S phase of the cell cycle does not occur simultaneously; rather, different sequences are replicated at different times. The replication timing of specific sequences can change during development; however, the determinants of this dynamic process are poorly understood. To gain insights into the contribution of developmental state, genomic sequence, and transcriptional activity to replication timing, we investigated the timing of DNA replication at high resolution along an entire human chromosome (chromosome 22) in two different cell types. The pattern of replication timing was correlated with respect to annotated genes, gene expression, novel transcribed regions of unknown function, sequence composition, and cytological features. We observed that chromosome 22 contains regions of early- and late-replicating domains of 100 kb to 2 Mb, many (but not all) of which are associated with previously described chromosomal bands. In both cell types, expressed sequences are replicated earlier than nontranscribed regions. However, several highly transcribed regions replicate late. Overall, the DNA replication-timing profiles of the two different cell types are remarkably similar, with only nine regions of difference observed. In one case, this difference reflects the differential expression of an annotated gene that resides in this region. Novel transcribed regions with low coding potential exhibit a strong propensity for early DNA replication. Although the cellular function of such transcripts is poorly understood, our results suggest that their activity is linked to the replication-timing program.

    View details for DOI 10.1073/pnas.0408170101

    View details for Web of Science ID 000225951500038

    View details for PubMedID 15591350

  • Regulation of gene expression by a metabolic enzyme SCIENCE Hall, D. A., Zhu, H., Zhu, X. W., Royce, T., Gerstein, M., Snyder, M. 2004; 306 (5695): 482-484


    Gene expression in eukaryotes is normally believed to be controlled by transcriptional regulators that activate genes encoding structural proteins and enzymes. To identify previously unrecognized DNA binding activities, a yeast proteome microarray was screened with DNA probes; Arg5,6, a well-characterized mitochondrial enzyme involved in arginine biosynthesis, was identified. Chromatin immunoprecipitation experiments revealed that Arg5,6 is associated with specific nuclear and mitochondrial loci in vivo, and Arg5,6 binds to specific fragments in vitro. Deletion of Arg5,6 causes altered transcript levels of both nuclear and mitochondrial target genes. These results indicate that metabolic enzymes can directly regulate eukaryotic gene expression.

    View details for Web of Science ID 000224626500052

    View details for PubMedID 15486299

  • Large-scale mutagenesis of the yeast genome using a Tn7-derived multipurpose transposon GENOME RESEARCH Kumar, A., Seringhaus, M., Biery, M. C., Sarnovsky, R. J., Umansky, L., Piccirillo, S., Heidtman, M., Cheung, K. H., Dobry, C. J., Gerstein, M. B., Craig, N. L., Snyder, M. 2004; 14 (10A): 1975-1986


    We present here an unbiased and extremely versatile insertional library of yeast genomic DNA generated by in vitro mutagenesis with a multipurpose element derived from the bacterial transposon Tn7. This mini-Tn7 element has been engineered such that a single insertion can be used to generate a lacZ fusion, gene disruption, and epitope-tagged gene product. Using this transposon, we generated a plasmid-based library of approximately 300,000 mutant alleles; by high-throughput screening in yeast, we identified and sequenced 9032 insertions affecting 2613 genes (45% of the genome). From analysis of 7176 insertions, we found little bias in Tn7 target-site selection in vitro. In contrast, we also sequenced 10,174 Tn3 insertions and found a markedly stronger preference for an AT-rich 5-base pair target sequence. We further screened 1327 insertion alleles in yeast for hypersensitivity to the chemotherapeutic cisplatin. Fifty-one genes were identified, including four functionally uncharacterized genes and 25 genes involved in DNA repair, replication, transcription, and chromatin structure. In total, the collection reported here constitutes the largest plasmid-based set of sequenced yeast mutant alleles to date and, as such, should be singularly useful for gene and genome-wide functional analysis.

    View details for DOI 10.1101/gr.2875304

    View details for Web of Science ID 000224405900017

    View details for PubMedID 15466296

  • Genomic analysis of regulatory network dynamics reveals large topological changes NATURE Luscombe, N. M., Babu, M. M., Yu, H. Y., Snyder, M., Teichmann, S. A., Gerstein, M. 2004; 431 (7006): 308-312


    Network analysis has been applied widely, providing a unifying language to describe disparate systems ranging from social interactions to power grids. It has recently been used in molecular biology, but so far the resulting networks have only been analysed statically. Here we present the dynamics of a biological network on a genomic scale, by integrating transcriptional regulatory information and gene-expression data for multiple conditions in Saccharomyces cerevisiae. We develop an approach for the statistical analysis of network dynamics, called SANDY, combining well-known global topological measures, local motifs and newly derived statistics. We uncover large changes in underlying network architecture that are unexpected given current viewpoints and random simulations. In response to diverse stimuli, transcription factors alter their interactions to varying degrees, thereby rewiring the network. A few transcription factors serve as permanent hubs, but most act transiently only during certain conditions. By studying sub-network structures, we show that environmental responses facilitate fast signal propagation (for example, with short regulatory cascades), whereas the cell cycle and sporulation direct temporal progression through multiple stages (for example, with highly inter-connected transcription factors). Indeed, to drive the latter processes forward, phase-specific transcription factors inter-regulate serially, and ubiquitously active transcription factors layer above them in a two-tiered hierarchy. We anticipate that many of the concepts presented here--particularly the large-scale topological changes and hub transience--will apply to other biological networks, including complex sub-systems in higher eukaryotes.

    View details for DOI 10.1038/nature02782

    View details for Web of Science ID 000223864000041

    View details for PubMedID 15372033

  • Major molecular differences between mammalian sexes are involved in drug metabolism and renal function DEVELOPMENTAL CELL Rinn, J. L., Rozowsky, J. S., Laurenzi, I. J., Petersen, P. H., Zou, K. Y., Zhong, W. M., Gerstein, M., Snyder, M. 2004; 6 (6): 791-800


    Many anatomical differences exist between males and females; these are manifested on a molecular level by different hormonal environments. Although several molecular differences in adult tissues have been identified, a comprehensive investigation of the gene expression differences between males and females has not been performed. We surveyed the expression patterns of 13,977 mouse genes in male and female hypothalamus, kidney, liver, and reproductive tissues. Extensive differential gene expression was observed not only in the reproductive tissues, but also in the kidney and liver. The differentially expressed genes are involved in drug and steroid metabolism, osmotic regulation, or as yet unresolved cellular roles. In contrast, very few molecular differences were observed between the male and female hypothalamus in both mice and humans. We conclude that there are persistent differences in gene expression between adult males and females. These molecular differences have important implications for the physiological differences between males and females.

    View details for Web of Science ID 000222443200012

    View details for PubMedID 15177028

  • CREB binds to multiple loci on human chromosome 22 MOLECULAR AND CELLULAR BIOLOGY Euskirchen, G., Royce, T. E., Bertone, P., Martone, R., Rinn, J. L., Nelson, F. K., Sayward, F., Luscombe, N. M., Miller, P., Gerstein, M., Weissman, S., Snyder, M. 2004; 24 (9): 3804-3814


    The cyclic AMP-responsive element-binding protein (CREB) is an important transcription factor that can be activated by hormonal stimulation and regulates neuronal function and development. An unbiased, global analysis of where CREB binds has not been performed. We have mapped for the first time the binding distribution of CREB along an entire human chromosome. Chromatin immunoprecipitation of CREB-associated DNA and subsequent hybridization of the associated DNA to a genomic DNA microarray containing all of the nonrepetitive DNA of human chromosome 22 revealed 215 binding sites corresponding to 192 different loci and 100 annotated potential gene targets. We found binding near or within many genes involved in signal transduction and neuronal function. We also found that only a small fraction of CREB binding sites lay near well-defined 5' ends of genes; the majority of sites were found elsewhere, including introns and unannotated regions. Several of the latter lay near novel unannotated transcriptionally active regions. Few CREB targets were found near full-length cyclic AMP response element sites; the majority contained shorter versions or close matches to this sequence. Several of the CREB targets were altered in their expression by treatment with forskolin; interestingly, both induced and repressed genes were found. Our results provide novel molecular insights into how CREB mediates its functions in humans.

    View details for DOI 10.1128/MCB.24.9.3804-3814.2004

    View details for Web of Science ID 000220898100021

    View details for PubMedID 15082775

  • Microbial synergy via an ethanol-triggered pathway MOLECULAR AND CELLULAR BIOLOGY Smith, M. G., Des Etages, S. G., Snyder, M. 2004; 24 (9): 3874-3884


    We have discovered a microbial interaction between yeast, bacteria, and nematodes. Upon coculturing, Saccharomyces cerevisiae stimulated the growth of several species of Acinetobacter, including, A. baumannii, A. haemolyticus, A. johnsonii, and A. radioresistens, as well as several natural isolates of Acinetobacter. This enhanced growth was due to a diffusible factor that was shown to be ethanol by chemical assays and evaluation of strains lacking ADH1, ADH3, and ADH5, as all three genes are involved in ethanol production by yeast. This effect is specific to ethanol: methanol, butanol, and dimethyl sulfoxide were unable to stimulate growth to any appreciable level. Low doses of ethanol not only stimulated growth to a higher cell density but also served as a signaling molecule: in the presence of ethanol, Acinetobacter species were able to withstand the toxic effects of salt, indicating that ethanol alters cell physiology. Furthermore, ethanol-fed A. baumannii displayed increased pathogenicity when confronted with a predator, Caenorhabditis elegans. Our results are consistent with the concept that ethanol can serve as a signaling molecule which can affect bacterial physiology and survival.

    View details for DOI 10.1128/MCB.24.9.3874-3884.2004

    View details for Web of Science ID 000220898100027

    View details for PubMedID 15082781

  • Regulation of polarized growth initiation and termination cycles by the polarisome and Cdc42 regulators JOURNAL OF CELL BIOLOGY Bidlingmaier, S., Snyder, M. 2004; 164 (2): 207-218


    The dynamic regulation of polarized cell growth allows cells to form structures of defined size and shape. We have studied the regulation of polarized growth using mating yeast as a model. Haploid yeast cells treated with high concentration of pheromone form successive mating projections that initiate and terminate growth with regular periodicity. The mechanisms that control the frequency of growth initiation and termination under these conditions are not well understood. We found that the polarisome components Spa2, Pea2, and Bni1 and the Cdc42 regulators Cdc24 and Bem3 control the timing and frequency of projection formation. Loss of polarisome components and mutation of Cdc24 decrease the frequency of projection formation, while loss of Bem3 increases the frequency of projection formation. We found that polarisome components and the cell fusion proteins Fus1 and Fus2 are important for the termination of projection growth. Our results define the first molecular regulators that control the timing of growth initiation and termination during eukaryotic cell differentiation.

    View details for DOI 10.1083/jcb.200307065

    View details for Web of Science ID 000188370500006

    View details for PubMedID 14734532

  • Fast optimal genome tiling with applications to microarray design and homology search JOURNAL OF COMPUTATIONAL BIOLOGY Berman, P., Bertone, P., DasGupta, B., Gerstein, M., Kao, M. Y., Snyder, M. 2004; 11 (4): 766-785


    In this paper, we consider several variations of the following basic tiling problem: given a sequence of real numbers with two size-bound parameters, we want to find a set of tiles of maximum total weight such that each tiles satisfies the size bounds. A solution to this problem is important to a number of computational biology applications such as selecting genomic DNA fragments for PCR-based amplicon microarrays and performing homology searches with long sequence queries. Our goal is to design efficient algorithms with linear or near-linear time and space in the normal range of parameter values for these problems. For this purpose, we first discuss the solution to a basic online interval maximum problem via a sliding-window approach and show how to use this solution in a nontrivial manner for many of the tiling problems introduced. We also discuss NP-hardness results and approximation algorithms for generalizing our basic tiling problem to higher dimensions. Finally, computational results from applying our tiling algorithms to genomic sequences of five model eukaryotes are reported.

    View details for Web of Science ID 000223974700015

    View details for PubMedID 15579244

  • Analyzing antibody specificity with whole proteome microarrays NATURE BIOTECHNOLOGY Michaud, G. A., Salcius, M., Zhou, F., Bangham, R., Bonin, J., Guo, H., Snyder, M., Predki, P. F., Schweitzer, B. I. 2003; 21 (12): 1509-1512


    Although approximately 10,000 antibodies are available from commercial sources, antibody reagents are still unavailable for most proteins. Furthermore, new applications such as antibody arrays and monoclonal antibody therapeutics have increased the demand for more specific antibodies to reduce cross-reactivity and side effects. An array containing every protein for the relevant organism represents the ideal format for an assay to test antibody specificity, because it allows the simultaneous screening of thousands of proteins for possible cross-reactivity. As an initial test of this approach, we screened 11 polyclonal and monoclonal antibodies to approximately 5,000 different yeast proteins deposited on a glass slide and found that, in addition to recognizing their cognate proteins, the antibodies cross-reacted with other yeast proteins to varying degrees. Some of the interactions of the antibodies with noncognate proteins could be deduced by alignment of the primary amino acid sequences of the antigens and cross-reactive proteins; however, these interactions could not be predicted a priori. Our findings show that proteome array technology has potential to improve antibody design and selection for applications in both medicine and research.

    View details for DOI 10.1038/nbt910

    View details for Web of Science ID 000186845200031

    View details for PubMedID 14608365

  • Changes in the nutrient content of school lunches: results from the Pathways study PREVENTIVE MEDICINE Story, M., Snyder, M. P., Anliker, J., Weber, J. L., Cunningham-Sabo, L., Stone, E. J., Chamberlain, A., Ethelbah, B., Suchindran, C., Ring, K. 2003; 37 (6): S35-S45


    Pathways, a randomized trial, evaluated the effectiveness of a school-based multicomponent intervention to reduce fatness in American-Indian schoolchildren. The goal of the Pathways food service intervention component was to reduce the fat in school lunches to no more than 30% of energy from fat while maintaining recommended levels of calories and key nutrients.The intervention was implemented by school food service staff in intervention schools over a 3-year period. Five consecutive days of school lunch menu items were collected from 20 control and 21 intervention schools at four time periods, and nutrient content was analyzed.There was a significantly greater mean reduction in percent energy from fat and saturated fat in the intervention schools compared to the control schools. Mean percentages of energy from fat decreased from 33.1% at baseline to 28.3% at the end of the study in intervention schools compared to 33.2% at baseline and 32.2% at follow-up in the control schools (P<0.003). There were no statistically significant differences for calories or nutrients between intervention and control schools.The Pathways school food lunch intervention documented the feasibility of successfully lowering the percent of energy from fat, as part of a coordinated obesity prevention program for American-Indian children.

    View details for DOI 10.1016/j.ypmed.2003.08.009

    View details for Web of Science ID 000187114300005

    View details for PubMedID 14636807

  • Negative regulation of calcineurin signaling by Hrr25p, a yeast homolog of casein kinase I GENES & DEVELOPMENT Kafadar, K. A., Zhu, H., Snyder, M., Cyert, M. S. 2003; 17 (21): 2698-2708


    Calcineurin is a Ca2+/calmodulin-regulated protein phosphatase required for Saccharomyces cerevisiae to respond to a variety of environmental stresses. Calcineurin promotes cell survival during stress by dephosphorylating and activating the Zn-finger transcription factor Crz1p/Tcn1p. Using a high-throughput assay, we screened 119 yeast kinases for their ability to phosphorylate Crz1p in vitro and identified the casein kinase I homolog Hrr25p. Here we show that Hrr25p negatively regulates Crz1p activity and nuclear localization in vivo. Hrr25p binds to and phosphorylates Crz1p in vitro and in vivo. Overexpression of Hrr25p decreases Crz1p-dependent transcription and antagonizes its Ca2+-induced nuclear accumulation. In the absence of Hrr25p, activation of Crz1p by Ca2+/calcineurin is potentiated. These findings represent the first identification of a negative regulator for Crz1p, and establish a novel physiological role for Hrr25p in antagonizing calcineurin signaling.

    View details for DOI 10.1101/gad.1140603

    View details for Web of Science ID 000186299700011

    View details for PubMedID 14597664

  • Microarrays to characterize protein interactions on a whole-proteome scale PROTEOMICS Schweitzer, B., Predki, P., Snyder, M. 2003; 3 (11): 2190-2199


    Protein microarrays contain a defined set of proteins spotted and analyzed at high density, and can be generally classified into two categories; protein profiling arrays and functional protein arrays. Functional protein arrays can be made up of any type of protein, and therefore have a diverse set of useful applications. Advantages of these arrays include low reagent consumption, rapid interpretation of results, and the ability to easily control experimental conditions. The ultimate form of a functional protein array consists of all of the proteins encoded by the genome of an organism; such an array would be the whole proteome equivalent of the whole genome DNA arrays that are now available. While proteome microarrays may not have reached the stage of maturity of DNA microarrays, recent developments have shown that many of the barriers holding back the technology can be overcome. Arrays of this type have already been used to rapidly screen large numbers of proteins simultaneously for biochemical activities, protein-protein interactions, protein-lipid interactions, protein-nucleic acid interactions, and protein-small molecule interactions. Eventually, functional protein arrays will be used to facilitate various steps in the drug discovery and early development processes that are currently bottlenecks in the drug development continuum.

    View details for DOI 10.1002/pmic.200300610

    View details for Web of Science ID 000186582500015

    View details for PubMedID 14595818

  • A Bayesian networks approach for predicting protein-protein interactions from genomic data SCIENCE Jansen, R., Yu, H. Y., Greenbaum, D., Kluger, Y., Krogan, N. J., Chung, S. B., Emili, A., Snyder, M., Greenblatt, J. F., Gerstein, M. 2003; 302 (5644): 449-453


    We have developed an approach using Bayesian networks to predict protein-protein interactions genome-wide in yeast. Our method naturally weights and combines into reliable predictions genomic features only weakly associated with interaction (e.g., messenger RNAcoexpression, coessentiality, and colocalization). In addition to de novo predictions, it can integrate often noisy, experimental interaction data sets. We observe that at given levels of sensitivity, our predictions are more accurate than the existing high-throughput experimental data sets. We validate our predictions with TAP (tandem affinity purification) tagging experiments. Our analysis, which gives a comprehensive view of yeast interactions, is available at

    View details for Web of Science ID 000185963200044

    View details for PubMedID 14564010

  • Distribution of NF-kappa B-binding sites across human chromosome 22 PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Martone, R., Euskirchen, G., Bertone, P., Hartman, S., Royce, T. E., Luscombe, N. M., Rinn, J. L., Nelson, F. K., Miller, P., Gerstein, M., Weissman, S., Snyder, M. 2003; 100 (21): 12247-12252


    We have mapped the chromosomal binding site distribution of a transcription factor in human cells. The NF-kappaB family of transcription factors plays an essential role in regulating the induction of genes involved in several physiological processes, including apoptosis, immunity, and inflammation. The binding sites of the NF-kappaB family member p65 were determined by using chromatin immunoprecipitation and a genomic microarray of human chromosome 22 DNA. Sites of binding were observed along the entire chromosome in both coding and noncoding regions, with an enrichment at the 5' end of genes. Strikingly, a significant proportion of binding was seen in intronic regions, demonstrating that transcription factor binding is not restricted to promoter regions. NF-kappaB binding was also found at genes whose expression was regulated by tumor necrosis factor alpha, a known inducer of NF-kappaB-dependent gene expression, as well as adjacent to genes whose expression is not affected by tumor necrosis factor alpha. Many of these latter genes are either known to be activated by NF-kappaB under other conditions or are consistent with NF-kappaB's role in the immune and apoptotic responses. Our results suggest that binding is not restricted to promoter regions and that NF-kappaB binding occurs at a significant number of genes whose expression is not altered, thereby suggesting that binding alone is not sufficient for gene activation.

    View details for DOI 10.1073/pnas.2135255100

    View details for Web of Science ID 000186024300058

    View details for PubMedID 14527995

  • Cytoskeletal activation of a checkpoint kinase MOLECULAR CELL Hanrahan, J., Snyder, M. 2003; 12 (3): 663-673


    The assembly of cytoskeletal structures is coupled to other cellular processes. We have studied the molecular mechanism by which assembly of the yeast septin cytoskeleton is monitored and coordinated with cell cycle progression by analyzing a key regulatory protein kinase, Hsl1, that becomes activated only when the septin cytoskeleton is properly assembled. We first identified a regulatory region of Hsl1 that physically associates with the kinase domain and found that it performs an autoinhibitory function both in vivo and in vitro. Several septin binding domains lie near and overlap the inhibitory domain; these are important for Hsl1 function, and binding of two septins, Cdc11 and Cdc12, relieves the autoinhibition imposed by the kinase inhibitory domain in vitro. Our results suggest that binding to multiple septins activates Hsl1 kinase activity, thereby promoting cell cycle progression. The high conservation of Hsl1 indicates that similar mechanisms may monitor cytoskeletal organization in other eukaryotes.

    View details for Web of Science ID 000185613800015

    View details for PubMedID 14527412

  • Specific protein targeting during cell differentiation: Polarized localization of Fus1p during mating depends on Chs5p in Saccharomyces cerevisiae EUKARYOTIC CELL Santos, B., Snyder, M. 2003; 2 (4): 821-825


    In budding yeast, chs5 mutants are defective in chitin synthesis and cell fusion during mating. Chs5p is a late-Golgi protein required for the polarized transport of the chitin synthase Chs3p to the membrane. Here we show that Chs5p is also essential for the polarized targeting of Fus1p, but not of other cell fusion proteins, to the membrane during mating.

    View details for DOI 10.1128/EC.2.4.821-825.2003

    View details for Web of Science ID 000184803000018

    View details for PubMedID 12912901

  • ExpressYourself: a modular platform for processing and visualizing microarray data NUCLEIC ACIDS RESEARCH Luscombe, N. M., Royce, T. E., Bertone, P., Echols, N., Horak, C. E., Chang, J. T., Snyder, M., Gerstein, M. 2003; 31 (13): 3477-3482


    DNA microarrays are widely used in biological research; by analyzing differential hybridization on a single microarray slide, one can detect changes in mRNA expression levels, increases in DNA copy numbers and the location of transcription factor binding sites on a genomic scale. Having performed the experiments, the major challenge is to process large, noisy datasets in order to identify the specific array elements that are significantly differentially hybridized. This normally requires aggregating different, often incompatible programs into a multi-step pipeline. Here we present ExpressYourself, a fully integrated platform for processing microarray data. In completely automated fashion, it will correct the background array signal, normalize the Cy5 and Cy3 signals, score levels of differential hybridization, combine the results of replicate experiments, filter problematic regions of the array and assess the quality of individual and replicate experiments. ExpressYourself is designed with a highly modular architecture so various types of microarray analysis algorithms can readily be incorporated as they are developed; for example, the system currently implements several normalization methods, including those that simultaneously consider signal intensity and slide location. The processed data are presented using a web-based graphical interface to facilitate comparison with the original images of the array slides. In particular, Express Yourself is able to regenerate images of the original microarray after applying various steps of processing, which greatly facilities identification of position-specific artifacts. The program is freely available for use at

    View details for DOI 10.1093/nar/gkg628

    View details for Web of Science ID 000183832900040

    View details for PubMedID 12824348

  • Recent developments in analytical and functional protein microarrays CURRENT OPINION IN MOLECULAR THERAPEUTICS Jona, G., Snyder, M. 2003; 5 (3): 271-277


    In recent years, the genomes of many different organisms have been fully sequenced and annotated. As a consequence of this information, a number of methods have emerged to study the function of many genes and proteins in parallel. One recent approach for the large-scale analysis of proteins is the use of protein microarrays in which hundreds to thousands of proteins are arrayed and assayed simultaneously. Protein arrays can be used for assessing protein levels and following disease markers, identifying biochemical activities, analyzing post-translational modifications, building interaction networks, and for drug discovery and development. In this review, we discuss the construction of different types of protein arrays, and their numerous and diverse applications.

    View details for Web of Science ID 000184024600009

    View details for PubMedID 12870437

  • Genomics - Defining genes in the genomics era SCIENCE Snyder, M., Gerstein, M. 2003; 300 (5617): 258-260

    View details for Web of Science ID 000182135400032

    View details for PubMedID 12690176

  • Molecular dissection of a yeast septin: Distinct domains are required for septin interaction, localization, and function MOLECULAR AND CELLULAR BIOLOGY Casamayor, A., Snyder, M. 2003; 23 (8): 2762-2777


    The septins are a family of cytoskeletal proteins present in animal and fungal cells. They were first identified for their essential role in cytokinesis, but more recently, they have been found to play an important role in many cellular processes, including bud site selection, chitin deposition, cell compartmentalization, and exocytosis. Septin proteins self-associate into filamentous structures that, in yeast cells, form a cortical ring at the mother bud neck. Members of the septin family share common structural domains: a GTPase domain in the central region of the protein, a stretch of basic residues at the amino terminus, and a predicted coiled-coil domain at the carboxy terminus. We have studied the role of each domain in the Saccharomyces cerevisiae septin Cdc11 and found that the three domains are responsible for distinct and sometimes overlapping functions. All three domains are important for proper localization and function in cytokinesis and morphogenesis. The basic region was found to bind the phosphoinositides phosphatidylinositol 4-phosphate and phosphatidylinositol 5-phosphate. The coiled-coil domain is important for interaction with Cdc3 and Bem4. The GTPase domain is involved in Cdc11-septin interaction and targeting to the mother bud neck. Surprisingly, GTP binding appears to be dispensable for Cdc11 function, localization, and lipid binding. Thus, we find that septins are multifunctional proteins with specific domains involved in distinct molecular interactions required for assembly, localization, and function within the cell.

    View details for DOI 10.1128/MCB.23.8.2762-2777.2003

    View details for Web of Science ID 000182049900012

    View details for PubMedID 12665577

  • Protein analysis on a proteomic scale NATURE Phizicky, E., Bastiaens, P. I., Zhu, H., Snyder, M., Fields, S. 2003; 422 (6928): 208-215


    The long-term challenge of proteomics is enormous: to define the identities, quantities, structures and functions of complete complements of proteins, and to characterize how these properties vary in different cellular contexts. One critical step in tackling this goal is the generation of sets of clones that express a representative of each protein of a proteome in a useful format, followed by the analysis of these sets on a genome-wide basis. Such studies enable genetic, biochemical and cell biological technologies to be applied on a systematic level, leading to the assignment of biochemical activities, the construction of protein arrays, the identification of interactions, and the localization of proteins within cellular compartments.

    View details for DOI 10.1038/nature01512

    View details for Web of Science ID 000181488900055

    View details for PubMedID 12634794

  • The transcriptional activity of human Chromosome 22 GENES & DEVELOPMENT Rinn, J. L., Euskirchen, G., Bertone, P., Martone, R., Luscombe, N. M., Hartman, S., Harrison, P. M., Nelson, F. K., Miller, P., Gerstein, M., Weissman, S., Snyder, M. 2003; 17 (4): 529-540


    A DNA microarray representing nearly all of the unique sequences of human Chromosome 22 was constructed and used to measure global-transcriptional activity in placental poly(A)(+) RNA. We found that many of the known, related and predicted genes are expressed. More importantly, our study reveals twice as many transcribed bases as have been reported previously. Many of the newly discovered expressed fragments were verified by RNA blot analysis and a novel technique called differential hybridization mapping (DHM). Interestingly, a significant fraction of these novel fragments are expressed antisense to previously annotated introns. The coding potential of these novel expressed regions is supported by their sequence conservation in the mouse genome. This study has greatly increased our understanding of the biological information encoded on a human chromosome. To facilitate the dissemination of these results to the scientific community, we have developed a comprehensive Web resource to present the findings of this study and other features of human Chromosome 22 at

    View details for DOI 10.1101/gad.1055203

    View details for Web of Science ID 000181276200011

    View details for PubMedID 12600945

  • Protein chip technology CURRENT OPINION IN CHEMICAL BIOLOGY Zhu, H., Snyder, M. 2003; 7 (1): 55-63


    Microarray technology has become a crucial tool for large-scale and high-throughput biology. It allows fast, easy and parallel detection of thousands of addressable elements in a single experiment. In the past few years, protein microarray technology has shown its great potential in basic research, diagnostics and drug discovery. It has been applied to analyse antibody-antigen, protein-protein, protein-nucleic-acid, protein-lipid and protein-small-molecule interactions, as well as enzyme-substrate interactions. Recent progress in the field of protein chips includes surface chemistry, capture molecule attachment, protein labeling and detection methods, high-throughput protein/antibody production, and applications to analyse entire proteomes.

    View details for DOI 10.1016/S1367-5931(02)00005-4

    View details for Web of Science ID 000180868900009

    View details for PubMedID 12547427

  • Identification of novel functional elements in the human genome 67th Cold Spring Harbor Symposium on Quantitative Biology Lian, Z., Euskirchen, G., Rinn, J., Martone, R., Bertone, P., Hartman, S., Royce, T., Nelson, K., Sayward, F., Luscombe, N., Yang, J., Li, J. L., Miller, P., Urban, A. E., Gerstein, M., Weissman, S., Snyder, M. COLD SPRING HARBOR LAB PRESS, PUBLICATIONS DEPT. 2003: 317–322

    View details for Web of Science ID 000222969300037

    View details for PubMedID 15338632

  • Proteomics ANNUAL REVIEW OF BIOCHEMISTRY Zhu, H., Bilgin, M., Snyder, M. 2003; 72: 783-812


    Fueled by ever-growing DNA sequence information, proteomics-the large scale analysis of proteins-has become one of the most important disciplines for characterizing gene function, for building functional linkages between protein molecules, and for providing insight into the mechanisms of biological processes in a high-throughput mode. It is now possible to examine the expression of more than 1000 proteins using mass spectrometry technology coupled with various separation methods. High-throughput yeast two-hybrid approaches and analysis of protein complexes using affinity tag purification have yielded valuable protein-protein interaction maps. Large-scale protein tagging and subcellular localization projects have provided considerable information about protein function. Finally, recent developments in protein microarray technology provide a versatile tool to study protein-protein, protein-nucleic acid, protein-lipid, enzyme-substrate, and protein-drug interactions. Other types of microarrays, though not fully developed, also show great potential in diagnostics, protein profiling, and drug identification and validation. This review discusses high-throughput technologies for proteome analysis and their applications. Also discussed are the approaches used for the integrated analysis of the voluminous sets of data generated by proteome analysis conducted on a global scale.

    View details for DOI 10.1146/annurev.biochem.72.121801.161511

    View details for Web of Science ID 000185092500024

    View details for PubMedID 14527327

  • The alpha-factor receptor C-terminus is important for mating projection formation and orientation in Saccharomyces cerevisiae CELL MOTILITY AND THE CYTOSKELETON Vallier, L. G., Segall, J. E., Snyder, M. 2002; 53 (4): 251-266


    Successful mating of MATa Saccharomyces cerevisiae cells is dependent on Ste2p, the alpha-factor receptor. Besides receiving the pheromone signal and transducing it through the G-protein coupled MAP kinase pathway, Ste2p is active in the establishment and orientation of the mating projection. We investigated the role of the carboxyl terminus of the receptor in mating projection formation and orientation using a spatial gradient assay. Cells carrying the ste2-T326 mutation, truncating 105 of the 135 amino acids in the receptor tail including a motif necessary for its ligand-mediated internalization, display slow onset of projection formation, abnormal shmoo morphology, and reduced ability to orient the mating projection toward a pheromone source. This reduction was due to the increased loss of mating projection orientation in a pheromone gradient. Cells with a mutated endocytosis motif were defective in reorientation in a pheromone gradient. ste2-Delta296 cells, which carry a complete truncation of the Ste2p tail, exhibit a severe defect in projection formation, and those projections that do form are unable to orient in a pheromone gradient. These results suggest a complex role for the Ste2p carboxy-terminal tail in the formation, orientation, and directional adjustment of the mating projection, and that endocytosis of the receptor is important for this process. In addition, mutations in RSR1/BUD1 and SPA2, genes necessary for budding polarity, exhibited little or no defect in formation or orientation of mating projections. We conclude that mating projection orientation depends upon the carboxyl terminus of the pheromone receptor and not the directional machinery used in budding.

    View details for DOI 10.1002/cm.10073

    View details for Web of Science ID 000179314000001

    View details for PubMedID 12378535

  • Complex transcriptional circuitry at the G1/S transition in Saccharomyces cerevisiae GENES & DEVELOPMENT Horak, C. E., Luscombe, N. M., Qian, J. A., Bertone, P., Piccirrillo, S., Gerstein, M., Snyder, M. 2002; 16 (23): 3017-3033


    In the yeast Saccharomyces cerevisiae, SBF (Swi4-Swi6 cell cycle box binding factor) and MBF (MluI binding factor) are the major transcription factors regulating the START of the cell cycle, a time just before DNA replication, bud growth initiation, and spindle pole body (SPB) duplication. These two factors bind to the promoters of 235 genes, but bind less than a quarter of the promoters upstream of genes with peak transcript levels at the G1 phase of the cell cycle. Several functional categories, which are known to be crucial for G1/S events, such as SPB duplication/migration and DNA synthesis, are under-represented in the list of SBF and MBF gene targets. SBF binds the promoters of several other transcription factors, including HCM1, PLM2, POG1, TOS4, TOS8, TYE7, YAP5, YHP1, and YOX1. Here, we demonstrate that these factors are targets of SBF using an independent assay. To further elucidate the transcriptional circuitry that regulates the G1-to-S-phase progression, these factors were epitope-tagged and their binding targets were identified by chIp-chip analysis. These factors bind the promoters of genes with roles in G1/S events including DNA replication, bud growth, and spindle pole complex formation, as well as the general activities of mitochondrial function, transcription, and protein synthesis. Although functional overlap exists between these factors and MBF and SBF, each of these factors has distinct functional roles. Most of these factors bind the promoters of other transcription factors known to be cell cycle regulated or known to be important for cell cycle progression and differentiation processes indicating that a complex network of transcription factors coordinates the diverse activities that initiate a new cell cycle.

    View details for DOI 10.1101/gad.1039602

    View details for Web of Science ID 000179649300005

    View details for PubMedID 12464632

  • Proteomic approaches for the global analysis of proteins BIOTECHNIQUES Michaud, G. A., Snyder, M. 2002; 33 (6): 1308-1316


    Improvements in technology that allow miniaturization and high-throughput analyses of thousand of genes and gene products have changed the focus and scope of research and development in both academia and industry. It is now possible to study entire proteomes with the goals of elucidating protein expression, subcellular localization, biochemical activities, and their regulation. Alterations in different cell types and conditions and in normal and disease states can be revealed. This wealth of information not only has facilitated our basic understanding of many biological processes but also has enormous potential for drug discovery and development.

    View details for Web of Science ID 000179996500022

    View details for PubMedID 12503317

  • A novel mitochondrial protein, Tar1p, is encoded on the antisense strand of the nuclear 25S rDNA GENES & DEVELOPMENT Coelho, P. S., Bryan, A. C., Kumar, A., Shadel, G. S., Snyder, M. 2002; 16 (21): 2755-2760


    In eukaryotes, it is widely assumed that genes coding for proteins and structural RNAs do not overlap. Using a transposon-tagging strategy to globally analyze the Saccharomyces cerevisiae genome for expressed genes, we identified multiple insertions in an open reading frame that is contained fully within and transcribed antisense to the 25S rRNA gene in the nuclear rDNA repeat region on Chromosome XII. Expression of this gene, TAR1 (Transcript Antisense to Ribosomal RNA), can be detected at the RNA and protein levels, and the primary sequence of the corresponding 124-amino-acid protein is conserved in several yeast species. Tar1p was found to localize to mitochondria, and overexpression of the protein suppresses the respiration-deficient petite phenotype of a point mutation in mitochondrial RNA polymerase that affects mitochondrial gene expression and mtDNA stability. These findings indicate that coding information for protein and structural RNAs can overlap, raising issues regarding the coevolution of such complex genes, and also suggest that rDNA transcription and mitochondrial function are coordinately regulated in eukaryotic cells.

    View details for DOI 10.1101/gad.1035002

    View details for Web of Science ID 000179027900004

    View details for PubMedID 12414727

  • A dynamic approach to mapping coordinates between microplates and microarrays JOURNAL OF BIOMEDICAL INFORMATICS Cheung, K. H., Hager, J., Nelson, K., White, K., Li, Y. L., Snyder, M., Williams, K., Miller, P. 2002; 35 (5-6): 306-312


    The retrieval of useful data from spotted microarray slides requires keeping track of which microplate wells and DNA sample corresponds to each spot on each array slide. Existing approaches are closely coupled with the type of arrayer in use and are computer operating-system-specific. To support the microarray researcher community at large who use different arrayers and computer platforms, increased flexibility, generality, and portability of these approaches are required. In this paper, we describe a general algorithm that correlates the well positions of DNA samples in each microplate to the positions of the spots on each array slide. Based on this algorithm, we have implemented a flexible and platform-independent program named MicroArray Convolutor (MAC) that provides a Web solution allowing the user to: (a) import a text file that identifies the DNA samples and their well locations, (b) select a transformation method that converts data in 96-well plate format into 384-well plate format, and (c) specify the output format of the array lists dependant on the configuration of the array platform as well as the downstream analysis software chosen for the array. MAC and its source code can be accessed via the following Web address:

    View details for DOI 10.1016/S1532-0464(03)00033-9

    View details for Web of Science ID 000184879000004

    View details for PubMedID 12968779

  • Global analysis of gene expression in yeast. Functional & integrative genomics Horak, C. E., Snyder, M. 2002; 2 (4-5): 171-180


    In the past decade, there has been an intense effort to comprehensively catalogue the expressed genes in the yeast Saccharomyces cerevisiae and to determine the absolute and relative abundance of transcript and protein levels under different cellular conditions. Several methods have been developed to monitor gene expression: DNA microarray analysis, Serial Analysis of Gene Expression (SAGE), kinetic RT-PCR and monitoring expression of beta-galactosidase fusion proteins. These techniques have been used to measure transcript and protein abundance in different developmental states and under different environmental stimuli. A wealth of expression data for yeast is now publicly available through several web sites. The expression information that exists has the obvious benefits of providing a better understanding of the gene expression patterns that accompany changes in a yeast cell's environmental and developmental states. This data has also, however, provided clues to unraveling the complicated questions surrounding gene regulation: why and how is gene expression controlled?

    View details for PubMedID 12192590

  • Functional profiling of the Saccharomyces cerevisiae genome NATURE Giaever, G., Chu, A. M., Ni, L., CONNELLY, C., Riles, L., Veronneau, S., Dow, S., Lucau-Danila, A., Anderson, K., Andre, B., Arkin, A. P., Astromoff, A., El Bakkoury, M., Bangham, R., Benito, R., Brachat, S., Campanaro, S., Curtiss, M., Davis, K., Deutschbauer, A., Entian, K. D., Flaherty, P., Foury, F., Garfinkel, D. J., Gerstein, M., Gotte, D., Guldener, U., Hegemann, J. H., Hempel, S., Herman, Z., Jaramillo, D. F., Kelly, D. E., Kelly, S. L., Kotter, P., LaBonte, D., Lamb, D. C., Lan, N., Liang, H., Liao, H., Liu, L., Luo, C. Y., Lussier, M., Mao, R., Menard, P., Ooi, S. L., Revuelta, J. L., Roberts, C. J., Rose, M., Ross-Macdonald, P., Scherens, B., Schimmack, G., Shafer, B., Shoemaker, D. D., Sookhai-Mahadeo, S., Storms, R. K., Strathern, J. N., Valle, G., Voet, M., Volckaert, G., Wang, C. Y., Ward, T. R., Wilhelmy, J., Winzeler, E. A., Yang, Y. H., Yen, G., Youngman, E., Yu, K. X., Bussey, H., Boeke, J. D., Snyder, M., Philippsen, P., Davis, R. W., Johnston, M. 2002; 418 (6896): 387-391


    Determining the effect of gene deletion is a fundamental approach to understanding gene function. Conventional genetic screens exhibit biases, and genes contributing to a phenotype are often missed. We systematically constructed a nearly complete collection of gene-deletion mutants (96% of annotated open reading frames, or ORFs) of the yeast Saccharomyces cerevisiae. DNA sequences dubbed 'molecular bar codes' uniquely identify each strain, enabling their growth to be analysed in parallel and the fitness contribution of each gene to be quantitatively assessed by hybridization to high-density oligonucleotide arrays. We show that previously known and new genes are necessary for optimal growth under six well-studied conditions: high salt, sorbitol, galactose, pH 8, minimal medium and nystatin treatment. Less than 7% of genes that exhibit a significant increase in messenger RNA expression are also required for optimal growth in four of the tested conditions. Our results validate the yeast gene-deletion collection as a valuable resource for functional genomics.

    View details for DOI 10.1038/nature00935

    View details for Web of Science ID 000177009700029

    View details for PubMedID 12140549

  • Microtubule capture by the cleavage apparatus is required for proper spindle positioning in yeast GENES & DEVELOPMENT Kusch, J., Meyer, A., Snyder, M. P., Barral, Y. 2002; 16 (13): 1627-1639


    Cell division is the result of two major cytoskeletal events: partition of the chromatids by the mitotic spindle and cleavage of the cell by the cytokinetic apparatus. Spatial coordination of these events ensures that each daughter cell inherits a nucleus. Here we show that, in budding yeast, capture and shrinkage of astral microtubules at the bud neck is required to position the spindle relative to the cleavage apparatus. Capture required the septins and the microtubule-associated protein Kar9. Like Kar9-defective cells, cells lacking the septin ring failed to position their spindle correctly and showed an increased frequency of nuclear missegregation. Microtubule attachment at the bud neck was followed by shrinkage and a pulling action on the spindle. Enhancement of microtubule shrinkage at the bud neck required the Par-1-related, septin-dependent kinases (SDK) Hsl1 and Gin4. Neither the formin Bnr1 nor the actomyosin contractile ring was required for either microtubule capture or microtubule shrinkage. Together, our results indicate that septins and septin-dependent kinases may coordinate microtubule and actin functions in cell division.

    View details for DOI 10.1101/gad.222602

    View details for Web of Science ID 000176679100004

    View details for PubMedID 12101122

  • Large-scale identification of genes important for apical growth in Saccharomyces cerevisiae by directed allele replacement technology (DART) screening. Functional & integrative genomics Bidlingmaier, S., Snyder, M. 2002; 1 (6): 345-356


    In Saccharomyces cerevisiae, apical bud growth occurs for a brief period in G1 when the deposition of membrane and cell wall is restricted to the tip of the growing bud. To identify genes important for apical bud growth, we have utilized a novel transposon-based mutagenesis system termed DART (Directed Allele Replacement Technology) that allows the rapid transfer of defined insertion alleles into any strain background. A total of 4,810 insertion alleles affecting 1,392 different yeast genes were transferred into a cdc34-2 mutant strain that arrests in the apical growth phase when grown at the restrictive temperature of 37 degrees C. We identified 29 insertion alleles, containing mutations in 17 different genes ( SMY1, SPA2, PAN1, SLA1, SLA2, CBK1, SEC22, FAB1, VPS36, VID22, RAS2, ECM33, OPI3, API1/YDR372c, API2/YDR525w, API3/YKR020w, and API4/YNL051w), which alter the elongated bud morphology of cdc34-2 cells arrested in the apical growth phase. Upon treatment with mating pheromone at 25 degrees C, cells containing insertion alleles affecting ten of these genes ( SMY1, SPA2, PAN1, SLA1, SLA2, CBK1, FAB1, VPS36, VID22, and API2/YDR525w) form abnormal mating projections. Additionally, cells containing insertion alleles affecting SEC22, RAS2, API1/YDR372c, API3/YKR020w,and API4/YNL051display severe mating projection formation defects at the elevated temperature of 37 degrees C. DART mutagenesis has many advantages over traditional mutagenesis methods and will be a useful tool for dissecting gene networks important for biological processes.

    View details for PubMedID 11957109

  • Bud-site selection and cell polarity in budding yeast CURRENT OPINION IN MICROBIOLOGY Casamayor, A., Snyder, M. 2002; 5 (2): 179-186


    Polarized growth involves a hierarchy of events such as selection of the growth site, polarization of the cytoskeleton to the selected growth site, and transport of secretory vesicles containing components required for growth. The budding yeast Saccharomyces cerevisiae is an excellent model system for the study of polarized cell growth. A large number of proteins have been found to be involved in these processes, although their mechanisms of action are not yet well-understood. Recent discoveries have helped elucidate many of the processes involved in cell polarity and bud-site selection in yeast and have modified the traditional view of cellular structures involved in these processes. This review focuses on recent advances on the roles of cortical tags, GTPases and the cytoskeleton in the generation and maintenance of cell polarity in yeast.

    View details for Web of Science ID 000175460500009

    View details for PubMedID 11934615

  • 'Omic' approaches for unraveling signaling networks CURRENT OPINION IN CELL BIOLOGY Zhu, H., Snyder, M. 2002; 14 (2): 173-179


    Signaling pathways are crucial for cell differentiation and response to cellular environments. Recently, a large number of approaches for the global analysis of genes and proteins have been described. These have provided important new insights into the components of different pathways and the molecular and cellular responses of these pathways. This review covers genomic and proteomic (collectively referred to as "omic") approaches for the global analysis of cell signaling, including gene expression profiling and analysis, protein-protein interaction methods, protein microarrays, mass spectroscopy and gene-disruption and engineering approaches.

    View details for DOI 10.1016/S0955-0674(02)00315-0

    View details for Web of Science ID 000174193300007

    View details for PubMedID 11891116

  • Carbohydrate analysis prepares to enter the "omics" era CHEMISTRY & BIOLOGY Bidlingmaier, S., Snyder, M. 2002; 9 (4): 400-401


    In this issue, Houseman and Mrksich describe a carbohydrate array preparation method that can be used to analyze protein-carbohydrate interactions and to characterize the substrate specificity of a carbohydrate-modifying enzyme. Carbohydrate chips were prepared by a novel procedure that allows the covalent attachment of carbohydrate-diene conjugates to a specially engineered monolayer surface. The surface presents a precisely controllable ratio of reactive benzoquinone and inert ethylene glycol groups. Nonspecific adsorption of proteins to the surface is extremely low, and the surface is compatible with popular detection techniques. The immobilization technique was demonstrated to be compatible with recently developed automated solid phase carbohydrate synthesis methods, paving the way for the development of highly complex carbohydrate arrays.

    View details for Web of Science ID 000175379100002

    View details for PubMedID 11983329

  • Subcellular localization of the yeast proteome GENES & DEVELOPMENT Kumar, A., Agarwal, S., Heyman, J. A., Matson, S., Heidtman, M., Piccirillo, S., Umansky, L., Drawid, A., Jansen, R., Liu, Y., Cheung, K. H., Miller, P., Gerstein, M., Roeder, G. S., Snyder, M. 2002; 16 (6): 707-719


    Protein localization data are a valuable information resource helpful in elucidating eukaryotic protein function. Here, we report the first proteome-scale analysis of protein localization within any eukaryote. Using directed topoisomerase I-mediated cloning strategies and genome-wide transposon mutagenesis, we have epitope-tagged 60% of the Saccharomyces cerevisiae proteome. By high-throughput immunolocalization of tagged gene products, we have determined the subcellular localization of 2744 yeast proteins. Extrapolating these data through a computational algorithm employing Bayesian formalism, we define the yeast localizome (the subcellular distribution of all 6100 yeast proteins). We estimate the yeast proteome to encompass approximately 5100 soluble proteins and >1000 transmembrane proteins. Our results indicate that 47% of yeast proteins are cytoplasmic, 13% mitochondrial, 13% exocytic (including proteins of the endoplasmic reticulum and secretory vesicles), and 27% nuclear/nucleolar. A subset of nuclear proteins was further analyzed by immunolocalization using surface-spread preparations of meiotic chromosomes. Of these proteins, 38% were found associated with chromosomal DNA. As determined from phenotypic analyses of nuclear proteins, 34% are essential for spore viability--a percentage nearly twice as great as that observed for the proteome as a whole. In total, this study presents experimentally derived localization data for 955 proteins of previously unknown function: nearly half of all functionally uncharacterized proteins in yeast. To facilitate access to these data, we provide a searchable database featuring 2900 fluorescent micrographs at

    View details for DOI 10.1101/gad.970902

    View details for Web of Science ID 000174516500007

    View details for PubMedID 11914276

  • GATA-1 binding sites mapped in the beta-globin locus by using mammalian chlp-chip analysis PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Horak, C. E., Mahajan, M. C., Luscombe, N. M., Gerstein, M., Weissman, S. M., Snyder, M. 2002; 99 (5): 2924-2929


    The expression of the beta-like globin genes is intricately regulated by a series of both general and tissue-restricted transcription factors. The hemapoietic lineage-specific transcription factor GATA-1 is important for erythroid differentiation and has been implicated in regulating the expression of the erythroid-specific genes including the genes of the beta-globin locus. In the human erythroleukemic K562 cell line, only one DNA region has been identified previously as a putative site of GATA-1 interaction by in vivo footprinting studies. We mapped GATA-1 binding throughout the beta-globin locus by using chIp-chip analysis of K562 cells. We found that GATA-1 binds in a region encompassing the HS2 core element, as was previously identified, and an additional region of GATA-1 binding upstream of the gammaG gene. This approach will be of general utility for mapping transcription factor binding sites within the beta-globin locus and throughout the genome.

    View details for DOI 10.1073/pnas.052706999

    View details for Web of Science ID 000174284600061

    View details for PubMedID 11867748

  • A question of size: the eukaryotic proteome and the problems in defining it NUCLEIC ACIDS RESEARCH Harrison, P. M., Kumar, A., Lang, N., Snyder, M., Gerstein, M. 2002; 30 (5): 1083-1090


    We discuss the problems in defining the extent of the proteomes for completely sequenced eukaryotic organisms (i.e. the total number of protein-coding sequences), focusing on yeast, worm, fly and human. (i) Six years after completion of its genome sequence, the true size of the yeast proteome is still not defined. New small genes are still being discovered, and a large number of existing annotations are being called into question, with these questionable ORFs (qORFs) comprising up to one-fifth of the 'current' proteome. We discuss these in the context of an ideal genome-annotation strategy that considers the proteome as a rigorously defined subset of all possible coding sequences ('the orfome'). (ii) Despite the greater apparent complexity of the fly (more cells, more complex physiology, longer lifespan), the nematode worm appears to have more genes. To explain this, we compare the annotated proteomes of worm and fly, relating to both genome-annotation and genome evolution issues. (iii) The unexpectedly small size of the gene complement estimated for the complete human genome provoked much public debate about the nature of biological complexity. However, in the first instance, for the human genome, the relationship between gene number and proteome size is far from simple. We survey the current estimates for the numbers of human genes and, from this, we estimate a range for the size of the human proteome. The determination of this is substantially hampered by the unknown extent of the cohort of pseudogenes ('dead' genes), in combination with the prevalence of alternative splicing. (Further information relating to yeast is available at

    View details for Web of Science ID 000174229900001

    View details for PubMedID 11861898

  • A small reservoir of disabled ORFs in the yeast genome and its implications for the dynamics of proteome evolution JOURNAL OF MOLECULAR BIOLOGY Harrison, P., Kumar, A., Lan, N., Echols, N., Snyder, M., Gerstein, M. 2002; 316 (3): 409-419


    We surveyed the sequenced Saccharomyces cerevisiae genome (strain S288C) comprehensively for open reading frames (ORFs) that could encode full-length proteins but contain obvious mid-sequence disablements (frameshifts or premature stop codons). These pseudogenic features are termed disabled ORFs (dORFs). Using homology to annotated yeast ORFs and non-yeast proteins plus a simple region extension procedure, we have found 183 dORFs. Combined with the 38 existing annotations for potential dORFs, we have a total pool of up to 221 dORFs, corresponding to less than approximately 3% of the proteome. Additionally, we found 20 pairs of annotated ORFs for yeast that could be merged into a single ORF (termed a mORF) by read-through of the intervening stop codon, and may comprise a complete ORF in other yeast strains. Focussing on a core pool of 98 dORFs with a verifying protein homology, we find that most dORFs are substantially decayed, with approximately 90% having two or more disablements, and approximately 60% having four or more. dORFs are much more yeast-proteome specific than live yeast genes (having about half the chance that they are related to a non-yeast protein). They show a dramatically increased density at the telomeres of chromosomes, relative to genes. A microarray study shows that some dORFs are expressed even though they carry multiple disablements, and thus may be more resistant to nonsense-mediated decay. Many of the dORFs may be involved in responding to environmental stresses, as the largest functional groups include growth inhibition, flocculation, and the SRP/TIP1 family. Our results have important implications for proteome evolution. The characteristics of the dORF population suggest the sorts of genes that are likely to fall in and out of usage (and vary in copy number) in a strain-specific way and highlight the role of subtelomeric regions in engendering this diversity. Our results also have important implications for the effects of the [PSI+] prion. The dORFs disabled by only a single stop and the mORFs (together totalling 35) provide an estimate for the extent of the sequence population that can be resurrected readily through the demonstrated ability of the [PSI+] prion to cause nonsense-codon read-through. Also, the dORFs and mORFs that we find have properties (e.g. growth inhibition, flocculation, vanadate resistance, stress response) that are potentially related to the ability of [PSI+] to engender substantial phenotypic variation in yeast strains under different environmental conditions. (See for further information.)

    View details for DOI 10.1006/jmbi.2001.5343

    View details for Web of Science ID 000174216400001

    View details for PubMedID 11866506

  • An integrated approach for finding overlooked genes in yeast NATURE BIOTECHNOLOGY Kumar, A., Harrison, P. M., Cheung, K. H., Lan, N., Echols, N., Bertone, P., Miller, P., Gerstein, M. B., Snyder, M. 2002; 20 (1): 58-63


    We report here the discovery of 137 previously unappreciated genes in yeast through a widely applicable and highly scalable approach integrating methods of gene-trapping, microarray-based expression analysis, and genome-wide homology searching. Our approach is a multistep process in which expressed sequences are first trapped using a modified transposon that produces protein fusions to beta-galactosidase (beta-gal); non-annotated open reading frames (ORFs) translated as beta-gal chimeras are selected as a candidate pool of potential genes. To verify expression of these sequences, labeled RNA is hybridized against a microarray of oligonucleotides designed to detect gene transcripts in a strand-specific manner. In complement to this experimental method, novel genes are also identified in silico by homology to previously annotated proteins. As these methods are capable of identifying both short ORFs and antisense ORFs, our approach provides an effective supplement to current gene-finding schemes. In total, the genes discovered using this approach constitute 2% of the yeast genome and represent a wealth of overlooked biology.

    View details for Web of Science ID 000173031600037

    View details for PubMedID 11753363

  • Insertional mutagenesis: Transposon-insertion libraries as mutagens in yeast GUIDE TO YEAST GENETICS AND MOLECULAR AND CELL BIOLOGY, PT B Kumar, A., Vidan, S., Snyder, M. 2002; 350: 219-229

    View details for Web of Science ID 000176466300012

    View details for PubMedID 12073314

  • ChIP-chip: A genomic approach for identifying transcription factor binding sites GUIDE TO YEAST GENETICS AND MOLECULAR AND CELL BIOLOGY, PT B Horak, C. E., Snyder, M. 2002; 350: 469-483

    View details for Web of Science ID 000176466300026

    View details for PubMedID 12073330

  • The TRIPLES database: a community resource for yeast molecular biology NUCLEIC ACIDS RESEARCH Kumar, A., Cheung, K. H., Tosches, N., Masiar, P., Liu, Y., Miller, P., Snyder, M. 2002; 30 (1): 73-75


    TRIPLES is a web-accessible database of TRansposon-Insertion Phenotypes, Localization and Expression in Saccharomyces cerevisiae-a relational database housing nearly half a million data points generated from an ongoing study using large-scale transposon mutagenesis to characterize gene function in yeast. At present, TRIPLES contains three principal data sets (i.e. phenotypic data, protein localization data and expression data) for over 3500 annotated yeast genes as well as several hundred non-annotated open reading frames. In addition, the TRIPLES web site provides online order forms linked to each data set so that users may request any strain or reagent generated from this project free of charge. In response to user requests, the TRIPLES web site has undergone several recent modifications. Our localization data have been supplemented with approximately 500 fluorescent micrographs depicting actual staining patterns observed upon indirect immunofluorescence analysis of indicated epitope-tagged proteins. These localization data, as well as all other data sets within TRIPLES, are now available in full as tab-delimited text. To accommodate increased reagent requests, all orders are now cataloged in a separate database, and users are notified immediately of order receipt and shipment. Also, TRIPLES is one of five sites incorporated into the new functional analysis tool Function Junction provided by the Saccharomyces Genome Database. TRIPLES may be accessed from the Yale Genome Analysis Center (YGAC) homepage at

    View details for Web of Science ID 000173077100018

    View details for PubMedID 11752258

  • YMD: A microarray database for large-scale gene expression analysis Annual Symposium of the American-Medical-Informatics-Association Cheung, K. H., White, K., Hager, J., Gerstein, M., Reinke, V., Nelson, K., Masiar, P., Srivastava, R., Li, Y. L., Li, J., Zhao, H. Y., Li, J. M., Allison, D. B., Snyder, M., Miller, P., Williams, K. HANLEY & BELFUS INC MED PUBLISHERS. 2002: 140–144


    The use of microarray technology to perform parallel analysis of the expression pattern of a large number of genes in a single experiment has created a new frontier of medical research. The vast amount of gene expression data generated from multiple microarray experiments requires a robust database system that allows efficient data storage, retrieval, secure access, data dissemination, and integrated data analyses. To address the growing needs of microarray researchers at Yale and their collaborators, we have built the Yale Microarray Database (YMD). YMD is Web-accessible with the following features: (i) a Web program that tracks DNA samples between source plates and arrays, (ii) the capability of finding common genes/clones across different array platforms, (iii) an image file server, (iv) laboratory-based user management and access privileges, (v) project management, (vi) template data entry, (vii) linking gene expression data to annotation databases for functional analysis. YMD is currently being used on a pilot basis by several laboratories for different organisms and array platforms.

    View details for Web of Science ID 000189418100029

    View details for PubMedID 12463803

  • Phosphorylation of gamma-tubulin regulates microtubule organization in budding yeast DEVELOPMENTAL CELL Vogel, J., Drapkin, B., Oomen, J., Beach, D., Bloom, K., Snyder, M. 2001; 1 (5): 621-631


    gamma-Tubulin is essential for microtubule nucleation in yeast and other organisms; whether this protein is regulated in vivo has not been explored. We show that the budding yeast gamma-tubulin (Tub4p) is phosphorylated in vivo. Hyperphosphorylated Tub4p isoforms are restricted to G1. A conserved tyrosine near the carboxy terminus (Tyr445) is required for phosphorylation in vivo. A point mutation, Tyr445 to Asp, causes cells to arrest prior to anaphase. The frequency of new microtubules appearing in the SPB region and the number of microtubules are increased in tub4-Y445D cells, suggesting this mutation promotes microtubule assembly. These data suggest that modification of gamma-tubulin is important for controlling microtubule number, thereby influencing microtubule organization and function during the yeast cell cycle.

    View details for Web of Science ID 000175301700008

    View details for PubMedID 11709183

  • A filamentous growth response mediated by the yeast mating pathway GENETICS Erdman, S., Snyder, M. 2001; 159 (3): 919-928


    Haploid cells of the budding yeast Saccharomyces cerevisiae respond to mating pheromones by arresting their cell-division cycle in G1 and differentiating into a cell type capable of locating and fusing with mating partners. Yeast cells undergo chemotactic cell surface growth when pheromones are present above a threshold level for morphogenesis; however, the morphogenetic responses of cells to levels of pheromone below this threshold have not been systematically explored. Here we show that MATa haploid cells exposed to low levels of the alpha-factor mating pheromone undergo a novel cellular response: cells modulate their division patterns and cell shape, forming colonies composed of filamentous chains of cells. Time-lapse analysis of filament formation shows that its dynamics are distinct from that of pseudohyphal growth; during pheromone-induced filament formation, daughter cells are delayed relative to mother cells with respect to the timing of bud emergence. Filament formation requires the RSR1(BUD1), BUD8, SLK1/BCK1, and SPA2 genes and many elements of the STE11/STE7 MAP kinase pathway; this response is also independent of FAR1, a gene involved in orienting cell polarization during the mating response. We suggest that mating yeast cells undergo a complex response to low levels of pheromone that may enhance the ability of cells to search for mating partners through the modification of cell shape and alteration of cell-division patterns.

    View details for Web of Science ID 000172665800002

    View details for PubMedID 11729141

  • Global analysis of protein activities using proteome chips SCIENCE Zhu, H., Bilgin, M., Bangham, R., Hall, D., Casamayor, A., Bertone, P., Lan, N., Jansen, R., Bidlingmaier, S., Houfek, T., Mitchell, T., Miller, P., Dean, R. A., Gerstein, M., Snyder, M. 2001; 293 (5537): 2101-2105


    To facilitate studies of the yeast proteome, we cloned 5800 open reading frames and overexpressed and purified their corresponding proteins. The proteins were printed onto slides at high spatial density to form a yeast proteome microarray and screened for their ability to interact with proteins and phospholipids. We identified many new calmodulin- and phospholipid-interacting proteins; a common potential binding motif was identified for many of the calmodulin-binding proteins. Thus, microarrays of an entire eukaryotic proteome can be prepared and screened for diverse biochemical activities. The microarrays can also be used to screen protein-drug interactions and to detect posttranslational modifications.

    View details for Web of Science ID 000171028700077

    View details for PubMedID 11474067

  • A genomic study of the bipolar bud site selection pattern in Saccharomyces cerevisiae MOLECULAR BIOLOGY OF THE CELL Ni, L., Snyder, M. 2001; 12 (7): 2147-2170


    A genome-wide screen of 4168 homozygous diploid yeast deletion strains has been performed to identify nonessential genes that participate in the bipolar budding pattern. By examining bud scar patterns representing the sites of previous cell divisions, 127 mutants representing three different phenotypes were found: unipolar, axial-like, and random. From this screen, 11 functional classes of known genes were identified, including those involved in actin-cytoskeleton organization, general bud site selection, cell polarity, vesicular transport, cell wall synthesis, protein modification, transcription, nuclear function, translation, and other functions. Four characterized genes that were not known previously to participate in bud site selection were also found to be important for the haploid axial budding pattern. In addition to known genes, we found 22 novel genes (20 are designated BUD13-BUD32) important for bud site selection. Deletion of one resulted in unipolar budding exclusively from the proximal pole, suggesting that this gene plays an important role in diploid distal budding. Mutations in 20 other novel BUD genes produced a random budding phenotype and one produced an axial-like budding defect. Several of the novel Bud proteins were fused to green fluorescence protein; two proteins were found to localize to sites of polarized cell growth (i.e., the bud tip in small budded cells and the neck in cells undergoing cytokinesis), similar to that postulated for the bipolar signals and proteins that target cell division site tags to their proper location in the cell. Four others localized to the nucleus, suggesting that they play a role in gene expression. The bipolar distal marker Bud8 was localized in a number of mutants; many showed an altered Bud8-green fluorescence protein localization pattern. Through the genome-wide identification and analysis of different mutants involved in bipolar bud site selection, an integrated pathway for this process is presented in which proximal and distal bud site selection tags are synthesized and localized at their appropriate poles, thereby directing growth at those sites. Genome-wide screens of defined collections of mutants hold significant promise for dissecting many biological processes in yeast.

    View details for Web of Science ID 000170350300019

    View details for PubMedID 11452010

  • Genome-wide transposon mutagenesis in yeast. Current protocols in molecular biology / edited by Frederick M. Ausubel ... [et al.] Kumar, A., Snyder, M. 2001; Chapter 13: Unit13 3-?


    This unit provides comprehensive protocols for the use of insertional libraries generated by shuttle mutagenesis. From the basic protocol, a small aliquot of insertional library DNA may be used to mutagenize yeast, producing strains containing a single transposon insertion within a transcribed and translated region of the genome. This transposon-mutagenized bank of yeast strains may be screened for any desired mutant phenotype. Alternatively, since the transposon contains a reporter gene lacking its start codon and promoter, transposon-tagged strains may also be screened for specific patterns of gene expression. Strains of interest may be characterized by vectorette PCR (protocol provided) in order to locate the precise genomic site of transposon insertion within each mutant. A method by which Cre/lox recombination may be used to reduce the transposon in yeast to a small insertion element encoding an epitope tag is described. This tag serves as a tool by which transposon-mutagenized gene products may be analyzed further (e.g., localized to a discrete subcellular site).

    View details for DOI 10.1002/0471142727.mb1303s51

    View details for PubMedID 18265099

  • The Cbk1p pathway is important for polarized cell growth and cell separation in Saccharomyces cerevisiae MOLECULAR AND CELLULAR BIOLOGY Bidlingmaier, S., Weiss, E. L., Seidel, C., Drubin, D. G., Snyder, M. 2001; 21 (7): 2449-2462


    During the early stages of budding, cell wall remodeling and polarized secretion are concentrated at the bud tip (apical growth). The CBK1 gene, encoding a putative serine/threonine protein kinase, was identified in a screen designed to isolate mutations that affect apical growth. Analysis of cbk1Delta cells reveals that Cbk1p is required for efficient apical growth, proper mating projection morphology, bipolar bud site selection in diploid cells, and cell separation. Epitope-tagged Cbk1p localizes to both sides of the bud neck in late anaphase, just prior to cell separation. CBK1