Is it possible to understand the molecular structure and function of proteins and nucleic acids in enough detail to make accurate predictions about structure and function? We are mounting a two-pronged attack on this problem using both molecular dynamics simulation and molecular modeling. (i) Simulation attempts to reproduce the structural, thermodynamic and dynamic properties of a macromolecule in as accurate a way as possible. Starting with simple but realistic expressions for the interactions between atoms and classical laws of motion, we calculate a trajectory that specifies the position and velocity of every atom as a function of time. The time-step between calculated structures is small at 10-15 seconds, and we need to reduce hundreds of thousands of sets of atomic coordinates into a simple coherent description. We have simulated with reasonable fidelity the measurable static and dynamic properties of the several different proteins surrounded by thousands of water molecules. Simulation at different temperatures has allowed exploration of the pathways of protein denaturation of entire proteins and small fragments of protein secondary structure (a-helices and ß-hairpins). Companion studies of DNA double-helix segments in solution preserve the classical double helix while still showing a wide repertoire of interesting motions. (ii) Molecular modeling attempts to build a model of a macromolecule using known three-dimensional structures and energy minimization as complementary guidelines. Specific examples of this work include the automatic modeling of antibody variable domains, the general modeling of homologous proteins and studies of DNA base-pair mismatches. Questions we are trying to answer include: How can a protein be stabilized by a single amino acid change? How does the sequence of DNA cause local variations of double-helix conformation and stability? Extensive use is made of sophisticated programming, sequence and structural data bases, and computer graphics.

Academic Appointments

Administrative Appointments

  • Chair, Department of Structural Biology (1993 - 2004)
  • Associate Chair, Department of Structural Biology (2005 - 2010)

Honors & Awards

  • Member, European Molecular Biology Organization. (1981)
  • Fellow, The Royal Society (2001)
  • Member, The US National Academy of Science (2002)
  • Member, American Academy of Arts & Sciences (2010)
  • Anniversary Prize, Federation of European Biochemical Societies (1986)
  • Editorial Board, Structure and Current Opinion in Structural Biology (1992)
  • Editor, Journal of Molecular Biology (2001)
  • Co-director of Program in Mathematics and Molecular Biology, Mathematics and Molecular Biology (1997-2002)
  • Member, Editorial Board Proc. Natl. Acad. Sci. USA (2002)
  • Blaise Pascal Professor of Research, Fondation de l'Ecole Normale Superieure, Paris, France (2003-2004)

Boards, Advisory Committees, Professional Organizations

  • Member, National Academy of Sciences (2013 - Present)

Professional Education

  • PhD, Gonville and Caius College, Cambridge, Structural Biology (1971)

Research & Scholarship

Current Research and Scholarly Interests

I pioneered of computational biology setting up the conceptual and theoretical framework for a field that I am still actively involved in at all levels. More specifically, I still write and maintain computer programs of all types including large simulation packages and molecular graphics interfaces. I have also developed a high-level of expertise in Perl scripting, as well as in the advanced use of the Office Suite of programs (Word, Excel and PowerPoint), which is more important and rare than it may seem. My research focuses on three different but inter-related areas of research. First, we are interested in predicting the folding of a polypeptide chain into a protein with a unique native-structure with particular emphasis on how the hydrophobic forces affect the pathway. We expect hydrophobic interactions to energetically favor structure that are more native-like. In this way, the same stabilizing interactions that exist in the final folded state the search tractable. Second we are interested in predicting protein structure from sequence without regard for the process of folding. Such prediction relies on the well-established paradigms that similar protein sequences imply similar three-dimensional structures. We have focused on the hardest problem in homology modeling: the refinement of a near-native structure to make it more precisely like the actual native structure of protein. We have also focused on how the general similarity of all protein sequences resulting from their evolution from common ancestor sequence affects the nature of the protein universe. Third, we are focusing on mesoscale modeling of large macromolecular complexes such as RNA polymerase and the mammalian chaperonin. In this work, done in close collaboration with experimentalists, we use new morphing strategies combined with normal mode analysis in torsion angle space to overcome problems caused by the size and complexity of these critical, biomedically important systems. All this work depends on the way a molecular structure is represented in terms of the force-field that allows calculation of the potential energy of the system. We employ a very wide variety of such energy functions that extend from knowledge-based statistical potentials for a single interaction center per residue to quantum-mechanical force-fields that include inductive effects as well as polarization.


2017-18 Courses

Graduate and Fellowship Programs


All Publications

  • The language of the protein universe CURRENT OPINION IN GENETICS & DEVELOPMENT Scaiewicz, A., Levitt, M. 2015; 35: 50-56

    View details for DOI 10.1016/j.gde.2015.08.010

    View details for Web of Science ID 000366900600008

    View details for PubMedID 26451980

  • Birth and Future of Multiscale Modeling for Macromolecular Systems (Nobel Lecture) ANGEWANDTE CHEMIE-INTERNATIONAL EDITION Levitt, M. 2014; 53 (38): 10006-10018

    View details for DOI 10.1002/anie.201403691

    View details for Web of Science ID 000342761700002

    View details for PubMedID 25100216

  • WeFold: A coopetition for protein structure prediction PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS Khoury, G. A., Liwo, A., Khatib, F., Zhou, H., Chopra, G., Bacardit, J., Bortot, L. O., Faccioli, R. A., Deng, X., He, Y., Krupa, P., Li, J., Mozolewska, M. A., Sieradzan, A. K., Smadbeck, J., Wirecki, T., Cooper, S., Flatten, J., Xu, K., Baker, D., Cheng, J., Delbem, A. C., Floudas, C. A., Keasar, C., Levitt, M., Popovic, Z., Scheraga, H. A., Skolnick, J., Crivelli, S. N., Players, F. 2014; 82 (9): 1850-1868


    The protein structure prediction problem continues to elude scientists. Despite the introduction of many methods, only modest gains were made over the last decade for certain classes of prediction targets. To address this challenge, a social-media based worldwide collaborative effort, named WeFold, was undertaken by 13 labs. During the collaboration, the laboratories were simultaneously competing with each other. Here, we present the first attempt at "coopetition" in scientific research applied to the protein structure prediction and refinement problems. The coopetition was possible by allowing the participating labs to contribute different components of their protein structure prediction pipelines and create new hybrid pipelines that they tested during CASP10. This manuscript describes both successes and areas needing improvement as identified throughout the first WeFold experiment and discusses the efforts that are underway to advance this initiative. A footprint of all contributions and structures are publicly accessible at

    View details for DOI 10.1002/prot.24538

    View details for Web of Science ID 000340940300014

    View details for PubMedID 24677212

  • Deformable elastic network refinement for low-resolution macromolecular crystallography. Acta crystallographica. Section D, Biological crystallography Schröder, G. F., Levitt, M., Brunger, A. T. 2014; 70: 2241-2255


    Crystals of membrane proteins and protein complexes often diffract to low resolution owing to their intrinsic molecular flexibility, heterogeneity or the mosaic spread of micro-domains. At low resolution, the building and refinement of atomic models is a more challenging task. The deformable elastic network (DEN) refinement method developed previously has been instrumental in the determinion of several structures at low resolution. Here, DEN refinement is reviewed, recommendations for its optimal usage are provided and its limitations are discussed. Representative examples of the application of DEN refinement to challenging cases of refinement at low resolution are presented. These cases include soluble as well as membrane proteins determined at limiting resolutions ranging from 3 to 7 Å. Potential extensions of the DEN refinement technique and future perspectives for the interpretation of low-resolution crystal structures are also discussed.

    View details for DOI 10.1107/S1399004714016496

    View details for PubMedID 25195739

  • Deformable elastic network refinement for low-resolution macromolecular crystallography ACTA CRYSTALLOGRAPHICA SECTION D-BIOLOGICAL CRYSTALLOGRAPHY Schroeder, G. F., Levitt, M., Brunger, A. T. 2014; 70: 2241-2255
  • Redundancy-weighting for better inference of protein structural features. Bioinformatics Yanover, C., Vanetik, N., Levitt, M., Kolodny, R., Keasar, C. 2014; 30 (16): 2295-2301


    Structural knowledge, extracted from the Protein Data Bank (PDB), underlies numerous potential functions and prediction methods. The PDB, however, is highly biased: many proteins have more than one entry, while entire protein families are represented by a single structure, or even not at all. The standard solution to this problem is to limit the studies to non-redundant subsets of the PDB. While alleviating biases, this solution hides the many-to-many relations between sequences and structures. That is, non-redundant datasets conceal the diversity of sequences that share the same fold and the existence of multiple conformations for the same protein. A particularly disturbing aspect of non-redundant subsets is that they hardly benefit from the rapid pace of protein structure determination, as most newly solved structures fall within existing families.In this study we explore the concept of redundancy-weighted datasets, originally suggested by Miyazawa and Jernigan. Redundancy-weighted datasets include all available structures and associate them (or features thereof) with weights that are inversely proportional to the number of their homologs. Here, we provide the first systematic comparison of redundancy-weighted datasets with non-redundant ones. We test three weighting schemes and show that the distributions of structural features that they produce are smoother (having higher entropy) compared with the distributions inferred from non-redundant datasets. We further show that these smoothed distributions are both more robust and more correct than their non-redundant counterparts. We suggest that the better distributions, inferred using redundancy-weighting, may improve the accuracy of knowledge-based potentials and increase the power of protein structure prediction methods. Consequently, they may enhance model-driven molecular or

    View details for DOI 10.1093/bioinformatics/btu242

    View details for PubMedID 24771517

  • Millisecond dynamics of RNA polymerase II translocation at atomic resolution PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Silva, D., Weiss, D. R., Avila, F. P., Da, L., Levitt, M., Wang, D., Huang, X. 2014; 111 (21): 7665-7670


    Transcription is a central step in gene expression, in which the DNA template is processively read by RNA polymerase II (Pol II), synthesizing a complementary messenger RNA transcript. At each cycle, Pol II moves exactly one register along the DNA, a process known as translocation. Although X-ray crystal structures have greatly enhanced our understanding of the transcription process, the underlying molecular mechanisms of translocation remain unclear. Here we use sophisticated simulation techniques to observe Pol II translocation on a millisecond timescale and at atomistic resolution. We observe multiple cycles of forward and backward translocation and identify two previously unidentified intermediate states. We show that the bridge helix (BH) plays a key role accelerating the translocation of both the RNA:DNA hybrid and transition nucleotide by directly interacting with them. The conserved BH residues, Thr831 and Tyr836, mediate these interactions. To date, this study delivers the most detailed picture of the mechanism of Pol II translocation at atomic level.

    View details for DOI 10.1073/pnas.1315751111

    View details for Web of Science ID 000336411300044

    View details for PubMedID 24753580

  • Training-free atomistic prediction of nucleosome occupancy PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Minary, P., Levitt, M. 2014; 111 (17): 6293-6298


    Nucleosomes alter gene expression by preventing transcription factors from occupying binding sites along DNA. DNA methylation can affect nucleosome positioning and so alter gene expression epigenetically (without changing DNA sequence). Conventional methods to predict nucleosome occupancy are trained on observed DNA sequence patterns or known DNA oligonucleotide structures. They are statistical and lack the physics needed to predict subtle epigenetic changes due to DNA methylation. The training-free method presented here uses physical principles and state-of-the-art all-atom force fields to predict both nucleosome occupancy along genomic sequences as well as binding to known positioning sequences. Our method calculates the energy of both nucleosomal and linear DNA of the given sequence. Based on the DNA deformation energy, we accurately predict the in vitro occupancy profile observed experimentally for a 20,000-bp genomic region as well as the experimental locations of nucleosomes along 13 well-established positioning sequence elements. DNA with all C bases methylated at the 5 position shows less variation of nucleosome binding: Strong binding is weakened and weak binding is strengthened compared with normal DNA. Methylation also alters the preference of nucleosomes for some positioning sequences but not others.

    View details for DOI 10.1073/pnas.1404475111

    View details for Web of Science ID 000335199000052

    View details for PubMedID 24733939

  • Architecture of an RNA Polymerase II Transcription Pre-Initiation Complex SCIENCE Murakami, K., Elmlund, H., Kalisman, N., Bushnell, D. A., Adams, C. M., Azubel, M., Elmlund, D., Levi-Kalisman, Y., Liu, X., Gibbons, B. J., Levitt, M., Kornberg, R. D. 2013; 342 (6159): 709-?
  • Architecture of an RNA polymerase II transcription pre-initiation complex. Science Murakami, K., Elmlund, H., Kalisman, N., Bushnell, D. A., Adams, C. M., Azubel, M., Elmlund, D., Levi-Kalisman, Y., Liu, X., Gibbons, B. J., Levitt, M., Kornberg, R. D. 2013; 342 (6159): 1238724-?


    The protein density and arrangement of subunits of a complete, 32-protein, RNA polymerase II (pol II) transcription pre-initiation complex (PIC) were determined by means of cryogenic electron microscopy and a combination of chemical cross-linking and mass spectrometry. The PIC showed a marked division in two parts, one containing all the general transcription factors (GTFs) and the other pol II. Promoter DNA was associated only with the GTFs, suspended above the pol II cleft and not in contact with pol II. This structural principle of the PIC underlies its conversion to a transcriptionally active state; the PIC is poised for the formation of a transcription bubble and descent of the DNA into the pol II cleft.

    View details for DOI 10.1126/science.1238724

    View details for PubMedID 24072820

  • The crystal structures of the eukaryotic chaperonin CCT reveal its functional partitioning. Structure Kalisman, N., Schröder, G. F., Levitt, M. 2013; 21 (4): 540-549


    In eukaryotes, CCT is essential for the correct and efficient folding of many cytosolic proteins, most notably actin and tubulin. Structural studies of CCT have been hindered by the failure of standard crystallographic analysis to resolve its eight different subunit types at low resolutions. Here, we exhaustively assess the R value fit of all possible CCT models to available crystallographic data of the closed and open forms with resolutions of 3.8 Å and 5.5 Å, respectively. This unbiased analysis finds the native subunit arrangements with overwhelming significance. The resulting structures provide independent crystallographic proof of the subunit arrangement of CCT and map major asymmetrical features of the particle onto specific subunits. The actin and tubulin substrates both bind around subunit CCT6, which shows other structural anomalies. CCT is thus clearly partitioned, both functionally and evolutionary, into a substrate-binding side that is opposite to the ATP-hydrolyzing side.

    View details for DOI 10.1016/j.str.2013.01.017

    View details for PubMedID 23478063

  • The Crystal Structures of the Eukaryotic Chaperonin CCT Reveal Its Functional Partitioning STRUCTURE Kalisman, N., Schroeder, G. F., Levitt, M. 2013; 21 (4): 540-549

    View details for DOI 10.1016/j.str.2013.01.017

    View details for Web of Science ID 000317800100004

    View details for PubMedID 23478063

  • On the Universe of Protein Folds ANNUAL REVIEW OF BIOPHYSICS, VOL 42 Kolodny, R., Pereyaslavets, L., Samson, A. O., Levitt, M. 2013; 42: 559-582


    In the fifty years since the first atomic structure of a protein was revealed, tens of thousands of additional structures have been solved. Like all objects in biology, proteins structures show common patterns that seem to define family relationships. Classification of proteins structures, which started in the 1970s with about a dozen structures, has continued with increasing enthusiasm, leading to two main fold classifications, SCOP and CATH, as well as many additional databases. Classification is complicated by deciding what constitutes a domain, the fundamental unit of structure. Also difficult is deciding when two given structures are similar. Like all of biology, fold classification is beset by exceptions to all rules. Thus, the perspectives of protein fold space that the fold classifications offer differ from each other. In spite of these ambiguities, fold classifications are useful for prediction of structure and function. Studying the characteristics of fold space can shed light on protein evolution and the physical laws that govern protein behavior.

    View details for DOI 10.1146/annurev-biophys-083012-130432

    View details for Web of Science ID 000321695700025

    View details for PubMedID 23527781

  • Evolutionarily consistent families in SCOP: sequence, structure and function BMC STRUCTURAL BIOLOGY Pethica, R. B., Levitt, M., Gough, J. 2012; 12


    SCOP is a hierarchical domain classification system for proteins of known structure. The superfamily level has a clear definition: Protein domains belong to the same superfamily if there is structural, functional and sequence evidence for a common evolutionary ancestor. Superfamilies are sub-classified into families, however, there is not such a clear basis for the family level groupings. Do SCOP families group together domains with sequence similarity, do they group domains with similar structure or by common function? It is these questions we answer, but most importantly, whether each family represents a distinct phylogenetic group within a superfamily.Several phylogenetic trees were generated for each superfamily: one derived from a multiple sequence alignment, one based on structural distances, and the final two from presence/absence of GO terms or EC numbers assigned to domains. The topologies of the resulting trees and confidence values were compared to the SCOP family classification.We show that SCOP family groupings are evolutionarily consistent to a very high degree with respect to classical sequence phylogenetics. The trees built from (automatically generated) structural distances correlate well, but are not always consistent with SCOP (hand annotated) groupings. Trees derived from functional data are less consistent with the family level than those from structure or sequence, though the majority still agree. Much of GO and EC annotation applies directly to one family or subset of the family; relatively few terms apply at the superfamily level. Maximum sequence diversity within a family is on average 22% but close to zero for superfamilies.

    View details for DOI 10.1186/1472-6807-12-27

    View details for Web of Science ID 000311335600001

    View details for PubMedID 23078280

  • KoBaMIN: a knowledge-based minimization web server for protein structure refinement NUCLEIC ACIDS RESEARCH Rodrigues, J. P., Levitt, M., Chopra, G. 2012; 40 (W1): W323-W328

    View details for DOI 10.1093/nar/gks376

    View details for Web of Science ID 000306670900053

  • KoBaMIN: a knowledge-based minimization web server for protein structure refinement. Nucleic acids research Rodrigues, J. P., Levitt, M., Chopra, G. 2012; 40 (Web Server issue): W323-8


    The KoBaMIN web server provides an online interface to a simple, consistent and computationally efficient protein structure refinement protocol based on minimization of a knowledge-based potential of mean force. The server can be used to refine either a single protein structure or an ensemble of proteins starting from their unrefined coordinates in PDB format. The refinement method is particularly fast and accurate due to the underlying knowledge-based potential derived from structures deposited in the PDB; as such, the energy function implicitly includes the effects of solvent and the crystal environment. Our server allows for an optional but recommended step that optimizes stereochemistry using the MESHI software. The KoBaMIN server also allows comparison of the refined structures with a provided reference structure to assess the changes brought about by the refinement protocol. The performance of KoBaMIN has been benchmarked widely on a large set of decoys, all models generated at the seventh worldwide experiments on critical assessment of techniques for protein structure prediction (CASP7) and it was also shown to produce top-ranking predictions in the refinement category at both CASP8 and CASP9, yielding consistently good results across a broad range of model quality values. The web server is fully functional and freely available at

    View details for DOI 10.1093/nar/gks376

    View details for PubMedID 22564897

  • Multiscale natural moves refine macromolecules using single-particle electron microscopy projection images PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Zhang, J., Minary, P., Levitt, M. 2012; 109 (25): 9845-9850


    The method presented here refines molecular conformations directly against projections of single particles measured by electron microscopy. By optimizing the orientation of the projection at the same time as the conformation, the method is well-suited to two-dimensional class averages from cryoelectron microscopy. Such direct use of two-dimensional images circumvents the need for a three-dimensional density map, which may be difficult to reconstruct from projections due to structural heterogeneity or preferred orientations of the sample on the grid. Our refinement protocol exploits Natural Move Monte Carlo to model a macromolecule as a small number of segments connected by flexible loops, on multiple scales. After tests on artificial data from lysozyme, we applied the method to the Methonococcus maripaludis chaperonin. We successfully refined its conformation from a closed-state initial model to an open-state final model using just one class-averaged projection. We also used Natural Moves to iteratively refine against heterogeneous projection images of Methonococcus maripaludis chaperonin in a mix of open and closed states. Our results suggest a general method for electron microscopy refinement specially suited to macromolecules with significant conformational flexibility. The algorithm is available in the program Methodologies for Optimization and Sampling In Computational Studies.

    View details for DOI 10.1073/pnas.1205945109

    View details for Web of Science ID 000306061400043

    View details for PubMedID 22665770

  • Improving the accuracy of macromolecular structure refinement at 7 Å resolution. Structure Brunger, A. T., Adams, P. D., Fromme, P., Fromme, R., Levitt, M., Schröder, G. F. 2012; 20 (6): 957-966


    In X-ray crystallography, molecular replacement and subsequent refinement is challenging at low resolution. We compared refinement methods using synchrotron diffraction data of photosystem I at 7.4 Å resolution, starting from different initial models with increasing deviations from the known high-resolution structure. Standard refinement spoiled the initial models, moving them further away from the true structure and leading to high R(free)-values. In contrast, DEN refinement improved even the most distant starting model as judged by R(free), atomic root-mean-square differences to the true structure, significance of features not included in the initial model, and connectivity of electron density. The best protocol was DEN refinement with initial segmented rigid-body refinement. For the most distant initial model, the fraction of atoms within 2 Å of the true structure improved from 24% to 60%. We also found a significant correlation between R(free) values and the accuracy of the model, suggesting that R(free) is useful even at low resolution.

    View details for DOI 10.1016/j.str.2012.04.020

    View details for PubMedID 22681901

  • Improving the Accuracy of Macromolecular Structure Refinement at 7 angstrom Resolution STRUCTURE Brunger, A. T., Adams, P. D., Fromme, P., Fromme, R., Levitt, M., Schroeder, G. F. 2012; 20 (6): 957-966
  • Comparative modeling and protein-like features of hydrophobic-polar models on a two-dimensional lattice PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS Moreno-Hernandez, S., Levitt, M. 2012; 80 (6): 1683-1693


    Lattice models of proteins have been extensively used to study protein thermodynamics, folding dynamics, and evolution. Our study considers two different hydrophobic-polar (HP) models on the 2D square lattice: the purely HP model and a model where a compactness-favoring term is added. We exhaustively enumerate all the possible structures in our models and perform the study of their corresponding folds, HP arrangements in space and shapes. The two models considered differ greatly in their numbers of structures, folds, arrangements, and shapes. Despite their differences, both lattice models have distinctive protein-like features: (1) Shapes are compact in both models, especially when a compactness-favoring energy term is added. (2) The residue composition is independent of the chain length and is very close to 50% hydrophobic in both models, as we observe in real proteins. (3) Comparative modeling works well in both models, particularly in the more compact one. The fact that our models show protein-like features suggests that lattice models incorporate the fundamental physical principles of proteins. Our study supports the use of lattice models to study questions about proteins that require exactness and extensive calculations, such as protein design and evolution, which are often too complex and computationally demanding to be addressed with more detailed models.

    View details for DOI 10.1002/prot.24067

    View details for Web of Science ID 000303759000014

    View details for PubMedID 22411636

  • Modeling nucleic acids CURRENT OPINION IN STRUCTURAL BIOLOGY Sim, A. Y., Minary, P., Levitt, M. 2012; 22 (3): 273-278


    Nucleic acids are an important class of biological macromolecules that carry out a variety of cellular roles. For many functions, naturally occurring DNA and RNA molecules need to fold into precise three-dimensional structures. Due to their self-assembling characteristics, nucleic acids have also been widely studied in the field of nanotechnology, and a diverse range of intricate three-dimensional nanostructures have been designed and synthesized. Different physical terms such as base-pairing and stacking interactions, tertiary contacts, electrostatic interactions and entropy all affect nucleic acid folding and structure. Here we review general computational approaches developed to model nucleic acid systems. We focus on four key areas of nucleic acid modeling: molecular representation, potential energy function, degrees of freedom and sampling algorithm. Appropriate choices in each of these key areas in nucleic acid modeling can effectively combine to aid interpretation of experimental data and facilitate prediction of nucleic acid structure.

    View details for DOI 10.1016/

    View details for Web of Science ID 000306347800004

    View details for PubMedID 22538125



    Ribonucleic acid (RNA) molecules play important roles in a variety of biological processes. To properly function, RNA molecules usually have to fold to specific structures, and therefore understanding RNA structure is vital in comprehending how RNA functions. One approach to understanding and predicting biomolecular structure is to use knowledge-based potentials built from experimentally determined structures. These types of potentials have been shown to be effective for predicting both protein and RNA structures, but their utility is limited by their significantly rugged nature. This ruggedness (and hence the potential's usefulness) depends heavily on the choice of bin width to sort structural information (e.g. distances) but the appropriate bin width is not known a priori. To circumvent the binning problem, we compared knowledge-based potentials built from inter-atomic distances in RNA structures using different mixture models (Kernel Density Estimation, Expectation Minimization and Dirichlet Process). We show that the smooth knowledge-based potential built from Dirichlet process is successful in selecting native-like RNA models from different sets of structural decoys with comparable efficacy to a potential developed by spline-fitting - a commonly taken approach - to binned distance histograms. The less rugged nature of our potential suggests its applicability in diverse types of structural modeling.

    View details for DOI 10.1142/S0219720012410107

    View details for Web of Science ID 000302951300009

    View details for PubMedID 22809345

  • Application of DEN refinement and automated model building to a difficult case of molecular-replacement phasing: the structure of a putative succinyl-diaminopimelate desuccinylase from Corynebacterium glutamicum ACTA CRYSTALLOGRAPHICA SECTION D-BIOLOGICAL CRYSTALLOGRAPHY Brunger, A. T., Das, D., Deacon, A. M., Grant, J., Terwilliger, T. C., Read, R. J., Adams, P. D., Levitt, M., Schroeder, G. F. 2012; 68: 391-403


    Phasing by molecular replacement remains difficult for targets that are far from the search model or in situations where the crystal diffracts only weakly or to low resolution. Here, the process of determining and refining the structure of Cgl1109, a putative succinyl-diaminopimelate desuccinylase from Corynebacterium glutamicum, at ∼3 Å resolution is described using a combination of homology modeling with MODELLER, molecular-replacement phasing with Phaser, deformable elastic network (DEN) refinement and automated model building using AutoBuild in a semi-automated fashion, followed by final refinement cycles with phenix.refine and Coot. This difficult molecular-replacement case illustrates the power of including DEN restraints derived from a starting model to guide the movements of the model during refinement. The resulting improved model phases provide better starting points for automated model building and produce more significant difference peaks in anomalous difference Fourier maps to locate anomalous scatterers than does standard refinement. This example also illustrates a current limitation of automated procedures that require manual adjustment of local sequence misalignments between the homology model and the target sequence.

    View details for DOI 10.1107/S090744491104978X

    View details for Web of Science ID 000302138400008

    View details for PubMedID 22505259

  • Modeling and design by hierarchical natural moves PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Sim, A. Y., Levitt, M., Minary, P. 2012; 109 (8): 2890-2895


    We develop a unique algorithm implemented in the program MOSAICS (Methodologies for Optimization and Sampling in Computational Studies) that is capable of nanoscale modeling without compromising the resolution of interest. This is achieved by modeling with customizable hierarchical degrees of freedom, thereby circumventing major limitations of conventional molecular modeling. With the emergence of RNA-based nanotechnology, large RNAs in all-atom representation are used here to benchmark our algorithm. Our method locates all favorable structural states of a model RNA of significant complexity while improving sampling accuracy and increasing speed many fold over existing all-atom RNA modeling methods. We also modeled the effects of sequence mutations on the structural building blocks of tRNA-based nanotechnology. With its flexibility in choosing arbitrary degrees of freedom as well as in allowing different all-atom energy functions, MOSAICS is an ideal tool to model and design biomolecules of the nanoscale.

    View details for DOI 10.1073/pnas.1119918109

    View details for Web of Science ID 000300495100048

    View details for PubMedID 22308445

  • Subunit order of eukaryotic TRiC/CCT chaperonin by cross-linking, mass spectrometry, and combinatorial homology modeling PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Kalisman, N., Adams, C. M., Levitt, M. 2012; 109 (8): 2884-2889


    The TRiC/CCT chaperonin is a 1-MDa hetero-oligomer of 16 subunits that assists the folding of proteins in eukaryotes. Low-resolution structural studies confirmed the TRiC particle to be composed of two stacked octameric rings enclosing a folding cavity. The exact arrangement of the different proteins in the rings underlies the functionality of TRiC and is likely to be conserved across all eukaryotes. Yet despite its importance it has not been determined conclusively, mainly because the different subunits appear nearly identical under low resolution. This work successfully addresses the arrangement problem by the emerging technique of cross-linking, mass spectrometry, and modeling. We cross-linked TRiC under native conditions with a cross-linker that is primarily reactive toward exposed lysine side chains that are spatially close in the context of the particle. Following digestion and mass spectrometry we were able to identify over 60 lysine pairs that underwent cross-linking, thus providing distance restraints between specific residues in the complex. Independently of the cross-link set, we constructed 40,320 (= 8 factorial) computational models of the TRiC particle, which exhaustively enumerate all the possible arrangements of the different subunits. When we assessed the compatibility of each model with the cross-link set, we discovered that one specific model is significantly more compatible than any other model. Furthermore, bootstrapping analysis confirmed that this model is 10 times more likely to result from this cross-link set than the next best-fitting model. Our subunit arrangement is very different than any of the previously reported models and changes the context of existing and future findings on TRiC.

    View details for DOI 10.1073/pnas.1119472109

    View details for Web of Science ID 000300495100047

    View details for PubMedID 22308438

  • Symmetry-free cryo-EM structures of the chaperonin TRiC along its ATPase-driven conformational cycle EMBO JOURNAL Cong, Y., Schroeder, G. F., Meyer, A. S., Jakana, J., Ma, B., Dougherty, M. T., Schmid, M. F., Reissmann, S., Levitt, M., Ludtke, S. L., Frydman, J., Chiu, W. 2012; 31 (3): 720-730


    The eukaryotic group II chaperonin TRiC/CCT is a 16-subunit complex with eight distinct but similar subunits arranged in two stacked rings. Substrate folding inside the central chamber is triggered by ATP hydrolysis. We present five cryo-EM structures of TRiC in apo and nucleotide-induced states without imposing symmetry during the 3D reconstruction. These structures reveal the intra- and inter-ring subunit interaction pattern changes during the ATPase cycle. In the apo state, the subunit arrangement in each ring is highly asymmetric, whereas all nucleotide-containing states tend to be more symmetrical. We identify and structurally characterize an one-ring closed intermediate induced by ATP hydrolysis wherein the closed TRiC ring exhibits an observable chamber expansion. This likely represents the physiological substrate folding state. Our structural results suggest mechanisms for inter-ring-negative cooperativity, intra-ring-positive cooperativity, and protein-folding chamber closure of TRiC. Intriguingly, these mechanisms are different from other group I and II chaperonins despite their similar architecture.

    View details for DOI 10.1038/emboj.2011.366

    View details for Web of Science ID 000300871700019

    View details for PubMedID 22045336

    View details for PubMedCentralID PMC3273382

  • Optimized Torsion-Angle Normal Modes Reproduce Conformational Changes More Accurately Than Cartesian Modes BIOPHYSICAL JOURNAL Bray, J. K., Weiss, D. R., Levitt, M. 2011; 101 (12): 2966-2969


    We present what to our knowledge is a new method of optimized torsion-angle normal-mode analysis, in which the normal modes move along curved paths in Cartesian space. We show that optimized torsion-angle normal modes reproduce protein conformational changes more accurately than Cartesian normal modes. We also show that orthogonalizing the displacement vectors from torsion-angle normal-mode analysis and projecting them as straight lines in Cartesian space does not lead to better performance than Cartesian normal modes. Clearly, protein motion is more naturally described by curved paths in Cartesian space.

    View details for DOI 10.1016/j.bpj.2011.10.054

    View details for Web of Science ID 000298445500012

    View details for PubMedID 22208195

  • Remarkable patterns of surface water ordering around polarized buckminsterfullerene PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Chopra, G., Levitt, M. 2011; 108 (35): 14455-14460


    Accurate description of water structure affects simulation of protein folding, substrate binding, macromolecular recognition, and complex formation. We study the hydration of buckminsterfullerene, the smallest hydrophobic nanosphere, by molecular dynamics simulations using a state-of-the-art quantum mechanical polarizable force field (QMPFF3), derived from quantum mechanical data at the MP2/aug-cc-pVTZ(-hp) level augmented by CCSD(T). QMPFF3 calculation of the hydrophobic effect is compared to that obtained with empirical force fields. Using a novel and highly sensitive method, we see polarization increases ordered water structure so that the imprint of the hydrophobic surface atoms on the surrounding waters is stronger and extends to long-range. We see less water order for empirical force fields. The greater order seen with QMPFF3 will affect biological processes through a stronger hydrophobic effect.

    View details for DOI 10.1073/pnas.1110626108

    View details for Web of Science ID 000294425900024

    View details for PubMedID 21844369

  • Fully differentiable coarse-grained and all-atom knowledge-based potentials for RNA structure evaluation RNA-A PUBLICATION OF THE RNA SOCIETY Bernauer, J., Huang, X., Sim, A. Y., Levitt, M. 2011; 17 (6): 1066-1075


    RNA molecules play integral roles in gene regulation, and understanding their structures gives us important insights into their biological functions. Despite recent developments in template-based and parameterized energy functions, the structure of RNA--in particular the nonhelical regions--is still difficult to predict. Knowledge-based potentials have proven efficient in protein structure prediction. In this work, we describe two differentiable knowledge-based potentials derived from a curated data set of RNA structures, with all-atom or coarse-grained representation, respectively. We focus on one aspect of the prediction problem: the identification of native-like RNA conformations from a set of near-native models. Using a variety of near-native RNA models generated from three independent methods, we show that our potential is able to distinguish the native structure and identify native-like conformations, even at the coarse-grained level. The all-atom version of our knowledge-based potential performs better and appears to be more effective at discriminating near-native RNA conformations than one of the most highly regarded parameterized potential. The fully differentiable form of our potentials will additionally likely be useful for structure refinement and/or molecular dynamics simulations.

    View details for DOI 10.1261/rna.2543711

    View details for Web of Science ID 000290666300008

    View details for PubMedID 21521828

  • Cryo-EM Structure of a Group II Chaperonin in the Prehydrolysis ATP-Bound State Leading to Lid Closure STRUCTURE Zhang, J., Ma, B., DiMaio, F., Douglas, N. R., Joachimiak, L. A., Baker, D., Frydman, J., Levitt, M., Chiu, W. 2011; 19 (5): 633-639


    Chaperonins are large ATP-driven molecular machines that mediate cellular protein folding. Group II chaperonins use their "built-in lid" to close their central folding chamber. Here we report the structure of an archaeal group II chaperonin in its prehydrolysis ATP-bound state at subnanometer resolution using single particle cryo-electron microscopy (cryo-EM). Structural comparison of Mm-cpn in ATP-free, ATP-bound, and ATP-hydrolysis states reveals that ATP binding alone causes the chaperonin to close slightly with a ∼45° counterclockwise rotation of the apical domain. The subsequent ATP hydrolysis drives each subunit to rock toward the folding chamber and to close the lid completely. These motions are attributable to the local interactions of specific active site residues with the nucleotide, the tight couplings between the apical and intermediate domains within the subunit, and the aligned interactions between two subunits across the rings. This mechanism of structural changes in response to ATP is entirely different from those found in group I chaperonins.

    View details for DOI 10.1016/j.str.2011.03.005

    View details for Web of Science ID 000290815500006

    View details for PubMedID 21565698

  • Normal Modes of Prion Proteins: From Native to Infectious Particle BIOCHEMISTRY Samson, A. O., Levitt, M. 2011; 50 (12): 2243-2248


    Prion proteins (PrP) are the infectious agent in transmissible spongiform encephalopathies (i.e., mad cow disease). To be infectious, prion proteins must undergo a conformational change involving a decrease in α-helical content along with an increase in β-strand content. This conformational change was evaluated by means of elastic normal modes. Elastic normal modes show a diminution of two α-helices by one and two residues, as well as an extension of two β-strands by three residues each, which could instigate the conformational change. The conformational change occurs in a region that is compatible with immunological studies, and it is observed more frequently in mutant prions that are prone to conversion than in wild-type prions because of differences in their starting structures, which are amplified through normal modes. These findings are valuable for our comprehension of the conversion mechanism associated with the conformational change in prion proteins.

    View details for DOI 10.1021/bi1010514

    View details for Web of Science ID 000288573500027

    View details for PubMedID 21338080

  • Clustering to identify RNA conformations constrained by secondary structure PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Sim, A. Y., Levitt, M. 2011; 108 (9): 3590-3595


    RNA often folds hierarchically, so that its sequence defines its secondary structure (helical base-paired regions connected by single-stranded junctions), which subsequently defines its tertiary fold. To preserve base-pairing and chain connectivity, the three-dimensional conformations that RNA can explore are strongly confined compared to when secondary structure constraints are not enforced. Using three examples, we studied how secondary structure confines and dictates an RNA's preferred conformations. We made use of Macromolecular Conformations by SYMbolic programming (MC-Sym) fragment assembly to generate RNA conformations constrained by secondary structure. Then, to understand the correlations between different helix placements and orientations, we robustly clustered all RNA conformations by employing unique methods to remove outliers and estimate the best number of conformational clusters. We observed that the preferred conformation (as judged by largest cluster size) for each type of RNA junction molecule tested is consistent with its biological function. Further, the improved quality of models in our pruned datasets facilitates subsequent discrimination using scoring functions based either on statistical analysis (knowledge based) or experimental data.

    View details for DOI 10.1073/pnas.1018653108

    View details for Web of Science ID 000287844400031

    View details for PubMedID 21317361

  • To what extent does the citation advantage of collaboration depend on the citation counting system? 13th Conference of the International-Society-for-Scientometrics-and-Informetrics (ISSI) Levitt, J. M., Thelwall, M., Levitt, M. INT SOC SCIENTOMETRICS & INFORMETRICS-ISSI. 2011: 398–408
  • RNA polymerase II trigger loop residues stabilize and position the incoming nucleotide triphosphate in transcription PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Huang, X., Wang, D., Weiss, D. R., Bushnell, D. A., Kornberg, R. D., Levitt, M. 2010; 107 (36): 15745-15750


    A structurally conserved element, the trigger loop, has been suggested to play a key role in substrate selection and catalysis of RNA polymerase II (pol II) transcription elongation. Recently resolved X-ray structures showed that the trigger loop forms direct interactions with the beta-phosphate and base of the matched nucleotide triphosphate (NTP) through residues His1085 and Leu1081, respectively. In order to understand the role of these two critical residues in stabilizing active site conformation in the dynamic complex, we performed all-atom molecular dynamics simulations of the wild-type pol II elongation complex and its mutants in explicit solvent. In the wild-type complex, we found that the trigger loop is stabilized in the "closed" conformation, and His1085 forms a stable interaction with the NTP. Simulations of point mutations of His1085 are shown to affect this interaction; simulations of alternative protonation states, which are inaccessible through experiment, indicate that only the protonated form is able to stabilize the His1085-NTP interaction. Another trigger loop residue, Leu1081, stabilizes the incoming nucleotide position through interaction with the nucleotide base. Our simulations of this Leu mutant suggest a three-component mechanism for correctly positioning the incoming NTP in which (i) hydrophobic contact through Leu1081, (ii) base stacking, and (iii) base pairing work together to minimize the motion of the incoming NTP base. These results complement experimental observations and provide insight into the role of the trigger loop on transcription fidelity.

    View details for DOI 10.1073/pnas.1009898107

    View details for Web of Science ID 000281637800024

    View details for PubMedID 20798057

  • Consistent refinement of submitted models at CASP using a knowledge-based potential PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS Chopra, G., Kalisman, N., Levitt, M. 2010; 78 (12): 2668-2678


    Protein structure refinement is an important but unsolved problem; it must be solved if we are to predict biological function that is very sensitive to structural details. Specifically, critical assessment of techniques for protein structure prediction (CASP) shows that the accuracy of predictions in the comparative modeling category is often worse than that of the template on which the homology model is based. Here we describe a refinement protocol that is able to consistently refine submitted predictions for all categories at CASP7. The protocol uses direct energy minimization of the knowledge-based potential of mean force that is based on the interaction statistics of 167 atom types (Summa and Levitt, Proc Natl Acad Sci USA 2007; 104:3177-3182). Our protocol is thus computationally very efficient; it only takes a few minutes of CPU time to run typical protein models (300 residues). We observe an average structural improvement of 1% in GDT_TS, for predictions that have low and medium homology to known PDB structures (Global Distance Test score or GDT_TS between 50 and 80%). We also observe a marked improvement in the stereochemistry of the models. The level of improvement varies amongst the various participants at CASP, but we see large improvements (>10% increase in GDT_TS) even for models predicted by the best performing groups at CASP7. In addition, our protocol consistently improved the best predicted models in the refinement category at CASP7 and CASP8. These improvements in structure and stereochemistry prove the usefulness of our computationally inexpensive, powerful and automatic refinement protocol.

    View details for DOI 10.1002/prot.22781

    View details for Web of Science ID 000280822000008

    View details for PubMedID 20589633

  • Conformational Optimization with Natural Degrees of Freedom: A Novel Stochastic Chain Closure Algorithm JOURNAL OF COMPUTATIONAL BIOLOGY Minary, P., Levitt, M. 2010; 17 (8): 993-1010


    The present article introduces a set of novel methods that facilitate the use of "natural moves" or arbitrary degrees of freedom that can give rise to collective rearrangements in the structure of biological macromolecules. While such "natural moves" may spoil the stereochemistry and even break the bonded chain at multiple locations, our new method restores the correct chain geometry by adjusting bond and torsion angles in an arbitrary defined molten zone. This is done by successive stages of partial closure that propagate the location of the chain break backwards along the chain. At the end of these stages, the size of the chain break is generally reduced so much that it can be repaired by adjusting the position of a single atom. Our chain closure method is efficient with a computational complexity of O(N(d)), where N(d) is the number of degrees of freedom used to repair the chain break. The new method facilitates the use of arbitrary degrees of freedom including the "natural" degrees of freedom inferred from analyzing experimental (X-ray crystallography and nuclear magnetic resonance [NMR]) structures of nucleic acids and proteins. In terms of its ability to generate large conformational moves and its effectiveness in locating low energy states, the new method is robust and computationally efficient.

    View details for DOI 10.1089/cmb.2010.0016

    View details for Web of Science ID 000281199700003

    View details for PubMedID 20726792

  • MOTIF-EM: an automated computational tool for identifying conserved regions in CryoEM structures BIOINFORMATICS Saha, M., Levitt, M., Chiu, W. 2010; 26 (12): i301-i309


    We present a new, first-of-its-kind, fully automated computational tool MOTIF-EM for identifying regions or domains or motifs in cryoEM maps of large macromolecular assemblies (such as chaperonins, viruses, etc.) that remain conformationally conserved. As a by-product, regions in structures that are not conserved are revealed: this can indicate local molecular flexibility related to biological activity. MOTIF-EM takes cryoEM volumetric maps as inputs. The technique used by MOTIF-EM to detect conserved sub-structures is inspired by a recent breakthrough in 2D object recognition. The technique works by constructing rotationally invariant, low-dimensional representations of local regions in the input cryoEM maps. Correspondences are established between the reduced representations (by comparing them using a simple metric) across the input maps. The correspondences are clustered using hash tables and graph theory is used to retrieve conserved structural domains or motifs. MOTIF-EM has been used to extract conserved domains occurring in large macromolecular assembly maps, including as those of viruses P22 and epsilon 15, Ribosome 70S, GroEL, that remain structurally conserved in different functional states. Our method can also been used to build atomic models for some maps. We also used MOTIF-EM to identify the conserved folds shared among dsDNA bacteriophages HK97, Epsilon 15, and ô29, though they have low-sequence similarity. Supplementary information: Supplementary data are available at Bioinformatics online.

    View details for DOI 10.1093/bioinformatics/btq195

    View details for Web of Science ID 000278689000037

    View details for PubMedID 20529921

    View details for PubMedCentralID PMC2881380

  • Super-resolution biomolecular crystallography with low-resolution data NATURE Schroeder, G. F., Levitt, M., Brunger, A. T. 2010; 464 (7292): 1218-U146


    X-ray diffraction plays a pivotal role in the understanding of biological systems by revealing atomic structures of proteins, nucleic acids and their complexes, with much recent interest in very large assemblies like the ribosome. As crystals of such large assemblies often diffract weakly (resolution worse than 4 A), we need methods that work at such low resolution. In macromolecular assemblies, some of the components may be known at high resolution, whereas others are unknown: current refinement methods fail as they require a high-resolution starting structure for the entire complex. Determining the structure of such complexes, which are often of key biological importance, should be possible in principle as the number of independent diffraction intensities at a resolution better than 5 A generally exceeds the number of degrees of freedom. Here we introduce a method that adds specific information from known homologous structures but allows global and local deformations of these homology models. Our approach uses the observation that local protein structure tends to be conserved as sequence and function evolve. Cross-validation with R(free) (the free R-factor) determines the optimum deformation and influence of the homology model. For test cases at 3.5-5 A resolution with known structures at high resolution, our method gives significant improvements over conventional refinement in the model as monitored by coordinate accuracy, the definition of secondary structure and the quality of electron density maps. For re-refinements of a representative set of 19 low-resolution crystal structures from the Protein Data Bank, we find similar improvements. Thus, a structure derived from low-resolution diffraction data can have quality similar to a high-resolution structure. Our method is applicable to the study of weakly diffracting crystals using X-ray micro-diffraction as well as data from new X-ray light sources. Use of homology information is not restricted to X-ray crystallography and cryo-electron microscopy: as optical imaging advances to subnanometre resolution, it can use similar tools.

    View details for DOI 10.1038/nature08892

    View details for Web of Science ID 000276891100043

    View details for PubMedID 20376006

  • Mechanism of folding chamber closure in a group II chaperonin NATURE Zhang, J., Baker, M. L., Schroeder, G. F., Douglas, N. R., Reissmann, S., Jakana, J., Dougherty, M., Fu, C. J., Levitt, M., Ludtke, S. J., Frydman, J., Chiu, W. 2010; 463 (7279): 379-U130


    Group II chaperonins are essential mediators of cellular protein folding in eukaryotes and archaea. These oligomeric protein machines, approximately 1 megadalton, consist of two back-to-back rings encompassing a central cavity that accommodates polypeptide substrates. Chaperonin-mediated protein folding is critically dependent on the closure of a built-in lid, which is triggered by ATP hydrolysis. The structural rearrangements and molecular events leading to lid closure are still unknown. Here we report four single particle cryo-electron microscopy (cryo-EM) structures of Mm-cpn, an archaeal group II chaperonin, in the nucleotide-free (open) and nucleotide-induced (closed) states. The 4.3 A resolution of the closed conformation allowed building of the first ever atomic model directly from the single particle cryo-EM density map, in which we were able to visualize the nucleotide and more than 70% of the side chains. The model of the open conformation was obtained by using the deformable elastic network modelling with the 8 A resolution open-state cryo-EM density restraints. Together, the open and closed structures show how local conformational changes triggered by ATP hydrolysis lead to an alteration of intersubunit contacts within and across the rings, ultimately causing a rocking motion that closes the ring. Our analyses show that there is an intricate and unforeseen set of interactions controlling allosteric communication and inter-ring signalling, driving the conformational cycle of group II chaperonins. Beyond this, we anticipate that our methodology of combining single particle cryo-EM and computational modelling will become a powerful tool in the determination of atomic details involved in the dynamic processes of macromolecular machines in solution.

    View details for DOI 10.1038/nature08701

    View details for Web of Science ID 000273748100049

    View details for PubMedID 20090755

    View details for PubMedCentralID PMC2834796

  • Insights into the intra-ring subunit order of TRiC/CCT: a structural and evolutionary analysis. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Kalisman, N., Levitt, M. 2010: 252-259


    TRiC is an important group II chaperonin that facilitates the folding of many eukaryotic proteins. The TRiC complex consists of two stacked rings, each comprised of eight paralogous subunits with a mutual sequence identity of 30-35%. Each subunit has unique functional roles that are manifested by corresponding sequence conservation. It is generally assumed that the subunit order within each ring is fixed, but this order is still uncertain. Here we address the problem of the intra-ring subunit order by combining two sources of information: evolutionary conservation and a structural hypothesis. Specifically, we identify residues in the TRiC subunits that are likely to be part of the intra-unit interface, based on homology modeling to the solved thermosome structure. Within this set of residues, we search for a subset that shows an evolutionary conservation pattern that is indicative of the subunit order key. This pattern shows considerable conservation across species, but large variation across the eight subunits. By this approach we were able to locate two parts of the interface where complementary interactions seem to favor certain pairing of subunits. This knowledge leads to restrictions on the 5,040 (=7!) possible subunits arrangements in the ring, and limits them to just 72. Although our findings give only partial understanding of the inter-subunit interactions that determine their order, we conclude that they are comprised of complementary charged, polar and hydrophobic interactions that occur in both the equatorial and middle domains of each subunit.

    View details for PubMedID 19908377

  • Nature of the protein universe PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Levitt, M. 2009; 106 (27): 11079-11084


    The protein universe is the set of all proteins of all organisms. Here, all currently known sequences are analyzed in terms of families that have single-domain or multidomain architectures and whether they have a known three-dimensional structure. Growth of new single-domain families is very slow: Almost all growth comes from new multidomain architectures that are combinations of domains characterized by approximately 15,000 sequence profiles. Single-domain families are mostly shared by the major groups of organisms, whereas multidomain architectures are specific and account for species diversity. There are known structures for a quarter of the single-domain families, and >70% of all sequences can be partially modeled thanks to their membership in these families.

    View details for DOI 10.1073/pnas.0905029106

    View details for Web of Science ID 000267796100040

    View details for PubMedID 19541617

  • Structural Basis of Transcription: Backtracked RNA Polymerase II at 3.4 Angstrom Resolution SCIENCE Wang, D., Bushnell, D. A., Huang, X., Westover, K. D., Levitt, M., Kornberg, R. D. 2009; 324 (5931): 1203-1206


    Transcribing RNA polymerases oscillate between three stable states, two of which, pre- and posttranslocated, were previously subjected to x-ray crystal structure determination. We report here the crystal structure of RNA polymerase II in the third state, the reverse translocated, or "backtracked" state. The defining feature of the backtracked structure is a binding site for the first backtracked nucleotide. This binding site is occupied in case of nucleotide misincorporation in the RNA or damage to the DNA, and is termed the "P" site because it supports proofreading. The predominant mechanism of proofreading is the excision of a dinucleotide in the presence of the elongation factor SII (TFIIS). Structure determination of a cocrystal with TFIIS reveals a rearrangement whereby cleavage of the RNA may take place.

    View details for DOI 10.1126/science.1168729

    View details for Web of Science ID 000266410100047

    View details for PubMedID 19478184

  • Outcome of a Workshop on Applications of Protein Models in Biomedical Research STRUCTURE Schwede, T., Sali, A., Honig, B., Levitt, M., Berman, H. M., Jones, D., Brenner, S. E., Burley, S. K., Das, R., Dokholyan, N. V., Dunbrack, R. L., Fidelis, K., Fiser, A., Godzik, A., Huang, Y. J., Humblet, C., Jacobson, M. P., Joachimiak, A., Krystek, S. R., Kortemme, T., Kryshtafovych, A., Montelione, G. T., Moult, J., Murray, D., Sanchez, R., Sosnick, T. R., Standley, D. M., Stouch, T., Vajda, S., Vasquez, M., Westbrook, J. D., Wilson, I. A. 2009; 17 (2): 151-159


    We describe the proceedings and conclusions from the "Workshop on Applications of Protein Models in Biomedical Research" (the Workshop) that was held at the University of California, San Francisco on 11 and 12 July, 2008. At the Workshop, international scientists involved with structure modeling explored (i) how models are currently used in biomedical research, (ii) the requirements and challenges for different applications, and (iii) how the interaction between the computational and experimental research communities could be strengthened to advance the field.

    View details for DOI 10.1016/j.str.2008.12.014

    View details for Web of Science ID 000263384800003

    View details for PubMedID 19217386

  • Generalized ensemble methods for de novo structure prediction PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Shmygelska, A., Levitt, M. 2009; 106 (5): 1415-1420


    Current methods for predicting protein structure depend on two interrelated components: (i) an energy function that should have a low value near the correct structure and (ii) a method for searching through different conformations of the polypeptide chain. Identification of the most efficient search methods is essential if we are to be able to apply such methods broadly and with confidence. In addition, efficient search methods provide a rigorous test of existing energy functions, which are generally knowledge-based and contain different terms added together with arbitrary weights. Here, we test different search methods with one of the most accurate and predictive energy functions, namely Rosetta the knowledge-based force-field from Baker's group [Simons K, Kooperberg C, Huang E, Baker D (1997) J Mol Biol 268:209-225]. We use an implementation of a generalized ensemble search method to scale relevant parts of the energy function. This method, known as Hamiltonian Replica Exchange Monte Carlo, outperforms the original Monte Carlo Simulated Annealing used in the Rosetta package in terms of sampling low-energy states. It also outperforms another widely used generalized ensemble search method known as Temperature Replica Exchange Monte Carlo. Our results reveal clear deficiencies in the low-resolution Rosetta energy function in that the lowest energy structures are not necessarily the most native-like. By using a set of nonnative low-energy structures found by our extensive sampling, we discovered that the long-range and short-range backbone hydrogen-bonding energy terms of the Rosetta energy discriminate between the nonnative and native-like structures significantly better than the low-resolution score used in Rosetta.

    View details for DOI 10.1073/pnas.0812510106

    View details for Web of Science ID 000263074600025

    View details for PubMedID 19171891

  • Can Morphing Methods Predict Intermediate Structures? JOURNAL OF MOLECULAR BIOLOGY Weiss, D. R., Levitt, M. 2009; 385 (2): 665-674


    Movement is crucial to the biological function of many proteins, yet crystallographic structures of proteins can give us only a static snapshot. The protein dynamics that are important to biological function often happen on a timescale that is unattainable through detailed simulation methods such as molecular dynamics as they often involve crossing high-energy barriers. To address this coarse-grained motion, several methods have been implemented as web servers in which a set of coordinates is usually linearly interpolated from an initial crystallographic structure to a final crystallographic structure. We present a new morphing method that does not extrapolate linearly and can therefore go around high-energy barriers and which can produce different trajectories between the same two starting points. In this work, we evaluate our method and other established coarse-grained methods according to an objective measure: how close a coarse-grained dynamics method comes to a crystallographically determined intermediate structure when calculating a trajectory between the initial and final crystal protein structure. We test this with a set of five proteins with at least three crystallographically determined on-pathway high-resolution intermediate structures from the Protein Data Bank. For simple hinging motions involving a small conformational change, segmentation of the protein into two rigid sections outperforms other more computationally involved methods. However, large-scale conformational change is best addressed using a nonlinear approach and we suggest that there is merit in further developing such methods.

    View details for DOI 10.1016/j.jmb.2008.10.064

    View details for Web of Science ID 000262916900027

    View details for PubMedID 18996395

  • Protein segment finder: an online search engine for segment motifs in the PDB NUCLEIC ACIDS RESEARCH Samson, A. O., Levitt, M. 2009; 37: D224-D228


    Finding related conformations in the Protein Data Bank (PDB) is essential in many areas of bioscience. To assist this task, we designed a search engine that uses a compact database to quickly identify protein segments obeying a set of primary, secondary and tertiary structure constraints. The database contains information such as amino acid sequence, secondary structure, disulfide bonds, hydrogen bonds and atoms in contact as calculated from all protein structures in the PDB. The search engine parses the database and returns hits that match the queried parameters. The conformation search engine, which is notable for its high speed and interactive feedback, is expected to assist scientists in discovering conformation homologs and predicting protein structure. The engine is publicly available at and it will also be used in-house in an automatic mode aimed at discovering new protein motifs.

    View details for DOI 10.1093/nar/gkn833

    View details for Web of Science ID 000261906200041

    View details for PubMedID 18974183

  • Solvent dramatically affects protein structure refinement PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Chopra, G., Summa, C. M., Levitt, M. 2008; 105 (51): 20239-20244


    One of the most challenging problems in protein structure prediction is improvement of homology models (structures within 1-3 A C(alpha) rmsd of the native structure), also known as the protein structure refinement problem. It has been shown that improvement could be achieved using in vacuo energy minimization with molecular mechanics and statistically derived continuously differentiable hybrid knowledge-based (KB) potential functions. Globular proteins, however, fold and function in aqueous solution in vivo and in vitro. In this work, we study the role of solvent in protein structure refinement. Molecular dynamics in explicit solvent and energy minimization in both explicit and implicit solvent were performed on a set of 75 native proteins to test the various energy potentials. A more stringent test for refinement was performed on 729 near-native decoys for each native protein. We use a powerfully convergent energy minimization method to show that implicit solvent (GBSA) provides greater improvement for some proteins than the KB potential: 24 of 75 proteins showing an average improvement of >20% in C(alpha) rmsd from the native structure with GBSA, compared to just 7 proteins with KB. Molecular dynamics in explicit solvent moved the structures further away from their native conformation than the initial, unrefined decoys. Implicit solvent gives rise to a deep, smooth potential energy attractor basin that pulls toward the native structure.

    View details for DOI 10.1073/pnas.0810818105

    View details for Web of Science ID 000261995600042

    View details for PubMedID 19073921

  • Inhibition mechanism of the acetylcholine receptor by alpha-neurotoxins as revealed by normal-mode dynamics BIOCHEMISTRY Samson, A. O., Levitt, M. 2008; 47 (13): 4065-4070


    The nicotinic acetylcholine receptor (AChR) is the prototype of ligand-gated ion channels. Here, we calculate the dynamics of the muscle AChR using normal modes. The calculations reveal a twist-like gating motion responsible for channel opening. The ion channel diameter is shown to increase with this twist motion. Strikingly, the twist motion and the increase in channel diameter are not observed for the AChR in complex with two alpha-bungarotoxin (alphaBTX) molecules. The toxins seems to lock together neighboring receptor subunits, thereby inhibiting channel opening. Interestingly, one alphaBTX molecule suffices to prevent the twist motion. These results shed light on the gating mechanism of the AChR and present a complementary inhibition mechanism by snake-venom-derived alpha-neurotoxins.

    View details for DOI 10.1021/bi702272j

    View details for Web of Science ID 000254408200010

    View details for PubMedID 18327915

  • How hydrophobic Buckminsterfullerene affects surrounding water structure JOURNAL OF PHYSICAL CHEMISTRY B Weiss, D. R., Raschke, T. M., Levitt, M. 2008; 112 (10): 2981-2990


    The hydrophobic hydration of fullerenes in water is of significant interest as the most common Buckminsterfullerene (C60) is a mesoscale sphere; C60 also has potential in pharmaceutical and nanomaterial applications. We use an all-atom molecular dynamics simulation lasting hundreds of nanoseconds to determine the behavior of a single molecule of C60 in a periodic box of water, and compare this to methane. A C60 molecule does not induce drying at the surface; however, unlike a hard sphere methane, a hard sphere C60 solute does. This is due to a larger number of attractive Lennard-Jones interactions between the carbon atom centers in C60 and the surrounding waters. In these simulations, water is not uniformly arranged but rather adopts a range of orientations in the first hydration shell despite the spherical symmetry of both solutes. There is a clear effect of solute size on the orientation of the first hydration shell waters. There is a large increase in hydrogen-bonding contacts between waters in the C60 first hydration shell. There is also a disruption of hydrogen bonds between waters in the first and second hydration shells. Water molecules in the first hydration shell preferentially create triangular structures that minimize the net water dipole near the surface near both the methane and C60 surface, reducing the total energy of the system. Additionally, in the first and second hydration shells, the water dipoles are ordered to a distance of 8 A from the solute surface. We conclude that, with a diameter of approximately 1 nm, C60 behaves as a large hydrophobic solute.

    View details for DOI 10.1021/jp076416h

    View details for Web of Science ID 000253784700031

    View details for PubMedID 18275178

  • Probing protein fold space with a simplified model JOURNAL OF MOLECULAR BIOLOGY Minary, P., Levitt, M. 2008; 375 (4): 920-933


    We probe the stability and near-native energy landscape of protein fold space using powerful conformational sampling methods together with simple reduced models and statistical potentials. Fold space is represented by a set of 280 protein domains spanning all topological classes and having a wide range of lengths (33-300 residues) amino acid composition and number of secondary structural elements. The degrees of freedom are taken as the loop torsion angles. This choice preserves the native secondary structure but allows the tertiary structure to change. The proteins are represented by three-point per residue, three-dimensional models with statistical potentials derived from a knowledge-based study of known protein structures. When this space is sampled by a combination of parallel tempering and equi-energy Monte Carlo, we find that the three-point model captures the known stability of protein native structures with stable energy basins that are near-native (all alpha: 4.77 A, all beta: 2.93 A, alpha/beta: 3.09 A, alpha+beta: 4.89 A on average and within 6 A for 71.41%, 92.85%, 94.29% and 64.28% for all-alpha, all-beta, alpha/beta and alpha+beta, classes, respectively). Denatured structures also occur and these have interesting structural properties that shed light on the different landscape characteristics of alpha and beta folds. We find that alpha/beta proteins with alternating alpha and beta segments (such as the beta-barrel) are more stable than proteins in other fold classes.

    View details for DOI 10.1016/j.jmb.2007.10.087

    View details for Web of Science ID 000253098000004

    View details for PubMedID 18054792

  • Combining efficient conformational sampling with a deformable elastic network model facilitates structure refinement at low resolution STRUCTURE Schroeder, G. F., Brunger, A. T., Levitt, M. 2007; 15 (12): 1630-1641


    Structural studies of large proteins and protein assemblies are a difficult and pressing challenge in molecular biology. Experiments often yield only low-resolution or sparse data that are not sufficient to fully determine atomistic structures. We have developed a general geometry-based algorithm that efficiently samples conformational space under constraints imposed by low-resolution density maps obtained from electron microscopy or X-ray crystallography experiments. A deformable elastic network (DEN) is used to restrain the sampling to prior knowledge of an approximate structure. The DEN restraints dramatically reduce over-fitting, especially at low resolution. Cross-validation is used to optimally weight the structural information and experimental data. Our algorithm is robust even for noise-added density maps and has a large radius of convergence for our test case. The DEN restraints can also be used to enhance reciprocal space simulated annealing refinement.

    View details for DOI 10.1016/j.str.2007.09.021

    View details for Web of Science ID 000251655400015

    View details for PubMedID 18073112

  • Simulations of RNA base pairs in a nanodroplet reveal solvation-dependent stability PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Sykes, M. T., Levitt, M. 2007; 104 (30): 12336-12340


    We show that RNA base pairs have variable stability depending on their degree of solvation. This finding has far-reaching biological implications for nucleic acid structure in a partially solvated cellular environment such as inside RNA-protein complexes. Molecular dynamics simulations of partially solvated Watson-Crick RNA base pairs show that whereas water serves to destabilize a base pair by competing for and disrupting base-base hydrogen bonds, when sufficient water molecules are present, fewer hydrogen bonds are available to disrupt the base pairs and the destabilization effect is reduced. The result is that base pairs exist at a stability minimum when solvated in between 20 and 100 water molecules, the upper limit of which corresponds to the approximate number of water molecules contained in the first hydration shell.

    View details for DOI 10.1073/pnas.0705573104

    View details for Web of Science ID 000248472100020

    View details for PubMedID 17636124

  • Growth of novel protein structural data PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Levitt, M. 2007; 104 (9): 3183-3188


    Contrary to popular assumption, the rate of growth of structural data has slowed, and the Protein Data Bank (PDB) has not been growing exponentially since 1995. Reaching such a dramatic conclusion requires careful measurement of growth of novel structures, which can be achieved by clustering entry sequences, or by using a novel index to down-weight entries with a higher number of sequence neighbors. These measures agree, and growth rates are very similar for entire PDB files, clusters, and weighted chains. The overall sizes of Structural Classification of Proteins (SCOP) categories (number of families, superfamilies, and folds) appear to be directly proportional to the number of deposited PDB files. Using our weighted chain count, which is most correlated to the change in the size of each SCOP category in any time period, shows that the rate of increase of SCOP categories is actually slowing down. This enables the final size of each of these SCOP categories to be predicted without examining or comparing protein structures. In the last 3 years, structures solved by structural genomics (SG) initiatives, especially the United States National Institutes of Health Protein Structure Initiative, have begun to redress the slowing growth of the PDB. Structures solved by SG are 3.8 times less sequence-redundant than typical PDB structures. Since mid-2004, SG programs have contributed half the novel structures measured by weighted chain counts. Our analysis does not rely on visual inspection of coordinate sets: it is done automatically, providing an accurate, up-to-date measure of the growth of novel protein structural data.

    View details for DOI 10.1073/pnas.0611678104

    View details for Web of Science ID 000244661400031

    View details for PubMedID 17360626

  • Near-native structure refinement using in vacuo energy minimization PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Summa, C. M., Levitt, M. 2007; 104 (9): 3177-3182


    One of the greatest shortcomings of macromolecular energy minimization and molecular dynamics techniques is that they generally do not preserve the native structure of proteins as observed by x-ray crystallography. This deformation of the native structure means that these methods are not generally used to refine structures produced by homology-modeling techniques. Here, we use a database of 75 proteins to test the ability of a variety of popular molecular mechanics force fields to maintain the native structure. Minimization from the native structure is a weak test of potential energy functions: It is complemented by a much stronger test in which the same methods are compared for their ability to attract a near-native decoy protein structure toward the native structure. We use a powerfully convergent energy-minimization method and show that, of the traditional molecular mechanics potentials tested, only one showed a modest net improvement over a large data set of structurally diverse proteins. A smooth, differentiable knowledge-based pairwise atomic potential performs better on this test than traditional potential functions. This work is expected to have important implications for protein structure refinement, homology modeling, and structure prediction.

    View details for DOI 10.1073/pnas.0611593104

    View details for Web of Science ID 000244661400030

    View details for PubMedID 17360625

  • Discussion of "equi-energy sampler" by Kou, Zhou and Wong ANNALS OF STATISTICS Minary, P., Levitt, M. 2006; 34 (4): 1636-1641
  • Spatial regulation and the rate of signal transduction activation PLOS COMPUTATIONAL BIOLOGY Batada, N. N., Shepp, L. A., Siegmund, D. O., Levitt, M. 2006; 2 (5): 343-349


    Of the many important signaling events that take place on the surface of a mammalian cell, activation of signal transduction pathways via interactions of cell surface receptors is one of the most important. Evidence suggests that cell surface proteins are not as freely diffusible as implied by the classic fluid mosaic model and that their confinement to membrane domains is regulated. It is unknown whether these dynamic localization mechanisms function to enhance signal transduction activation rate or to minimize cross talk among pathways that share common intermediates. To determine which of these two possibilities is more likely, we derive an explicit equation for the rate at which cell surface membrane proteins interact based on a Brownian motion model in the presence of endocytosis and exocytosis. We find that in the absence of any diffusion constraints, cell surface protein interaction rate is extremely high relative to cytoplasmic protein interaction rate even in a large mammalian cell with a receptor abundance of a mere two hundred molecules. Since a larger number of downstream signaling events needs to take place, each occurring at a much slower rate than the initial activation via association of cell surface proteins, we conclude that the role of co-localization is most likely that of cross-talk reduction rather than coupling efficiency enhancement.

    View details for DOI 10.1371/journal.pcbi.0020044

    View details for Web of Science ID 000239493900003

    View details for PubMedID 16699596

  • Theory and simulation - Accuracy and reliability in modelling proteins and complexes CURRENT OPINION IN STRUCTURAL BIOLOGY Janin, J., Levitt, M. 2006; 16 (2): 139-141
  • An atomic environment potential for use in protein structure prediction JOURNAL OF MOLECULAR BIOLOGY Summa, C. M., Levitt, M., DeGrado, W. F. 2005; 352 (4): 986-1001


    We describe the derivation and testing of a knowledge-based atomic environment potential for the modeling of protein structural energetics. An analysis of the probabilities of atomic interactions in a dataset of high-resolution protein structures shows that the probabilities of non-bonded inter-atomic contacts are not statistically independent events, and that the multi-body contact frequencies are poorly predicted from pairwise contact potentials. A pseudo-energy function is defined that measures the preferences for protein atoms to be in a given microenvironment defined by the number of contacting atoms in the environment and its atomic composition. This functional form is tested for its ability to recognize native protein structures amongst an ensemble of decoy structures and a detailed relative performance comparison is made with a number of common functions used in protein structure prediction.

    View details for DOI 10.1016/j.jmb.2005.07.054

    View details for Web of Science ID 000232188100018

    View details for PubMedID 16126228

  • Describing RNA structure by libraries of clustered nucleotide doublets JOURNAL OF MOLECULAR BIOLOGY Sykes, M. T., Levitt, M. 2005; 351 (1): 26-38


    The rapidly increasing wealth of structural information on RNA and knowledge of its varying roles in biology have facilitated the study of RNA structure using computational methods. Here, we present a new method to describe RNA structure based on nucleotide doublets, where a doublet is any two nucleotides in a structure. We restrict our search to doublets that are close together in space, but not necessarily in sequence, and obtain doublet libraries of various sizes by clustering a large set of doublets taken from a data set of high-resolution RNA structures. We demonstrate that these libraries are able to both capture structural features present in RNA and fit local RNA structure with a high level of accuracy. Libraries ranging in size from ten to 100 doublets are examined, and a detailed analysis shows that a library with as few as 30 doublets is sufficient to capture the most common structural features, while larger libraries would be more appropriate for accurate modeling. We anticipate many uses for these libraries, from annotation to structure refinement and prediction.

    View details for DOI 10.1016/j.jmb.2005.06.024

    View details for Web of Science ID 000230803100004

    View details for PubMedID 15993894

  • Nonpolar solutes enhance water structure within hydration shells while reducing interactions between them PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Raschke, T. M., Levitt, M. 2005; 102 (19): 6777-6782


    The origins of the hydrophobic effect are widely thought to lie in structural changes of the water molecules surrounding a nonpolar solute. The spatial distribution functions of the water molecules surrounding benzene and cyclohexane computed previously from molecular dynamics simulations show a high density first hydration shell surrounding both solutes. In addition, benzene showed a strong preference for hydrogen bonding with two water molecules, one to each face of the benzene ring. The position data alone, however, do not describe the majority of orientational changes in the water molecules in the first hydration shells surrounding these solutes. In this paper, we measure the changes in orientation of the water molecules with respect to the solute through spatial orientation functions as well as radial/angular distribution functions. These data show that the water molecules hydrogen bonded to benzene have a strong orientation preference, whereas those around cyclohexane show a weaker tendency. In addition, the water-water interactions within and between the first two hydration shells were measured as a function of distance and "best" hydrogen bonding angle. Water molecules within the first hydration shell have increased hydrogen bonding structure; water molecules interacting across shell 1 and shell 2 have reduced hydrogen bonding structure.

    View details for DOI 10.1073/pnas.0500225102

    View details for Web of Science ID 000229048500026

    View details for PubMedID 15867152

  • Comprehensive evaluation of protein structure alignment methods: Scoring by geometric measures JOURNAL OF MOLECULAR BIOLOGY Kolodny, R., Koehl, P., Levitt, M. 2005; 346 (4): 1173-1188


    We report the largest and most comprehensive comparison of protein structural alignment methods. Specifically, we evaluate six publicly available structure alignment programs: SSAP, STRUCTAL, DALI, LSQMAN, CE and SSM by aligning all 8,581,970 protein structure pairs in a test set of 2930 protein domains specially selected from CATH v.2.4 to ensure sequence diversity. We consider an alignment good if it matches many residues, and the two substructures are geometrically similar. Even with this definition, evaluating structural alignment methods is not straightforward. At first, we compared the rates of true and false positives using receiver operating characteristic (ROC) curves with the CATH classification taken as a gold standard. This proved unsatisfactory in that the quality of the alignments is not taken into account: sometimes a method that finds less good alignments scores better than a method that finds better alignments. We correct this intrinsic limitation by using four different geometric match measures (SI, MI, SAS, and GSAS) to evaluate the quality of each structural alignment. With this improved analysis we show that there is a wide variation in the performance of different methods; the main reason for this is that it can be difficult to find a good structural alignment between two proteins even when such an alignment exists. We find that STRUCTAL and SSM perform best, followed by LSQMAN and CE. Our focus on the intrinsic quality of each alignment allows us to propose a new method, called "Best-of-All" that combines the best results of all methods. Many commonly used methods miss 10-50% of the good Best-of-All alignments. By putting existing structural alignments into proper perspective, our study allows better comparison of protein structures. By highlighting limitations of existing methods, it will spur the further development of better structural alignment methods. This will have significant biological implications now that structural comparison has come to play a central role in the analysis of experimental work on protein structure, protein function and protein evolution.

    View details for DOI 10.1016/j.jmb.2004.12.032

    View details for Web of Science ID 000227187800018

    View details for PubMedID 15701525

  • Inverse kinematics in biology: The protein loop closure problem INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH Kolodny, R., Guibas, L., Levitt, M., Koehl, P. 2005; 24 (2-3): 151-163
  • Diffusion of nucleoside triphosphates and role of the entry site to the RNA polymerase II active center PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Batada, N. N., Westover, K. D., Bushnell, D. A., Levitt, M., Kornberg, R. D. 2004; 101 (50): 17361-17364


    Nucleoside triphosphates (NTPs) diffuse to the active center of RNA polymerase II through a funnel-shaped opening that narrows to a negatively charged pore. Computer simulation shows that the funnel and pore reduce the rate of diffusion by a factor of approximately 2 x 10(-7). The resulting limitation on the rate of RNA synthesis under conditions of low NTP concentration may be overcome by NTP binding to an entry site adjacent to the active center. Binding to the entry site greatly enhances the lifetime of an NTP in the active center region, and it prevents "backtracking" and the consequent occlusion of the active site.

    View details for DOI 10.1073/pnas.0408168101

    View details for Web of Science ID 000225803400011

    View details for PubMedID 15574497

  • The area derivative of a space-filling diagram DISCRETE & COMPUTATIONAL GEOMETRY Bryant, R., Edelsbrunner, H., Koehl, P., Levitt, M. 2004; 32 (3): 293-308
  • Detailed hydration maps of benzene and cyclohexane reveal distinct water structures JOURNAL OF PHYSICAL CHEMISTRY B Raschke, T. M., Levitt, M. 2004; 108 (35): 13492-13500

    View details for DOI 10.1021/jp049481p

    View details for Web of Science ID 000223600800057

  • Improved protein structure selection using decoy-dependent discriminatory functions. BMC structural biology Wang, K., Fain, B., Levitt, M., Samudrala, R. 2004; 4: 8-?


    A key component in protein structure prediction is a scoring or discriminatory function that can distinguish near-native conformations from misfolded ones. Various types of scoring functions have been developed to accomplish this goal, but their performance is not adequate to solve the structure selection problem. In addition, there is poor correlation between the scores and the accuracy of the generated conformations.We present a simple and nonparametric formula to estimate the accuracy of predicted conformations (or decoys). This scoring function, called the density score function, evaluates decoy conformations by performing an all-against-all Calpha RMSD (Root Mean Square Deviation) calculation in a given decoy set. We tested the density score function on 83 decoy sets grouped by their generation methods (4state_reduced, fisa, fisa_casp3, lmds, lattice_ssfit, semfold and Rosetta). The density scores have correlations as high as 0.9 with the Calpha RMSDs of the decoy conformations, measured relative to the experimental conformation for each decoy. We previously developed a residue-specific all-atom probability discriminatory function (RAPDF), which compiles statistics from a database of experimentally determined conformations, to aid in structure selection. Here, we present a decoy-dependent discriminatory function called self-RAPDF, where we compiled the atom-atom contact probabilities from all the conformations in a decoy set instead of using an ensemble of native conformations, with a weighting scheme based on the density scores. The self-RAPDF has a higher correlation with Calpha RMSD than RAPDF for 76/83 decoy sets, and selects better near-native conformations for 62/83 decoy sets. Self-RAPDF may be useful not only for selecting near-native conformations from decoy sets, but also for fold simulations and protein structure refinement.Both the density score and the self-RAPDF functions are decoy-dependent scoring functions for improved protein structure selection. Their success indicates that information from the ensemble of decoy conformations can be used to derive statistical probabilities and facilitate the identification of near-native structures.

    View details for PubMedID 15207004

  • Simulating protein evolution in sequence and structure space CURRENT OPINION IN STRUCTURAL BIOLOGY Xia, Y., Levitt, M. 2004; 14 (2): 202-207


    Naturally occurring proteins comprise a special subset of all plausible sequences and structures selected through evolution. Simulating protein evolution with simplified and all-atom models has shed light on the evolutionary dynamics of protein populations, the nature of evolved sequences and structures, and the extent to which today's proteins are shaped by selection pressures on folding, structure and function. Extensive mapping of the native structure, stability and folding rate in sequence space using lattice proteins has revealed organizational principles of the sequence/structure map important for evolutionary dynamics. Evolutionary simulations with lattice proteins have highlighted the importance of fitness landscapes, evolutionary mechanisms, population dynamics and sequence space entropy in shaping the generic properties of proteins. Finally, evolutionary-like simulations with all-atom models, in particular computational protein design, have helped identify the dominant selection pressures on naturally occurring protein sequences and structures.

    View details for DOI 10.1016/

    View details for Web of Science ID 000221340700012

    View details for PubMedID 15093835

  • Funnel-like organization in sequence space determines the distributions of protein stability and folding rate preferred by evolution PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS Xia, Y., Levitt, M. 2004; 55 (1): 107-114


    To understand the physical and evolutionary determinants of protein folding, we map out the complete organization of thermodynamic and kinetic properties for protein sequences that share the same fold. The exhaustive nature of our study necessitates using simplified models of protein folding. We obtain a stability map and a folding rate map in sequence space. Comparison of the two maps reveals a common organizational principle: optimality decreases more or less uniformly with distance from the optimal sequence in the sequence space. This gives a funnel-shaped optimality surface. Evolutionary dynamics of a sequence population on these two maps reveal how the simple organization of sequence space affects the distributions of stability and folding rate preferred by evolution.

    View details for DOI 10.1002/prot.10563

    View details for Web of Science ID 000220980300011

    View details for PubMedID 14997545

  • The ASTRAL Compendium in 2004 NUCLEIC ACIDS RESEARCH Chandonia, J. M., Hon, G., Walker, N. S., Lo Conte, L., Koehl, P., Levitt, M., Brenner, S. E. 2004; 32: D189-D192


    The ASTRAL Compendium provides several databases and tools to aid in the analysis of protein structures, particularly through the use of their sequences. Partially derived from the SCOP database of protein structure domains, it includes sequences for each domain and other resources useful for studying these sequences and domain structures. The current release of ASTRAL contains 54,745 domains, more than three times as many as the initial release 4 years ago. ASTRAL has undergone major transformations in the past 2 years. In addition to several complete updates each year, ASTRAL is now updated on a weekly basis with preliminary classifications of domains from newly released PDB structures. These classifications are available as a stand-alone database, as well as integrated into other ASTRAL databases such as representative subsets. To enhance the utility of ASTRAL to structural biologists, all SCOP domains are now made available as PDB-style coordinate files as well as sequences. In addition to sequences and representative subsets based on SCOP domains, sequences and subsets based on PDB chains are newly included in ASTRAL. Several search tools have been added to ASTRAL to facilitate retrieval of data by individual users and automated methods. ASTRAL may be accessed at http://astral.stanford. edu/.

    View details for DOI 10.1093/nar/gkh034

    View details for Web of Science ID 000188079000043

    View details for PubMedID 14681391

  • Funnel sculpting for in silico assembly of secondary structure elements of proteins PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Fain, B., Levitt, M. 2003; 100 (19): 10700-10705


    We present a method for designing a funnel-shaped free-energy surface that reproducibly assembles secondary structure elements of proteins into their native conformations from a random extended configuration. Assuming a priori knowledge of secondary structure, our method can design a funnel-shaped surface for folding of alpha, beta, and alphabeta structures individually. We design energy surfaces that fold up to five unrelated sequences with the same energy parameters. We develop a measure of the foldability of an energy landscape in silico and present an alternative way to view energy landscapes.

    View details for DOI 10.1073/pnas.1732312100

    View details for Web of Science ID 000185415300025

    View details for PubMedID 12925740

  • A novel approach to decoy set generation: Designing a physical energy function having local minima with native structure characteristics JOURNAL OF MOLECULAR BIOLOGY Keasar, C., Levitt, M. 2003; 329 (1): 159-174


    We suggest a new approach to the generation of candidate structures (decoys) for ab initio prediction of protein structures. Our method is based on random sampling of conformation space and subsequent local energy minimization. At the core of this approach lies the design of a novel type of energy function. This energy function has local minima with native structure characteristics and wide basins of attraction. The current work presents our motivation for deriving such an energy function and also tests the derived energy function. Our approach is novel in that it takes advantage of the inherently rough energy landscape of proteins, which is generally considered a major obstacle for protein structure prediction. When local minima have wide basins of attraction, the protein's conformation space can be greatly reduced by the convergence of large regions of the space into single points, namely the local minima corresponding to these funnels. We have implemented this concept by an iterative process. The potential is first used to generate decoy sets and then we study these sets of decoys to guide further development of the potential. A key feature of our potential is the use of cooperative multi-body interactions that mimic the role of the entropic and solvent contributions to the free energy. The validity and value of our approach is demonstrated by applying it to 14 diverse, small proteins. We show that, for these proteins, the size of conformation space is considerably reduced by the new energy function. In fact, the reduction is so substantial as to allow efficient conformational sampling. As a result we are able to find a significant number of near-native conformations in random searches performed with limited computational resources.

    View details for DOI 10.1016/S0022-2836(03)00323-1

    View details for Web of Science ID 000183067000013

    View details for PubMedID 12742025

  • Protein decoy assembly using short fragments under geometric constraints BIOPOLYMERS Kolodny, R., Levitt, M. 2003; 68 (3): 278-285


    A small set of protein fragments can represent adequately all known local protein structure. This set of fragments, along with a construction scheme that assembles these fragments into structures, defines a discrete (relatively small) conformation space, which approximates protein structures accurately. We generate protein decoys by sampling geometrically valid structures from this conformation space, biased by the secondary structure prediction for the protein. Unlike other methods, secondary structure prediction is the only protein-specific information used for generating the decoys. Nevertheless, these decoys are qualitatively similar to those found by others. The method works well for all-alpha proteins, and shows promising results for alpha and beta proteins.

    View details for DOI 10.1002/bip.10262

    View details for Web of Science ID 000181363000006

    View details for PubMedID 12601789

  • Evidence of turn and salt bridge contributions to beta-hairpin stability: MD simulations of C-terminal fragment from the B1 domain of protein G BIOPHYSICAL CHEMISTRY Tsai, J., Levitt, M. 2002; 101: 187-201


    We ran and analyzed a total of eighteen, 10 ns molecular dynamics simulations of two C-terminal beta-hairpins from the B1 domain of Protein G: twelve runs for the last 16 residues and six runs for the last 15 residues, G41-E56 and E42-E56, respectively. Based on their CalphaRMS deviation from the starting structure and the pattern of stabilizing interactions (hydrogen bonds, hydrophobic contacts, and salt bridges), we were able to classify the twelve runs on G41-E56 into one of three general states of the beta-hairpin ensemble: 'Stable', 'Unstable', and 'Unfolded'. Comparing the specific interactions between these states, we find that on average the stable beta-hairpin buries 287 A(2) of hydrophobic surface area, makes 13 hydrogen bonds, and forms 3 salt-bridges. We find that the hydrophobic core prefers to make some specific contacts; however, this core does not require optimal packing. Side-chain hydrogen bonds stabilize the beta-hairpin turn with strong stabilizing interactions primarily due to the carboxyl of D46 with contributions from T49 hydroxyl. Buoyed by the strength of the hydrophobic core, other hydrogen bonds, primarily main-chain, guide the beta-hairpin into registration by forming a loose network of interactions, making an approximately constant number of hydrogen bonds from a pool of possible candidates. In simulations on E42-E56, where the salt bridge closing the termini is not favored, we observe that all the simulations show no 'Stable' behavior, but are 'Unstable' or 'Unfolded'. We can estimate that the salt-bridge between the termini provides approximately 1.3 kcal/mol. Altogether, the results suggest that the beta-hairpin folds beginning at the turn, followed by hydrophobic collapse, and then hydrogen bond formation. Salt bridges help to stabilize the folded conformations by inhibiting unfolded states.

    View details for Web of Science ID 000180165100020

  • Sequence variations within protein families are linearly related to structural variations JOURNAL OF MOLECULAR BIOLOGY Koehl, P., Levitt, M. 2002; 323 (3): 551-562


    It is commonly believed that similarities between the sequences of two proteins infer similarities between their structures. Sequence alignments reliably recognize pairs of protein of similar structures provided that the percentage sequence identity between their two sequences is sufficiently high. This distinction, however, is statistically less reliable when the percentage sequence identity is lower than 30% and little is known then about the detailed relationship between the two measures of similarity. Here, we investigate the inverse correlation between structural similarity and sequence similarity on 12 protein structure families. We define the structure similarity between two proteins as the cRMS distance between their structures. The sequence similarity for a pair of proteins is measured as the mean distance between the sequences in the subsets of sequence space compatible with their structures. We obtain an approximation of the sequence space compatible with a protein by designing a collection of protein sequences both stable and specific to the structure of that protein. Using these measures of sequence and structure similarities, we find that structural changes within a protein family are linearly related to changes in sequence similarity.

    View details for DOI 10.1016/S0022-2836(02)00971-3

    View details for Web of Science ID 000179083800012

    View details for PubMedID 12381308

  • Small libraries of protein fragments model native protein structures accurately JOURNAL OF MOLECULAR BIOLOGY Kolodny, R., Koehl, P., Guibas, L., Levitt, M. 2002; 323 (2): 297-307


    Prediction of protein structure depends on the accuracy and complexity of the models used. Here, we represent the polypeptide chain by a sequence of rigid fragments that are concatenated without any degrees of freedom. Fragments chosen from a library of representative fragments are fit to the native structure using a greedy build-up method. This gives a one-dimensional representation of native protein three-dimensional structure whose quality depends on the nature of the library. We use a novel clustering method to construct libraries that differ in the fragment length (four to seven residues) and number of representative fragments they contain (25-300). Each library is characterized by the quality of fit (accuracy) and the number of allowed states per residue (complexity). We find that the accuracy depends on the complexity and varies from 2.9A for a 2.7-state model on the basis of fragments of length 7-0.76A for a 15-state model on the basis of fragments of length 5. Our goal is to find representations that are both accurate and economical (low complexity). The models defined here are substantially better in this regard: with ten states per residue we approximate native protein structure to 1A compared to over 20 states per residue needed previously. For the same complexity, we find that longer fragments provide better fits. Unfortunately, libraries of longer fragments must be much larger (for ten states per residue, a seven-residue library is 100 times larger than a five-residue library). As the number of known protein native structures increases, it will be possible to construct larger libraries to better exploit this correlation between neighboring residues. Our fragment libraries, which offer a wide range of optimal fragments suited to different accuracies of fit, may prove to be useful for generating better decoy sets for ab initio protein folding and for generating accurate loop conformations in homology modeling.

    View details for DOI 10.1016/S0022-2836(02)00942-7

    View details for Web of Science ID 000178976500011

    View details for PubMedID 12381322

  • Roles of mutation and recombination in the evolution of protein thermodynamics PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Xia, Y., Levitt, M. 2002; 99 (16): 10382-10387


    We present a comprehensive study of the evolutionary origin of the thermodynamic behavior of proteins. With the use of a simplified model, we exhaustively enumerate the space of all sequences and the space of all structures, simulate the evolutionary relationship between sequences and structures, and characterize the steady-state sequence distribution for all structures in terms of several thermodynamic variables. We assess the effects of two major forces of evolution: mutation and recombination. Three simplifications are made. First, a two-dimensional lattice model is used to represent protein sequences and structures. Second, proteins undergo neutral evolution so that the fitness landscape has a flat allowed region inside of which all sequences are equally fit. Third, we ignore otherwise important factors such as finite population size and evolutionary time. Two scenarios emerge from our study. The first occurs when evolution is dominated by mutation events. Even though the prototype sequence that is most mutationally robust is preferred by evolution, the preference is not strong enough to offset the huge size of sequence space. Most native sequences are located near the boundary of the fitness region and are marginally compatible with the native structure. The second scenario occurs when evolution is dominated by recombination events. Now evolutionary preference for prototype sequence is strong enough to overcome the size of sequence space so that most native sequences are located near the center of sequence-structure compatibility. We conclude that the relative frequency of mutation and recombination events is a major determinant of how optimal protein sequences are for their structures.

    View details for DOI 10.1073/pnas.162097799

    View details for Web of Science ID 000177343200032

    View details for PubMedID 12149452

  • Design of an optimal Chebyshev-expanded discrimination function for globular proteins PROTEIN SCIENCE Fain, B., Xia, Y., Levitt, M. 2002; 11 (8): 2010-2021


    We describe the construction of a scoring function designed to model the free energy of protein folding. An optimization technique is used to determine the best functional forms of the hydrophobic, residue-residue and hydrogen-bonding components of the potential. The scoring function is expanded by use of Chebyshev polynomials, the coefficients of which are determined by minimizing the score, in units of standard deviation, of native structures in the ensembles of alternate decoy conformations. The derived effective potential is then tested on decoy sets used conventionally in such studies. Using our scoring function, we achieve a high level of discrimination between correct and incorrect folds. In addition, our method is able to represent functions of arbitrary shape with fewer parameters than the usual histogram potentials of similar resolution. Finally, our representation can be combined easily with many optimization methods, because the total energy is a linear function of the parameters. Our results show that the techniques of Z-score optimization and Chebyshev expansion work well.

    View details for DOI 10.1110/ps.0200702

    View details for Web of Science ID 000177036500015

    View details for PubMedID 12142455

  • A comprehensive analysis of 40 blind protein structure predictions. BMC structural biology Samudrala, R., Levitt, M. 2002; 2: 3-?


    We thoroughly analyse the results of 40 blind predictions for which an experimental answer was made available at the fourth meeting on the critical assessment of protein structure methods (CASP4). Using our comparative modelling and fold recognition methodologies, we made 29 predictions for targets that had sequence identities ranging from 50% to 10% to the nearest related protein with known structure. Using our ab initio methodologies, we made eleven predictions for targets that had no detectable sequence relationships.For 23 of these proteins, we produced models ranging from 1.0 to 6.0 A root mean square deviation (RMSD) for the Calpha atoms between the model and the corresponding experimental structure for all or large parts of the protein, with model accuracies scaling fairly linearly with respect to sequence identity (i.e., the higher the sequence identity, the better the prediction). We produced nine models with accuracies ranging from 4.0 to 6.0 A Calpha RMSD for 60-100 residue proteins (or large fragments of a protein), with a prediction accuracy of 4.0 A Calpha RMSD for residues 1-80 for T110/rbfa.The areas of protein structure prediction that work well, and areas that need improvement, are discernable by examining how our methods have performed over the past four CASP experiments. These results have implications for modelling the structure of all tractable proteins encoded by the genome of an organism.

    View details for PubMedID 12150712

  • Protein topology and stability define the space of allowed sequences PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Koehl, P., Levitt, M. 2002; 99 (3): 1280-1285


    We describe a new approach to explore and quantify the sequence space associated with a given protein structure. A set of sequences are optimized for a given target structure, using all-atom models and a physical energy function. Specificity of the sequence for its target is ensured by using the random energy model, which keeps the amino acid composition of the sequence constant. The designed sequences provide a multiple sequence alignment that describes the sequence space compatible with the structure of interest; here the size of this space is estimated by using an information entropy measure. In parallel, multiple alignments of naturally occurring sequences can be derived by using either sequence or structure alignments. We compared these 3 independent multiple sequence alignments for 10 different proteins, ranging in size from 56 to 310 residues. We observed that the subset of the sequence space derived by using our design procedure is similar in size to the sequence spaces observed in nature. These results suggest that the volume of sequence space compatible with a given protein fold is defined by the length of the protein as well as by the topology (i.e., geometry of the polypeptide chain) and the stability (i.e., free energy of denaturation) of the fold.

    View details for Web of Science ID 000173752500035

    View details for PubMedID 11805293

  • Within the twilight zone: A sensitive profile-profile comparison tool based on information theory JOURNAL OF MOLECULAR BIOLOGY Yona, G., Levitt, M. 2002; 315 (5): 1257-1275


    This paper presents a novel approach to profile-profile comparison. The method compares two input profiles (like those that are generated by PSI-BLAST) and assigns a similarity score to assess their statistical similarity. Our profile-profile comparison tool, which allows for gaps, can be used to detect weak similarities between protein families. It has also been optimized to produce alignments that are in very good agreement with structural alignments. Tests show that the profile-profile alignments are indeed highly correlated with similarities between secondary structure elements and tertiary structure. Exhaustive evaluations show that our method is significantly more sensitive in detecting distant homologies than the popular profile-based search programs PSI-BLAST and IMPALA. The relative improvement is the same order of magnitude as the improvement of PSI-BLAST relative to BLAST. Our new tool often detects similarities that fall within the twilight zone of sequence similarity.

    View details for DOI 10.1006/jmbi.2001.5293

    View details for Web of Science ID 000173867900026

    View details for PubMedID 11827492

  • Improved recognition of native-like protein structures using a family of designed sequences PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Koehl, P., Levitt, M. 2002; 99 (2): 691-696


    The goal of the inverse protein folding problem is to identify amino acid sequences that stabilize a given target protein conformation. Methods that attempt to solve this problem have proven useful for protein sequence design. Here we show that the same methods can provide valuable information for protein fold recognition and for ab initio protein structure prediction. We present a measure of the compatibility of a test sequence with a target model structure, based on computational protein design. The model structure is used as input to design a family of low free energy sequences, and these sequences are compared with the test sequence by using a metric in sequence space based on nearest-neighbor connectivity. We find that this measure is able to recognize the native fold of a myoglobin sequence among different globin folds. It is also powerful enough to recognize near-native protein structures among non-native models.

    View details for Web of Science ID 000173450100030

    View details for PubMedID 11782533

  • ASTRAL compendium enhancements NUCLEIC ACIDS RESEARCH Chandonia, J. M., Walker, N. S., Conte, L. L., Koehl, P., Levitt, M., Brenner, S. E. 2002; 30 (1): 260-263


    The ASTRAL compendium provides several databases and tools to aid in the analysis of protein structures, particularly through the use of their sequences. It is partially derived from the SCOP database of protein domains, and it includes sequences for each domain as well as other resources useful for studying these sequences and domain structures. Several major improvements have been made to the ASTRAL compendium since its initial release 2 years ago. The number of protein domain sequences included has doubled from 15 190 to 30 867, and additional databases have been added. The Rapid Access Format (RAF) database contains manually curated mappings linking the biological amino acid sequences described in the SEQRES records of PDB entries to the amino acid sequences structurally observed (provided in the ATOM records) in a format designed for rapid access by automated tools. This information is used to derive sequences for protein domains in the SCOP database. In cases where a SCOP domain spans several protein chains, all of which can be traced back to a single genetic source, a 'genetic domain' sequence is created by concatenating the sequences of each chain in the order found in the original gene sequence. Both the original-style library of SCOP sequences and a new library including genetic domain sequences are available. Selected representative subsets of each of these libraries, based on multiple criteria and degrees of similarity, are also included. ASTRAL may be accessed at

    View details for Web of Science ID 000173077100070

    View details for PubMedID 11752310

  • Peter Kollman - Obituary NATURE STRUCTURAL BIOLOGY Levitt, M., Daggett, V. 2001; 8 (8): 662-662
  • Quantification of the hydrophobic interaction by simulations of the aggregation of small hydrophobic solutes in water PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Raschke, T. M., Tsai, J., Levitt, M. 2001; 98 (11): 5965-5969


    The hydrophobic interaction, the tendency for nonpolar molecules to aggregate in solution, is a major driving force in biology. In a direct approach to the physical basis of the hydrophobic effect, nanosecond molecular dynamics simulations were performed on increasing numbers of hydrocarbon solute molecules in water-filled boxes of different sizes. The intermittent formation of solute clusters gives a free energy that is proportional to the loss in exposed molecular surface area with a constant of proportionality of 45 +/- 6 cal/mol A(2). The molecular surface area is the envelope of the solute cluster that is impenetrable by solvent and is somewhat smaller than the more traditional solvent-accessible surface area, which is the area transcribed by the radius of a solvent molecule rolled over the surface of the cluster. When we apply a factor relating molecular surface area to solvent-accessible surface area, we obtain 24 cal/mol A(2). Ours is the first direct calculation, to our knowledge, of the hydrophobic interaction from molecular dynamics simulations; the excellent qualitative and quantitative agreement with experiment proves that simple van der Waals interactions and atomic point-charge electrostatics account for the most important driving force in biology.

    View details for Web of Science ID 000168883700008

    View details for PubMedID 11353861

  • Determination of optimal Chebyshev-expanded hydrophobic discrimination function for globular proteins IBM JOURNAL OF RESEARCH AND DEVELOPMENT Fain, B., Xia, Y., Levitt, M. 2001; 45 (3-4): 525-532
  • The birth of computational structural biology NATURE STRUCTURAL BIOLOGY Levitt, M. 2001; 8 (5): 392-393

    View details for Web of Science ID 000168315300008

    View details for PubMedID 11323711

  • A novel method for sampling alpha-helical protein backbones JOURNAL OF MOLECULAR BIOLOGY Fain, B., Levitt, M. 2001; 305 (2): 191-201


    We present a novel technique of sampling the configurations of helical proteins. Assuming knowledge of native secondary structure, we employ assembly rules gathered from a database of existing structures to enumerate the geometrically possible three-dimensional arrangements of the constituent helices. We produce a library of possible folds for 25 helical protein cores. In each case, our method finds significant numbers of conformations close to the native structure. In addition, we assign coordinates to all atoms for four of the 25 proteins and show that this has a small effect on the number of near-native conformations. In the context of database driven exhaustive enumeration our method performs extremely well, yielding significant percentages of conformations (between 0.02% and 82%) within 6 A of the native structure. The method's speed and efficiency make it a valuable tool for predicting protein structure.

    View details for DOI 10.1006/jmbi.2000.4290

    View details for Web of Science ID 000166413600002

    View details for PubMedID 11124899

  • De novo protein design 4th NATO Advanced-Study-Institute on Dynamics, Structure and Function of Biological Macromolecules Koehl, P., Levitt, M. I O S PRESS. 2001: 57–75
  • Extracting knowledge-based energy functions from protein structures by error rate minimization: Comparison of methods using lattice model JOURNAL OF CHEMICAL PHYSICS Xia, Y., Levitt, M. 2000; 113 (20): 9318-9330
  • Constructing side chains on near-native main chains for ab initio protein structure prediction PROTEIN ENGINEERING Samudrala, R., Huang, E. S., Koehl, P., Levitt, M. 2000; 13 (7): 453-457


    Is there value in constructing side chains while searching protein conformational space during an ab initio simulation? If so, what is the most computationally efficient method for constructing these side chains? To answer these questions, four published approaches were used to construct side chain conformations on a range of near-native main chains generated by ab initio protein structure prediction methods. The accuracy of these approaches was compared with a naive approach that selects the most frequently observed rotamer for a given amino acid to construct side chains. An all-atom conditional probability discriminatory function is useful at selecting conformations with overall low all-atom root mean square deviation (r.m.s.d.) and the discrimination improves on sets that are closer to the native conformation. In addition, the naive approach performs as well as more sophisticated methods in terms of the percentage of chi(1) angles built accurately and the all-atom r. m.s.d., between the native and near-native conformations. The results suggest that the naive method would be extremely useful for fast and efficient side chain construction on vast numbers of conformations for ab initio prediction of protein structure.

    View details for Web of Science ID 000088743100001

    View details for PubMedID 10906341

  • Decoys 'R' Us: A database of incorrect conformations to improve protein structure prediction PROTEIN SCIENCE Samudrala, R., Levitt, M. 2000; 9 (7): 1399-1401


    The development of an energy or scoring function for protein structure prediction is greatly enhanced by testing the function on a set of computer-generated conformations (decoys) to determine whether it can readily distinguish native-like conformations from nonnative ones. We have created "Decoys 'R' Us," a database containing many such sets of conformations, to provide a resource that allows scoring functions to be improved.

    View details for Web of Science ID 000088376100017

    View details for PubMedID 10933507

  • Ab initio construction of protein tertiary structures using a hierarchical approach JOURNAL OF MOLECULAR BIOLOGY Xia, Y., Huang, E. S., Levitt, M., Samudrala, R. 2000; 300 (1): 171-185


    We present a hierarchical method to predict protein tertiary structure models from sequence. We start with complete enumeration of conformations using a simple tetrahedral lattice model. We then build conformations with increasing detail, and at each step select a subset of conformations using empirical energy functions with increasing complexity. After enumeration on lattice, we select a subset of low energy conformations using a statistical residue-residue contact energy function, and generate all-atom models using predicted secondary structure. A combined knowledge-based atomic level energy function is then used to select subsets of the all-atom models. The final predictions are generated using a consensus distance geometry procedure. We test the feasibility of the procedure on a set of 12 small proteins covering a wide range of protein topologies. A rigorous double-blind test of our method was made under the auspices of the CASP3 experiment, where we did ab initio structure predictions for 12 proteins using this approach. The performance of our methodology at CASP3 is reasonably good and completely consistent with our initial tests.

    View details for DOI 10.1006/jmbi.2000.3835

    View details for Web of Science ID 000087980600015

    View details for PubMedID 10864507

  • Towards a complete map of the protein space based on a unified sequence and structure analysis of all known proteins. Proceedings. International Conference on Intelligent Systems for Molecular Biology Yona, G., Levitt, M. 2000; 8: 395-406


    In search for global principles that may explain the organization of the space of all possible proteins, we study all known protein sequences and structures. In this paper we present a global map of the protein space based on our analysis. Our protein space contains all protein sequences in a non-redundant (NR) database, which includes all major sequence databases. Using the PSI-BLAST procedure we defined 4,670 clusters of related sequences in this space. Of these clusters, 1,421 are centered on a sequence of known structure. All 4,670 clusters were then compared using either a structure metric (when 3D structures are known) or a novel sequence profile metric. These scores were used to define a unified and consistent metric between all clusters. Two schemes were employed to organize these clusters in a meta-organization. The first uses a graph theory method and cluster the clusters in an hierarchical organization. This organization extends our ability to predict the structure and function of many proteins beyond what is possible with existing tools for sequence analysis. The second uses a variation on a multidimensional scaling technique to embed the clusters in a low dimensional real space. This last approach resulted in a projection of the protein space onto a 2D plane that provides us with a bird's eye view of the protein space. Based on this map we suggest a list of possible target sequences with unknown structure that are likely to adopt new, unknown folds.

    View details for PubMedID 10977100

  • Expectations from structural genomics PROTEIN SCIENCE Brenner, S. E., Levitt, M. 2000; 9 (1): 197-200


    Structural genomics projects aim to provide an experimental structure or a good model for every protein in all completed genomes. Most of the experimental work for these projects will be directed toward proteins whose fold cannot be readily recognized by simple sequence comparison with proteins of known structure. Based on the history of proteins classified in the SCOP structure database, we expect that only about a quarter of the early structural genomics targets will have a new fold. Among the remaining ones, about half are likely to be evolutionarily related to proteins of known structure, even though the homology could not be readily detected by sequence analysis.

    View details for Web of Science ID 000085042500023

    View details for PubMedID 10739263

  • The ASTRAL compendium for protein structure and sequence analysis NUCLEIC ACIDS RESEARCH Brenner, S. E., Koehl, P., Levitt, R. 2000; 28 (1): 254-256


    The ASTRAL compendium provides several databases and tools to aid in the analysis of protein structures, particularly through the use of their sequences. The SPACI scores included in the system summarize the overall characteristics of a protein structure. A structural alignments database indicates residue equivalencies in superimposed protein domain structures. The PDB sequence-map files provide a linkage between the amino acid sequence of the molecule studied (SEQRES records in a database entry) and the sequence of the atoms experimentally observed in the structure (ATOM records). These maps are combined with information in the SCOPdatabase to provide sequences of protein domains. Selected subsets of the domain database, with varying degrees of similarity measured in several different ways, are also available. ASTRALmay be accessed at

    View details for Web of Science ID 000084896300073

    View details for PubMedID 10592239

  • Probing structure-function relationships of the DNA polymerase alpha-associated zinc-finger protein using computational approaches. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Samudrala, R., Xia, Y., Levitt, M., Cotton, N. J., Huang, E. S., Davis, R. 2000: 179-190


    We present the application of a method for protein structure prediction to aid the determination of structure-function relationships by experiment. The structure prediction method was rigourously tested by making blind predictions at the third meeting on the Critical Assessment of Protein Structure methods (CASP3). The method is a combined hierarchical approach involving exhaustive enumeration of all possible folds of a small protein sequence on a tetrahedral lattice. A set of filters, primarily in the form of discriminatory functions, are applied to these conformations. As the filters are applied, greater detail is added to the models resulting in a handful of all-atom "final" conformations. Encouraged by the results at CASP3, we used our approach to help solve a practical biological problem: the prediction of the structure and function of the 67-residue C-terminal zinc-finger region of the DNA polymerase alpha-associated zinc-finger (PAZ) protein. We discuss how the prediction points to a novel function relative to the sequence homologs, in conjunction with evidence from experiment, and how the predicted structure is guiding further experimental studies. This work represents a move from the theoretical realm to actual application of structure prediction methods for gaining unique insight to guide experimental biologists.

    View details for PubMedID 10902167

  • De novo protein design. I. In search of stability and specificity JOURNAL OF MOLECULAR BIOLOGY Koehl, P., Levitt, M. 1999; 293 (5): 1161-1181


    We have developed a fully automated protein design strategy that works on the entire sequence of the protein and uses a full atom representation. At each step of the procedure, an all-atom model of the protein is built using the template protein structure and the current designed sequence. The energy of the model is used to drive a Monte Carlo optimization in sequence space: random moves are either accepted or rejected based on the Metropolis criterion. We rely on the physical forces that stabilize native protein structures to choose the optimum sequence. Our energy function includes van der Waals interactions, electrostatics and an environment free energy. Successful protein design should be specific and generate a sequence compatible with the template fold and incompatible with competing folds. We impose specificity by maintaining the amino acid composition constant, based on the random energy model. The specificity of the optimized sequence is tested by fold recognition techniques. Successful sequence designs for the B1 domain of protein G, for the lambda repressor and for sperm whale myoglobin are presented. We show that each additional term of the energy function improves the performance of our design procedure: the van der Waals term ensures correct packing, the electrostatics term increases the specificity for the correct native fold, and the environment solvation term ensures a correct pattern of buried hydrophobic and exposed hydrophilic residues. For the globin family, we show that we can design a protein sequence that is stable in the myoglobin fold, yet incompatible with the very similar hemoglobin fold.

    View details for Web of Science ID 000083798400015

    View details for PubMedID 10547293

  • De novo protein design. II. Plasticity in sequence space JOURNAL OF MOLECULAR BIOLOGY Koehl, P., Levitt, M. 1999; 293 (5): 1183-1193


    It is generally accepted that many different protein sequences have similar folded structures, and that there is a relatively high probability that a new sequence possesses a previously observed fold. An indirect consequence of this is that protein design should define the sequence space accessible to a given structure, rather than providing a single optimized sequence. We have recently developed a new approach for protein sequence design, which optimizes the complete sequence of a protein based on the knowledge of its backbone structure, its amino acid composition and a physical energy function including van der Waals interactions, electrostatics, and environment free energy. The specificity of the designed sequence for its template backbone is imposed by keeping the amino acid composition fixed. Here, we show that our procedure converges in sequence space, albeit not to the native sequence of the protein. We observe that while polar residues are well conserved in our designed sequences, non-polar amino acids at the surface of a protein are often replaced by polar residues. The designed sequences provide a multiple alignment of sequences that all adopt the same three-dimensional fold. This alignment is used to derive a profile matrix for chicken triose phosphate isomerase, TIM. The matrix is found to recognize significantly the native sequence for TIM, as well as closely related sequences. Possible application of this approach to protein fold recognition is discussed.

    View details for Web of Science ID 000083798400016

    View details for PubMedID 10547294

  • Structure-based conformational preferences of amino acids PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Koehl, P., Levitt, M. 1999; 96 (22): 12524-12529


    Proteins can be very tolerant to amino acid substitution, even within their core. Understanding the factors responsible for this behavior is of critical importance for protein engineering and design. Mutations in proteins have been quantified in terms of the changes in stability they induce. For example, guest residues in specific secondary structures have been used as probes of conformational preferences of amino acids, yielding propensity scales. Predicting these amino acid propensities would be a good test of any new potential energy functions used to mimic protein stability. We have recently developed a protein design procedure that optimizes whole sequences for a given target conformation based on the knowledge of the template backbone and on a semiempirical potential energy function. This energy function is purely physical, including steric interactions based on a Lennard-Jones potential, electrostatics based on a Coulomb potential, and hydrophobicity in the form of an environment free energy based on accessible surface area and interatomic contact areas. Sequences designed by this procedure for 10 different proteins were analyzed to extract conformational preferences for amino acids. The resulting structure-based propensity scales show significant agreements with experimental propensity scale values, both for alpha-helices and beta-sheets. These results indicate that amino acid conformational preferences are a natural consequence of the potential energy we use. This confirms the accuracy of our potential and indicates that such preferences should not be added as a design criterion.

    View details for Web of Science ID 000083373000060

    View details for PubMedID 10535955

  • Hierarchy of structure loss in MD simulations of src SH3 domain unfolding JOURNAL OF MOLECULAR BIOLOGY Tsai, J., Levitt, M., Baker, D. 1999; 291 (1): 215-225


    To complement experimental studies of the src SH3 domain folding, we studied 30 independent, high-temperature, molecular dynamics simulations of src SH3 domain unfolding. These trajectories were observed to differ widely from each other. Thus, rather than analyzing individual trajectories, we sought to identify the recurrent features of the high-temperature unfolding process. The conformations from all simulations were combined and then divided into groups based on the number of native contacts. Average occupancies of each side-chain hydrophobic contact and hydrogen bond in the protein were then determined. In the symmetric funnel limit, the occupancies of all contacts should decrease in concert with the loss in total number of native contacts. If there is a lack of symmetry or hierarchy to the unfolding process, the occupancies of some contacts should decrease more slowly, and others more rapidly. Despite the heterogeneity of the individual trajectories, the ensemble averaging revealed an order to the unfolding process: contacts between the N and C-terminal strands are the first to disappear, whereas contacts within the distal beta-hairpin and a hydrogen-bonding network involving the distal loop beta-turn and the diverging turn persist well after the majority of the native contacts are lost. This hierarchy of events resembles but is somewhat less pronounced than that observed in our experimental studies of the folding of src SH3 domain.

    View details for Web of Science ID 000081903500015

    View details for PubMedID 10438616

  • Theory and simulation - Can theory challenge experiment? Editorial overview CURRENT OPINION IN STRUCTURAL BIOLOGY Koehl, P., Levitt, M. 1999; 9 (2): 155-156

    View details for Web of Science ID 000085219800001

    View details for PubMedID 10465610

  • A brighter future for protein structure prediction. Nature structural biology Koehl, P., Levitt, M. 1999; 6 (2): 108-111

    View details for PubMedID 10048917

  • A combined approach for ab initio construction of low resolution protein tertiary structures from sequence. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Samudrala, R., Xia, Y., Levitt, M., Huang, E. S. 1999: 505-516


    An approach to construct low resolution models of protein structure from sequence information using a combination of different methodologies is described. All possible compact self-avoiding C alpha conformations (approximately 10 million) of a small protein chain were exhaustively enumerated on a tetrahedral lattice. The best scoring 10,000 conformations were selected using a lattice-based scoring function. All-atom structures were then generated by fitting an off-lattice four-state phi/psi model to the lattice conformations, using idealised helix and sheet values based on predicted secondary structure. The all-atom conformations were minimised using ENCAD and scored using a second hybrid scoring function. The best scoring 50, 100, and 500 conformations were input to a consensus-based distance geometry routine that used constraints from each the conformation sets and produced a single structure for each set (total of three). Secondary structures were again fitted to the three structures, and the resulting structures were minimised and scored. The lowest scoring conformation was taken to be the "correct" answer. The results of application of this method to twelve proteins are presented.

    View details for PubMedID 10380223

  • Ab initio protein structure prediction using a combined hierarchical approach 3rd Meeting on the Critical Assessment of Techniques for Protein Structure Prediction (CASP3) Samudrala, R., Xia, Y., Huang, E., Levitt, M. WILEY-BLACKWELL. 1999: 194–198


    As part of the third Critical Assessment of Structure Prediction meeting (CASP3), we predict the three-dimensional structures for 13 proteins using a hierarchical approach. First, all possible compact conformations of a protein sequence are enumerated using a highly simplified tetrahedral lattice model. We select a large subset of these conformations using a lattice-based scoring function and build detailed all-atom models incorporating predicted secondary structure. A combined all-atom knowledge-based scoring function is then used to select three smaller subsets from these all-atom models. Finally, a consensus-based distance geometry procedure is used to generate the best conformations from each of the all-atom subsets. With this method, we are able to predict the global topology/shape for all or a large part of the sequence for six out of the thirteen proteins. For two other proteins, the topology/shape for shorter fragments are predicted. This represents a marked improvement in ab initio prediction since CASP was first instigated in 1994.

    View details for Web of Science ID 000082804100024

    View details for PubMedID 10526368

  • The PRESAGE database for structural genomics NUCLEIC ACIDS RESEARCH Brenner, S. E., Barken, D., Levitt, M. 1999; 27 (1): 251-253


    The PRESAGE database is a collaborative resource for structural genomics. It provides a database of proteins to which researchers add annotations indicating current experimental status, structural predictions and suggestions. The database is intended to enhance communication among structural genomics researchers and aid dissemination of their results. The PRESAGE database may be accessed at

    View details for Web of Science ID 000077983000064

    View details for PubMedID 9847193

  • Accuracy of side-chain prediction upon near-native protein backbones generated by ab initio folding methods PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS Huang, E. S., Koehl, P., Levitt, M., Pappu, R. V., Ponder, J. W. 1998; 33 (2): 204-217


    The ab initio folding problem can be divided into two sequential tasks of approximately equal computational complexity: the generation of native-like backbone folds and the positioning of side chains upon these backbones. The prediction of side-chain conformation in this context is challenging, because at best only the near-native global fold of the protein is known. To test the effect of displacements in the protein backbones on side-chain prediction for folds generated ab initio, sets of near-native backbones (< or = 4 A C alpha RMS error) for four small proteins were generated by two methods. The steric environment surrounding each residue was probed by placing the side chains in the native conformation on each of these decoys, followed by torsion-space optimization to remove steric clashes on a rigid backbone. We observe that on average 40% of the chi1 angles were displaced by 40 degrees or more, effectively setting the limits in accuracy for side-chain modeling under these conditions. Three different algorithms were subsequently used for prediction of side-chain conformation. The average prediction accuracy for the three methods was remarkably similar: 49% to 51% of the chi1 angles were predicted correctly overall (33% to 36% of the chi1+2 angles). Interestingly, when the inter-side-chain interactions were disregarded, the mean accuracy increased. A consensus approach is described, in which side-chain conformations are defined based on the most frequently predicted chi angles for a given method upon each set of near-native backbones. We find that consensus modeling, which de facto includes backbone flexibility, improves side-chain prediction: chi1 accuracy improved to 51-54% (36-42% of chi1+2). Implications of a consensus method for ab initio protein structure prediction are discussed.

    View details for Web of Science ID 000076257100005

    View details for PubMedID 9779788

  • Simulating water and the molecules of life SCIENTIFIC AMERICAN Gerstein, M., Levitt, M. 1998; 279 (5): 100-105

    View details for Web of Science ID 000076499900029

    View details for PubMedID 9841379

  • A unified statistical framework for sequence comparison and structure comparison Colloquium on Computational Biomolecular Science Levitt, M., Gerstein, M. NATL ACAD SCIENCES. 1998: 5913–20


    We present an approach for assessing the significance of sequence and structure comparisons by using nearly identical statistical formalisms for both sequence and structure. Doing so involves an all-vs.-all comparison of protein domains [taken here from the Structural Classification of Proteins (scop) database] and then fitting a simple distribution function to the observed scores. By using this distribution, we can attach a statistical significance to each comparison score in the form of a P value, the probability that a better score would occur by chance. As expected, we find that the scores for sequence matching follow an extreme-value distribution. The agreement, moreover, between the P values that we derive from this distribution and those reported by standard programs (e.g., BLAST and FASTA validates our approach. Structure comparison scores also follow an extreme-value distribution when the statistics are expressed in terms of a structural alignment score (essentially the sum of reciprocated distances between aligned atoms minus gap penalties). We find that the traditional metric of structural similarity, the rms deviation in atom positions after fitting aligned atoms, follows a different distribution of scores and does not perform as well as the structural alignment score. Comparison of the sequence and structure statistics for pairs of proteins known to be related distantly shows that structural comparison is able to detect approximately twice as many distant relationships as sequence comparison at the same error rate. The comparison also indicates that there are very few pairs with significant similarity in terms of sequence but not structure whereas many pairs have significant similarity in terms of structure but not sequence.

    View details for Web of Science ID 000073852600013

    View details for PubMedID 9600892

  • Comprehensive assessment of automatic structural alignment against a manual standard, the scop classification of proteins PROTEIN SCIENCE Gerstein, M., Levitt, M. 1998; 7 (2): 445-456


    We apply a simple method for aligning protein sequences on the basis of a 3D structure, on a large scale, to the proteins in the scop classification of fold families. This allows us to assess, understand, and improve our automatic method against an objective, manually derived standard, a type of comprehensive evaluation that has not yet been possible for other structural alignment algorithms. Our basic approach directly matches the backbones of two structures, using repeated cycles of dynamic programming and least-squares fitting to determine an alignment minimizing coordinate difference. Because of simplicity, our method can be readily modified to take into account additional features of protein structure such as the orientation of side chains or the location-dependent cost of opening a gap. Our basic method, augmented by such modifications, can find reasonable alignments for all but 1.5% of the known structural similarities in scop, i.e., all but 32 of the 2,107 superfamily pairs. We discuss the specific protein structural features that make these 32 pairs so difficult to align and show how our procedure effectively partitions the relationships in scop into different categories, depending on what aspects of protein structure are involved (e.g., depending on whether or not consideration of side-chain orientation is necessary for proper alignment). We also show how our pairwise alignment procedure can be extended to generate a multiple alignment for a group of related structures. We have compared these alignments in detail with corresponding manual ones culled from the literature. We find good agreement (to within 95% for the core regions), and detailed comparison highlights how particular protein structural features (such as certain strands) are problematical to align, giving somewhat ambiguous results. With these improvements and systematic tests, our procedure should be useful for the development of scop and the future classification of protein folds.

    View details for Web of Science ID 000072056400026

    View details for PubMedID 9521122

  • Simulating the minimum core for hydrophobic collapse in globular proteins PROTEIN SCIENCE Tsai, J., Gerstein, M., Levitt, M. 1997; 6 (12): 2606-2616


    To investigate the nature of hydrophobic collapse considered to be the driving force in protein folding, we have simulated aqueous solutions of two model hydrophobic solutes, methane and isobutylene. Using a novel methodology for determining contacts, we can precisely follow hydrophobic aggregation as it proceeds through three stages: dispersed, transition, and collapsed. Theoretical modeling of the cluster formation observed by simulation indicates that this aggregation is cooperative and that the simulations favor the formation of a single cluster midway through the transition stage. This defines a minimum solute hydrophobic core volume. We compare this with protein hydrophobic core volumes determined from solved crystal structures. Our analysis shows that the solute core volume roughly estimates the minimum core size required for independent hydrophobic stabilization of a protein and defines a limiting concentration of nonpolar residues that can cause hydrophobic collapse. These results suggest that the physical forces driving aggregation of hydrophobic molecules in water is indeed responsible for protein folding.

    View details for Web of Science ID A1997YK91200012

    View details for PubMedID 9416609

  • A structural census of the current population of protein sequences PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Gerstein, M., Levitt, M. 1997; 94 (22): 11911-11916


    We examine the occurrence of the approximately 300 known protein folds in different groups of organisms. To do this, we characterize a large fraction of the currently known protein sequences ( approximately 140,000) in structural terms, by matching them to known structures via sequence comparison (or by secondary-structure class prediction for those without structural homologues). Overall, we find that an appreciable fraction of the known folds are present in each of the major groups of organisms (e.g., bacteria and eukaryotes share 156 of 275 folds), and most of the common folds are associated with many families of nonhomologous sequences (i.e., >10 sequence families for each common fold). However, different groups of organisms have characteristically distinct distributions of folds. So, for instance, some of the most common folds in vertebrates, such as globins or zinc fingers, are rare or absent in bacteria. Many of these differences in fold usage are biologically reasonable, such as the folds of metabolic enzymes being common in bacteria and those associated with extracellular transport and communication being common in animals. They also have important implications for database-based methods for fold recognition, suggesting that an unknown sequence from a plant is more likely to have a certain fold (e.g., a TIM barrel) than an unknown sequence from an animal.

    View details for Web of Science ID A1997YD50600031

    View details for PubMedID 9342336

  • Calibration and testing of a water model for simulation of the molecular dynamics of proteins and nucleic acids in solution JOURNAL OF PHYSICAL CHEMISTRY B Levitt, M., Hirshberg, M., Sharon, R., Laidig, K. E., Daggett, V. 1997; 101 (25): 5051-5061
  • Factors affecting the ability of energy functions to discriminate correct from incorrect folds JOURNAL OF MOLECULAR BIOLOGY Park, B. H., Huang, E. S., Levitt, M. 1997; 266 (4): 831-846


    Eighteen low and medium resolution empirical energy functions were tested for their ability to distinguish correct from incorrect folds from three test sets of decoy protein conformations. The energy functions included 13 pairwise potentials of mean force, covering a wide range of functional forms and methods of parameterization, four potentials that attempt to detect properly formed hydrophobic cores, and one environment-based potential. the first of the three test sets consists of large ensembles of plausible conformations for eight small proteins, all of which have correct native secondary structure and are reasonably compact. The second is the set of all subconformations in a database of known protein structures applied to the sequences in that database (ungapped threading). The third is a set of ensembles of 1000 conformations each for seven small proteins taken from molecular dynamics simulations at 298 K and 498 K. Our results show that there are functions effective for each challenge set; moreover, success in one test is no guarantee of success in another. We examine the factors that seem to be important for accurate discrimination of correct structures in each of the test sets, and note that extremely simple functions are often as effective as more complex functions.

    View details for Web of Science ID A1997WL95400017

    View details for PubMedID 9102472

  • Competitive assessment of protein fold recognition and alignment accuracy PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS Levitt, M. 1997: 92-104


    The predictions made for fold recognition and modeling accuracy at the 1996 Critical Assessment of Structure Prediction meeting (CASP2) were assessed to discover which groups did best. With 32 groups making a total of 369 predictions, it was necessary to use simple criteria for distinguishing between the entries. By focusing on the predictors' ability to use the sequence of the unknown target structure to recognize the target fold from a database of known folds and also on the quality of the model judged by the accuracy of the predicted alignment, it is easy to determine the best predictions for a given target. Assessing overall performance of the predictors on all the targets is much more difficult and use was made of weighted averages of fold recognition and alignment accuracy with and without normalization for target difficulty. By plotting these results in two dimensions the winning groups stand out, allowing readers to focus their attention on the most promising methods. When the present results are compared with the results of the earlier CASP1 meeting, held in 1994, it is clear that threading predictions have progressed dramatically. For this assessor, the strongest lesson learned is that subjectivity is pervasive and affects us all. It is abundantly clear that the blind predictions made at CASP are essential if progress is to be made in predicting protein structure.

    View details for Web of Science ID 000071920700013

    View details for PubMedID 9485500

  • Packing as a structural basis of protein stability: understanding mutant properties from wildtype structure. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Lee, C., Levitt, M. 1997: 245-255


    Modeling of protein core mutations using sidechain packing can forecast their effects on stability. We have assessed the structural basis of this approach, by evaluating the accuracy of our 1991 model of a three-site mutant of lambda repressor (V36L/M40L/V47I), against the recently reported crystal structure. The three mutated residues matched the crystal structure to within 0.89A (1.11A for sidechain atoms), giving fairly accurate sidechain placement and packing (81-99th percentile rank in coordinate accuracy). However, the model used different sidechain torsional angles than seen in the crystal structure at residues 36 and 40, apparently to compensate for the backbone shifts present in the actual mutant structure, but not accounted for in our modeling method. To understand the structural basis of stability across a set of lambda repressor core mutants, we have analyzed the mutant models, revealing several simple packing effects: V36I, predicted to be stabilized by filling a hydrophobic cavity; M40V, destabilized by a steric clash with the unusual structural demands of a helix-turn transition. These effects illustrate how mutant stability can often be understood directly from scrutiny of wildtype structure. Simply adding the calculated energies of neighboring point mutations predicts the stability effect of the combined mutant relatively well, with little apparent cooperativity, yielding simple rules for each site's amino acid preferences. Our treatment of core packing indicates that it can permit a large fraction of sequences to fit the native fold, as observed experimentally, far more than indicated by rotamer hard-sphere models.

    View details for PubMedID 9390296

  • A retrospective analysis of CASP2 threading predictions PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS Marchler-Bauer, A., Levitt, M., Bryant, S. H. 1997: 83-91


    Analysis of CASP2 protein threading results shows that the success rate of structure predictions varies widely among prediction targets. We set "critical" thresholds in fold recognition specificity and threading model accuracy at the points where "incorrect" CASP2 predictions just outnumber "correct" predictions. Using these thresholds we find that correct predictions were made for all of those targets and for only those targets where more than 50% of target residues may be superimposed on previously known structures. Three-fourths of these correct predictions were furthermore made for targets with greater than 12% residue identity in structural alignment, where characteristic sequence motifs are also present. Based on these observations we suggest that the sustained performance of threading methods is best characterized by counting the numbers of correct predictions for targets of increasing "difficulty." We suggest that target difficulty may be assigned, once the true structure of the target is known, according to the fraction of residues superimposable onto previously known structures and the fraction of identical residues in those structural alignments.

    View details for Web of Science ID 000071920700012

    View details for PubMedID 9485499

  • Protein folding: The endgame ANNUAL REVIEW OF BIOCHEMISTRY Levitt, M., Gerstein, M., Huang, E., Subbiah, S., Tsai, J. 1997; 66: 549-579


    The last stage of protein folding, the "endgame," involves the ordering of amino acid side-chains into a well defined and closely packed configuration. We review a number of topics related to this process. We first describe how the observed packing in protein crystal structures is measured. Such measurements show that the protein interior is packed exceptionally tightly, more so than the protein surface or surrounding solvent and even more efficiently than crystals of simple organic molecules. In vitro protein folding experiments also show that the protein is close-packed in solution and that the tight packing and intercalation of side-chains is a final and essential step in the folding pathway. These experimental observations, in turn, suggest that a folded protein structure can be described as a kind of three-dimensional jigsaw puzzle and that predicting side-chain packing is possible in the sense of solving this puzzle. The major difficulty that must be overcome in predicting side-chain packing is a combinatorial "explosion" in the number of possible configurations. There has been much recent progress towards overcoming this problem, and we survey a variety of the approaches. These approaches differ principally in whether they use ab initio (physical) or more knowledge-based methods, how they divide up and search conformational space, and how they evaluate candidate configurations (using scoring functions). The accuracy of side-chain prediction depends crucially on the (assumed) positioning of the main-chain. Methods for predicting main-chain conformation are, in a sense, not as developed as that for side-chains. We conclude by surveying these methods. As with side-chain prediction, there are a great variety of approaches, which differ in how they divide up and search space and in how they score candidate conformations.

    View details for Web of Science ID A1997XH20100019

    View details for PubMedID 9242917

  • Finite-difference solution of the Poisson-Boltzmann equation: Complete elimination of self-energy JOURNAL OF COMPUTATIONAL CHEMISTRY Zhou, Z. X., Payne, P., Vasquez, M., Kuhn, N., Levitt, M. 1996; 17 (11): 1344-1351
  • Keeping the shape but changing the charges: A simulation study of urea and its iso-steric analogs JOURNAL OF CHEMICAL PHYSICS Tsai, J., Gerstein, M., Levitt, M. 1996; 104 (23): 9417-9430
  • Energy functions that discriminate X-ray and near-native folds from well-constructed decoys JOURNAL OF MOLECULAR BIOLOGY Park, B., Levitt, M. 1996; 258 (2): 367-392


    This study generates ensembles of decoy or test structures for eight small proteins with a variety of different folds. Between 35,000 and 200,000 decoys were generated for each protein using our four-state off-lattice model together with a novel relaxation method. These give compact self-avoiding conformations each constrained to have native secondary structure. Ensembles of these decoy conformations were used to test the ability of several types of empirical contact, surface area and distance-dependent energy functions to distinguish between correct and incorrect conformations. These tests have shown that none of the functions is able to distinguish consistently either the X-ray conformation or the near-native conformations from others which are incorrect. Certain combinations of two of these energy functions were able, however, consistently to identify X-ray structures from amongst the decoy conformations. These same combinations are better also at identifying near-native conformations, consistently finding them with a hundred-fold higher frequency than chance. The fact that these combination energy functions perform better than generally accepted energy functions suggests their future use in folding simulations and perhaps threading predictions.

    View details for Web of Science ID A1996UH33000013

    View details for PubMedID 8627632

  • From structure to sequence and back again JOURNAL OF MOLECULAR BIOLOGY Hinds, D. A., Levitt, M. 1996; 258 (1): 201-209


    With a simple lattice model and sequence design algorithm, we can design sequences to fit arbitrary compact globular structures. We judged the success of the design algorithm by performing exhaustive conformational searches to determine if a designed sequence's lowest energy conformation matched the target for which it was designed. Designed sequences tend to be much better optimized for their targets than a natural sequence is optimized for its lowest energy model conformation. We examined the effect of varying the number of available amino acid types on the success of the design method. It was more difficult but not impossible to successfully design discriminating sequences using fewer amino acid types.

    View details for Web of Science ID A1996UG25600016

    View details for PubMedID 8613988

  • Using a hydrophobic contact potential to evaluate native and near-native folds generated by molecular dynamics simulations JOURNAL OF MOLECULAR BIOLOGY Huang, E. S., Subbiah, S., Tsai, J., Levitt, M. 1996; 257 (3): 716-725


    There are several knowledge-based energy functions that can distinguish the native fold from a pool of grossly misfolded decoys for a given sequence of amino acids. These decoys, which are typically generated by mounting, or "threading", the sequence onto the backbones of unrelated protein structures, tend to be non-compact and quite different from the native structure: the root-mean-squared (RMS) deviations from the native are commonly in the range of 15 to 20 angstroms. Effective energy functions should also demonstrate a similar recognition capability when presented with compact decoys that depart only slightly in conformation from the correct structure (i.e. those with RMS deviations of approximately 5 angstroms or less). Recently, we developed a simple yet powerful method for native fold recognition based on the tendency for native folds to form hydrophobic cores. Our energy measure, which we call the hydrophobic fitness score, is challenged to recognize the native fold from 2000 near-native structures generated for each of five small monomeric proteins. First, 1000 conformations for each protein were generated by molecular dynamics simulation at room temperature. The average RMS deviation of this set of 5000 was 1.5 angstroms. A total of 323 decoys had energies lower than native; however, none of these had RMS deviations greater than 2 angstroms. Another 1000 structures were generated for each at high temperature, in which a greater range of conformational space was explored (4.3 angstroms RMS deviation). Out of this set, only seven decoys were misrecognized. The hydrophobic fitness energy of a conformation is strongly dependent upon the RMS deviation. On average our potential yields energy values which are lowest for the population of structures generated at room temperature, intermediate for those produced at high temperature and highest for those constructed by threading methods. In general, the lowest energy decoy conformations have backbones very close to native structure. The possible utility of our method for screening backbone candidates for the purpose of modelling by side-chain packing optimization is discussed.

    View details for Web of Science ID A1996UC77300021

    View details for PubMedID 8648635

  • Theory and simulation through the breach CURRENT OPINION IN STRUCTURAL BIOLOGY Levitt, M. 1996; 6 (2): 193-194
  • Through the breach. Current opinion in structural biology LEVITT 1996; 6 (2): 193-194

    View details for PubMedID 8794145

  • Using iterative dynamic programming to obtain accurate pairwise and multiple alignments of protein structures. Proceedings / ... International Conference on Intelligent Systems for Molecular Biology ; ISMB. International Conference on Intelligent Systems for Molecular Biology Gerstein, M., Levitt, M. 1996; 4: 59-67


    We show how a basic pairwise alignment procedure can be improved to more accurately align conserved structural regions, by using variable, position-dependent gap penalties that depend on secondary structure and by taking the consensus of a number of suboptimal alignments. These improvements, which are novel for structural alignment, are direct analogs of what is possible with normal sequences alignment. They are feasible for us since our basic structural alignment procedure, unlike others, is so similar to normal sequence alignment. We further present preliminary results that show how our procedure can be generalized to produce a multiple alignment of a family of structures. Our approach is based on finding a "median" structure from doing all possible pairwise alignments and then aligning everything to it.

    View details for PubMedID 8877505

  • Packing as a structural basis of protein stability: Understanding mutant properties from wildtype structure 2nd Pacific Symposium on Biocomputing (PSB) Lee, C., Levitt, M. WORLD SCIENTIFIC PUBL CO PTE LTD. 1996: 245–255
  • Simulating the dynamics of the DNA double helix in solution NATO Advanced Study Institute / International-School-of-Biological-Magnetic-Resonances 2nd Course on Dynamics and the Problem of Recognition in Biological Macromolecules Hirshberg, M., Levitt, M. PLENUM PRESS DIV PLENUM PUBLISHING CORP. 1996: 173–191


    Central to the ab initio protein folding problem is the development of an energy function for which the correct native structure has a lower energy than all other conformations. Existing potentials of mean force typically rely extensively on database-derived contact frequencies or knowledge of three-dimensional structural information in order to be successful in the problem of recognizing the native fold for a given sequence from a set of decoy backbone conformations. Is the detailed statistical information or sophisticated analysis used by these knowledge-based potentials needed to achieve the observed degree of success in fold recognition? Here we introduce a novel pairwise energy function that enumerates contacts between hydrophobic residues while weighting their sum by the total number of residues surrounding these hydrophobic residues. Thus it effectively selects compact folds with the desired structural feature of a buried, intact core. This approach represents an advance over using pairwise terms whose energies of interaction that are independent of the position in the protein and greatly improves the discrimination capability of an energy function. Our results show that 85% of a set of 195 representative native folds were recognized correctly. The 29 exceptions were lipophilic proteins, small proteins with prosthetic groups or disulfide bonds, and oligomeric proteins. Overall, our method separates the native fold from incorrect folds by a larger margin (measured in standard deviation units) than has been previously demonstrated by more sophisticated methods. The arrangement of hydrophobic and polar residues alone as evaluated by our novel scoring scheme, is unexpectedly effective at recognizing native folds in general. It is surprising that a simple binary pattern of hydrophobic and polar residues apparently selects a give unique fold topology.

    View details for Web of Science ID A1995RY58400015

    View details for PubMedID 7563083



    We analyze the volume of atoms on the protein surface during a molecular-dynamics simulation of a small protein (pancreatic trypsin inhibitor). To calculate volumes, we use a particular geometric construction, called Voronoi polyhedra, that divides the total volume of the simulation box amongst the atoms, rendering them relatively larger or smaller depending on how tightly they are packed. We find that most of the atoms on the protein surface are larger than those buried in the core (by approximately 6%), except for the charged atoms, which decrease in size, presumably due to electroconstriction. We also find that water molecules are larger near apolar atoms on the protein surface and smaller near charged atoms, in comparison to "bulk" water molecules far from the protein. Taken together, these findings necessarily imply that apolar atoms on the protein surface and their associated water molecules are less tightly packed (than corresponding atoms in the protein core and bulk water) and the opposite is the case for charged atoms. This looser apolar packing and tighter charged packing fundamentally reflects protein-water distances that are larger or smaller than those expected from van der Waals radii. In addition to the calculation of mean volumes, simulations allow us to investigate the volume fluctuations and hence compressibilities of the protein and solvent atoms. The relatively large volume fluctuations of atoms at the protein-water interface indicates that they have a more variable packing than corresponding atoms in the protein core or in bulk water. We try to adhere to traditional conventions throughout our calculations. Nevertheless, we are aware of and discuss three complexities that significantly qualify our calculations: the positioning of the dividing plane between atoms, the problem of vertex error, and the choice of atom radii. In particular, our results highlight how poor a "compromise" the commonly accepted value of 1.4 A is for the radius of a water molecule.

    View details for Web of Science ID A1995RF04700011

    View details for PubMedID 7540695



    The prediction of protein structure depends on the quality of the models used. In this paper, we examine the relationship between the complexity and accuracy of representation of various models of protein alpha-carbon backbone structure. First, we develop an efficient algorithm for the near optimal fitting of arbitrary lattice and off-lattice models of polypeptide chains to their true X-ray structures. Using this, we show that the relationship between the complexity of a model, taken as the number of possible conformational states per residue, and the simplest measure of accuracy, the root-mean-square deviation from the X-ray structure, is approximately (Accuracy) varies; is directly proportional to (Complexity)-1/2. This relationship is insensitive to the particularities of individual models, i.e. lattice and off-lattice models of the same complexity tend to have similar average root-mean-square deviations, and this also implies that improvements in model accuracy with increasing complexity are very small. However, other measures of model accuracy, such as the preservation of X-ray residue-residue contacts and the alpha-helix, do distinguish among models. In addition, we show that low complexity models, which take into account the uneven distribution of residue conformations in real proteins, can represent X-ray structures as accurately as more complex models, which do not: a selected 6-state model can represent protein structures almost as accurately (1.7 A root-mean-square) as a 17-state lattice model (1.6 A root-mean-square). Finally, we use a novel optimization procedure to generate eight 4-state models, which fit native proteins to an average of 2.4 A, and preserve 85% of native residue-residue contacts. We discuss the implications of these findings for protein folding and the prediction of protein conformation.

    View details for Web of Science ID A1995RB61600021

    View details for PubMedID 7783205



    Cholera is a widespread disease for which there is no efficient vaccine. A better understanding of the conformational rearrangements at the epitope might be very helpful for the development of a good vaccine. Cholera toxin (CT) as well as the closely related heat-labile toxin from Escherichia coli (LT) are composed of two subunits, A and B, which form an oligomeric assembly AB5. Residues 50-64 on the surface of the B subunits comprise a conserved loop (CTP3), which is involved in saccharide binding to the receptor on epithelial cells. This loop exhibits remarkable conformational plasticity induced by environmental constraints. The crystal structure of this loop is compared in the free and receptor-bound toxins as well as in the crystal and solution structures of a complex with TE33, a monoclonal antibody elicited against CTP3. In the toxins this loop forms an irregular structure connecting a beta-strand to the central alpha-helix. Ser 55 and Gln 56 exhibit considerable conformational variability in the five subunits of the unliganded toxins. Saccharide binding induces a change primarily in Ser 55 and Gln 56 to a conformation identical in all five copies. Thus, saccharide binding confers rigidity upon the loop. The conformation of CTP3 in complex with TE33 is quite different. The amino-terminal part of CTP3 forms a beta-turn that fits snugly into a deep binding pocket on TE33, in both the crystal and NMR-derived solution structure. Only 8 and 12 residues out of 15 are seen in the NMR and crystal structures, respectively. Despite these conformational differences, TE33 is cross-reactive with intact CT, albeit with a thousandfold decrease in affinity. This suggests a different interaction of TE33 with intact CT.

    View details for Web of Science ID A1995QW98100003

    View details for PubMedID 7545048



    We present a low resolution lattice model for which we can exhaustively generate all possible compact backbone conformations for small proteins. Using simple structural and energetic criteria, for a variety of proteins, we can select for lattice structures that have significant similarities with their known native structures. Our energetic parameters are based on pairwise amino acid contact frequencies in a database of experimentally determined structures. A key step in our method involves the threading of a sequence onto every lattice model, such that a locally optimal pattern of tertiary interactions is formed. We evaluate our results against statistics collected for structures covering all of conformational space, and against statistics collected for permuted sequences. Despite the low resolution of the model, our low energy structures contain many native features. These results indicate that the overall pattern of hydrophobicity of a sequence significantly constrains the range of folds that sequence is likely to adopt.

    View details for Web of Science ID A1994PQ66300012

    View details for PubMedID 7966290



    We report an interesting case of structural similarity between 2 small, nonhomologous proteins, the third domain of ovomucoid (ovomucoid) and the C-terminal fragment of ribosomal L7/L12 protein (CTF). The region of similarity consists of a 3-stranded beta-sheet and an alpha-helix. This region is highly similar; the corresponding elements of secondary structure share a common topology, and the RMS difference for "equivalent" C alpha atoms is 1.6 A. Surprisingly, this common structure arises from completely different sequences. For the common core, the sequence identity is less than 3%, and there is neither significant sequence similarity nor similarity in the position or orientation of conserved hydrophobic residues. This superposition raises the question of how 2 entirely different sequences can produce an identical structure. Analyzing this common region in ovomucoid revealed that it is stabilized by disulfide bonds. In contrast, the corresponding structure in CTF is stabilized in the alpha-helix by a composition of residues with high helix-forming propensities. This result suggests that different sequences and different stabilizing interactions can produce an identical structure.

    View details for Web of Science ID A1994PZ82400005

    View details for PubMedID 7703840

  • WATER - NOW YOU SEE IT, NOW YOU DONT STRUCTURE Levitt, M., Park, B. H. 1993; 1 (4): 223-226

    View details for Web of Science ID A1993NC28000002

    View details for PubMedID 8081736



    Herein we describe the results of molecular dynamics simulations of the bovine pancreatic trypsin inhibitor (BPTI) in solution at a variety of temperatures both with and without disulfide bonds. The reduced form of the protein unfolded at high temperature to an ensemble of conformations with all the properties of the molten globule state. In this account we outline the structural details of the actual unfolding process between the native and molten globule states. The first steps of unfolding involved expansion of the protein, which disrupted packing interactions. The solvent-accessible surface area also quickly increased. The unfolding was localized mostly to the turn and loop regions of the molecule, while leaving the secondary structure intact. Then, there was more gradual unfolding of the secondary structure and non-native turns became prevalent. This same trajectory was continued and more drastic unfolding occurred that resulted in a relatively compact state devoid of stable secondary structure.

    View details for Web of Science ID A1993LQ98500023

    View details for PubMedID 7688428



    In recent years, the determination of large numbers of protein structures has created a need for automatic and objective methods for the comparison of structures or conformations. Many protein structures show similarities of conformation that are undetectable by comparing their sequences. Comparison of structures can reveal similarities between proteins thought to be unrelated, providing new insight into the interrelationships of sequence, structure and function.Using a new tool that we have developed to perform rapid structural alignment, we present the highlights of an exhaustive comparison of all pairs of protein structures in the Brookhaven protein database. Notably, we find that the DNA-binding domain of the bacteriophage repressor family is almost completely embedded in the larger eight-helix fold of the globin family of proteins. The significant match of specific residues is correlated with functional, structural and evolutionary information.Our method can help to identify structurally similar folds rapidly and with high-sensitivity, providing a powerful tool for analyzing the ever-increasing number of protein structures being elucidated.

    View details for Web of Science ID A1993LL95500002

    View details for PubMedID 15335781


    View details for Web of Science ID A1993LH45400014

    View details for PubMedID 8347994



    Intramolecular interactions in bound cholera toxin peptide (CTP3) in three antibody complexes were studied by two-dimensional transferred NOE spectroscopy. These measurements together with previously recorded spectra that show intermolecular interactions in these complexes were used to obtain restraints on interproton distances in two of these complexes (TE32 and TE33). The NMR-derived distance restraints were used to dock the peptide into calculated models for the three-dimensional structure of the antibody combining site. It was found that TE32 and TE33 recognize a loop comprising the sequence VPGSQHID and a beta-turn formed by the sequence VPGS. The third antibody, TE34, recognizes a different epitope within the same peptide and a beta-turn formed by the sequence IDSQ. Neither of these two turns was observed in the free peptide. The formation of a beta-turn in the bound peptide gives a compact conformation that maximizes the contact with the antibody and that has greater conformational freedom than alpha-helix or beta-sheet secondary structure. A total of 15 antibody residues are involved in peptide contacts in the TE33 complex, and 73% of the contact area in the antibody combining site consists of the side chains of aromatic amino acids. A comparison of the NMR-derived models for CTP3 interacting with TE32 and TE33 with the previously derived model for TE34 reveals a relationship between amino acid sequence and combining site structure and function. (a) The three aromatic residues that interact with the peptide in TE32 and TE33 complexes, Tyr 32L, Tyr 32H, and Trp 50H, are invariant in all light chains sharing at least 65% identity with TE33 and TE32 and in all heavy chains sharing at least 75% identity with TE33. Although TE34 differs from TE32 and TE33 in its fine specificity, these aromatic residues are conserved in TE34 and interact with its antigen. Therefore, we conclude that the role of these three aromatic residues is to participate in nonspecific hydrophobic interactions with the antigen. (b) Residues 31, 31c, and 31e of CDR1 of the light chain interact with the antigen in all three antibodies that we have studied. The amino acids in these positions in TE34 differ from those in TE32 and TE33, and they are involved in specific polar interactions with the antigen. (c) CDR3 of the heavy chain varies considerably both in length and in sequence between TE34 and the two other anti-CTP3 antibodies. These changes modify the shape of the combining site and the hydrophobic and polar interactions of CDR3 with the peptide antigen.

    View details for Web of Science ID A1992JG66700004

    View details for PubMedID 1379072



    Segment match modeling uses a data base of highly refined known protein X-ray structures to build an unknown target structure from its amino acid sequence and the atomic coordinates of a few of its atoms (generally only the C alpha atoms). The target structure is first broken into a set of short segments. The data base is then searched for matching segments, which are fitted onto the framework of the target structure. Three criteria are used for choosing a matching data base segment: amino acid sequence similarity, conformational similarity (atomic co-ordinates), and compatibility with the target structure (van der Waals' interactions). The new method works surprisingly well: for eight test proteins ranging in size from 46 to 323 residues, the all-atom root-mean-square deviation of the modeled structures is between 0.93 A and 1.73 A (the average is 1.26 A). Deviations of this magnitude are comparable with those found for protein co-ordinates before and after refinement against X-ray data or for co-ordinates of the same protein in different crystal packings. These results are insensitive to errors in the C alpha positions or to missing C alpha atoms: accurate models can be built with C alpha errors of up to 1 A or by using only half the C alpha atoms. The fit to the X-ray structures is improved significantly by building several independent models based on different random choices and then averaging co-ordinates; this novel concept has general implications for other modeling tasks. The segment match modeling method is fully automatic, yields a complete set of atomic co-ordinates without any human intervention and is efficient (14 s/residue on the Silicon Graphics 4D/25 Personal Iris workstation.

    View details for Web of Science ID A1992JF96600020

    View details for PubMedID 1640463



    It is generally accepted that a protein's primary sequence determines its three-dimensional structure. It has proved difficult, however, to obtain detailed structural information about the actual protein folding process and intermediate states. We present the results of molecular dynamics simulations of the unfolding of reduced bovine pancreatic trypsin inhibitor. The resulting partially "denatured" state was compact but expanded relative to the native state (11-25%); the expansion was not caused by an influx of water molecules. The structures were mobile, with overall secondary structure contents comparable to those of the native protein. The protein experienced relatively local unfolding, with the largest changes in the structure occurring in the loop regions. A hydrophobic core was maintained although packing of the side chains was compromised. The properties displayed in the simulation are consistent with unfolding to a molten globule state. Our simulations provide an in-depth view of this state and details of water-protein interactions that cannot yet be obtained experimentally.

    View details for Web of Science ID A1992HX16800075

    View details for PubMedID 1594623



    The prediction of the folded structure of a protein from its sequence has proven to be a very difficult computational problem. We have developed an exceptionally simple representation of a polypeptide chain, with which we can enumerate all possible backbone conformations of small proteins. A protein is represented by a self-avoiding path of connected vertices on a tetrahedral lattice, with several amino acid residues assigned to each lattice vertex. For five small structurally dissimilar proteins, we find that we can separate native-like structures from the vast majority of non-native folds by using only simple structural and energetic criteria. This method demonstrates significant generality and predictive power without requiring foreknowledge of any native structural details.

    View details for Web of Science ID A1992HL81600006

    View details for PubMedID 1557356



    An understanding of the structural transitions that an alpha-helix undergoes will help to elucidate such motions in proteins and their role in protein folding. We present the results of molecular dynamics simulations to investigate these transitions in a short polyalanine peptide (13 residues) both in vacuo and in the presence of solvent. The denaturation of this peptide was monitored as a function of temperature (ranging from 5 to 200 degrees C). In vacuo, the helical state predominated at all temperatures, whereas in solution the helix melted with increasing temperature. The peptide was predominantly helical at low temperature in solution, while at intermediate temperatures the peptide spent the bulk of the time fluctuating between different conformations with intermediate amounts of helix, e.g. not completely helical nor entirely non-helical. Many of these conformations consisted of short helical segments with intervening non-helical residues. At high temperature the peptide unfolded and adopted various collapsed unstructured states. The intrahelical hydrogen bonds that break at high temperature were not fully compensated by hydrogen bonds with water molecules in the partially unfolded forms of the peptide. Increases in temperature disrupted both the helical structure and the peptide-water interactions. Water played a major but indirect role in facilitating unfolding, as opposed to specifically competing for the intrapeptide hydrogen bonds. The implications of our results to protein folding are discussed.

    View details for Web of Science ID A1992HG60100023

    View details for PubMedID 1538392



    Nuclear magnetic resonance has been used to study the structure of the anti-spin label antibody AN02 combining site and kinetic rates for the hapten-antibody reaction. The association reaction for the hapten dinitrophenyl-diglycine (DNP-diGly) is diffusion-limited. The activation enthalpy for association, 5.1 kcal/mol, is close to the activation enthalpy for diffusion in water. Several reliable resonance assignments have been made with the aid of recently reported crystal structure. Structural data deduced from the nuclear magnetic resonance (n.m.r.) spectra compare favorably with the crystal structure in terms of the combining site amino acid composition, distances of tyrosine residues from the unpaired electron of the hapten, and residues in direct contact with the hapten. Evidence is presented that a single binding site region tyrosine residue can assume two distinct conformations on binding of DNP-diGly. The AN02 antibody is an autoantibody. Dimerization of the Fab fragments is blocked by the hapten DNP-diGly. The n.m.r. spectra suggests that some of the amino acid residues involved in the binding of the DNP-hapten are also involved in the Fab dimerization.

    View details for Web of Science ID A1991GG37200025

    View details for PubMedID 1920409



    Theoretical prediction of the structure, stability and activity of proteins, an important unsolved problem in molecular biology, would be of use for guiding site-directed mutagenesis and other protein-engineering techniques. X-ray diffraction studies have provided extensive structural information for many proteins, challenging theorists to develop reliable techniques able to use such knowledge as a base for prediction of mutants' characteristics. Here we report theoretical calculation of stabilization energies for 78 triple-site sequence variants of lambda repressor characterized experimentally by Lim and Sauer. The calculated energies correlate with the mutants' measured activities; active and inactive mutations are discriminated with 92% reliability. They correlate even more directly with the mutants' thermostabilities, correctly identifying two of the mutants to be more stable than the wild type.

    View details for Web of Science ID A1991FZ34600071

    View details for PubMedID 1861725



    Molecular dynamics simulations of atomic motion in protein and nucleic acid molecules must be done on a femtosecond time-scale. Much of this rapid motion is unimportant for the slower changes that are most relevant to biological function (conformational changes, substrate binding, protein folding). The high-frequency motion makes simulations computationally expensive. More importantly, the high frequencies obscure visualization of the relevant dynamics processes. Sessions, Dauber-Osguthorpe and Osguthorpe presented a method for removing high-frequency motions from atomic co-ordinates of trajectories generated by simulation. While that study used fast Fourier methods and emphasized the use of filtering for analysis of trajectories, this communication describes a new method that makes it much easier to use frequency filtering in programs that display trajectories as a sequence of moving images. Tests of the method on systems extending from pure water to proteins and nucleic acid molecules in vacuo and in solution have demonstrated its general utility. Impressed with the power and simplicity of the new method, we wish to present it in sufficient detail to allow others to implement it themselves.

    View details for Web of Science ID A1991FW10200001

    View details for PubMedID 2067008

  • ENHANCED STABILITY OF SUBTILISIN BY 3 POINT MUTATIONS BIOTECHNOLOGY AND APPLIED BIOCHEMISTRY Narhi, L. O., STABINSKY, Y., Levitt, M., Miller, L., Sachdev, R., Finley, S., Park, S., Kolvenbach, C., Arakawa, T., Zukowski, M. 1991; 13 (1): 12-24


    This study was undertaken to characterize the effect of three point mutations made on aprA-subtilisin on the stability of the protein to both heat- and detergent-induced denaturation. Asparagine residues at positions 109 and 218 were replaced with serine residues to prevent the possible cyclization between these asparagines and the adjacent glycine residues and hence to increase the long-term stability. The effect of these substitutions on conformational stability was examined by thermal denaturation. At high calcium concentrations, the Ser109-substituted analog showed a 3 degrees C higher transition temperature than that of aprA-subtilisin, while the Ser218 substituted analog had a 4 degrees C higher transition temperature. The analog with both changes had a 7 degrees C higher transition temperature than that of the original aprA-subtilisin, indicating that the contributions of the individual mutations were additive. The analog with both mutations also exhibited increased stability in the presence of sodium dodecyl sulfate (SDS) when compared to aprA-subtilisin. In addition to the above two mutations, the asparagine at position 76, located in the high affinity Ca(2+) binding loop of subtilisin, was changed to aspartic acid. The effect of this mutation on the thermal stability of the protein was examined at different calcium concentrations. The analog with all three mutations exhibited little dependence on calcium concentration below 1 mM levels, while the proteins without the mutation at asparagine-76 displayed a strong dependence of melting temperature on Ca(2+) concentration in this range. At much higher calcium concentrations, the analog with three mutations showed an increase in stability similar to that observed with aprA-subtilisin. The analog with three mutations also exhibited greater stability to SDS-induced denaturation than both aprA-subtilisin and the Ser109- and Ser218-substituted analogs. The activation energy barrier for loss of structure in 1% SDS for the analog with all three mutations was increased over that for aprA-subtilisin by 16 kcal/ml. These results suggest that the mutation of asparagine-76 to aspartic acid increases the affinity of the primary Ca(2+) binding site.

    View details for Web of Science ID A1991EW55900002

    View details for PubMedID 2054102

  • Protein Folding Curr. Opinions Struct. Biol. Levitt M 1991; 1: 224-229
  • NMR-DERIVED MODEL FOR A PEPTIDE-ANTIBODY COMPLEX BIOCHEMISTRY Zilber, B., Scherf, T., Levitt, M., Anglister, J. 1990; 29 (43): 10032-10041


    The TE34 monoclonal antibody against cholera toxin peptide 3 (CTP3; VEVPGSQHIDSQKKA) was sequenced and investigated by two-dimensional transferred NOE difference spectroscopy and molecular modeling. The VH sequence of TE34, which does not bind cholera toxin, shares remarkable homology to that of TE32 and TE33, which are both anti-CTP3 antibodies that bind the toxin. However, due to a shortened heavy chain CDR3, TE34 assumes a radically different combining site structure. The assignment of the combining site interactions to specific peptide residues was completed by use of AcIDSQRKA, a truncated peptide analogue in which lysine-13 was substituted by arginine, specific deuteration of individual polypeptide chains of the antibody, and a computer model for the Fv fragment of TE34. NMR-derived distance restraints were then applied to the calculated model of the Fv to generate a three-dimensional structure of the TE34/CTP3 complex. The combining site was found to be a very hydrophobic cavity composed of seven aromatic residues. Charged residues are found in the periphery of the combining site. The peptide residues HIDSQKKA form a beta-turn inside the combining site. The contact area between the peptide and the TE34 antibody is 388 A2, about half of the contact area observed in protein-antibody complexes.

    View details for Web of Science ID A1990EG39900004

    View details for PubMedID 2271636

  • CONFORMATIONS OF IMMUNOGLOBULIN HYPERVARIABLE REGIONS NATURE Chothia, C., Lesk, A. M., Tramontano, A., Levitt, M., SMITHGILL, S. J., Air, G., Sheriff, S., Padlan, E. A., Davies, D., TULIP, W. R., Colman, P. M., Spinelli, S., Alzari, P. M., Poljak, R. J. 1989; 342 (6252): 877-883


    On the basis of comparative studies of known antibody structures and sequences it has been argued that there is a small repertoire of main-chain conformations for at least five of the six hypervariable regions of antibodies, and that the particular conformation adopted is determined by a few key conserved residues. These hypotheses are now supported by reasonably successful predictions of the structures of most hypervariable regions of various antibodies, as revealed by comparison with their subsequently determined structures.

    View details for Web of Science ID A1989CF63700048

    View details for PubMedID 2687698

  • A HUMANIZED ANTIBODY THAT BINDS TO THE INTERLEUKIN-2 RECEPTOR PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Queen, C., Schneider, W. P., Selick, H. E., PAYNE, P. W., Landolfi, N. F., Duncan, J. F., AVDALOVIC, N. M., Levitt, M., Junghans, R. P., Waldmann, T. A. 1989; 86 (24): 10029-10033


    The anti-Tac monoclonal antibody is known to bind to the p55 chain of the human interleukin 2 receptor and to inhibit proliferation of T cells by blocking interleukin 2 binding. However, use of anti-Tac as an immunosuppressant drug would be impaired by the human immune response against this murine antibody. We have therefore constructed a "humanized" antibody by combining the complementarity-determining regions (CDRs) of the anti-Tac antibody with human framework and constant regions. The human framework regions were chosen to maximize homology with the anti-Tac antibody sequence. In addition, a computer model of murine anti-Tac was used to identify several amino acids which, while outside the CDRs, are likely to interact with the CDRs or antigen. These mouse amino acids were also retained in the humanized antibody. The humanized anti-Tac antibody has an affinity for p55 of 3 x 10(9) M-1, about 1/3 that of murine anti-Tac.

    View details for Web of Science ID A1989CE97600082

    View details for PubMedID 2513570



    The interactions between the aromatic amino acids of two monoclonal antibodies (TE32 and TE33) with specific amino acid residues of a peptide of cholera toxin (CTP3) have been determined by two-dimensional (2D) transferred NOE difference spectroscopy. Aromatic amino acids are found to play an important role in peptide binding. In both antibodies two tryptophan and two tyrosine residues and one histidine residue interact with the peptide. In TE33 there is an additional phenylalanine residue that also interacts with the peptide. The residues of the CTP3 peptide that have been found to interact with the antibody are val 3, pro 4, gly 5, gln 7, his 8, and asp 10. We have determined the amino acid sequences of the two antibodies by direct mRNA sequencing. Computerized molecular modeling has been used to build detailed all-atom models of both antibodies from the known conformations of other antibodies. These models allow unambiguous assignment of most of the antibody residues that interact with the peptide. A comparison of the amino acid sequences of the two anti-CTP3 antibodies with other antibodies from the same gene family reveals that the majority of the aromatic residues involved in the binding of CTP3 are conserved although these antibodies have different specificities. This similarity suggests that these aromatic residues create a general hydrophobic pocket and that other residues in the complementarity-determining regions (CDRs) modulate the shape and the polarity of the combining site to fit the specific antigens.

    View details for Web of Science ID A1989AQ25000006

    View details for PubMedID 2819059



    Four different disulfide bridges (linking positions 9-164, 21-142, 90-122, and 127-154) were introduced into a cysteine-free phage T4 lysozyme at sites suggested by theoretical calculations and computer modeling. The new cysteines spontaneously formed disulfide bonds on exposure to air in vitro. In all cases the oxidized (crosslinked) lysozyme was more stable than the corresponding reduced (noncrosslinked) enzyme toward thermal denaturation. Relative to wild-type lysozyme, the melting temperatures of the 9-164 and 21-142 disulfide mutants were increased by 6.4 degrees C and 11.0 degrees C, whereas the other two mutants were either less stable or equally stable. Measurement of the equilibrium constants for the reduction of the engineered disulfide bonds by dithiothreitol indicates that the less thermostable mutants tend to have a less favorable crosslink in the native structure. The two disulfide bridges that are most effective in increasing the stability of T4 lysozyme have, in common, a large loop size and a location that includes a flexible part of the molecule. The results suggest that stabilization due to the effect of the crosslink on the entropy of the unfolded polypeptide is offset by the strain energy associated with formation of the disulfide bond in the folded protein. The design of disulfide bridges is discussed in terms of protein flexibility.

    View details for Web of Science ID A1989AP19100026

    View details for PubMedID 2671995



    Simulation of the molecular dynamics of a small protein, bovine pancreatic trypsin inhibitor, was found to be more realistic when water molecules were included than when in vacuo: the time-averaged structure was much more like that observed in high-resolution x-ray studies, the amplitudes of atomic vibration in solution were smaller, and fewer incorrect hydrogen bonds were formed. Our approach, which provides a sound basis for reliable simulation of diverse properties of biological macromolecules in solution, uses atom-centered forces and classical mechanics.

    View details for Web of Science ID A1988Q580700030

    View details for PubMedID 2459709



    Simple energy calculations show that there is a significant interaction between a hydrogen bond donor (like the greater than NH group) and the centre of a benzene ring, which acts as a hydrogen bond acceptor. This interaction, which is about half as strong as a normal hydrogen bond, contributes approximately 3 kcal/mol (1 cal = 4.184 J) of stabilizing enthalpy and is expected to play a significant role in molecular associations. It is of interest that the aromatic hydrogen bond arises from small partial charges centred on the ring carbon and hydrogen atoms: there is no need to consider delocalized electrons. Although some energy calculations have included such partial charges, their role in forming such a strong interaction was not appreciated until after aromatic hydrogen bonds had been observed in protein-drug complexes.

    View details for Web of Science ID A1988N947200007

    View details for PubMedID 3172202

  • CONTRIBUTION OF TRYPTOPHAN RESIDUES TO THE COMBINING SITE OF A MONOCLONAL ANTI DINITROPHENYL SPIN-LABEL ANTIBODY BIOCHEMISTRY Anglister, J., Bond, M. W., Frey, T., Leahy, D., Levitt, M., McConnell, H. M., Rule, G. S., Tomasello, J., Whittaker, M. 1987; 26 (19): 6058-6064


    Two Fab fragments of the monoclonal anti dinitrophenyl (DNP) spin-label antibody AN02 were prepared by recombination of specifically deuterated heavy and light chains. In the recombinant H(I)L(II) all the tyrosines and phenylalanines were perdeuterated as were the tryptophan residues of the heavy chain. In the recombinant H(II)L(I) all the tyrosines and phenylalanines were perdeuterated as were the tryptophan residues of the light chain. Saturation of three resonances of H(I)L(II), assigned to tryptophan protons of the light chain, resulted in magnetization transfer to the aromatic proton at position 6 of the DNP ring and to the CH2 protons of the glycines linked to the DNP in a diamagnetic hapten (DNP-DG). Saturation of three resonances of H(II)L(I) assigned to tryptophan protons of the heavy chain resulted in magnetization transfer to the CH2 protons of the glycines in DNP-DG. From the dependence of the magnetization transfer on the irradiation time, the cross relaxation rates between the involved protons were estimated. The inferred distances between these protons of the hapten and certain tryptophan protons are 3-4 A. It is concluded that in the combining site of AN02 there is one tryptophan from the light chain and one tryptophan from the heavy chain that are very near the hapten. When all tyrosines and phenylalanines were perdeuterated and all tryptophan aromatic protons were deuterated except for the protons at positions 2 and 5, titration of the Fab fragments with variable amounts of paramagnetic hapten showed that one proton from the light chain tryptophan is near (less than 7 A) the unpaired electron and that three other protons are significantly closer than 15 A.(ABSTRACT TRUNCATED AT 250 WORDS)

    View details for Web of Science ID A1987K210700017

    View details for PubMedID 3120771

  • THE PREDICTED STRUCTURE OF IMMUNOGLOBULIN-D1.3 AND ITS COMPARISON WITH THE CRYSTAL-STRUCTURE SCIENCE Chothia, C., Lesk, A. M., Levitt, M., AMIT, A. G., Mariuzza, R. A., Phillips, S. E., Poljak, R. J. 1986; 233 (4765): 755-758


    Predictions of the structures of the antigen-binding domains of an antibody, recorded before its experimental structure determination and tested subsequently, were based on comparative analysis of known antibody structures or on conformational energy calculations. The framework, the relative positions of the hypervariable regions, and the folds of four of the hypervariable loops were predicted correctly. This portion includes all residues in contact with the antigen, in this case hen egg white lysozyme, implying that the main chain conformation of the antibody combining site does not change upon ligation. The conformations of three residues in each of the other two hypervariable loops are different in the predicted models and the experimental structure.

    View details for Web of Science ID A1986D523400028

    View details for PubMedID 3090684

  • HELIX TO HELIX PACKING IN PROTEINS JOURNAL OF MOLECULAR BIOLOGY Chothia, C., Levitt, M., Richardson, D. 1981; 145 (1): 215-250

    View details for Web of Science ID A1981KX76800010

    View details for PubMedID 7265198

  • PERIODICITY OF DEOXYRIBONUCLEASE-I DIGESTION OF CHROMATIN SCIENCE Prunell, A., Kornberg, R. D., Lutter, L., Klug, A., Levitt, M., CRICK, F. H. 1979; 204 (4395): 855-858


    Two methods have been used to measure the single-strand lengths of the DNA fragments produced by deoxyribonuclease I digestion of chromatin. The average lengths obtained are muliples of about 10.4 bases, significantly different from the value of 10 previously reported. This periodicity in fragment lengths is closely related to the periodicity of the DNA double helix in chromatin, but the two values need not be exactly the same.

    View details for Web of Science ID A1979GV41300034

    View details for PubMedID 441739


    View details for Web of Science ID A1978FY32300007

    View details for PubMedID 731698



    Simple models are presented that describe the rules for almost all the packing that occurs between and among alpha-helices and pleated sheets. These packing rules, together with the primary and secondary structures, are the major determinants of the three-dimensional structure of proteins.

    View details for Web of Science ID A1977DZ33900005

    View details for PubMedID 270659

  • STRUCTURE OF NUCLEOSOME CORE PARTICLES OF CHROMATIN NATURE Finch, J. T., Lutter, L. C., Rhodes, D., Brown, R. S., Rushton, B., Levitt, M., Klug, A. 1977; 269 (5623): 29-36


    Cystals have been obtained on nucleosome cores and analysed by X-ray diffraction and electron microscopy. The core is a flat particle of dimensions about 110 X 110 X 57 A, somewhat wedge shaped, and strongly divided into two 'layers', consistent with the DNA being wound into about 1 3/4 turns of a flat superhelix of a pitch about 28 A. The organisation of the DNA can be correlated with the results to enzyme digestion studies. A change in the screw of the DNA double helix on nucleosome formation can be deduced.

    View details for Web of Science ID A1977DS90100026

    View details for PubMedID 895884

  • STRUCTURAL PATTERNS IN GLOBULAR PROTEINS NATURE Levitt, M., Chothia, C. 1976; 261 (5561): 552-558


    A simple diagrammatic representation has been used to show the arrangement of alpha helices and beta sheets in 31 globular proteins, which are classified into four clearly separated classes. The observed arrangements are significantly non-random in that pieces of secondary structure adjacent in sequence along the polypeptide chain are also often in contact in three dimensions.

    View details for Web of Science ID A1976BU52600022

    View details for PubMedID 934293