Format

Send to

Choose Destination
BMC Bioinformatics. 2017 Jun 14;18(1):302. doi: 10.1186/s12859-017-1702-0.

3D deep convolutional neural networks for amino acid environment similarity analysis.

Author information

1
Deparment of Bioengineering, Stanford University, Stanford, CA, 94305, USA.
2
Deparment of Bioengineering, Stanford University, Stanford, CA, 94305, USA. russ.altman@stanford.edu.
3
Department of Genetics, Stanford University, Stanford, CA, 94305, USA. russ.altman@stanford.edu.

Abstract

BACKGROUND:

Central to protein biology is the understanding of how structural elements give rise to observed function. The surfeit of protein structural data enables development of computational methods to systematically derive rules governing structural-functional relationships. However, performance of these methods depends critically on the choice of protein structural representation. Most current methods rely on features that are manually selected based on knowledge about protein structures. These are often general-purpose but not optimized for the specific application of interest. In this paper, we present a general framework that applies 3D convolutional neural network (3DCNN) technology to structure-based protein analysis. The framework automatically extracts task-specific features from the raw atom distribution, driven by supervised labels. As a pilot study, we use our network to analyze local protein microenvironments surrounding the 20 amino acids, and predict the amino acids most compatible with environments within a protein structure. To further validate the power of our method, we construct two amino acid substitution matrices from the prediction statistics and use them to predict effects of mutations in T4 lysozyme structures.

RESULTS:

Our deep 3DCNN achieves a two-fold increase in prediction accuracy compared to models that employ conventional hand-engineered features and successfully recapitulates known information about similar and different microenvironments. Models built from our predictions and substitution matrices achieve an 85% accuracy predicting outcomes of the T4 lysozyme mutation variants. Our substitution matrices contain rich information relevant to mutation analysis compared to well-established substitution matrices. Finally, we present a visualization method to inspect the individual contributions of each atom to the classification decisions.

CONCLUSIONS:

End-to-end trained deep learning networks consistently outperform methods using hand-engineered features, suggesting that the 3DCNN framework is well suited for analysis of protein microenvironments and may be useful for other protein structural analyses.

KEYWORDS:

Amino acid similarities; Convolutional neural network; Deep learning; Mutation analysis; Protein structural analysis; Structural bioinformatics

PMID:
28615003
PMCID:
PMC5472009
DOI:
10.1186/s12859-017-1702-0
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for BioMed Central Icon for PubMed Central
Loading ...
Support Center