Table of Contents
Article resources
AGRICOLA serves as the catalog and index to the collections of the National Agricultural Library. The records describe publications and resources encompassing all aspects of agriculture and allied disciplines, including animal and veterinary sciences, entomology, plant sciences, forestry, aquaculture and fisheries, farming and farming systems, agricultural economics, extension and education, food and human nutrition, and earth and environmental sciences.
BIOBASE Knowledge Library (BKL), combining the PROTEOME and TRANSFAC suites of databases, offers rich, high quality content and integrated analysis tools spanning gene regulation and pathways to fully annotated genomes.
BIOSIS Citation Index indexes the worldwide literature of research in the biological and biomedical sciences. The database covers the entire field of life sciences including original research reports and reviews in field, laboratory, clinical, experimental, and theoretical work. BIOSIS indexes journals, technical reports, meeting proceedings, United States patents, and books in biology, biomedicine, and related areas.
Developed by the National Center for Biotechnology Information (NCBI), Pubmed is a free version of MEDLINE that offers links to GenBank records, other molecular sequence data, and other resources. MEDLINE, produced by the National Library of Medicine, is an index to journal articles in medicine, nursing, dentistry, veterinary medicine, the health care system, and the preclinical sciences, including both basic biomedical sciences and clinical practice. It indexes over 4,000 journals.
The Web of Science service, available via ISI Web of Knowledge, includes three core component databases: the expanded version of Science Citation Index (SciSearch), Social Sciences Citation Index, and Arts & Humanities Citation Index. The Science Citation Index provides access to current and retrospective bibliographic information, author abstracts, and cited references found in the world's leading scholarly science and technical journals covering over 100 disciplines from 1900 to the present.
xSearch, developed through the Stanford University Libraries' partnership with Deep Web Technologies, provides Stanford researchers and students with a single search option for multiple online resources. Searches may be limited to specific databases, or all available sources may be searched simultaneously. Search results are merged into one relevance ranked list, and are clustered by topic, author, source publication, publisher, and date.
Web resources
The official website of the ASHG provides insight to the world of Human Genetics. There is extensive information about annual meetings, publications, educational resources, projects and policies.
The Arabidopsis Information Resource offers genomic information, as well as analysis tools, links and downloads all about Arabidopsis. It is also searchable for information on genes, DNA, genetic markers, proteins and polymorphisms.
ArrayExpress is a public archive for functional genomics data compliant with MIAME- and MINSEQE requirements in accordance with MGED recommendations. The Gene Expression Atlas uses a curated, re-annotated subset of data from the Archive to provide information about gene expression under various biological conditions.
The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.
BIOBASE Knowledge Library (BKL), combining the PROTEOME and TRANSFAC suites of databases, offers rich, high quality content and integrated analysis tools spanning gene regulation and pathways to fully annotated genomes.
BiosciRegister.com is a manual compilation of biotechnology and life sciences suppliers enabling buyers to easily identify manufacturers of biotechnology equipment and reagents for cell biology, cell culture, genomics, proteomics and more. The site includes new projects, free magazines and trade publications, jobs, news and events.
The aim of the CAMERA project is to serve the needs of the microbial ecology research community by creating a rich, distinctive data repository and a bioinformatics tools resource that will address many of the unique challenges of metagenomic analysis.
CiteXplore combines literature searches of MEDLINE/PubMed and full-text databases such as PubMedCentral and patents with text mining tools for biology. Search results are cross referenced to EBI applications based on publication identifiers. Links to full text versions are provided where available.
Cold Spring Harbor Protocols is an interdisciplinary digital journal providing a definitive source of research methods in cell, developmental and molecular biology, genetics, bioinformatics, protein science, computational biology, immunology, neuroscience and imaging. Each monthly issue details multiple essential methods—a mix of cutting-edge and well-established techniques. Newly commissioned protocols and unsolicited submissions are supplemented with articles based on Cold Spring Harbor Laboratory’s courses and manuals.
The Comprehensive Microbial Resource (CMR) is a free website used to display information on all of the publicly available, complete prokaryotic genomes. In addition to the convenience of having all of the organisms on a single website, common data types across all genomes in the CMR make searches more meaningful, and cross genome analysis highlight differences and similarities between the genomes.
DAVID, maintained by the National Institute of Allergy and Infectious Diseases (NIAID), provides a comprehensive set of functional annotation tools for investigators to understand biological meaning behind a large list of genes.
This E. Coli database has information on genotypes for specific strains of E.Coli in the CGSC database. Also available are gene names, properties, linkage maps, gene product information, and mutation information.
EcoGene is a compilation of information about the genes, proteins, and intergenic regions of the E. coli K-12 genome. Genomic maps, databases and files are downloadable from this site.
The ECR Browser is a dynamic graphical interface that allows users to visualize and analyze Evolutionary Conserved Regions (ECRs) in genomes of sequenced species.
The European Genome-phenome Archive (EGA), maintained by EMBL-EBI, is designed to be a repository for all types of genotype experiments, including case control, population, and family studies. We will include SNP and CNV genotypes from array based methods and genotyping done with re-sequencing methods. This data may be either publicly available or limited access, depending on the design of the study.
The European Nucleotide Archive (also known as EMBL-Bank), maintained by EMBL-EBI, is Europe's primary nucleotide sequence resource. Main sources for DNA and RNA sequences are direct submissions from individual researchers, genome sequencing projects and patent applications. The database is produced in an international collaboration with GenBank (USA) and the DNA Database of Japan (DDBJ). Each of the three groups collects a portion of the total sequence data reported worldwide, and all new and updated database entries are exchanged between the groups on a daily basis.
Ensembl is a joint project between the EMBL-EBI and the Wellcome Trust Sanger Institute that aims at developing a system that maintains automatic annotation of large eukaryotic genomes. It is a comprehensive source of stable annotation with confirmed gene predictions that have been integrated from external data sources. Ensembl annotates known genes and predicts new ones, with functional annotation from InterPro, OMIM, SAGE and gene families.
The Entrez Cross-database Search Page lists, describes, and provides a link to each NCBI database. On this page, all NCBI databases can be searched simultaneously.
The European Molecular Biology Laboratory (EMBL) is dedicated to basic research in the molecular life sciences.
The Experimental Factor Ontology (EFO) is an application focused ontology modeling the experimental factors in ArrayExpress. The ontology has been developed to increase the richness of the annotations that are currently made in the ArrayExpress repository, to promote consistent annotation, to facilitate automatic annotation and to integrate external data.
The Gene Expression Atlas (ArrayExpress Atlas), maintained by EMBL-EBI, is a semantically enriched database of meta-analysis based summary statistics which serves queries for condition-specific gene expression patterns (e.g. genes over-expressed in a particular tissue or disease state) as well as broader exploratory searches for biologically interesting genes/samples. It is based on a subset of the ArrayExpress data.
The Gene Ontology project is a major bioinformatics initiative with the aim of standardizing the representation of gene and gene product attributes across species and databases. The project provides a controlled vocabulary of terms for describing gene product characteristics and gene product annotation data from GO Consortium members, as well as tools to access and process this data.
The GOA project aims to provide high-quality Gene Ontology (GO) annotations to proteins in the UniProt Knowledgebase (UniProtKB) and International Protein Index (IPI) and is a central dataset for other major multi-species databases; such as Ensembl and NCBI.
This browser has genome information on 13 species, including among the most typical species human, mouse and Zebrafish, as well as less common species like chimp, honeybee and dog. Users can search within each species various parameters such as disease, domain, family, gene, peptide, protein and sequence.
The first completed genomes from viruses, phages and organelles were deposited into the EMBL Database in the early 1980's. Since then, molecular biology's shift to obtain the complete sequences of as many genomes as possible combined with major developments in sequencing technology resulted in hundreds of complete genome sequences being added to the database, including Archaea, Bacteria and Eukaryota. These web pages give access to a large number of complete genomes.
The Genomic Science Program of the U.S. Department of Energy provides information for scientists and the public on progress in genomics and a description of the Human Genome Project.
Gramene is a curated, open-source data resource for comparative genome analysis in the grasses.
HOVERGEN is a database of homologous vertebrate genes, structured under the ACNUC sequence database management system. It allows one to select sets of homologous genes among vertebrate species, and to visualize multiple alignments and phylogenetic trees.
HGNC is a web site hosted at the European Bioinformatics Institute. It aims to assign a unique approved gene symbol and name to every gene in the human genome. Standardized gene nomenclature is now widely recognized as an invaluable resource for all researchers and is prominently listed in numerous databases such as Ensembl, Entrez Gene, GeneCards, the UCSC Genome Browser and UniProt.
The HGP site has the most current information about the HGP, including research and education, also included is the history of the project, as well as publications pertaining to the HGP.
InterPro, maintained by EMBL-EBI, is a database of protein families, domains, regions, repeats and sites in which identifiable features found in known proteins can be applied to new protein sequences.
KEGG (Kyoto Encyclopedia of Genes and Genomes) is a database of biological systems, consisting of genetic building blocks of genes and proteins (KEGG GENES), chemical building blocks of both endogenous and exogenous substances (KEGG LIGAND), molecular wiring diagrams of interaction and reaction networks (KEGG PATHWAY), and hierarchies and relationships of various biological objects (KEGG BRITE). KEGG provides a reference knowledge base for linking genomes to biological systems and also to environments by the processes of PATHWAY mapping and BRITE mapping.
MaizeGDB, funded by USDA Agricultural Research Service, is the community database for biological information about the crop plant Zea mays ssp. mays. The following datatypes are accessible through this site: genetic, genomic, sequence, gene product, functional characterization, literature reference, and person/organization contact information.
The goal of the Mammalian Gene Collection (MGC), a trans-NIH initiative, is to provide researchers with unrestricted access to sequence-validated full-length protein-coding (FL-CDS) cDNA clones for human, mouse, and rat genes. In 2005, the project added the cow cDNAs generated by Genome Canada.
Mouse genome Informatics provides accessible data to laboratory mouse genetics, genomics and biology.
The NHGRI website provides information about the institute’s programs, research, and research funding. The site also provides information about the Human Genome Project in a series of pages for students, educators, and others.
Published by Oxford Journals Online, Nucleic Acids Research Journal gives access to full text articles from 1996 to the current issue, with abstracts available as far back as 1975, as well as the annual database issue.
This register is an electronic publication for articles describing the isolation and DNA sequencing of plant genes. The site is searchable with full text reports from 1995-2000.
The Plant Genome Mapping Laboratory at the University of Georgia provides this page of links to web resources for plant genomics and molecular biology. Resources are listed by organism and by topic. Sections of the list cover genetic and physical mapping databases, motif and pattern finding, and gene-finding software.
PlantGDB is an NSF-funded project to develop plant species-specific EST and GSS databases, to provide web-accessible tools and inter-species query capabilities, and to provide genome browsing and annotation capabilities.
Plants and Crops is a list of web resources for plant genomics and other topics related to agriculturally important plants. The site is maintained by the National Agricultural Library.
PDBe is the EBI Protein Structure Database in Europe, a project for the collection, management and distribution of data about macromolecular structures, derived from the Protein Data Bank (PDB).
The Rat Genome Database is a collaborative effort between leading research institutions involved in rat genetic and genomic research.
The Yeast Genome site provides a variety of tools, such as BLAST, PatMatch, gene sequencing resources and chromosomal features search, which can be applied to the S.cerevisiae genome.
The S.pombe genome project allows searching of the genome, including all protein coding genes, psuedogenes, transposons, and tRNAs. Also included are sequence searches and cloning resources.
Scitable is a free science library and personal learning tool brought to you by Nature Publishing Group. Scitable currently concentrates on genetics and cell biology, which include the topics of evolution, gene expression, and cellular processes. The site also offers resources for effective science communication and career paths.
This page, maintained by the European Bioinformatics Institute, lists various sequence database similarity search tools such as FASTA, BLAST, MPsrch and ScanPS. Interactive as well as email submissions are available for each of these services.
The Stanford bioinformatics resource provides workshops, consultations, hardware and software access for various computers and systems for use in Biomedical research.
The Stanford Genome Technology center provides information on the cost-effectiveness of DNA sequencing, and about the genome of S. Cerevisiae. The site also offers publications and project information.
Stanford Genomics Resources provides links to various systematic analysis projects, resources, laboratories, and departments at Stanford University. Genomic research departments are genetics, biochemistry, biological sciences and developmental biology.
TGD is a web-accessible database of information about the Tetrahymena thermophila genome sequence determined at The Institute for Genomic Research (TIGR). TGD provides information on the genome, genes, and proteins of Tetrahymena collected from the scientific literature, research community, and many other sources.
The Universal Protein Resource (UniProt), maintained by EMBL-EBI, is a comprehensive resource for protein sequence and annotation data. The UniProt databases are the UniProt Knowledgebase (UniProtKB), the UniProt Reference Clusters (UniRef), and the UniProt Archive (UniParc). The UniProt Metagenomic and Environmental Sequences (UniMES) database is a repository specifically developed for metagenomic and environmental data.
This Genetics Virtual Library has information on a variety of topics including organisms, countries, human chromosomes and genetic disorders. Each Topic provides a listing of reputable links on the Internet for that topic. Also made available is a list with the most recent additions to the database.
WormBase is dedicated to the biology and genome of C.elegans. It makes available a BLAST search, batch genes, batch sequences, markers and genetic maps.
The Zebrafish Information Network is a comprehensive database on everything there is to know about Zebrafish, including transgenics, wild-type lines, genes, gene expression, genetics maps and publications.
Encyclopedias and handbooks
Supports the development and use of statistical methods for addressing the problems and critical issues that confront scientists, practitioners and policy makers engaged in the life and medical sciences.
Brings together all four fields of genetics, genomics, proteomics and bioinformatics in an online format for faster delivery of content to meet your dynamic research requirements.
Features over 4,900 specially commissioned, peer-reviewed and citable articles spanning the entire spectrum of the life sciences.
Provides a clearly written, user-friendly tool for navigating terminology, concepts, theories, applications, and technology in these dynamic disciplines.
Provides extensive coverage of the field including: articles on plant genetic modification, applications of plant biotechnology to crop improvement in agriculture, production systems for pharmaceutical proteins, ethics and safety issues, public perception and public relations, scale-up, and testing and legislation.
NCBI Entrez resources
The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.
Entrez searches the National Center for Biotechnology Information (NCBI) databases to provide integrated access to nucleotide and protein sequence data from over 160,000 species, along with three-dimensional protein structures, genomic mapping information, PubMed MEDLINE, and more. Sequence data are combined from various sources, including GenBank, EMBL, DDBJ, RefSeq, PIR-International, PRF, Swiss-Prot, and PDB. Entrez can be searched with a wide variety of text terms such as author name, journal name, gene or protein name, organism, unique identifier (e.g., accession number, sequence ID, PubMed ID), and other terms, depending on the database being searched.
The BioSystems database contains records that group together molecules that interact in biological systems. One type of biosystem is a biological pathway, which can consist of interacting genes, proteins, and small molecules. Another type of biosystem is a disease, which can involve components such as genes, biomarkers, and drugs.
In collaboration with authors and publishers, the National Center for Biotechnology Information (NCBI) is adapting biomedical Books for the web.
CDD currently contains domains derived from two popular collections, Smart and Pfam, plus contributions from colleagues at NCBI, such as COG. The source databases also provide descriptions and links to citations. Since conserved domains correspond to compact structural units, CDs contain links to 3D-structure via Cn3D whenever possible.
The database of genotype and phenotype (dbGaP) stores phenotype and genotype data, as well as the associations between them. Studies generating data for dbGaP will include genome-wide association studies, medical sequencing, and molecular diagnostic assays. Summaries of phenotype and genotype data as well as study documents and association analyses (when available) will be found on the public site. Authorized access may be required for downloading coded individual-level phenotypes genotypes and pedigrees.
The EST database contains all records found within the Expressed Sequence Tag (EST) division of GenBank. EST records contain first-pass single-read cDNA sequences and include no annotated biological features.
Gene organizes information about the characteristics and defining sequences of genes from species in Genome, RefSeq, and other model organisms.
The whole Genomes of over 1000 viruses and over 100 microbes can be found in Entrez Genome. The genomes represent both completely sequenced organisms and those for which sequencing is in progress. All three main domains of life - bacteria, archaea, and eukaryota - are represented, as well as many viruses and organelles.
Genome Project is a collection of complete and incomplete large-scale sequencing, assembly, annotation, and mapping projects for cellular organisms. The database is organized into organism-specific overviews that function as portals from which all projects in the database pertaining to that organism can be browsed and retrieved.
Comparable experimental sample sets assembled from the Gene Expression Omnibus (GEO) repository. Entrez GDS queries all GEO DataSet annotation, allowing identification of experiments of interest.
Individual gene expression and molecular abundance profiles assembled from the Gene Expression Omnibus (GEO) repository. Entrez GEO Profiles queries annotation and pre-computed profile characteristics, allowing identification of specific genes, and molecular abundance profiles of interest.
The GSS database contains all records found within the Genome Survey Sequence (GSS) division of GenBank. GSS records contain first-pass single-read genomic sequences and rarely include annotated biological features.
HomoloGene is an automated system for detecting homologs among the annotated genes of several completely sequenced eukaryotic genomes.
The Entrez Journals database searches for a journal and has links to records for that journal in the databases. The Journals database can be searched using the journal title, MEDLINE abbreviation, NLM ID, ISO abbreviation, or ISSN. The database includes the journals in all Entrez databases, e.g., PubMed, Nucleotide, Protein.
MeSH is NLM's controlled vocabulary used for indexing articles in PubMed. MeSH terminology provides a consistent way to retrieve information that may use different terminology for the same concepts. Use the MeSH database to build a PubMed search strategy.
The Nucleotide database contains records for all Entrez Nucleotide sequences that are not found within the Expressed Sequence Tag (EST) or Genome Survey Sequence (GSS) divisions of GenBank. These include sequences from all remaining divisions of GenBank, NCBI Reference Sequences (RefSeqs), Whole Genome Shotgun (WGS) sequences, Third Party Annotation (TPA) sequences, and sequences imported from the Entrez Structure database.
Online Mendelian Inheritance in Animals (OMIA) is a database of genes, inherited disorders and traits in animal species (other than human and mouse) authored by Professor Frank Nicholas of the University of Sydney, Australia, with help from many people over the years. The database contains textual information and references, as well as links to other relevant records.
Online Mendelian Inheritance in Man (OMIM) is a catalog of human genes and genetic disorders, with links to literature references, sequence records, maps, and related databases. It is based on the book, Mendelian Inheritance in Man. The online version is updated daily. The OMIM FAQs provide additional information about the book and the online database.
PopSet is a set of DNA sequences that have been collected to analyse the evolutionary relatedness of a population. The population could originate from different members of the same species, or from organisms from different species. They are submitted to GenBank via Sequin, often as a sequence alignment.
The Probe Database is a public registry of sequence-specific reagents designed for use in a wide variety of biomedical research applications, together with information on reagent availability, experimental protocols, probe effectiveness, and computed sequence similarities.
The Protein entries in the Entrez search and retrieval system have been compiled from a variety of sources, including SwissProt, PIR, PRF, PDB, and translations from annotated coding regions in GenBank and RefSeq.
Protein Clusters is a collection of related protein sequences (clusters). Currently it consists of Reference Sequence proteins encoded by complete prokaryotic and chloroplast genomes and plasmids. This database contains both curated and non-curated clusters.
SNP is NCBI’s database of Single Nucleotide Polymorphisms.
SRA is a database of raw sequence data from sequencing instruments.
Structure: The Molecular Modeling Database (MMDB) contains 3-D macromolecular structures, including proteins and polynucleotides. MMDB contains over 20,000 structures and is linked to the rest of the NCBI databases, including sequences, bibliographic citations, taxonomic classifications, and sequence and structure neighbors.
UniGene is an experimental system for automatically partitioning GenBank sequences into a non-redundant set of gene-oriented clusters. Each UniGene cluster contains sequences that represent a unique gene, as well as related information such as the tissue types in which the gene has been expressed and map location.
The Map Viewer provides special browsing capabilities for a subset of organisms in Entrez Genomes. Map Viewer allows you to view and search an organism's complete genome, display chromosome maps, and zoom into progressively greater levels of detail, down to the sequence data for a region of interest. The number and types of available maps vary by organism. If multiple maps are available for a chromosome, Map Viewer displays them aligned to each other based on shared marker and gene names, and, for the sequence maps, based on a common sequence coordinate system.
The NCBI Taxonomy database contains the names of all organisms that are represented in the genetic databases with at least one nucleotide or protein sequence.
PubChem BioAssay contains the results of biological activity screening from a variety of public sources. It provides searchable descriptions of each bioassay, including descriptions of the conditions and readouts specific to that screening procedure. PubChem BioAssay results are linked to PubChem Substance, and in turn to PubChem Compound, whenever chemical structures are known. Screening results may be browsed via a web interface and also downloaded for further cheminformatics analysis.
PubChem Compound contains chemical structure information drawn from a variety of public sources. Compounds may be searched by chemical properties and are pre-clustered into identity and similarity groups by structure comparison. Whenever possible, compounds are linked via PubChem Substance to information on their biological activities. Available links include PubMed citations, protein 3D structures and links to biological screening results available in PubChem BioAssay.
PubChem Substance contains descriptions of chemical samples, from a variety of public sources, and links to information on their biological activities. The description includes links to PubChem Compound in cases where the chemical structures of compounds in the sample are known. Links providing information on biological activity include links to PubMed citations, protein 3D structures, and to biological screening results available in PubChem BioAssay.
Developed by the National Center for Biotechnology Information (NCBI), Pubmed is a free version of MEDLINE that offers links to GenBank records, other molecular sequence data, and other resources. MEDLINE, produced by the National Library of Medicine, is an index to journal articles in medicine, nursing, dentistry, veterinary medicine, the health care system, and the preclinical sciences, including both basic biomedical sciences and clinical practice. It indexes over 4,000 journals.
PubMed Central (PMC) is the U.S. National Library of Medicine's digital archive of life sciences journal literature. Access to the full text of articles in PMC is free, except where a journal requires a subscription for access to recent articles.