Format

Send to

Choose Destination
BMC Bioinformatics. 2009 Feb 5;10 Suppl 2:S6. doi: 10.1186/1471-2105-10-S2-S6.

Pharmspresso: a text mining tool for extraction of pharmacogenomic concepts and relationships from full text.

Author information

1
Biomedical Informatics Training Program, Stanford University, Stanford, CA, USA. ygarten@stanford.edu

Abstract

BACKGROUND:

Pharmacogenomics studies the relationship between genetic variation and the variation in drug response phenotypes. The field is rapidly gaining importance: it promises drugs targeted to particular subpopulations based on genetic background. The pharmacogenomics literature has expanded rapidly, but is dispersed in many journals. It is challenging, therefore, to identify important associations between drugs and molecular entities--particularly genes and gene variants, and thus these critical connections are often lost. Text mining techniques can allow us to convert the free-style text to a computable, searchable format in which pharmacogenomic concepts (such as genes, drugs, polymorphisms, and diseases) are identified, and important links between these concepts are recorded. Availability of full text articles as input into text mining engines is key, as literature abstracts often do not contain sufficient information to identify these pharmacogenomic associations.

RESULTS:

Thus, building on a tool called Textpresso, we have created the Pharmspresso tool to assist in identifying important pharmacogenomic facts in full text articles. Pharmspresso parses text to find references to human genes, polymorphisms, drugs and diseases and their relationships. It presents these as a series of marked-up text fragments, in which key concepts are visually highlighted. To evaluate Pharmspresso, we used a gold standard of 45 human-curated articles. Pharmspresso identified 78%, 61%, and 74% of target gene, polymorphism, and drug concepts, respectively.

CONCLUSION:

Pharmspresso is a text analysis tool that extracts pharmacogenomic concepts from the literature automatically and thus captures our current understanding of gene-drug interactions in a computable form. We have made Pharmspresso available at http://pharmspresso.stanford.edu.

PMID:
19208194
PMCID:
PMC2646239
DOI:
10.1186/1471-2105-10-S2-S6
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for BioMed Central Icon for PubMed Central
Loading ...
Support Center