Todd Ferris
Chief Technology Officer, SoM - Information Resources & Technology
Bio
Publications
All Publications
-
Cohort Discovery Query Optimization via Computable Controlled Vocabulary Versioning.
Studies in health technology and informatics
2015; 216: 1084-?
Abstract
Self-service cohort discovery tools strive to provide intuitive interfaces to large Clinical Data Warehouses that contain extensive historic information. In those tools, controlled vocabulary (e.g., ICD-9-CM, CPT) coded clinical information is often the main search criteria used because of its ubiquity in billing processes. These tools generally require a researcher to pick specific terms from the controlled vocabulary. However, controlled vocabularies evolve over time as medical knowledge changes and can even be replaced with new versions (e.g., ICD-9 to ICD-10). These tools generally only display the current version of the controlled vocabulary. Researchers should not be expected to understand the underlying controlled vocabulary versioning issues. We propose a computable controlled vocabulary versioning system that allows cohort discovery tools to automatically expand queries to account for terminology changes.
View details for PubMedID 26262383
-
Pharmacovigilance using clinical notes.
Clinical pharmacology & therapeutics
2013; 93 (6): 547-555
Abstract
With increasing adoption of electronic health records (EHRs), there is an opportunity to use the free-text portion of EHRs for pharmacovigilance. We present novel methods that annotate the unstructured clinical notes and transform them into a deidentified patient-feature matrix encoded using medical terminologies. We demonstrate the use of the resulting high-throughput data for detecting drug-adverse event associations and adverse events associated with drug-drug interactions. We show that these methods flag adverse events early (in most cases before an official alert), allow filtering of spurious signals by adjusting for potential confounding, and compile prevalence information. We argue that analyzing large volumes of free-text clinical notes enables drug safety surveillance using a yet untapped data source. Such data mining can be used for hypothesis generation and for rapid analysis of suspected adverse event risk.
View details for DOI 10.1038/clpt.2013.47
View details for PubMedID 23571773
-
A simple heuristic for blindfolded record linkage
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION
2012; 19 (E1): E157-E161
Abstract
To address the challenge of balancing privacy with the need to create cross-site research registry records on individual patients, while matching the data for a given patient as he or she moves between participating sites. To evaluate the strategy of generating anonymous identifiers based on real identifiers in such a way that the chances of a shared patient being accurately identified were maximized, and the chances of incorrectly joining two records belonging to different people were minimized.Our hypothesis was that most variation in names occurs after the first two letters, and that date of birth is highly reliable, so a single match variable consisting of a hashed string built from the first two letters of the patient's first and last names plus their date of birth would have the desired characteristics. We compared and contrasted the match algorithm characteristics (rate of false positive v. rate of false negative) for our chosen variable against both Social Security Numbers and full names.In a data set of 19 000 records, a derived match variable consisting of a 2-character prefix from both first and last names combined with date of birth has a 97% sensitivity; by contrast, an anonymized identifier based on the patient's full names and date of birth has a sensitivity of only 87% and SSN has sensitivity 86%.The approach we describe is most useful in situations where privacy policies preclude the full exchange of the identifiers required by more sophisticated and sensitive linkage algorithms. For data sets of sufficiently high quality this effective approach, while producing a lower rate of matching than more complex algorithms, has the merit of being easy to explain to institutional review boards, adheres to the minimum necessary rule of the HIPAA privacy rule, and is faster and less cumbersome to implement than a full probabilistic linkage.
View details for DOI 10.1136/amiajnl-2011-000329
View details for Web of Science ID 000314151400026
View details for PubMedID 22298567
-
Managing Medical Vocabulary Updates in a Clinical Data Warehouse: An RxNorm Case Study.
AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium
2010; 2010: 477-481
Abstract
Use of terminology standards facilitates aggregating data from multiple sources for information retrieval, exchange and analysis. However, medical vocabularies are continuously updated and incorporating those changes consistently into clinical data warehouses requires rigorous methodology. To integrate pharmacy data from two hospital pharmacy information systems the Stanford Translational Research Integrated Database Environment (STRIDE) project mapped medication orders to RxNorm content using the RxNorm drug model. In order to keep the data relevant and up-to-date, we developed a strategy for updating to RxNorm, while preserving the original meaning and mapping of the legacy data. This case study discusses managing the vocabulary update by following the RxNorm content maintenance strategy and supplementing it with operations to retain access to its drug model information.
View details for PubMedID 21347024
View details for PubMedCentralID PMC3041287
-
Automated mapping of pharmacy orders from two electronic health record systems to RxNorm within the STRIDE clinical data warehouse.
AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium
2009; 2009: 244-248
Abstract
The Stanford Translational Research Integrated Database Environment (STRIDE) clinical data warehouse integrates medication information from two Stanford hospitals that use different drug representation systems. To merge this pharmacy data into a single, standards-based model supporting research we developed an algorithm to map HL7 pharmacy orders to RxNorm concepts. A formal evaluation of this algorithm on 1.5 million pharmacy orders showed that the system could accurately assign pharmacy orders in over 96% of cases. This paper describes the algorithm and discusses some of the causes of failures in mapping to RxNorm.
View details for PubMedID 20351858
View details for PubMedCentralID PMC2815471
-
STRIDE--An integrated standards-based translational research informatics platform.
AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium
2009; 2009: 391-395
Abstract
STRIDE (Stanford Translational Research Integrated Database Environment) is a research and development project at Stanford University to create a standards-based informatics platform supporting clinical and translational research. STRIDE consists of three integrated components: a clinical data warehouse, based on the HL7 Reference Information Model (RIM), containing clinical information on over 1.3 million pediatric and adult patients cared for at Stanford University Medical Center since 1995; an application development framework for building research data management applications on the STRIDE platform and a biospecimen data management system. STRIDE's semantic model uses standardized terminologies, such as SNOMED, RxNorm, ICD and CPT, to represent important biomedical concepts and their relationships. The system is in daily use at Stanford and is an important component of Stanford University's CTSA (Clinical and Translational Science Award) Informatics Program.
View details for PubMedID 20351886
View details for PubMedCentralID PMC2815452
-
Novel integration of hospital electronic medical records and gene expression measurements to identify genetic markers of maturation.
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
2008: 243-254
Abstract
Traditionally, the elucidation of genes involved in maturation and aging has been studied in a temporal fashion by examining gene expression at different time points in an organism's life as well as by knocking out, knocking in, and mutating genes thought to be involved. Here, we propose an in silico method to combine clinical electronic medical record (EMR) data and gene expression measurements in the context of disease to identify genes that may be involved in the process of human maturation and aging. First we show that absolute lymphocyte count may serve as a biomarker for maturation by using statistical methods to compare trends among different clinical laboratory tests in response to an increase in age. We then propose using the rate of decay for absolute lymphocyte count across 12 diseases as a proxy for differences in aging. We correlate the differing rates with gene expression across the same diseases to find maturation/aging related genes. Among the 53 genes with strongest correlations between expression profile and change in rate of decay, we found genes previously implicated in the process of aging, including MGMT (DNA repair), TERF2 (telomere stability), POLD1 (DNA replication and repair), and POLG (mtDNA replication).
View details for PubMedID 18229690
-
Clinical arrays of laboratory measures, or "clinarrays", built from an electronic health record enable disease subtyping by severity.
AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium
2007: 115-119
Abstract
The severity of diseases has often been assigned by direct observation of a patient and by pathological examination after symptoms have appeared. As we move into the genomic era, the ability to predict disease severity prior to manifestation has improved dramatically due to genomic sequencing and analysis of gene expression microarrays. However, as the severity of diseases can be exacerbated by non genetic factors, the ability to predict disease severity by examining gene expression alone may be inadequate. We propose the creation of a "clinarray" to examine phenotypic expression in the form of clinical laboratory measurements. We demonstrate that the clinarray can be used to distinguish between the severities of patients with cystic fibrosis and those with Crohn's disease by applying unsupervised clustering methods that have been previously applied to microarrays.
View details for PubMedID 18693809
-
A proposed key escrow system for secure patient information disclosure in biomedical research databases
Annual Symposium of the American-Medical-Informatics-Association
HANLEY & BELFUS INC MED PUBLISHERS. 2002: 245–249
Abstract
Access to clinical data is of increasing importance to biomedical research. The pending HIPAA privacy regulations provide specific requirements for the release of protected health information. Under the regulations, biomedical researchers may utilize anonymized data, or adhere to HIPAA requirements regarding protected health information. In order to provide researchers with anonymized data from a clinical research database, we reviewed several published strategies for de-identification of protected health information. Critical analysis with respect to this project suggests that de-identification alone is problematic when applied to clinical research databases. We propose a hybrid system; utilizing secure key escrow, de-identification, and role-based access for IRB approved researchers.
View details for Web of Science ID 000189418100050
View details for PubMedID 12463824