CS547 Human-Computer Interaction Seminar  (Seminar on People, Computers, and Design)

Fridays 12:30-1:50 · Gates B01 · Open to the public
Previous | Next
Archive
Diana Maclean
Stanford University
Insights from Patient Authored Text : From Close Reading to Automated Extraction
October 24, 2014

You need Flash player 8+ and JavaScript enabled to view this video.
Millions of people collaborate online with others who share their health concerns. In the process, these users perform complex health-related tasks, such as differential diagnosis and treatment comparison. The result is a massive, growing and readily accessible corpus of Patient Authored Text (PAT) that documents patients' behavior outside of the clinical environment. As a result, PAT can provide insights into otherwise obscure topics, such as why patients follow only certain parts of a treatment protocol, or how people self-treat stigmatized conditions such as prescription drug addiction.

Despite the potential value of PAT, attempts to extract medically-relevant insights from it have been limited. PAT is notoriously noisy and challenging to work with, and there is a dearth of methods and tools for processing and analyzing it. Moreover, the specific research questions that PAT can support are not obvious: determining what data PAT encodes, and how these data are encoded, is a challenge in and of itself.

In this thesis, I develop methods for automatically extracting medically-relevant data from PAT. I focus specifically on the topic of Addiction: a stigmatized and prevalent medical condition. Building on close readings of source text to inform schema induction, data annotation, and feature engineering, I train classifiers that accurately identify (1) medically-relevant terms in PAT; (2) users' motivations for participating in an Addiction online health community; (3) users' drugs of choice, and (4) users' transitions through relapse and recovery. Using our classifiers to scale our analyses to large PAT corpora, we derive novel insights into the process of Addiction, as well as the role that online health communities play in giving users informational and emotional support and, ultimately, in enabling recovery.

In concert, our contributions both underscore PAT's latent value for illuminating poorly understood or clandestine medical topics, and offer viable methods that dramatically improve our ability to realize this value.


Diana MacLean is a PhD candidate in the Computer Science Department at Stanford University, where she is advised by Jeffrey Heer. Her thesis comprises cross-disciplinary research focused on extracting medically-relevant data from patient-authored text on the topic of substance abuse. Out of the office, Diana is an avid rock climber and Tomb Raider fan. Diana received her BA from Harvard University in 2009 in Computer Science.