J Vasc Surg. 2016 Nov;64(5):1515-1522.e3. doi: 10.1016/j.jvs.2016.04.026. Epub 2016 Jun 3.

The use of machine learning for the identification of peripheral artery disease and future mortality risk.

Ross EG¹, Shah NH², Dalman RL¹, Nead KT³, Cooke JP⁴, Leeper NJ⁵.

Author information

1: Division of Vascular Surgery, Stanford Health Care, Stanford, Calif.
2: Center for Biomedical Informatics Research, Stanford University, Stanford, Calif.
3: Department of Radiation Oncology, University of Pennsylvania, Philadelphia, Pa.
4: Department of Cardiovascular Sciences, Houston Methodist Research Institute, Houston, Tex; Center for Cardiovascular Regeneration, Houston Methodist DeBakey Heart and Vascular Center, Houston, Tex.
5: Division of Vascular Surgery, Stanford Health Care, Stanford, Calif. Electronic address: nleeper@stanford.edu.

Abstract

OBJECTIVE:

A key aspect of the precision medicine effort is the development of informatics tools that can analyze and interpret "big data" sets in an automated and adaptive fashion while providing accurate and actionable clinical information. The aims of this study were to develop machine learning algorithms for the identification of disease and the prognostication of mortality risk and to determine whether such models perform better than classical statistical analyses.

METHODS:

Focusing on peripheral artery disease (PAD), patient data were derived from a prospective, observational study of 1755 patients who presented for elective coronary angiography. We employed multiple supervised machine learning algorithms and used diverse clinical, demographic, imaging, and genomic information in a hypothesis-free manner to build models that could identify patients with PAD and predict future mortality. Comparison was made to standard stepwise linear regression models.

RESULTS:

Our machine-learned models outperformed stepwise logistic regression models both for the identification of patients with PAD (area under the curve, 0.87 vs 0.76, respectively; P = .03) and for the prediction of future mortality (area under the curve, 0.76 vs 0.65, respectively; P = .10). Both machine-learned models were markedly better calibrated than the stepwise logistic regression models, thus providing more accurate disease and mortality risk estimates.

CONCLUSIONS:

Machine learning approaches can produce more accurate disease classification and prediction models. These tools may prove clinically useful for the automated identification of patients with highly morbid diseases for which aggressive risk factor management can improve outcomes.