Matei Zaharia

Associate Professor of Computer Science and, by courtesy, of Electrical Engineering

On Leave from 10/01/2022 To 12/31/2022

Bio

Homepage: https://cs.stanford.edu/~matei/

Academic Appointments

Associate Professor, Computer Science
Associate Professor (By courtesy), Electrical Engineering

Contact

Academic
matei@cs.stanford.edu

University - Faculty Department: Computer Science Position: Assoc Professor

Additional Info

Mail Code: 9040

2022-23 Courses

Principles of Data-Intensive Systems
CS 245 (Win)
Independent Studies (12)
- Advanced Reading and Research
  CS 499 (Aut, Win, Spr, Sum)
- Advanced Reading and Research
  CS 499P (Aut, Win, Spr, Sum)
- Curricular Practical Training
  CS 390A (Aut, Win, Spr, Sum)
- Curricular Practical Training
  CS 390B (Win, Sum)
- Curricular Practical Training
  CS 390C (Sum)
- Independent Project
  CS 399 (Aut, Win, Spr)
- Independent Work
  CS 199 (Aut, Win, Spr)
- Independent Work
  CS 199P (Win, Spr)
- Part-time Curricular Practical Training
  CS 390D (Aut, Win)
- Senior Project
  CS 191 (Aut, Win, Spr)
- Supervised Undergraduate Research
  CS 195 (Win)
- Writing Intensive Senior Research Project
  CS 191W (Win, Spr)
Prior Year Courses
2021-22 Courses
- Machine Learning Systems Seminar
  CS 528 (Aut, Win, Spr)
- Principles of Data-Intensive Systems
  CS 245 (Win)
- Value of Data and AI
  CS 320 (Win)
2020-21 Courses
- Principles of Data-Intensive Systems
  CS 245 (Win)
- Value of Data and AI
  CS 320 (Win)
2019-20 Courses
- Principles of Data-Intensive Systems
  CS 245 (Win)
- Value of Data and AI
  CS 320 (Win)

Stanford Advisees

Doctoral Dissertation Reader (AC)
Qian Li, Yilong Li, Yawen Wang
Orals Evaluator
Daniel Kang, Yilong Li
Doctoral Dissertation Co-Advisor (AC)
Lingjiao Chen, Daniel Kang, Omar Khattab
Master's Program Advisor
Victoria DiMelis, Rishab Gargeya, Kai Kato, Max Sobol Mark, Megan Worrel, Emily Yang, Eric Zhang
Doctoral (Program)
Jared Davis, Trevor Gale, Peter Kraft, Liana Patel, Deepti Raghavan, Keshav Santhanam, Gina Yuan

All Publications

Machine Learned Cellular Phenotypes Predict Outcome in Ischemic Cardiomyopathy. Circulation research Rogers, A. J., Selvalingam, A., Alhusseini, M. I., Krummen, D. E., Corrado, C., Abuzaid, F., Baykaner, T., Meyer, C., Clopton, P., Giles, W. R., Bailis, P., Niederer, S. A., Wang, P. J., Rappel, W., Zaharia, M., Narayan, S. M. 2020

Abstract

RATIONALE: Susceptibility to ventricular arrhythmias (VT/VF) is difficult to predict in patients with ischemic cardiomyopathy either by clinical tools or by attempting to translate cellular mechanisms to the bedside.OBJECTIVE: To develop computational phenotypes of patients with ischemic cardiomyopathy, by training then interpreting machine learning (ML) of ventricular monophasic action potentials (MAPs) to reveal phenotypes that predict long-term outcomes.METHODS AND RESULTS: We recorded 5706 ventricular MAPs in 42 patients with coronary disease (CAD) and left ventricular ejection fraction (LVEF) {less than or equal to}40% during steady-state pacing. Patients were randomly allocated to independent training and testing cohorts in a 70:30 ratio, repeated K=10 fold. Support vector machines (SVM) and convolutional neural networks (CNN) were trained to 2 endpoints: (i) sustained VT/VF or (ii) mortality at 3 years. SVM provided superior classification. For patient-level predictions, we computed personalized MAP scores as the proportion of MAP beats predicting each endpoint. Patient-level predictions in independent test cohorts yielded c-statistics of 0.90 for sustained VT/VF (95% CI: 0.76-1.00) and 0.91 for mortality (95% CI: 0.83-1.00) and were the most significant multivariate predictors. Interpreting trained SVM revealed MAP morphologies that, using in silico modeling, revealed higher L-type calcium current or sodium calcium exchanger as predominant phenotypes for VT/VF.CONCLUSIONS: Machine learning of action potential recordings in patients revealed novel phenotypes for long-term outcomes in ischemic cardiomyopathy. Such computational phenotypes provide an approach which may reveal cellular mechanisms for clinical outcomes and could be applied to other conditions.

View details for DOI 10.1161/CIRCRESAHA.120.317345

View details for PubMedID 33167779
DIFF: a relational interface for large-scale data explanation VLDB JOURNAL Abuzaid, F., Kraft, P., Suri, S., Gan, E., Xu, E., Shenoy, A., Ananthanarayan, A., Sheu, J., Meijer, E., Wu, X., Naughton, J., Bailis, P., Zaharia, M. 2020

View details for DOI 10.1007/s00778-020-00633-6

View details for Web of Science ID 000574078100002
Machine Learning to Classify Intracardiac Electrical Patterns during Atrial Fibrillation. Circulation. Arrhythmia and electrophysiology Alhusseini, M. I., Abuzaid, F., Rogers, A. J., Zaman, J. A., Baykaner, T., Clopton, P., Bailis, P., Zaharia, M., Wang, P. J., Rappel, W., Narayan, S. M. 2020

Abstract

Background - Advances in ablation for atrial fibrillation (AF) continue to be hindered by ambiguities in mapping, even between experts. We hypothesized that convolutional neural networks (CNN) may enable objective analysis of intracardiac activation in AF, which could be applied clinically if CNN classifications could also be explained. Methods - We performed panoramic recording of bi-atrial electrical signals in AF. We used the Hilbert-transform to produce 175,000 image grids in 35 patients, labeled for rotational activation by experts who showed consistency but with variability (kappa=0.79). In each patient, ablation terminated AF. A CNN was developed and trained on 100,000 AF image grids, validated on 25,000 grids, then tested on a separate 50,000 grids. Results - In the separate test cohort (50,000 grids), CNN reproducibly classified AF image grids into those with/without rotational sites with 95.0% accuracy (CI 94.8-95.2%). This accuracy exceeded that of support vector machines, traditional linear discriminant and k-nearest neighbor statistical analyses. To probe the CNN, we applied Gradient-weighted Class Activation Mapping which revealed that the decision logic closely mimicked rules used by experts (C-statistic 0.96). Conclusions - Convolutional neural networks improved the classification of intracardiac AF maps compared to other analyses, and agreed with expert evaluation. Novel explainability analyses revealed that the CNN operated using a decision logic similar to rules used by experts, even though these rules were not provided in training. We thus describe a scaleable platform for robust comparisons of complex AF data from multiple systems, which may provide immediate clinical utility to guide ablation.

View details for DOI 10.1161/CIRCEP.119.008160

View details for PubMedID 32631100
Approximate Selection with Guarantees using Proxies PROCEEDINGS OF THE VLDB ENDOWMENT Kang, D., Gan, E., Bailis, P., Hashimoto, T., Zaharia, M. 2020; 13 (11): 1990–2003

View details for DOI 10.14778/3407790.3407804

View details for Web of Science ID 000573965600014
PREDICTING SUDDEN CARDIAC DEATH BY MACHINE LEARNING OF VENTRICULAR ACTION POTENTIALS Selvalingam, A., Alhusseini, M., Rogers, A. J., Krummen, D., Abuzaid, F. M., Baykaner, T., Clopton, P., Bailis, P., Zaharia, M., Wang, P., Narayan, S. ELSEVIER SCIENCE INC. 2020: 427

View details for Web of Science ID 000522979100416
Fleet: A Framework for Massively Parallel Streaming on FPGAs Thomas, J., Hanrahan, P., Zaharia, M., ACM ASSOC COMPUTING MACHINERY. 2020: 639–51

View details for DOI 10.1145/3373376.3378495

View details for Web of Science ID 000541369300041
BlazeIt: Optimizing Declarative Aggregation and Limit Queries for Neural Network-Based Video Analytics PROCEEDINGS OF THE VLDB ENDOWMENT Kang, D., Bailis, P., Zaharia, M. 2019; 13 (4): 533–46

View details for DOI 10.14778/3372716.3372725

View details for Web of Science ID 000573950100009
To Index or Not to Index: Optimizing Exact Maximum Inner Product Search Abuzaid, F., Sethi, G., Bailis, P., Zaharia, M., IEEE IEEE. 2019: 1250–61

View details for DOI 10.1109/ICDE.2019.00114

View details for Web of Science ID 000477731600107
Optimizing Data-Intensive Computations in Existing Libraries with Split Annotations Palkar, S., Zaharia, M., ACM ASSOC COMPUTING MACHINERY. 2019: 291–305

View details for DOI 10.1145/3341301.3359652

View details for Web of Science ID 000524218600019
PipeDream: Generalized Pipeline Parallelism for DNN Training Narayanan, D., Harlap, A., Phanishayee, A., Seshadri, V., Devanur, N. R., Ganger, G. R., Gibbons, P. B., Zaharia, M., ACM ASSOC COMPUTING MACHINERY. 2019: 1–15

View details for DOI 10.1145/3341301.3359646

View details for Web of Science ID 000524218600001
TASO: Optimizing Deep Learning Computation with Automatic Generation of Graph Substitutions Jia, Z., Padon, O., Thomas, J., Warszawski, T., Zaharia, M., Aiken, A., ACM ASSOC COMPUTING MACHINERY. 2019: 47–62

View details for DOI 10.1145/3341301.3359630

View details for Web of Science ID 000524218600004
From Laptop to Lambda: Outsourcing Everyday Jobs to Thousands of Transient Functional Containers Fouladi, S., Romero, F., Iter, D., Li, Q., Chatterjee, S., Kozyrakis, C., Zaharia, M., Winstein, K., USENIX Assoc USENIX ASSOC. 2019: 475–88

View details for Web of Science ID 000489756800033
DIFF: A Relational Interface for Large-Scale Data Explanation PROCEEDINGS OF THE VLDB ENDOWMENT Abuzaid, F., Kraft, P., Suri, S., Gan, E., Xu, E., Shenoy, A., Ananthanarayan, A., Sheu, J., Meijer, E., Wu, X., Naughton, J., Bailis, P., Zaharia, M. 2018; 12 (4): 419–32

View details for DOI 10.14778/3297753.3297761

View details for Web of Science ID 000497516500009
Structured Streaming: A Declarative API for Real-Time Applications in Apache Spark Armbrust, M., Das, T., Torres, J., Yavuz, B., Zhu, S., Xin, R., Ghodsi, A., Stoica, I., Zaharia, M., Das, G., Jermaine, C., Bernstein, P., Eldawy, A. ASSOC COMPUTING MACHINERY. 2018: 601–13

View details for DOI 10.1145/3183713.3190664

View details for Web of Science ID 000460373700041
MISTIQUE: A System to Store and Query Model Intermediates for Model Diagnosis Vartak, M., da Trindade, J. F., Madden, S., Zaharia, M., Das, G., Jermaine, C., Bernstein, P., Eldawy, A. ASSOC COMPUTING MACHINERY. 2018: 1285–1300

View details for DOI 10.1145/3183713.3196934

View details for Web of Science ID 000460373700086
NoScope: Optimizing Neural Network Queries over Video at Scale PROCEEDINGS OF THE VLDB ENDOWMENT Kang, D., Emmons, J., Abuzaid, F., Bailis, P., Zaharia, M. 2017; 10 (11): 1586–97

View details for Web of Science ID 000416492900036
Splinter: Practical Private Queries on Public Data Wang, F., Yun, C., Goldwasser, S., Vaikuntanathan, V., Zaharia, M., USENIX Assoc USENIX ASSOC. 2017: 299–313

View details for Web of Science ID 000427296400019
DIY Hosting for Online Privacy Palkar, S., Zaharia, M., Assoc Comp Machinery ASSOC COMPUTING MACHINERY. 2017: 1–7

View details for DOI 10.1145/3152434.3152459

View details for Web of Science ID 000440700800001
Making Caches Work for Graph Analytics Zhang, Y., Kiriansky, V., Mendis, C., Amarasinghe, S., Zaharia, M., Nie, J. Y., Obradovic, Z., Suzumura, T., Ghosh, R., Nambiar, R., Wang, C., Zang, H., BaezaYates, R., Hu, Kepner, J., Cuzzocrea, A., Tang, J., Toyoda, M. IEEE. 2017: 293–302

View details for Web of Science ID 000428073700037
Apache Spark: A Unified Engine for Big Data Processing COMMUNICATIONS OF THE ACM Zaharia, M., Xin, R. S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., Venkataraman, S., Franklin, M. J., Ghodsi, A., Gonzalez, J., Shenker, S., Stoica, I. 2016; 59 (11): 56-65

View details for DOI 10.1145/2934664

View details for Web of Science ID 000387897700022
Voodoo - A Vector Algebra for Portable Database Performance on Modern Hardware PROCEEDINGS OF THE VLDB ENDOWMENT Pirk, H., Moll, O., Zaharia, M., Madden, S. 2016; 9 (14): 1707–18

View details for DOI 10.14778/3007328.3007336

View details for Web of Science ID 000386431500008
MLlib: Machine Learning in Apache Spark JOURNAL OF MACHINE LEARNING RESEARCH Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., Freeman, J., Tsai, D. B., Amde, M., Owen, S., Xin, D., Xin, R., Franklin, M. J., Zadeh, R., Zaharia, M., Talwalkar, A. 2016; 17

View details for Web of Science ID 000391480800001
GraphFrames: An Integrated API for Mixing Graph and Relational Queries Dave, A., Jindal, A., Li, L., Xin, R., Gonzalez, J., Zaharia, M., ACM ASSOC COMPUTING MACHINERY. 2016

View details for DOI 10.1145/2960414.2960416

View details for Web of Science ID 000383740900002
FairRide: Near-Optimal, Fair Cache Sharing Pu, Q., Li, H., Zaharia, M., Ghodsi, A., Stoica, I., USENIX Assoc USENIX ASSOC. 2016: 393–406

View details for Web of Science ID 000385264700026
SparkR: Scaling R Programs with Spark Venkataraman, S., Yang, Z., Liu, D., Liang, E., Falaki, H., Meng, X., Xin, R., Ghodsi, A., Franklin, M., Stoica, I., Zaharia, M., ACM SIGMOD ASSOC COMPUTING MACHINERY. 2016: 1099–1104

View details for DOI 10.1145/2882903.2903740

View details for Web of Science ID 000452538600074
Introduction to Spark 2.0 for Database Researchers Armbrust, M., Bateman, D., Xin, R., Zaharia, M., ACM SIGMOD ASSOC COMPUTING MACHINERY. 2016: 2193–94

View details for DOI 10.1145/2882903.2912565

View details for Web of Science ID 000452538600169
Yggdrasil: An Optimized System for Training Deep Decision Trees at Scale Abuzaid, F., Bradley, J., Liang, F., Feng, A., Yang, L., Zaharia, M., Talwalkar, A., Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, Garnett, R. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2016

View details for Web of Science ID 000458973703002
Matrix Computations and Optimization in Apache Spark Zadeh, R., Meng, X., Ulanov, A., Yavuz, B., Pu, L., Venkataraman, S., Sparks, E., Staple, A., Zaharia, M., Assoc Comp Machinery ASSOC COMPUTING MACHINERY. 2016: 31–38

View details for DOI 10.1145/2939672.2939675

View details for Web of Science ID 000485529800009
Scaling Spark in the Real World: Performance and Usability PROCEEDINGS OF THE VLDB ENDOWMENT Armbrust, M., Das, T., Davidson, A., Ghodsi, A., Or, A., Rosen, J., Stoica, I., Wendell, P., Xin, R., Zaharia, M. 2015; 8 (12): 1840–43

View details for Web of Science ID 000386424800046
Vuvuzela: Scalable Private Messaging Resistant to Traffic Analysis van den Hooff, J., Lazar, D., Zaharia, M., Zeldovich, N., Assoc Comp Machinery ASSOC COMPUTING MACHINERY. 2015: 137–52

View details for DOI 10.1145/2815400.2815417

View details for Web of Science ID 000494968800009
Spark SQL: Relational Data Processing in Spark Armbrust, M., Xin, R. S., Lian, C., Huai, Y., Liu, D., Bradley, J. K., Meng, X., Kaftan, T., Franklint, M. J., Ghodsi, A., Zaharia, M., ACM SIGMOD ASSOC COMPUTING MACHINERY. 2015: 1383–94

View details for DOI 10.1145/2723372.2742797

View details for Web of Science ID 000452535700109
Optimally designing games for behavioural research PROCEEDINGS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES Rafferty, A. N., Zaharia, M., Griffiths, T. L. 2014; 470 (2167): 20130828

Abstract

Computer games can be motivating and engaging experiences that facilitate learning, leading to their increasing use in education and behavioural experiments. For these applications, it is often important to make inferences about the knowledge and cognitive processes of players based on their behaviour. However, designing games that provide useful behavioural data are a difficult task that typically requires significant trial and error. We address this issue by creating a new formal framework that extends optimal experiment design, used in statistics, to apply to game design. In this framework, we use Markov decision processes to model players' actions within a game, and then make inferences about the parameters of a cognitive model from these actions. Using a variety of concept learning games, we show that in practice, this method can predict which games will result in better estimates of the parameters of interest. The best games require only half as many players to attain the same level of precision.

View details for DOI 10.1098/rspa.2013.0828

View details for Web of Science ID 000336184600004

View details for PubMedID 25002821

View details for PubMedCentralID PMC4032552
A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples GENOME RESEARCH Naccache, S. N., Federman, S., Veeraraghavan, N., Zaharia, M., Lee, D., Samayoa, E., Bouquet, J., Greninger, A. L., Luk, K., Enge, B., Wadford, D. A., Messenger, S. L., Genrich, G. L., Pellegrino, K., Grard, G., Leroy, E., Schneider, B. S., Fair, J. N., Martinez, M. A., Isa, P., Crump, J. A., DeRisi, J. L., Sittler, T., Hackett, J., Miller, S., Chiu, C. Y. 2014; 24 (7): 1180–92

Abstract

Unbiased next-generation sequencing (NGS) approaches enable comprehensive pathogen detection in the clinical microbiology laboratory and have numerous applications for public health surveillance, outbreak investigation, and the diagnosis of infectious diseases. However, practical deployment of the technology is hindered by the bioinformatics challenge of analyzing results accurately and in a clinically relevant timeframe. Here we describe SURPI ("sequence-based ultrarapid pathogen identification"), a computational pipeline for pathogen identification from complex metagenomic NGS data generated from clinical samples, and demonstrate use of the pipeline in the analysis of 237 clinical samples comprising more than 1.1 billion sequences. Deployable on both cloud-based and standalone servers, SURPI leverages two state-of-the-art aligners for accelerated analyses, SNAP and RAPSearch, which are as accurate as existing bioinformatics tools but orders of magnitude faster in performance. In fast mode, SURPI detects viruses and bacteria by scanning data sets of 7-500 million reads in 11 min to 5 h, while in comprehensive mode, all known microorganisms are identified, followed by de novo assembly and protein homology searches for divergent viruses in 50 min to 16 h. SURPI has also directly contributed to real-time microbial diagnosis in acutely ill patients, underscoring its potential key role in the development of unbiased NGS-based clinical assays in infectious diseases that demand rapid turnaround times.

View details for DOI 10.1101/gr.171934.113

View details for Web of Science ID 000338185000012

View details for PubMedID 24899342

View details for PubMedCentralID PMC4079973
Multi-Resource Fair Queueing for Packet Processing ACM SIGCOMM COMPUTER COMMUNICATION REVIEW Ghodsi, A., Sekar, V., Zaharia, M., Stoica, I. 2012; 42 (4): 1–12

View details for DOI 10.1145/2377677.2377679

View details for Web of Science ID 000309217600001
Managing Data Transfers in Computer Clusters with Orchestra ACM SIGCOMM COMPUTER COMMUNICATION REVIEW Chowdhury, M., Zaharia, M., Ma, J., Jordan, M. I., Stoica, I. 2011; 41 (4): 98–109

View details for DOI 10.1145/2043164.2018448

View details for Web of Science ID 000302124800009

Matei Zaharia

Associate Professor of Computer Science and, by courtesy, of Electrical Engineering

Bio

Academic Appointments

Contact

Additional Info

Links

2022-23 Courses

2021-22 Courses

2020-21 Courses

2019-20 Courses

Stanford Advisees

All Publications

Abstract

Abstract

Abstract

Abstract