
Gregory Valiant
Assistant Professor of Computer Science
Professional Education
-
PhD, UC Berkeley, Computer Science (2012)
-
BA, Harvard University, Mathematics (2006)
Current Research and Scholarly Interests
My primary research interests lie at the intersection of algorithms, learning, applied probability, and statistics. I am particularly interested in understanding the algorithmic and information theoretic possibilities and limitations for many fundamental information extraction tasks that underly real-world machine learning and data-centric applications.
2018-19 Courses
- Randomized Algorithms and Probabilistic Analysis
CME 309, CS 265 (Aut) - The Modern Algorithmic Toolbox
CS 168 (Spr) -
Independent Studies (18)
- Advanced Reading and Research
CS 499 (Aut, Win, Spr) - Advanced Reading and Research
CS 499P (Aut, Win, Spr, Sum) - Curricular Practical Training
CS 390A (Aut, Win, Spr, Sum) - Curricular Practical Training
CS 390B (Win, Spr) - Curricular Practical Training
CS 390C (Spr) - Independent Project
CS 399 (Aut, Win, Spr, Sum) - Independent Project
CS 399P (Win, Spr, Sum) - Independent Work
CS 199 (Aut, Win, Spr) - Independent Work
CS 199P (Spr) - Part-Time CPT
CS 390S (Aut) - Part-Time CPT
CS 390T (Win) - Part-Time Curricular Practical Training
CS 390Q (Win, Spr) - Part-Time Curricular Practical Training
CS 390U (Spr) - Part-time Curricular Practical Training
CS 390D (Aut) - Part-time Curricular Practical Training
CS 390P (Win, Spr) - Senior Project
CS 191 (Aut, Win, Spr, Sum) - Supervised Undergraduate Research
CS 195 (Aut) - Writing Intensive Senior Project (WIM)
CS 191W (Aut, Win, Spr)
- Advanced Reading and Research
-
Prior Year Courses
2017-18 Courses
- Randomized Algorithms and Probabilistic Analysis
CME 309, CS 265 (Aut) - The Modern Algorithmic Toolbox
CS 168 (Spr)
2016-17 Courses
- Design and Analysis of Algorithms
CS 161 (Win) - Randomized Algorithms and Probabilistic Analysis
CME 309, CS 265 (Aut) - The Modern Algorithmic Toolbox
CS 168 (Spr)
2015-16 Courses
- Randomized Algorithms and Probabilistic Analysis
CME 309, CS 265 (Aut) - The Modern Algorithmic Toolbox
CS 168 (Spr)
- Randomized Algorithms and Probabilistic Analysis
All Publications
- Learning from Untrusted Data Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing (STOC)
-
SPECTRUM ESTIMATION FROM SAMPLES
ANNALS OF STATISTICS
2017; 45 (5): 2218–47
View details for DOI 10.1214/16-AOS1525
View details for Web of Science ID 000416455300014
-
AN AUTOMATIC INEQUALITY PROVER AND INSTANCE OPTIMAL IDENTITY TESTING
SIAM JOURNAL ON COMPUTING
2017; 46 (1): 429-455
View details for DOI 10.1137/151002526
View details for Web of Science ID 000396677400016
-
Quantifying unobserved protein-coding variants in human populations provides a roadmap for large-scale sequencing projects.
Nature communications
2016; 7: 13293-?
Abstract
As new proposals aim to sequence ever larger collection of humans, it is critical to have a quantitative framework to evaluate the statistical power of these projects. We developed a new algorithm, UnseenEst, and applied it to the exomes of 60,706 individuals to estimate the frequency distribution of all protein-coding variants, including rare variants that have not been observed yet in the current cohorts. Our results quantified the number of new variants that we expect to identify as sequencing cohorts reach hundreds of thousands of individuals. With 500K individuals, we find that we expect to capture 7.5% of all possible loss-of-function variants and 12% of all possible missense variants. We also estimate that 2,900 genes have loss-of-function frequency of <0.00001 in healthy humans, consistent with very strong intolerance to gene inactivation.
View details for DOI 10.1038/ncomms13293
View details for PubMedID 27796292
View details for PubMedCentralID PMC5095512
-
Finding Correlations in Subquadratic Time, with Applications to Learning Parities and the Closest Pair Problem
JOURNAL OF THE ACM
2015; 62 (2)
View details for DOI 10.1145/2728167
View details for Web of Science ID 000354799200004
- Learning Sparse Polynomial Functions. 2014
- Optimal Algorithms for Testing Closeness of Discrete Distributions. 2014
- Computation in anonymous networks. CoRR abs/1306.4151 2013
- Instance-by-instance optimal identity testing. Electronic Colloquium on Computational Complexity (ECCC) 2013; 20: 111
- Testing k-Modal Distributions: Optimal Algorithms via Reductions. 2013
- Optimal Algorithms for Testing Closeness of Discrete Distributions. CoRR abs/1308.3946 2013
- Estimating the Unseen: Improved Estimators for Entropy and other Properties. 2013
- Least Squares Revisited: Scalable Approaches for Multi-class Prediction. CoRR abs/1310.1949 2013
- Size and Treewidth Bounds for Conjunctive Queries. J. ACM 2012; 3 (59): 16
- Finding Correlations in Subquadratic Time, with Applications to Learning Parities and Juntas. 2012
- Finding Correlations in Subquadratic Time, with Applications to Learning Parities and Juntas with Noise. Electronic Colloquium on Computational Complexity (ECCC) 2012; 19: 6
- Disentangling Gaussians. Commun. ACM 2012; 2 (55): 113-120
- Beating brute-force: Improved algorithms for finding correlations, and related problems. TinyToCS 2012; 1
- Testing $k$-Modal Distributions: Optimal Algorithms via Reductions. CoRR abs/1112.5659 2011
- Best-Response Mechanisms. 2011
- The Power of Linear Estimators. 2011
- Best-response auctions. 2011
- When is it best to best-respond? SIGecom Exchanges 2011; 2 (10): 16-18
- Estimating the unseen: an n/log(n)-sample estimator for entropy and support size, shown optimal via new CLTs. 2011
- Incentive-compatible distributed greedy protocols. 2011
-
Braess's Paradox in Large Random Graphs
RANDOM STRUCTURES & ALGORITHMS
2010; 37 (4): 495-515
View details for DOI 10.1002/rsa.20325
View details for Web of Science ID 000284038400006
-
DESIGNING NETWORK PROTOCOLS FOR GOOD EQUILIBRIA
SIAM JOURNAL ON COMPUTING
2010; 39 (5): 1799-1832
View details for DOI 10.1137/08072721X
View details for Web of Science ID 000277584700005
- Efficiently learning mixtures of two Gaussians. 2010
- A New Look at Selfish Routing. 2010
- Estimating the unseen: A sublinear-sample canonical estimator of distributions. Electronic Colloquium on Computational Complexity (ECCC) 2010; 17: 180
- Settling the Polynomial Learnability of Mixtures of Gaussians. CoRR abs/1004.4223 2010
- Settling the Polynomial Learnability of Mixtures of Gaussians. 2010
- A CLT and tight lower bounds for estimating entropy. Electronic Colloquium on Computational Complexity (ECCC) 2010; 17: 183
- On Learning Algorithms for Nash Equilibria. 2010
- Size Bounds for Conjunctive Queries with General Functional Dependencies. CoRR abs/0909.2030 2009
- On the complexity of Nash equilibria of action-graph games. 2009
- Size and treewidth bounds for conjunctive queries. 2009
-
Designing Networks with Good Equilibria
19th ACM-SIAM Symposium on Discrete Algorithms
SIAM. 2008: 854–863
View details for Web of Science ID 000281596900094
- On the Complexity of Nash Equilibria of Action-Graph Games. CoRR abs/0802.1604 2008
- Braess's paradox in large random graphs. 2006