Bio


Benjamin Van Roy is a Professor of Electrical Engineering, Management Science and Engineering, and, by courtesy, Computer Science, at Stanford University, where he has served on the faculty since 1998. His research focuses on understanding how an agent interacting with a poorly understood environment can learn over time to make effective decisions. He is interested in questions concerning what is possible or impossible as well as how to design efficient learning algorithms that achieve the possible. His research contributes to the fields of reinforcement learning, online optimization, and approximate dynamic programming, and offers means to addressing central problems of artificial intelligence.

He has graduated fifteen doctoral students, published over forty articles in peer-reviewed journals, and been listed as an inventor in over a dozen patents. He has served on the editorial boards of Machine Learning, Mathematics of Operations Research, and Operations Research, for which he has also served as editor of the Financial Engineering Area. He has also founded and/or led research programs at several technology companies, including Unica (acquired by IBM), Enuvis (acquired by SiRF), and Morgan Stanley.

He received the SB in Computer Science and Engineering and the SM and PhD in Electrical Engineering and Computer Science, all from MIT. He has been a recipient of the MIT George C. Newton Undergraduate Laboratory Project Award, the MIT Morris J. Levin Memorial Master's Thesis Award, the MIT George M. Sprowls Doctoral Dissertation Award, the National Science Foundation CAREER Award, the Stanford Tau Beta Pi Award for Excellence in Undergraduate Teaching, and the Management Science and Engineering Department's Graduate Teaching Award. He is an INFORMS Fellow and has been a Frederick E. Terman Fellow and a David Morgenthaler II Faculty Scholar. He has held visiting positions as the Wolfgang and Helga Gaul Visiting Professor at the University of Karlsruhe and as the Chin Sophonpanich Foundation Professor and the InTouch Professor at Chulalongkorn University.

Academic Appointments


Program Affiliations


  • Institute for Computational and Mathematical Engineering (ICME)

Professional Education


  • BS, Massachusetts Institute of Technology, Computer Science and Engineering (1993)
  • MS, Massachusetts Institute of Technology, Electrical Engineering and Computer Science (1995)
  • PhD, Massachusetts Institute of Technology, Electrical Engineering and Computer Science (1998)

2018-19 Courses


Stanford Advisees


All Publications


  • Learning to Optimize Via Posterior Sampling. Russo, D., Roy, B., Van
  • Reputation Markets Yan, X., Van Roy, B.
  • Directed Principal Component Analysis. Kao, Y., H., Roy, B., Van
  • Adaptive Execution: Exploration and Learning of Price Impact. Park, B., Roy, B., Van
  • Adaptive Execution: Exploration and Learning of Price Impact OPERATIONS RESEARCH Park, B., Van Roy, B. 2015; 63 (5): 1058-1076
  • Learning to Optimize via Posterior Sampling MATHEMATICS OF OPERATIONS RESEARCH Russo, D., Van Roy, B. 2014; 39 (4): 1221-1243
  • Directed Principal Component Analysis OPERATIONS RESEARCH Kao, Y., Van Roy, B. 2014; 62 (4): 957-972
  • Learning a factor model via regularized PCA MACHINE LEARNING Kao, Y., Van Roy, B. 2013; 91 (3): 279-303
  • Eluder Dimension and the Sample Complexity of Optimistic Exploration to appear in NIPS. Russo, D., Roy, B., Van 2013
  • Efficient Exploration and Value Function Generalization in Deterministic Systems abridged version to appear in NIPS. Wen, Z., Roy, B., Van 2013
  • (More) Efficient Reinforcement Learning via Posterior Sampling to appear in NIPS. Osband, I., Russo, D., Roy, B., Van 2013
  • Strategic execution in the presence of an uninformed arbitrageur JOURNAL OF FINANCIAL MARKETS Moallemi, C. C., Park, B., Van Roy, B. 2012; 15 (4): 361-391
  • Intermediated Blind Portfolio Auctions MANAGEMENT SCIENCE Padilla, M., Van Roy, B. 2012; 58 (9): 1747-1760
  • Portfolio selection with qualitative input JOURNAL OF BANKING & FINANCE Chiarawongse, A., Kiatsupaibul, S., Tirapat, S., Van Roy, B. 2012; 36 (2): 489-496
  • Efficient Reinforcement Learning for High Dimensional Linear Systems Advances in Neural Information Processing Systems 25 Ibrahimi, M., Javanmard, A., Roy, B., Van MIT Press. 2012
  • Approximate Dynamic Programming for Optimizing Oil Production Chapter 25 in Reinforcement Learning and Approximate Dynamic Programming for Feedback Control Wen, Z., Durlofsky, L., J., Roy, B., Van, Aziz, K. edited by Lewis, F., L., Liu, D. Wiley-IEEE Press. 2012
  • Industry dynamics: Foundations for models with an infinite number of firms JOURNAL OF ECONOMIC THEORY Weintraub, G. Y., Benkard, C. L., Van Roy, B. 2011; 146 (5): 1965-1994
  • Resource Allocation via Message Passing INFORMS JOURNAL ON COMPUTING Moallemi, C. C., Van Roy, B. 2011; 23 (2): 205-219
  • Control of Diffusions via Linear Programming in Stochastic Programming: The State of the Art, in Honor of George B. Dantzig Han, J., Van Roy, B. Springer. 2011: 329–354
  • Use of Approximate Dynamic Programming for Production Optimization Wen, Z., Durlofsky, L., J., Roy, B., Van, Aziz, K. 2011
  • Manipulation Robustness of Collaborative Filtering MANAGEMENT SCIENCE Van Roy, B., Yan, X. 2010; 56 (11): 1911-1929
  • Investment and Market Structure in Industries with Congestion OPERATIONS RESEARCH Johari, R., Weintraub, G. Y., Van Roy, B. 2010; 58 (5): 1303-1317
  • On Regression-Based Stopping Times DISCRETE EVENT DYNAMIC SYSTEMS-THEORY AND APPLICATIONS Van Roy, B. 2010; 20 (3): 307-324
  • Computational Methods for Oblivious Equilibrium OPERATIONS RESEARCH Weintraub, G. Y., Benkard, C. L., Van Roy, B. 2010; 58 (4): 1247-1265
  • Universal Reinforcement Learning IEEE TRANSACTIONS ON INFORMATION THEORY Farias, V. F., Moallemi, C. C., Van Roy, B., Weissman, T. 2010; 56 (5): 2441-2454
  • Convergence of Min-Sum Message-Passing for Convex Optimization IEEE TRANSACTIONS ON INFORMATION THEORY Moallemi, C. C., Van Roy, B. 2010; 56 (4): 2041-2050
  • Dynamic Pricing with a Prior on Market Response OPERATIONS RESEARCH Farias, V. F., Van Roy, B. 2010; 58 (1): 16-29
  • Convergence of the Min-Sum Algorithm for Convex Optimization IEEE Transactions on Information Theory Moallemi, C., C., Roy, B., Van 2010; 56 (4): 2041-2050
  • Convergence of Min-Sum Message Passing for Quadratic Optimization IEEE TRANSACTIONS ON INFORMATION THEORY Moallemi, C. C., Van Roy, B. 2009; 55 (5): 2413-2423
  • Directed Regression Advances in Neural Information Processing Systems 22 Kao, Y., H., Van Roy, B. MIT Press. 2009: 889–897
  • Manipulation-Resistant Collaborative Filtering Systems Roy, B., Van, Yan, X. 2009
  • Markov Perfect Industry Dynamics With Many Firms ECONOMETRICA Weintraub, G. Y., Benkard, C. L., Van Roy, B. 2008; 76 (6): 1375-1411

    View details for DOI 10.3982/ECTA6158

    View details for Web of Science ID 000261055400005

  • Capacity of the trapdoor channel with feedback IEEE International Symposium on Information Theory Permuter, H., Cuff, P., Van Roy, B., Weissman, T. IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. 2008: 3150–65
  • Approximate and Data-Driven Dynamic Programming for Queueing Networks Moallemi, C., C., Kumar, S., Van Roy, B. 2008
  • A short proof of optimality for the MIN cache replacement algorithm INFORMATION PROCESSING LETTERS Van Roy, B. 2007; 102 (2-3): 72-73
  • Managing the quality of a resource with stock and flow controls JOURNAL OF PUBLIC ECONOMICS Keohane, N., Van Roy, B., Zeckhauser, R. 2007; 91 (3-4): 541-569
  • Capacity and zero-error capacity of the chemical channel with feedback IEEE International Symposium on Information Theory Permuter, H., Cuff, P., Van Roy, B., Weissman, T. IEEE. 2007: 1866–1870
  • An Approximate Dynamic Programming Approach to Network Revenue Management Farias, V., F., Van Roy, B. 2007
  • Consensus propagation 19th Annual Conference on Neural Information Processing Systems (NIPS 05) Moallemi, C. C., Van Roy, B. IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. 2006: 4753–66
  • A cost-shaping linear program for average-cost approximate dynamic programming with performance guarantees MATHEMATICS OF OPERATIONS RESEARCH de Farias, D. P., Van Roy, B. 2006; 31 (3): 597-620
  • Performance loss bounds for approximate value iteration with state aggregation MATHEMATICS OF OPERATIONS RESEARCH Van Roy, B. 2006; 31 (2): 234-244
  • A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning DISCRETE EVENT DYNAMIC SYSTEMS-THEORY AND APPLICATIONS Choi, D., Van Roy, B. 2006; 16 (2): 207-239
  • Approximation algorithms for dynamic resource allocation OPERATIONS RESEARCH LETTERS Farias, V. F., Van Roy, B. 2006; 34 (2): 180-190
  • A nonparametric approach to multiproduct pricing OPERATIONS RESEARCH Rusmevichientong, P., Van Roy, B., Glynn, P. W. 2006; 54 (1): 82-98
  • Tetris: A Study of Randomized Constraint Sampling in Probabilistic and Randomized Methods for Design Under Uncertainty Farias, V., F., Van Roy, B. edited by Calafiore, G., Dabbene, F. Springer-Verlag. 2006
  • Opportunities and Challenges in Using Online Preference Data for Vehicle Pricing: A Case Study at General Motors Journal of Revenue and Pricing Management Rusmevichientong, P., Salisbury, J., A., Truss, L., T., Roy, B., Van, Glynn, P., W. 2006; 5 (1): 45-61
  • Consensus Propagation Advances in Neural Information Processing Systems 18 Moallemi, C., C., Van Roy, B. MIT Press. 2006
  • TD(0) Leads to Better Policies than Approximate Value Iteration Advances in Neural Information Processing Systems 18 Roy, B., Van MIT Press. 2006
  • Oblivious Equilibrium: A Mean Field Approximation for Large Scale Dynamic Games Advances in Neural Information Processing Systems 18 Weintraub, G., Y., Benkard, C., L., Roy, B., Van 2006
  • A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning Discrete Event Dynamic Systems Choi, D., S., Van Roy, B. 2006; 16 (2)
  • A Non-Parametric Approach to Multi-Product Pricing Operations Research Rusmevichientong, P., Van Roy, B., Glynn, P., W. 2006; 54 (1): 82-98
  • An approximate dynamic programming approach to decentralized control of stochastic systems Workshop on Control of Uncertain Systems Cogill, R., Rotkowitz, M., Van Roy, B., Lall, S. SPRINGER-VERLAG BERLIN. 2006: 243–256
  • A universal scheme for learning IEEE International Symposium on Information Theory and Its Applications Farias, V. F., Moallemi, C. C., Van Roy, B., Weissman, T. IEEE. 2005: 1158–1162
  • Solitaire: Man Versus Machine Advances in Neural Information Processing Systems 17 Yan, X., Diaconis, P., Rusmevichientong, P., Roy, B., Van MIT Press. 2005
  • A Linear Program for Bellman Error Minimization with Performance Guarantees Advances in Neural Information Processing Systems 17 de Farias, D., P., Van Roy, B. MIT Press. 2005
  • On constraint sampling in the linear programming approach to approximate dynamic programming MATHEMATICS OF OPERATIONS RESEARCH de Farias, D. P., Van Roy, B. 2004; 29 (3): 462-478
  • Making eigenvector-based reputation systems robust to collusion 3rd International Workshop on Algorithms and Models for the Web-Graph Zhang, H., Goel, A., Govindan, R., Mason, K., Van Roy, B. SPRINGER-VERLAG BERLIN. 2004: 92–104
  • Approximate Dynamic Programming for High-Dimensional Dynamic Resource Allocation Problems in Handbook of Learning and Approximate Dynamic Programming Powell, W., B., Van Roy, B. edited by Si, J., Barto, A., G., Powell, W., B. Wiley-IEEE Press, Hoboken, NJ. 2004: 261–279
  • An Approximate Dynamic Programming Approach to Decentralized Control of Stochastic Systems Cogill, R., Rotkowitz, M., Roy, B., Van, Lall, S. 2004
  • Distributed optimization in adaptive networks 17th Annual Conference on Neural Information Processing Systems (NIPS) Moallemi, C. C., Van Roy, B. M I T PRESS. 2004: 887–894
  • The linear programming approach to approximate dynamic programming OPERATIONS RESEARCH de Farias, D. P., Van Roy, B. 2003; 51 (6): 850-865
  • Decentralized decision-making in a large team with local information GAMES AND ECONOMIC BEHAVIOR Rusmevichientong, P., Van Roy, B. 2003; 43 (2): 266-295
  • On constraint sampling in the linear programming approach to approximate linear programming 42nd IEEE Conference on Decision and Control de Farias, D. P., Van Roy, B. IEEE. 2003: 2441–2446
  • Decentralized Protocols for Optimization of Sensor Networks Moallemi, C., C., Van Roy, B. 2003
  • Approximate Linear Programming for Average-Cost Dynamic Programming Advances in Neural Information Processing Systems 15 de Farias, D., P., Van Roy, B. MIT Press. 2003
  • Improving Eigenvector-Based Reputation Systems Against Collusion Workshop on Algorithms and Models for the Web Graph Zhang, H., Goel, A., Govindan, R., Mason, K., Roy, B., Van 2003
  • Book Review: Self-Learning Control of Finite Markov Chains, by A. S. Poznyak, K. Najim, and E. Gomez-Ramirez Automatica Roy, B., Van 2003; 39 (2): 373-376
  • On average versus discounted reward temporal-difference learning MACHINE LEARNING Tsitsiklis, J. N., Van Roy, B. 2002; 49 (2-3): 179-191
  • Approximate dynamic programming via linear programming 15th Annual Conference on Neural Information Processing Systems (NIPS) de Farias, D. P., Van Roy, B. M I T PRESS. 2002: 689–695
  • Algorithms for GPS Operation Indoors and Downtown GPS Solutions Agarwal, N., Basch, J., Beckmann, P., Bharti, P., Bloebaum, S., Casadei, S., Van Roy, B. 2002; 6 (3): 149-160
  • Regression methods for pricing complex American-Style options IEEE TRANSACTIONS ON NEURAL NETWORKS Tsitsiklis, J. N., Van Roy, B. 2001; 12 (4): 694-703

    Abstract

    We introduce and analyze a simulation-based approximate dynamic programming method for pricing complex American-style options, with a possibly high-dimensional underlying state space. We work within a finitely parameterized family of approximate value functions, and introduce a variant of value iteration, adapted to this parametric setting. We also introduce a related method which uses a single (parameterized) value function, which is a function of the time-state pair, as opposed to using a separate (independently parameterized) value function for each time. Our methods involve the evaluation of value functions at a finite set, consisting of "representative" elements of the state space. We show that with an arbitrary choice of this set, the approximation error can grow exponentially with the time horizon (time to expiration). On the other hand, if representative states are chosen by simulating the state process using the underlying risk-neutral probability distribution, then the approximation error remains bounded.

    View details for Web of Science ID 000169875300005

    View details for PubMedID 18249905

  • An Analysis of Belief Propagation on the Turbo Decoding Graph with Gaussian Densities IEEE Transactions on Information Theory Rusmevichientong, P., Van Roy, B. 2001; 47 (2): 745-765
  • Neuro-Dynamic Programming: Overview and Recent Trends in Handbook of Markov Decision Processes: Methods and Applications Van Roy, B. edited by Feinberg, E., Shwartz, A. Kluwer. 2001
  • A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning Choi, D., S., Van Roy, B. 2001
  • A Tractable POMDP for a Class of Sequencing Problems Rusmevichientong, P., Van Roy, B. 2001
  • On the existence of fixed points for approximate value iteration and temporal-difference learning JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS de Farias, D. P., Van Roy, B. 2000; 105 (3): 589-608
  • The optimal harvesting of environmental bads 39th IEEE Conference on Decision and Control Keohane, N., Van Roy, B., Zeckhauser, R. IEEE. 2000: 234–239
  • Fixed Points for Approximate Value Iteration and Temporal-Difference Learning de Farias, D., P., Van Roy, B. 2000
  • Approximate value iteration with randomized policies 39th IEEE Conference on Decision and Control de Farias, D. P., Van Roy, B. IEEE. 2000: 3421–3426
  • Temporal-difference learning and applications in finance Computational Finance 1999 Conference Van Roy, B. M I T PRESS. 2000: 447–461
  • Approximate value iteration and temporal-difference learning Symposium on Adaptive Systems for Signal Processing, Communications, and Control (AS-SPCC) de Farias, D. P., Van Roy, B. IEEE. 2000: 48–51
  • An analysis of turbo decoding with Gaussian densities 13th Annual Conference on Neural Information Processing Systems (NIPS) Rusmevichientong, P., Van Roy, B. M I T PRESS. 2000: 575–581
  • Average cost temporal-difference learning AUTOMATICA Tsitsiklis, J. N., Van Roy, B. 1999; 35 (11): 1799-1808
  • Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives IEEE TRANSACTIONS ON AUTOMATIC CONTROL Tsitsiklis, J. N., Van Roy, B. 1999; 44 (10): 1840-1851
  • Optimal Stopping of Markov Processes: Hilbert Space Theory, Approximation Algorithms, and an Application to Pricing High-Dimensional Financial Derivatives IEEE Transactions on Automatic Control Tsitsiklis, J., N., Van Roy, B. 1999; 44 (10): 1840-1851
  • An analysis of temporal-difference learning with function approximation IEEE TRANSACTIONS ON AUTOMATIC CONTROL Tsitsiklis, J. N., VANROY, B. 1997; 42 (5): 674-690
  • Neuro-dynamic programming overview and a case study in optimal stopping 36th IEEE Conference on Decision and Control Tsitsiklis, J. N., Van Roy, B. IEEE. 1997: 1181–1186
  • A neuro-dynamic programming approach to retailer inventory management 36th IEEE Conference on Decision and Control Van Roy, B., Bertsekas, D. P., Lee, Y., Tsitsiklis, J. N. IEEE. 1997: 4052–4057
  • Overview of Neuro-Dynamic Programming and a Case Study in Optimal Stopping Tsitsiklis, J., N., Van Roy, B. 1997
  • A Neuro-Dynamic Programming Approach to Retailer Inventory Management Van Roy, B., Bertsekas, D., P., Lee, Y., Tsitsiklis, J., N. 1997
  • Solving Data Mining Problems Through Pattern Recognition Prentice-Hall Kennedy, R., Lee, Y., Roy, B., Van, Reed, C., Lippman, R. 1997
  • Analysis of temporal-difference learning with function approximation 10th Annual Conference on Neural Information Processing Systems (NIPS) Tsitsiklis, J. N., VANROY, B. M I T PRESS. 1997: 1075–1081
  • Average cost temporal-difference learning 36th IEEE Conference on Decision and Control Tsitsiklis, J. N., Van Roy, B. IEEE. 1997: 498–502
  • Approximate solutions to optimal stopping problems 10th Annual Conference on Neural Information Processing Systems (NIPS) Tsitsiklis, J. N., VANROY, B. M I T PRESS. 1997: 1082–1088
  • Feature-based methods for large scale dynamic programming MACHINE LEARNING Tsitsiklis, J. N., VANROY, B. 1996; 22 (1-3): 59-94
  • Stable linear approximations to dynamic programming for stochastic control problems with local transitions 9th Annual Conference on Neural Information Processing Systems (NIPS) VANROY, B., Tsitsiklis, J. N. M I T PRESS. 1996: 1045–1051
  • Feature-based methods for large scale dynamic programming 34th IEEE Conference on Decision and Control Tsitsiklis, J. N., VANROY, B. IEEE. 1995: 565–567
  • Solving Pattern Recognition Problems Unica Kennedy, R., Lee, Y., Reed, C., Van Roy, B. 1995