Skip to:

Summer Workshops

ICME offers a variety of summer workshops open to both students, external partners, and the wider community. Registration is required. Room locations will be announced to those who register. Please note that these are not Stanford for-credit courses. Our 2015 program includes:

  • A set of workshops offered as a part of the SIAM Conference on Mathematical and Computational Issues in the Geosciences on Sunday June 28. Registration is now closed.

  • A series of week-long workshops on the Fundamentals of Data Science in July and August. These workshops are detailed below. Registration will be on a first-come first serve basis to students and ICME External Partners. Other community members are welcome to attend, space permitting.

Please email icme-summer-workshop@lists.stanford.edu if you have questions.

2015 Summer Workshop Series: Fundamentals of Data Science

Stats Week: July 27 (Monday) - July 31 (Friday)

Week: Aug 18 (Tuesday) - Aug 21 (Friday)

Week: Aug 24 (Monday) - Aug 28 (Friday)

Class times

  • AM: 9am - 12pm
  • PM: 1:30pm - 4:30pm
  • workshops are organized into 75 minute sessions with breaks

Registration

  • Registration fee: $100 per class
  • Student/ Faculty rate: $50 per class

Workshop Descriptions

Introduction to Scientific Python

Instructor(s): Nick Henderson

Registration link

Description

This workshop is recommended for those who want to learn the basics of Python for use in math, science, or engineering applications. The goal of the workshop is to familiarize participants with Python’s tools for scientific computing and data science. Lectures will be interactive with a focus on learning by example, and assignments will be application-driven. Some prior programming experience is highly recommended. Topics covered include control flow, basic data structures, File I/O, and an introduction to NumPy/SciPy.

Topics

  • Variables
  • Functions
  • Data types: Strings, Lists, Tuples, Dictionaries, Sets
  • File input and output (I/O)
  • Classes
  • Numpy, Scipy and Matplotlib
  • IPython, Pandas and Statsmodels
  • Exception handling
  • Unit tests
  • Recursion
  • Other useful packages

Introduction to Programming in R

Instructor(s): Xiaotong Suo

Registration link

Description

This workshop is recommended for those who want to learn the basics of R programming in statistics, science, or engineering courses. The goal of the workshop is to familiarize participants with R's tools for scientific computing and data analysis. Lectures will be interactive with a focus on learning by example, and assignments will be application-driven. No prior programming experience is needed.

Topics

  • Introduction to R, definitions of variables, functions, special values in R
  • Five basic objects, vector, matrix, factor, data frame, and list
  • Data input/output
  • Graphics(include ggplots)
  • Statistical applications: such as how to get a summary of the data, how to do linear regressions, and how to interpret the output
  • Metacharacters in R

Introduction to Statistical Data Analysis

Instructor(s): James Lambers

Registration link

Description

This workshop introduces participants to the use of statistical techniques for analysis of data sets. Upon completion, participants will be able to use statistics to synthesize useful information from raw data, make statistically reasonable inferences from data, and determine the validity of statistical analysis of data. The workshop is recommended especially for those in the humanities, social sciences, and life sciences who do not have a background in statistics, but may need to use statistics to work with data gathered as part of their coursework or research. Exercises on programming in R are included to give participants hands-on experience with applying concepts for data analysis. Participants will also be given an introduction to areas of current interest in statistics, such as "Big Data" and biostatistics. Topics covered include: measures of central tendency, dispersion and position within data sets, probability, random variables, distributions, sampling, confidence intervals, hypothesis testing, correlation, regression, analysis of variance, and time series.

Topics

  • Working with data sets, probability
  • Probability distributions, sampling
  • Confidence intervals, hypothesis testing
  • Chi-square distribution, correlation, regression, ANOVA
  • Time series, “Big Data”, biostatistics

Introduction to Matrix Computations for Data Scientists and Computational Engineers

Instructor(s): Margot Gerritsen, Anil Damle

Registration link

Description

This workshop briefly introduces participants to concepts in basic linear algebra and proceeds to discuss matrix computations and algorithms that underlie data science and computational engineering. Upon completion, participants will have an understanding of what is behind black box software packages and be able to make more informed decisions about what type of algorithm may be best for a given application. After familiarization with basic matrix and vector operations, the workshop discusses the foundational concepts on which many algorithms used in data mining, machine learning and deep learning are built. Examples include solving linear systems, eigenvalues and eigenvectors, and factorizations. Exercises help participants apply concepts to problems in optimization, machine learning and statistics. Recommended background: knowledge of vector calculus. Familiarity with MATLAB or Python is desirable.

Topics

  • Basics: matrices, vectors and fundamental operations: products, norms
  • Solving linear systems
  • Least squares
  • Eigenvalues and eigenvectors
  • Fundamental factorizations: QR, tall-skinny QR, SVD
  • Basic applications in optimization, machine learning, statistics

Introduction to Machine Learning

Instructor(s): Kari Bergen, Alex Ioannidis

Registration link

Description

This workshop presents the basics behind the application of modern machine learning algorithms. We will discuss a framework for reasoning about when to apply various machine learning techniques, emphasizing questions of over-fitting/under-fitting, regularization, interpretability, supervised/unsupervised methods, and handling of missing data. The principles behind various algorithms--the why and how of using them--will be discussed, while some mathematical detail underlying the algorithms--including proofs--will not be discussed. Unsupervised machine learning algorithms presented will include k-means clustering, principal component analysis (PCA), and independent component analysis (ICA). Supervised machine learning algorithms presented will include support vector machines (SVM), classification and regression trees (CART), boosting, bagging, and random forests. Imputation, the lasso, and cross-validation concepts will also be covered. The R programming language will be used for examples, though participants need not have prior exposure to R. Prerequisite: undergraduate-level linear algebra and statistics; basic programming experience (R/Matlab/Python).

Topics

  • Basic Concepts and Intro to Supervised Learning: linear and logistic regression
  • Penalties, regularization, sparsity (lasso, ridge, and elastic net)
  • Unsupervised learning: clustering (k-means and hierarchical) and dimensionality reduction (Principal Component Analysis, Independent Component Analysis, Self-Organizing Maps)
  • Unsupervised Learning: NMF and text classification
  • Supervised Learning: Loss functions, cross-validation (bias variance trade-off and learning curves), imputation (K-nearest neighbors and SVD), imbalanced data
  • Classification and regression trees (CART)
  • Ensemble methods (boosting, bagging, and random forests)
  • Support vector machines (SVM) and neural nets

Data Visualization

Instructor(s): Dave Deriso

Registration link

Description

Bring your data to life with beautiful and interactive visualizations. This workshop is designed to provide practical experience on combining data science and graphic design to effectively communicate knowledge buried inside complex data. Each lecture will explore a different set of free industry-standard tools, for example d3.js, three.js, ggplots2, and processing; enabling participants to think critically about how to architect their own interactive visualization for data exploration, web, presentations, and publications. Geared towards scientists and engineers, and with a particular emphasis on web, this workshop assumes an advanced background in programming methodology in multiple languages (particularly R and Javascript). Assignments are short and focus on visual experimentation with interesting data sets or the participants' own data. Topics: data, visualization, web. Prerequisites: some experience with general programming is required to understand the lectures and assignments.

Topics

  • Theory: Grammar of Graphics, Beautiful Evidence
  • Methods: Crash course in Javascript and web development
  • Top-down Frameworks: D3.js, C3.js
  • Practical: Industry group project with real data
  • Bottom-up Frameworks: Processing.js, Three.js
  • Engines: Designing interactions, integrating physics, and code optimization
  • Practical: Creative project with Three.js and advanced physics
  • Example Portfolio: http://gabepoon.herokuapp.com/

Intro to Optimization

Instructors: AJ Friend, Nick Henderson

Registration link

Description

This course introduces mathematical optimization and modeling, with a focus on convex optimization. We will use convexity as a starting point from which to consider some nonconvex problem types. The course will have a practical focus, with participants formulating and solving optimization problems early and often using standard modeling languages and solvers. By introducing common models from machine learning and other fields, the course aims to make students comfortable with optimization modeling so that they may use it for rapid prototyping and experimentation in their own work.

We'll cover simple but useful optimization algorithms such as gradient descent for when problems become too large for black-box solvers and cover majorization-maximization as a useful framework to view optimization algorithms for both convex and non-convex problems. Parallel and stochastic gradient descent will be covered in the contexts of distributed and large-scale optimization.

Time permitting, we will also discuss methods to solve nonconvex models for nonnegative matrix factorization, matrix completion, and neural networks. Students should be comfortable with linear algebra, differential multivariable calculus, and basic probability and statistics. Experience with Python will be helpful, but not required.

Topics

  • varieties of mathematical optimization
  • convexity of functions and sets
  • convex optimization modeling with CVXPY
  • (stochastic) gradient descent and basic distributed optimization
  • in-depth examples from machine learning, statistics and other fields
  • applications of bi-convexity and non-convex gradient descent