Computational Statistics (Stat G6104)

Spring 2012


Professor: Liam Paninski; Office: 1255 Amsterdam Ave, Rm 1028. Email: liam at stat dot columbia dot edu. Hours by appointment.
TA: Vince Dorie. Email: vjd2106 at columbia dot edu. Hours: Th 1-5, Rm 1022 SSW.
Time: Tu+Th, 10:35am-11:50am
Place: 337 Mudd

Course goals: (partially adapted from the preface of Givens' and Hoeting's book): Computation plays a central role in modern statistics and machine learning. This course aims to cover topics needed to develop a broad working knowledge of modern computational statistics. We seek to develop a practical understanding of how and why existing methods work, enabling effective use of modern statistical methods. Achieving these goals requires familiarity with diverse topics in statistical computing, computational statistics, computer science, and numerical analysis. Our choice of topics reflects our view of what is central to this evolving field, and what will be interesting and useful. A key theme is scalability to problems of high dimensionality, which are of most interest to many recent applications.
Some important topics will be omitted because high-quality solutions are already available in most software. For example, the generation of pseudo-random numbers is a classic topic, but existing methods built in to standard software packages will suffice for our needs. On the other hand, we will spend a bit of time on some classical numerical linear algebra ideas, because choosing the right method for solving a linear equation (for example) can have a huge impact on the time it takes to solve a problem in practice, particularly if there is some special structure that we can exploit.

Audience: The course will be aimed at first- and second-year students in the Statistics Ph.D. program. Students from other departments or programs are welcome, space permitting; instructor permission required.

Background: The level of mathematics expected does not extend much beyond standard calculus and linear algebra. Breadth of mathematical training is more helpful than depth; we prefer to focus on the big picture of how algorithms work and to sweep under the rug some of the nitty-gritty numerical details. The expected level of statistics is equivalent to that obtained by a graduate student in his or her first year of study of the theory of statistics and probability. An understanding of maximum likelihood methods, Bayesian methods, elementary asymptotic theory, Markov chains, and linear models is most important.

Programming: With respect to computer programming, good students can learn as they go. We'll forgo much language-specific examples, algorithms, or coding; I won't be teaching much programming per se, but rather will focus on the overarching ideas and techniques. For the exercises and projects, I recommend you choose a high-level, interactive package that permits the flexible design of graphical displays and includes supporting statistics and probability functions, e.g., R or MATLAB.

Evaluation: Final grades will be based on class participation, a few short exercises, and a student project.


Topics:
Deterministic optimization
- Newton-Raphson, conjugate gradients, preconditioning, quasi-Newton methods, Fisher scoring, EM and its various derivatives
- Numerical recipes for linear algebra: matrix inverse, LU, Cholesky decompositions, low-rank updates, SVD, banded matrices, Toeplitz matrices and the FFT, Kronecker products (separable matrices), sparse matrix solvers
- Convex analysis: convex functions, duality, KKT conditions, interior point methods, projected gradients, augmented Lagrangian methods, convex relaxations
- Applications: support vector machines, splines, kriging, isotonic regression, LASSO and LARS regression

Dynamic programming: hidden Markov models, forward-backward algorithm, Kalman filter, Markov random fields

Stochastic optimization: Robbins-Monro and Kiefer-Wolfowitz algorithms, simulated annealing, stochastic gradient methods

Deterministic integration: Gaussian quadrature, quasi-Monte Carlo. Application: expectation propagation

Monte Carlo methods
- Rejection sampling, importance sampling, variance reduction methods (Rao-Blackwellization, stratified sampling)
- MCMC methods: Gibbs sampling, Metropolis-Hastings, Langevin methods, Hamiltonian Monte Carlo, slice sampling. Implementation issues: burnin, monitoring convergence
- Sequential Monte Carlo (particle filtering)

References:
Givens and Hoeting (2005) Computational statistics
Robert and Casella (2004) Monte Carlo Statistical Methods
Boyd and Vandenberghe (2004), Convex Optimization.
Press et al, Numerical Recipes
Sun and Yuan (2006), Optimization theory and methods
Fletcher (2000) Practical methods of optimization
Searle (2006) Matrix Algebra Useful for Statistics
Shewchuk (1994), An Introduction to the Conjugate Gradient Method Without the Agonizing Pain
Boyd et al (2011), Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers


Schedule

Date Topic Reading Notes
Jan 17 Introduction; linesearch Ch. 1-2 of Givens+Hoeting See Sun and Yuan (2006) for further details on convergence analysis
Jan 19 Choosing search directions: Newton, generalized linear models, inexact Newton, quasi-Newton, Fisher scoring, BFGS See Vandenberghe's notes for some further background
Jan 24 No class
Jan 26 Exploiting special structure to solve Newton linear equations more efficiently: banded, sparse, low-rank (etc.) matrices
Jan 31 Conjugate gradients Shewchuk (1994)
Feb 2 Preconditioning. Toeplitz and circulant matrices. Gaussian process regression See Rasmussen and Williams (2006) for more background on GP regression; Chan and Ng (1996) on PCG for Toeplitz systems
Feb 7 Expectation maximization First HW due: exploiting banded matrices in Poisson regression
Feb 9 Constrained and non-smooth optimization: convex functions; interior point methods Boyd and Vandenberghe, ch. 3-4
Feb 14 Linear, quadratic, and semidefinite programs; LASSO methods Efron et al (2004), Friedman et al (2010)
Feb 16 Convex duality, KKT conditions Boyd and Vandenberghe, ch. 5
Feb 21 A brief tour of some advanced topics: proximal methods, dual decomposition, and convex relaxation Background: Bach et al (2011), Boyd et al (2011), Luo et al (2010)
Feb 23 No class
Feb 28-Mar 1 Graphical models; dynamic programming Rabiner tutorial, Jordan (2004) Background: Wainwright and Jordan (2008), Smith et al (2012)
Mar 6 Kalman filter. Monte Carlo basics Ch. 1-4 of Robert and Casella Background: Devroye (1986). HW 2 due: coordinate descent vs. interior point methods
Mar 8 Rejection and importance sampling; Metropolis-Hastings Ch. 7 of Robert and Casella Background: ch. 6 of Robert and Casella
Mar 13-15 No class Spring break
Mar 20 Short project presentations
Mar 22 No class
Mar 27 Gibbs sampling: slice sampling, Bayesian lasso, hit and run Background: Park and Casella (2008), Neal (2003), Papandreou and Yuille (2010)
Mar 29 MCMC diagnostics Gelman and Shirley (2011); see courseworks for additional reading Guest lecture by Prof. Gelman; slides here.
Apr 3, 10 Rao-Blackwellization; sequential Monte Carlo; auxiliary particle filter Doucet and Johansen (2011), Pitt and Shephard (1999) Further reading collected by A. Doucet
Apr 5 Hamiltonian Monte Carlo Neal (2010)
Guest lecture by Matt Hoffman. Further background: Hoffman and Gelman (2012)
Apr 10 Sequential Monte Carlo, cont'd HW 3 due: Gibbs sampling vs. random walk MCMC
Apr 17 A case study in large-scale Bayesian computation Madigan et al. (2010) Guest lecture by Prof. Madigan and I. Zorych
Apr 19 Some advanced MCMC topics: RJMCMC, bridge sampling, annealed importance sampling, adaptive simulated tempering Ch. 8 in Givens+Hoeting Further reading: Gelman and Meng (1998), Neal (2001), Salakhutdinov (2010)
Apr 24 Online methods; expectation propagation; deterministic integration methods; Gaussian quadrature Ch. 5 in Givens+Hoeting Further reading: Bottou (2011); Sudderth (2002)
May 1,3 Project presentations Send me your report as a .pdf by May 8.