Some important topics will be omitted because high-quality solutions are already available in most software. For example, the generation of pseudo-random numbers is a classic topic, but existing methods built in to standard software packages will suffice for our needs. On the other hand, we will spend a bit of time on some classical numerical linear algebra ideas, because choosing the right method for solving a linear equation (for example) can have a huge impact on the time it takes to solve a problem in practice, particularly if there is some special structure that we can exploit.

Deterministic optimization

- Newton-Raphson, conjugate gradients, preconditioning, quasi-Newton methods, Fisher scoring, EM and its various derivatives

- Numerical recipes for linear algebra: matrix inverse, LU, Cholesky decompositions, low-rank updates, SVD, banded matrices, Toeplitz matrices and the FFT, Kronecker products (separable matrices), sparse matrix solvers

- Convex analysis: convex functions, duality, KKT conditions, interior point methods, projected gradients, augmented Lagrangian methods, convex relaxations

- Applications: support vector machines, splines, Gaussian processes, isotonic regression, LASSO and LARS regression

Graphical models: dynamic programming, hidden Markov models, forward-backward algorithm, Kalman filter, Markov random fields

Stochastic optimization: Robbins-Monro and Kiefer-Wolfowitz algorithms, simulated annealing, stochastic gradient methods

Deterministic integration: Gaussian quadrature, quasi-Monte Carlo. Application: expectation propagation

Monte Carlo methods

- Rejection sampling, importance sampling, variance reduction methods (Rao-Blackwellization, stratified sampling)

- MCMC methods: Gibbs sampling, Metropolis-Hastings, Langevin methods, Hamiltonian Monte Carlo, slice sampling. Implementation issues: burnin, monitoring convergence

- Sequential Monte Carlo (particle filtering)

- Variational and stochastic variational inference

Givens and Hoeting (2005) Computational statistics

Robert and Casella (2004) Monte Carlo Statistical Methods

Boyd and Vandenberghe (2004), Convex Optimization.

Press et al, Numerical Recipes

Sun and Yuan (2006), Optimization theory and methods

Fletcher (2000) Practical methods of optimization

Searle (2006) Matrix Algebra Useful for Statistics

Spall (2003), Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control

Shewchuk (1994), An Introduction to the Conjugate Gradient Method Without the Agonizing Pain

Boyd et al (2011), Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers

Date | Topic | Reading | Notes |
---|---|---|---|

Jan 16 | Introduction | ||

Jan 23 | Gaussian processes and Bayesian optimization | Loper et al '20 for fast one-d GP inference. Mahsereci and Hennig (2016) on Bayesian linesearch, and Frazier '18 on Bayesian optimization. | See Rasmussen and Williams (2006) for more background on GP regression. Also notes by John Cunningham, Gardner et al '19, and some nice demos by Goertler et al '19 and Agnihotra and Batri '20. |

Jan 30 | LASSO methods | Efron et al (2004), Zou et al (2007), Friedman et al (2010), Bradley et al (2011), Tibshirani et al (2012) | More reading: Bach et al (2011), Kim et al (2009), Tseng (2001), Osborne et al (2000) |

Feb 6 | Optimal transport | Peyre and Cuturi (2020), Nutz (2022), Arjovsky et al (2017) | |

Feb 13 | Graphical models; dynamic programming; message passing | Rabiner tutorial, Wainwright lecture notes | Background: Wainwright and Jordan (2008), MP and AMP notes by A. Maleki, Sarkka and Garcia-Fernandez (2019) on parallelizing HMM inference, Schniter et al (2016), Rush and Venkataramanan (2018) on VAMP and AMP |

Feb 13 | Graph neural networks | Sanchez-Lengeling et al (2021) | |

Feb 20 | No class | ||

Feb 27 | Monte Carlo and Sequential Monte Carlo | Doucet and Johansen (2011), Pitt and Shephard (1999), Naesseth et al (2017) | Further reading collected by A. Doucet; Kantas et al (2014), Gabrie et al (2022), Wu et al (2023) |

Mar 5 | 2-minute project idea presentations | ||

Mar 12 | Spring break | ||

Mar 19 | ADMM | Boyd et al (2011) | |

Mar 19, 26 | Data augmentation: Bayesian Lasso and Polya-gamma trick | Park and Casella (2008), Polson et al (2013) | |

Apr 2 | Dirichlet processes | Neal (2000), Teh (2010), Orbanz (2014) | |

Apr 2 | Pareto smoothed importance sampling | Vehtari et al (2015) | |

Apr 9 | Diffusion and transformer models | Sohl-Dickstein et al '15, Ho et al '20, Rombach et al '21, Vaswani et al '17 | Additional applications: Gong et al '22, Li et al '22 |

Apr 16 | Generalized linear models | Huggins et al (2018), Trippe et al (2019) | |

Apr 16 | Non-negative matrix factorization | Gillis et al (2014) | |

Apr 23 | No class | ||

Apr 30 | Project presentations | Send me your report as a .pdf by May 7. |