# Statistics 4109: Probability and Statistics

### Fall 2010

This is a master's / advanced undergraduate level, double-credit course in probability and mathematical statistics.

Course goals: Statistics is about drawing inferences from stochastic data. Probability theory provides a mathematical structure for calculations that involve stochastic quantities. This course covers basic probability theory and and basic statistical theory for students seeking to build a foundation for further study in stochastic processes or statistical methods.
The aim of the first half of the course is for students to master the concepts of probability theory needed to understand two results that are fundamental to statistics, the law of large numbers and the central limit theorem. Along the way, students will also obtain a foundation sufficient for STAT 6501.
The aim of the second portion of the course is for students to master the standard mathematical formulations of the goals of inference and some of the elementary theory for evaluating the statistical methods that acheive these goals. Students will also gain some familiarity with a few of the classical statistical methods. Practical aspects of data analysis, however, will not be covered. Along the way, students will also obtain a foundation sufficient for, for example, STAT 4315, STAT 4220, and STAT 4201.
NOTE: The course covers the material of two other courses, STAT4105 and STAT4107, in a single semester. The pace, therefore, is fast, and not all students will be able to keep up. Furthermore, the material is cumulative, that is, almost every lecture builds on previously discussed concepts, and students unable to keep up will find themselves in a very uncomfortable position. Students who doubt their preparation or who are concerned that they will not be able to consistently devote time to the course would be well advised to consider taking STAT4105 this semester followed by STAT4107 the next. However, if you're thinking of taking 4105 and 4107 in the same semester, I strongly recommend you take 4109 instead; 4109 offers the big advantage of covering the material in the proper sequence.

Time: MW 1:10-3:55
Place: 903 School of Social Work building (1255 Amsterdam Ave)
Professor: Liam Paninski; Office: 1255 Amsterdam Ave, Rm 1028. Email: liam at stat dot columbia dot edu; be sure to put "4109" at the beginning of the subject line, or I might miss it.
Office Hours: W 11:30-1 (but note that these are subject to change, so check the website before stopping by).
Textbook: Probability and Statistics, 3rd Ed., by DeGroot and Schervish (ISBN 0-201-52488-0, on order at Labyrinth Books). Available on reserve in the Mathematics library.
Additional reading: Many students find it helpful to see this material from a few different points of view. In addition to the required textbook and the lecture notes, the following two additional texts might be useful: A First Course in Probability (by S. Ross) and Statistical Inference (by Casella and Berger).
Teaching assistants: 1) Gongjun Xu; office: Room 1023; email: gongjun at stat dot columbia dot edu; hour: Tu 4-6. 2) Tim Teravainen; email: tkt2103 at columbia dot edu; office hours: Mudd 327 (M 3-3:50 pm) Hamilton 503 (F 7-8 pm).
Prerequisite: A good working knowledge of single-variable calculus is necessary: differentiation, integration, infinite sums, Taylor expansions, limits. You should also have some experience with linear algebra: matrices, eigenvectors, quadratic forms. No previous experience with statistics or probability (or gambling) is necessary.

Evaluation: There will be a problem set due each lecture with the exception of the first lecture, the midterm date, and the lecture following the midterm. The midterm examination will be an in-class exam and will cover the material in chapters 1-5 and 11 of the text (probability theory). There will be a final examination that will cover the material in the sixth chapter through the middle of the tenth chapter (statistics). The midterm and the final will have equal weight in the assignment of final grades, and will be more important than the problem sets: top grades will be given to students who show mastery of the course material on the midterm and final examinations regardless of their performance on the problem sets, but performance on the problem sets can, to an extent, offset a poor performance on one or both of the exams. Exam problems will be similar to those given in the problem sets and worked out in the lectures. The exams will be closed-book and closed-note.

Old homeworks will be deposited in room 904 in the stat dept building.

No makeup midterm or final will be given.
Homework will be due at the beginning of the following class. No late homework will be accepted.
Students are encouraged to work together on the homework assignments but should write up solutions on their own. Of course, all work on the exams absolutely must be each student's alone.
Solutions to the homework assignments will be posted on Courseworks each week.

### Part 1: Probability

The first half of the course will cover most of chapters 1-5 and 11 from the textbook, with a bit of extra material thrown in (e.g., Stirling's approximation; Chernoff's inequality).

Date Topic Notes
W, Sept 8 Introduction, sample spaces, probability axioms Read chapter 1 in the book. Due for M 9/13: Problems 1.4.2, 1.4.4, 1.4.6, 1.5.2, 1.5.4, 1.5.6, 1.5.10, 1.5.12 from the book. Lecture notes for chapter one here (pdf).
M, Sept 13 Combinatorics, Stirling's approximation. Read chapter 2. Due W 9/15: Problems 1.6.2, 1.6.6, 1.6.8, 1.7.2, 1.7.4, 1.7.6, 1.7.8, 1.7.10, 1.8.2, 1.8.8, 1.8.14, 1.9.4, 1.9.8.
W, Sept 15 Conditional probability, Bayes rule. Independent events. Due M 9/20: Problems 1.12.2, 1.12.10, 2.1.2, 2.1.6, 2.1.8, 2.2.2, 2.2.4, 2.2.6, 2.2.10, 2.3.8, 2.3.10, 2.3.20. Lecture notes for chapter two here (pdf).
M, Sept 20 Markov chains. Random variables and distributions; pmf's, pdf's, and cdf's Read chapter 3 in the book. Due W 9/22: Problems 2.2.12, 2.2.14, 2.3.12, 2.3.18, 2.4.6, 2.4.12, 3.1.2, 3.1.4, 3.1.6.
W, Sept 22 More on continuous r.v.'s; multivariate distributions Due M 9/27: Problems 3.1.8, 3.2.2, 3.2.4, 3.2.8, 3.2.10, 3.3.2, 3.3.14, 3.4.2, 3.4.4. Lecture notes for chapter three here (pdf).
M, Sept 27 Functions of a random variable; convolution; expectations, variance Read chapter 4 in the book. Due W 9/29: Problems 3.5.2, 3.5.4, 3.5.6, 3.6.2, 3.6.4, 3.6.10, 3.7.2, 3.7.8.
W, Sept 29 Moment-generating functions; Covariance and correlation; sample means Due M 10/4: Problems 3.8.2, 3.8.6, 3.8.8, 3.9.2, 3.9.4, 3.9.8, 3.9.16, 3.10.20. Lecture notes for chapter four here (pdf).
M, Oct 4 Inequalities: Markov, Chebyshev, Chernoff, and Jensen; law of large numbers; special discrete distributions Due for W 10/6: Problems 4.1.2, 4.1.12, 4.2.2, 4.2.8, 4.2.10, 4.3.2, 4.3.4, 4.3.6, 4.4.2, 4.4.4, 4.4.10. Lecture notes for inequalities here.
W, Oct 6 Special continuous distributions Read chapter 5 in the book. Due M 10/11: Problems 4.5.2, 4.5.6, 4.5.12, 4.6.4, 4.6.8, 4.6.10, 4.7.2, 4.7.6, 4.7.12, 4.8.2, 4.8.8, 4.8.12. Lecture notes for chapter five here.
M, Oct 11 More on special distributions; central limit theorem Due W 10/13: Problems 5.2.4, 5.2.6, 5.2.8, 5.3.2, 5.3.6, 5.3.8, 5.4.2, 5.4.8, 5.4.14, 5.5.2, 5.5.6. Notes on CLT here.
W, Oct 13 Order statistics; basic simulation theory: Monte Carlo integration, importance sampling Read chapter 11 in the book. Due M 10/18: Problems 5.6.2(a)-(d), 5.6.6, 5.6.12, 5.6.14, 5.6.18, 5.7.2, 5.7.4, 5.7.6, 5.7.10. Some related notes are here.
M, Oct 18 More on Monte Carlo; introduction to MCMC Due W 10/20: Problems 5.8.2, 5.8.6, 5.9.6, 5.9.16, 5.9.18, 5.9.22, 5.10.2, 5.10.8, 5.11.4, 11.2.2, 11.2.4.
W, Oct 20 Midterm review No HW Due M 10/25: midterm (covers material in chapters 1-5 and 11). E-mail me problems you'd like me to do in class.
M, Oct 25 Midterm exam No HW due W 10/27.

### Part 2: Statistics

The second half of the course will cover most of chapters 6-9 from the textbook, with a bit of extra material thrown in (e.g., a bit of chapter 10 on regression, and a bit on the EM algorithm).

 W, Oct 27 Decision theory: admissibility; minimax and Bayes decision rules; Bias/variance of estimators Due W 11/3: Problems 4.9.4, 4.9.6, 4.9.8, 6.2.4, 6.2.6, 6.2.10. Read chapter 6.1-6.4 in the book. Notes on estimation theory here. M, Nov 1 No class - University holiday Don't forget to vote! W, Nov 3 Maximum likelihood estimation; sufficiency Read rest of chapter 6 in the book. Due M 11/8: Problems 6.3.6, 6.3.8, 6.3.10, 6.4.2, 6.4.4, 6.4.8, 6.5.2, 6.5.6, 6.5.10, 6.6.2, 6.6.4. M, Nov 8 Exponential families; conjugate priors. Method of moments. Asymptotic ideas: consistency, asymptotic efficiency Read chapter 7 in the book. Due W 11/10: Problems 6.6.6, 6.6.12, 6.7.2, 6.7.6, 6.8.4, 6.8.14, 6.9.2, 6.9.8, 6.9.14. W, Nov 10 Consistency of the MLE; Kullback-Leibler divergence. Asymptotic normality of the MLE; Fisher information Due M 11/15: Problems 7.1.2, 7.1.4, 7.1.6, 7.1.8, 7.2.2, 7.2.4, 7.2.10. M, Nov 15 Cramer-Rao bound; Simple hypothesis testing; likelihood ratio tests; Neyman-Pearson lemma Read chapter 8 in the book. Notes on hypothesis testing here. Due W 11/17: Problems 7.5.2, 7.5.6, 7.5.10, 7.6.2, 7.7.2, 7.7.4, 7.7.6 W, Nov 17 Hypothesis testing with compound alternates; uniformly most powerful tests Due M 11/22: Problems 7.7.8, 7.7.14, 7.8.2, 7.8.4, 7.8.8, 7.9.6, 7.9.16. M, Nov 22 Testing with compound null hypotheses; t-tests, F-tests Due M 11/29: Problems 8.1.2, 8.1.4, 8.1.8, 8.1.14, 8.2.4, 8.2.7, 8.3.2, 8.3.6, 8.3.12. Read chapter 9 in the book. W, Nov 24 No class Happy thanksgiving. M, Nov 29 Goodness of fit tests: chi-square and Kolmogorov-Smirnov Due W 12/1: Problems 7.3.6a, 7.3.8, 8.4.4, 8.4.6, 8.4.12, 8.5.2, 8.5.6, 8.5.8, 8.6.2, 8.6.8. W, Dec 1 Basic nonparametrics: sign and rank tests. Resampling: the bootstrap, jackknife, and permutation tests. Due M 12/6: Problems 8.7.2, 8.7.8, 9.1.4, 9.1.6, 9.1.8, 9.2.2, 9.2.4, 9.3.4, 9.3.6. M, Dec 6 Robust estimation. Nonparametric density estimation. Due W 12/8: Problems 9.4.4, 9.6.3, 9.6.4, 9.8.2, 9.8.12, 9.9.4, 9.9.6, 9.9.14. W, Dec 8 The expectation-maximization (EM) algorithm; fitting mixture models. Basics of linear regression. No HW due for M, Dec 13 - think of questions for the review session. M, Dec 13 Review session - last class. E-mail me questions before class. M, Dec 20 Final exam Usual time and place.