Statistics 4109: Probability and Statistics

Fall 2009


This is a master's / advanced undergraduate level, double-credit course in probability and mathematical statistics.

---If you would like to register for the course but have been denied access, please fill in the form here.

Course goals: Statistics is about drawing inferences from data, in particular, data that involve randomness. Probability theory provides a mathematical structure for calculations that involve randomness. This course covers basic probability theory and and basic statistical theory for students seeking to build a foundation for further study in stochastic processes or statistical methods.
The aim of the first half of the course is for students to master the concepts of probability theory needed to understand two results that are fundamental to statistics, the law of large numbers and the central limit theorem. Along the way, students will also obtain a foundation sufficient for STAT 6501.
The aim of the second portion of the course is for students to master the standard mathematical formulations of the goals of inference and some of the elementary theory for evaluating the statistical methods that acheive these goals. Students will also gain some familiarity with a few of the classical statistical methods. Practical aspects of data analysis, however, will not be covered. Along the way, students will also obtain a foundation sufficient for, for example, STAT 4315, STAT 4220, and STAT 4201.
NOTE: The course covers the material of two other courses, STAT4105 and STAT4107, in a single semester. The pace, therefore, is fast, and not all students will be able to keep up. Furthermore, the material is cumulative, that is, almost every lecture builds on previously discussed concepts, and students unable to keep up will find themselves in a very uncomfortable position. Students who doubt their preparation or who are concerned that they will not be able to consistently devote time to the course would be well advised to consider taking STAT4105 this semester followed by STAT4107 the next. However, if you're thinking of taking 4105 and 4107 in the same semester, I strongly recommend you take 4109 instead; 4109 offers the big advantage of covering the material in the proper sequence.

Time: MW 1:10-3:55
Place: Hamilton 703
Professor: Liam Paninski; Office: 1255 Amsterdam Ave, Rm 1028. Email: liam at stat dot columbia dot edu; be sure to put "4109" at the beginning of the subject line, or I might miss it.
Office Hours: W 4-6 (but note that these are subject to change, so check the website before stopping by).
Textbook: Probability and Statistics, 3rd Ed., by DeGroot and Schervish (ISBN 0-201-52488-0, on order at Labyrinth Books). Available on reserve in the Mathematics library.
Teaching assistants: 1) Wenwen Hu; office: 402 Hamilton; email: wh2198 at columbia dot edu; hour: M 4:10-5. 2) Yongbum Cho; office: rm 901 in 1255 Amsterdam Ave; hours: T,Th 10:30-11:30; email: yong at stat dot columbia dot edu.
Prerequisite: A good working knowledge of single-variable calculus is necessary: differentiation, integration, infinite sums, Taylor expansions, limits. You should also have some experience with linear algebra: matrices, eigenvectors, quadratic forms. No previous experience with statistics or probability (or gambling) is necessary.

Evaluation: There will be a problem set due each lecture with the exception of the first lecture, the midterm date (tentatively, Oct. 28), and the lecture following the midterm. The midterm examination will be an in-class exam and will cover the material in chapters 1-5 and 11 of the text (probability theory). There will be a final examination that will cover the material in the sixth chapter through the middle of the tenth chapter (statistics). The midterm and the final will have equal weight in the assignment of final grades, and will be more important than the problem sets: top grades will be given to students who show mastery of the course material on the midterm and final examinations regardless of their performance on the problem sets, but performance on the problem sets can, to an extent, offset a poor performance on one or both of the exams. Exam problems will be similar to those given in the problem sets and worked out in the lectures. The exams will be closed-book and closed-note.

Old homeworks will be deposited in room 904 in the stat dept building.

No makeup midterm or final will be given.
Homework will be due at the beginning of the following class. No late homework will be accepted.
Students are encouraged to work together on the homework assignments but should write up solutions on their own. Of course, all work on the exams absolutely must be each student's alone.
Solutions to the homework assignments will be posted on Courseworks each week.


Part 1: Probability

The first half of the course will cover most of chapters 1-5 and 11 from the textbook, with a bit of extra material thrown in (e.g., Stirling's approximation; Chernoff's inequality).

Date Topic Notes
W, Sept 9 Introduction, sample spaces, probability axioms Read chapter 1 in the book. Due for M 9/14: Problems 1.4.2, 1.4.4, 1.4.6, 1.5.2, 1.5.4, 1.5.6, 1.5.10, 1.5.12 from the book. Lecture notes for chapter one here (pdf).
M, Sept 14 Combinatorics, Stirling's approximation. Read chapter 2. Due W 9/16: Problems 1.6.2, 1.6.6, 1.6.8, 1.7.2, 1.7.4, 1.7.6, 1.7.8, 1.7.10, 1.8.2, 1.8.8, 1.8.14, 1.9.4, 1.9.8.
W, Sept 16 Conditional probability, Bayes rule. Independent events. Due M 9/21: Problems 1.12.2, 1.12.10, 2.1.2, 2.1.6, 2.1.8, 2.2.2, 2.2.4, 2.2.6, 2.2.10, 2.3.8, 2.3.10, 2.3.20. Lecture notes for chapter two here (pdf); scanned hw3 problems here.
M, Sept 21 Markov chains. Random variables and distributions; pmf's, pdf's, and cdf's Read chapter 3 in the book. Due W 9/23: Problems 2.2.12, 2.2.14, 2.3.12, 2.3.18, 2.4.6, 2.4.12, 3.1.2, 3.1.4, 3.1.6.
W, Sept 23 Multivariate distributions; functions of a random variable; convolution Due M 9/28: Problems 3.1.8, 3.2.2, 3.2.4, 3.2.8, 3.2.10, 3.3.2, 3.3.14, 3.4.2, 3.4.4. Lecture notes for chapter three here (pdf).
M, Sept 28 Expectations, variance Read chapter 4 in the book. Due W 9/30: Problems 3.5.2, 3.5.4, 3.5.6, 3.6.2, 3.6.4, 3.6.10, 3.7.2, 3.7.8.
W, Sept 30 Moment-generating functions; Covariance and correlation; sample means Due M 10/5: Problems 3.8.2, 3.8.6, 3.8.8, 3.9.2, 3.9.4, 3.9.8, 3.9.16, 3.10.20. Lecture notes for chapter four here (pdf).
M, Oct 5 Inequalities: Markov, Chebyshev, Chernoff, and Jensen; law of large numbers; special discrete distributions Due for W 10/7: Problems 4.1.2, 4.1.12, 4.2.2, 4.2.8, 4.2.10, 4.3.2, 4.3.4, 4.3.6, 4.4.2, 4.4.4, 4.4.10. Lecture notes for inequalities here.
W, Oct 7 Special continuous distributions Read chapter 5 in the book. Due M 10/12: Problems 4.5.2, 4.5.6, 4.5.12, 4.6.4, 4.6.8, 4.6.10, 4.7.2, 4.7.6, 4.7.12, 4.8.2, 4.8.8, 4.8.12. Lecture notes for chapter five here.
M, Oct 12 Order statistics; central limit theorem; convergence in distribution Due W 10/14: Problems 5.2.4, 5.2.6, 5.2.8, 5.3.2, 5.3.6, 5.3.8, 5.4.2, 5.4.8, 5.4.14, 5.5.2, 5.5.6. Notes on CLT here.
W, Oct 14 Basic simulation theory: Monte Carlo integration, importance sampling Read chapter 11 in the book. Due M 10/19: Problems 5.6.2(a)-(d), 5.6.6, 5.6.12, 5.6.14, 5.6.18, 5.7.2, 5.7.4, 5.7.6, 5.7.10. Some related notes are here.
M, Oct 19 Decision theory Due W 10/21: Problems 5.8.2, 5.8.6, 5.9.6, 5.9.16, 5.9.18, 5.9.22, 5.10.2, 5.10.8, 5.11.4, 11.2.2, 11.2.4. Notes on decision theory here.
W, Oct 21 Bayes estimation Read chapter 6.1-6.4 in the book. Due M 10/26: Problems 4.9.4, 4.9.6, 4.9.8, 6.2.4, 6.2.6, 6.2.10. Notes on estimation theory here.
M, Oct 26 Midterm review No HW Due W 10/28: midterm (covers material in chapters 1-5 and 11). E-mail me problems you'd like me to do in class.
W, Oct 28 Midterm exam No HW due M 11/2.
M, Nov 2 No class - University holiday

Part 2: Statistics

The second half of the course will cover most of chapters 6-9 from the textbook, with a bit of extra material thrown in (e.g., a bit of chapter 10 on regression).

W, Nov 4 Bias/variance of estimators; maximum likelihood estimation; sufficiency Read rest of chapter 6 in the book. Due M 11/9: Problems 6.3.6, 6.3.8, 6.3.10, 6.4.2, 6.4.4, 6.4.8, 6.5.2, 6.5.6, 6.5.10, 6.6.2, 6.6.4.
M, Nov 9 Exponential families; conjugate priors. Method of moments. Asymptotic ideas: consistency, asymptotic efficiency Read chapter 7 in the book. Due W 11/11: Problems 6.6.6, 6.6.12, 6.7.2, 6.7.6, 6.8.4, 6.8.14, 6.9.2, 6.9.8, 6.9.14.
W, Nov 11 Consistency of the MLE; Kullback-Leibler divergence. Asymptotic normality of the MLE; Fisher information; Cramer-Rao bound Due M 11/16: Problems 7.1.2, 7.1.4, 7.1.6, 7.1.8, 7.2.2, 7.2.4, 7.2.10.
M, Nov 16 Simple hypothesis testing; likelihood ratio tests; Neyman-Pearson lemma Read chapter 8 in the book. Notes on hypothesis testing here. Due W 11/18: Problems 7.5.2, 7.5.6, 7.5.10, 7.6.2, 7.7.2, 7.7.4, 7.7.6
W, Nov 18 Hypothesis testing with compound alternates; uniformly most powerful tests Due M 11/23: Problems 7.7.8, 7.7.14, 7.8.2, 7.8.4, 7.8.8, 7.9.6, 7.9.16.
M, Nov 23 Testing with compound null hypotheses; t-tests, F-tests Due M 11/30: Problems 8.1.2, 8.1.4, 8.1.8, 8.1.14, 8.2.4, 8.2.7, 8.3.2, 8.3.6, 8.3.12. Read chapter 9 in the book.
W, Nov 25 No class Happy thanksgiving.
M, Nov 30 Goodness of fit tests: chi-square and Kolmogorov-Smirnov Due W 12/2: Problems 7.3.6a, 7.3.8, 8.4.4, 8.4.6, 8.4.12, 8.5.2, 8.5.6, 8.5.8, 8.6.2, 8.6.8.
W, Dec 2 Basic nonparametrics: sign and rank tests. Basics of linear regression. Due M 12/7: Problems 8.7.2, 8.7.8, 9.1.4, 9.1.6, 9.1.8, 9.2.2, 9.2.4, 9.3.4, 9.3.6.
M, Dec 7 The EM algorithm; fitting mixture models Due W 12/9: Problems 9.4.4, 9.6.3, 9.6.4, 9.8.2, 9.8.12, 9.9.4, 9.9.6, 9.9.14.
W, Dec 9 Review session - last class. E-mail me questions.
M, Dec 21 Final exam Usual time and place.