Statistics 4109: Probability and Statistics

Fall 2006


This is a master's / advanced undergraduate level, double-credit course in probability and mathematical statistics.

Course goals: Statistics is about drawing inferences from data, in particular, data that involve randomness. Probability theory provides a mathematical structure for calculations that involve randomness. This course covers basic probability theory and and basic statistical theory for students seeking to build a foundation for further study in stochastic processes or statistical methods.
The aim of the first half of the course is for students to master the concepts of probability theory needed to understand two results that are fundamental to statistics, the law of large numbers and the central limit theorem. Along the way, students will also obtain a foundation sufficient for STAT 6501.
The aim of the second portion of the course is for students to master the standard mathematical formulations of the goals of inference and some of the elementary theory for evaluating the statistical methods that acheive these goals. Students will also gain some familiarity with a few of the classical statistical methods. Practical aspects of data analysis, however, will not be covered. Along the way, students will also obtain a foundation sufficient for, for example, STAT 4315, STAT 4220, and STAT 4201.
NOTE: The course covers the material of two other courses, STAT4105 and STAT4107, in a single semester. The pace, therefore, is fast, and not all students will be able to keep up. Furthermore, the material is cumulative, that is, almost every lecture builds on previously discussed concepts, and students unable to keep up will find themselves in a very uncomfortable position. Students who doubt their preparation or who are concerned that they will not be able to consistently devote time to the course would be well advised to consider taking STAT4105 this semester followed by STAT4107 the next. However, if you're thinking of taking 4105 and 4107 in the same semester, I strongly recommend you take 4109 instead; 4109 offers the big advantage of covering the material in the proper sequence.

Time: M,W 6:10-8:55
Place: 312 Math
Professor: Liam Paninski; Office: 1255 Amsterdam Ave, Rm 1028. Email: liam at stat dot columbia dot edu; be sure to put "4109" at the beginning of the subject line, or I might miss it.
Office Hours: M 4:30-5:30 and Tu 5-6 (but note that these are subject to change, so check the website before stopping by).
Textbook: Probability and Statistics, 3rd Ed., by DeGroot and Schervish (ISBN 0-201-52488-0, on order at Labyrinth Books). Available on reserve in the Mathematics library.
Teaching assistants: 1) Chang Ha; office: 1255 Amsterdam Ave, Rm 901; email: ch35 at columbia dot edu; hours: Th 5:00-7:00. 2) Chi-Tsun Chiu; office: 1255 Amsterdam Ave, Rm 905; email: cc2569 at columbia dot edu; hours: F 9-11.
Prerequisite: A good working knowledge of calculus is necessary: differentiation, integration, infinite sums, Taylor expansions, limits. No previous experience with statistics or probability (or gambling) is necessary.

Evaluation: There will be a problem set due each lecture with the exception of the first lecture, the midterm date (October 16), and the lecture following the midterm. The midterm examination will be an in-class exam and will cover the material in chapters 1-5 and 11 of the text (probability theory). There will be a final examination that will cover the material in the sixth chapter through the middle of the tenth chapter (statistics). The midterm and the final will have equal weight in the assignment of final grades, and will be more important than the problem sets: top grades will be given to students who show mastery of the course material on the midterm and final examinations regardless of their performance on the problem sets, but performance on the problem sets can, to an extent, offset a poor performance on one or both of the exams. Exam problems will be similar to those given in the problem sets and worked out in the lectures. The exams will be closed-book and closed-note.
Final Exam: Monday 12/18/06, Math 312, 7:10-10:00 (note time change).
No makeup midterm or final will be given.
Homework will be due at the beginning of the following class. No late homework will be accepted.
Students are encouraged to work together on the homework assignments but should write up solutions on their own. Of course, all work on the exams absolutely must be each student's alone.
Solutions to the homework assignments will be posted on Courseworks each week.


Part 1: Probability

The first half of the course will cover most of chapters 1-5 and 11 from the textbook, with a bit of extra material thrown in (e.g., Stirling's approximation; Chernoff's inequality). A brief overview of the topics we will cover is available here (pdf, 1.6Mb).

Date Topic Notes
W, Sept 6 Introduction, sample spaces, probability axioms, combinatorics Read chapter 1 in the book. Due for M 9/11: Problems 1.4.2, 1.4.4, 1.4.6, 1.5.2, 1.5.4, 1.5.6, 1.5.10, 1.5.12 from the book. Lecture notes for chapter one here (pdf).
M, Sept 11 More combinatorics, Stirling's approximation; conditional probability Read chapter 2.1-2.3 in the book. Due W 9/13: Problems 1.6.2, 1.6.6, 1.6.8, 1.7.2, 1.7.4, 1.7.6, 1.7.8, 1.7.10, 1.8.2, 1.8.8, 1.8.14, 1.9.4, 1.9.8.
W, Sept 13 More on conditional probabilities, Bayes rule Due M 9/18: Problems 1.12.2, 1.12.10, 2.1.2, 2.1.6, 2.1.8, 2.2.2, 2.2.4, 2.2.6, 2.2.10, 2.3.8, 2.3.10, 2.3.20. Lecture notes for chapter two here (pdf).
M, Sept 18 Independent events Read chapter 3 in the book. Due W 9/20: Problems 2.2.12, 2.2.14, 2.3.12, 2.3.18, 3.1.2, 3.1.4, 3.1.6.
W, Sept 20 Random variables and distributions; pmf's, pdf's, and cdf's Due M 9/25: Problems 3.1.8, 3.2.2, 3.2.4, 3.2.8, 3.2.10, 3.3.2, 3.3.14, 3.4.2, 3.4.4.
M, Sept 25 Multivariate distributions; functions of a random variable; convolution Due W 9/27: Problems 3.5.2, 3.5.4, 3.5.6, 3.6.2, 3.6.4, 3.6.10, 3.7.2, 3.7.8. Lecture notes for chapter three here (pdf).
W, Sept 27 Expectations, variance, moment-generating functions Read chapter 4 in the book. Due M 10/2: Problems 3.8.2, 3.8.6, 3.8.8, 3.9.2, 3.9.4, 3.9.8, 3.9.16, 3.10.20.
M, Oct 2 Covariance and correlation; sample means Due W 10/4: Problems 4.1.2, 4.1.12, 4.2.2, 4.2.8, 4.2.10, 4.3.2, 4.3.4, 4.3.6, 4.4.2, 4.4.4, 4.4.10.
W, Oct 4 Inequalities: Markov, Chebyshev, Chernoff, and Jensen; law of large numbers; special discrete distributions Read chapter 5 in the book. Due M 10/9: Problems 4.5.2, 4.5.6, 4.5.12, 4.6.4, 4.6.8, 4.6.10, 4.7.2, 4.7.6, 4.7.12, 4.8.2, 4.8.8, 4.8.12. Lecture notes for chapter four here and part 2 here.
M, Oct 9 Special continuous distributions Due W 10/11: Problems 5.2.4, 5.2.6, 5.2.8, 5.3.2, 5.3.6, 5.3.8, 5.4.2, 5.4.8, 5.4.14, 5.5.2, 5.5.6.
W, Oct 11 Central limit theorem; convergence in distribution; order statistics; midterm review No HW due M 10/16: Midterm exam. Lecture notes for chapter five here. Notes on CLT here and order statistics here.
M, Oct 16 Midterm exam No HW due W 10/18.

Part 2: Statistics

The second half of the course will cover most of chapters 6-9 from the textbook, with a bit of extra material thrown in (e.g., a bit of chapter 10 on regression).

W, Oct 18 Midterm review; basic simulation theory: Monte Carlo integration, importance sampling Read chapter 11.1-11.3 in the book. Due 10/23: Problems 5.6.2(a)-(d), 5.6.6, 5.6.12, 5.6.14, 5.6.18, 5.7.2, 5.7.4, 5.7.6, 5.7.10.
M, Oct 23 Decision theory Read chapter 6 in the book, and reread section 4.9. Due W 10/25: Problems 5.8.2, 5.8.6, 5.9.6, 5.9.16, 5.9.18, 5.9.22, 5.10.2, 5.10.8, 5.11.4, 11.2.2, 11.2.4. Notes on decision theory here.
W, Oct 25 Introduction to estimation theory; Bayes estimation; maximum likelihood Due M 10/30: Problems 4.9.4, 4.9.6, 4.9.8, 6.2.4, 6.2.6, 6.2.10, 6.3.6, 6.3.10, 6.3.12. Notes on estimation theory here.
M, Oct 30 Sufficiency; exponential families Due W 11/1: Problems 6.3.8, 6.4.2, 6.4.4, 6.4.8, 6.5.2, 6.5.6, 6.5.10, 6.6.2, 6.6.4
W, Nov 1 Sampling distributions; bias and variance; delta method Read chapter 7 in the book. Due W 11/8: Problems 6.6.6, 6.6.12, 6.7.2, 6.7.6, 6.8.4, 6.8.14, 6.9.2, 6.9.8, 6.9.14.
M, Nov 6 No class - University holiday Remember to vote...
W, Nov 8 Asymptotic ideas: consistency, asymptotic efficiency. Consistency of the MLE; Kullback-Leibler divergence Due M 11/13: Problems 7.1.2, 7.1.4, 7.1.6, 7.1.8, 7.2.2, 7.2.4, 7.2.10, 7.3.6a, 7.3.8.
M, Nov 13 Asymptotic normality of the MLE; Fisher information; Cramer-Rao bound Due W 11/15: Problems 7.5.2, 7.5.6, 7.5.10, 7.6.2, 7.7.2, 7.7.4, 7.7.6
W, Nov 15 Simple hypothesis testing Due M 11/20: Read chapter 8 in the book. Problems 7.7.8, 7.7.14, 7.8.2, 7.8.4, 7.8.8, 7.9.6, 7.9.16. Notes on hypothesis testing here.
M, Nov 20 Hypothesis testing with compound alternates; uniformly most powerful tests Due M 11/27: Problems 8.1.2, 8.1.4, 8.1.8, 8.1.14, 8.2.4, 8.2.7, 8.3.2, 8.3.6, 8.3.12.
W, Nov 22 Optional review session E-mail me questions.
M, Nov 27 Testing with compound null hypotheses; t-tests, F-tests Due W 11/29: Problems 8.4.4, 8.4.6, 8.4.12, 8.5.2, 8.5.6, 8.5.8, 8.6.2, 8.6.8, 8.7.2, 8.7.8.
W, Nov 29 Chi-square tests for goodness of fit, homogeneity, and independence Due M 12/4: Read chapter 9 in the book. Problems 9.1.4, 9.1.6, 9.1.8, 9.2.2, 9.2.4, 9.3.4, 9.3.6, 9.4.4.
M, Dec 4 Nonparametrics: Kolmogorov-Smirnov / sign and rank tests Due W 12/6: Problems 9.6.3, 9.6.4, 9.8.2, 9.8.12, 9.9.4, 9.9.6, 9.9.14
W, Dec 6 Basics of linear regression Due M 12/11: E-mail me questions for the review.
M, Dec 11 Review session - last class. E-mail me questions. See here for some extra course notes to help study. Also see the Courseworks page (look under Assignments) for some sample final exam questions.
M, Dec 18 Final exam Note change in time: 7:10-10:00 pm (usual place).