# Statistics 4109: Probability and Statistics

### Fall 2007

This is a master's / advanced undergraduate level, double-credit course in probability and mathematical statistics.

Course goals: Statistics is about drawing inferences from data, in particular, data that involve randomness. Probability theory provides a mathematical structure for calculations that involve randomness. This course covers basic probability theory and and basic statistical theory for students seeking to build a foundation for further study in stochastic processes or statistical methods.
The aim of the first half of the course is for students to master the concepts of probability theory needed to understand two results that are fundamental to statistics, the law of large numbers and the central limit theorem. Along the way, students will also obtain a foundation sufficient for STAT 6501.
The aim of the second portion of the course is for students to master the standard mathematical formulations of the goals of inference and some of the elementary theory for evaluating the statistical methods that acheive these goals. Students will also gain some familiarity with a few of the classical statistical methods. Practical aspects of data analysis, however, will not be covered. Along the way, students will also obtain a foundation sufficient for, for example, STAT 4315, STAT 4220, and STAT 4201.
NOTE: The course covers the material of two other courses, STAT4105 and STAT4107, in a single semester. The pace, therefore, is fast, and not all students will be able to keep up. Furthermore, the material is cumulative, that is, almost every lecture builds on previously discussed concepts, and students unable to keep up will find themselves in a very uncomfortable position. Students who doubt their preparation or who are concerned that they will not be able to consistently devote time to the course would be well advised to consider taking STAT4105 this semester followed by STAT4107 the next. However, if you're thinking of taking 4105 and 4107 in the same semester, I strongly recommend you take 4109 instead; 4109 offers the big advantage of covering the material in the proper sequence.

Time: Tu,Th 6:10-8:55
Place: Pupin 428
Professor: Liam Paninski; Office: 1255 Amsterdam Ave, Rm 1028. Email: liam at stat dot columbia dot edu; be sure to put "4109" at the beginning of the subject line, or I might miss it.
Office Hours: Th 4-6 (but note that these are subject to change, so check the website before stopping by).
Textbook: Probability and Statistics, 3rd Ed., by DeGroot and Schervish (ISBN 0-201-52488-0, on order at Labyrinth Books). Available on reserve in the Mathematics library.
Teaching assistants: 1) Chia-Hui Huang; office: rm 905 in 1255 Amsterdam Ave; email: ch2342 at columbia dot edu; hours: M 6-8 pm. 2) Subhankar Sadhukhan; office: rm 903 in 1255 Amsterdam Ave; email: ss3240 at columbia dot edu; hours: Tu 8:30-10:30 am.
Prerequisite: A good working knowledge of calculus is necessary: differentiation, integration, infinite sums, Taylor expansions, limits. No previous experience with statistics or probability (or gambling) is necessary.

Evaluation: There will be a problem set due each lecture with the exception of the first lecture, the midterm date (October 23), and the lecture following the midterm. The midterm examination will be an in-class exam and will cover the material in chapters 1-5 and 11 of the text (probability theory). There will be a final examination that will cover the material in the sixth chapter through the middle of the tenth chapter (statistics). The midterm and the final will have equal weight in the assignment of final grades, and will be more important than the problem sets: top grades will be given to students who show mastery of the course material on the midterm and final examinations regardless of their performance on the problem sets, but performance on the problem sets can, to an extent, offset a poor performance on one or both of the exams. Exam problems will be similar to those given in the problem sets and worked out in the lectures. The exams will be closed-book and closed-note.

Old homeworks will be deposited in room 904 in the stat dept building.

Final Exam: Tu, Dec 18, 7:10-10 pm (NOTE CHANGE IN TIME), Pupin 428 (normal place).
No makeup midterm or final will be given.
Homework will be due at the beginning of the following class. No late homework will be accepted.
Students are encouraged to work together on the homework assignments but should write up solutions on their own. Of course, all work on the exams absolutely must be each student's alone.
Solutions to the homework assignments will be posted on Courseworks each week.

### Part 1: Probability

The first half of the course will cover most of chapters 1-5 and 11 from the textbook, with a bit of extra material thrown in (e.g., Stirling's approximation; Chernoff's inequality). A brief overview of the topics we will cover is available here (pdf, 1.6Mb).

Date Topic Notes
Tu, Sept 4 Introduction, sample spaces, probability axioms Read chapter 1 in the book. Due for Th 9/6: Problems 1.4.2, 1.4.4, 1.4.6, 1.5.2, 1.5.4, 1.5.6, 1.5.10, 1.5.12 from the book. Lecture notes for chapter one here (pdf).
Th, Sept 6 Combinatorics, Stirling's approximation; conditional probability Read chapter 2.1-2.3 in the book. Due T 9/11: Problems 1.6.2, 1.6.6, 1.6.8, 1.7.2, 1.7.4, 1.7.6, 1.7.8, 1.7.10, 1.8.2, 1.8.8, 1.8.14, 1.9.4, 1.9.8.
Tu, Sept 11 More on conditional probabilities, Bayes rule Due Th 9/13: Problems 1.12.2, 1.12.10, 2.1.2, 2.1.6, 2.1.8, 2.2.2, 2.2.4, 2.2.6, 2.2.10, 2.3.8, 2.3.10, 2.3.20. Lecture notes for chapter two here (pdf).
Th, Sept 13 Independent events. Random variables and distributions; pmf's, pdf's, and cdf's Read chapter 3 in the book. Due Tu 9/18: Problems 2.2.12, 2.2.14, 2.3.12, 2.3.18, 3.1.2, 3.1.4, 3.1.6.
Tu, Sept 18 Multivariate distributions; functions of a random variable; convolution Due Th 9/20: Problems 3.1.8, 3.2.2, 3.2.4, 3.2.8, 3.2.10, 3.3.2, 3.3.14, 3.4.2, 3.4.4. Lecture notes for chapter three here (pdf).
Th, Sept 20 Expectations, variance Read chapter 4 in the book. Due Tu 9/25: Problems 3.8.2, 3.8.6, 3.8.8, 3.9.2, 3.9.4, 3.9.8, 3.9.16, 3.10.20. Lecture notes for chapter four here.
Tu, Sept 25 Moment-generating functions; Covariance and correlation; sample means Due Th 9/27: Problems 4.1.2, 4.1.12, 4.2.2, 4.2.8, 4.2.10, 4.3.2, 4.3.4, 4.3.6, 4.4.2, 4.4.4, 4.4.10.
Th, Sept 27 Inequalities: Markov, Chebyshev, Chernoff, and Jensen; law of large numbers; special discrete distributions Due for Tu 10/2: Problems 3.5.2, 3.5.4, 3.5.6, 3.6.2, 3.6.4, 3.6.10, 3.7.2, 3.7.8. Lecture notes for inequalities here.
Tu, Oct 2 Special continuous distributions; order statistics Read chapter 5 in the book. Due Th 10/4: Problems 4.5.2, 4.5.6, 4.5.12, 4.6.4, 4.6.8, 4.6.10, 4.7.2, 4.7.6, 4.7.12, 4.8.2, 4.8.8, 4.8.12. Lecture notes for chapter five here.
Th, Oct 4 Central limit theorem; convergence in distribution Due Tu 10/9: Problems 5.2.4, 5.2.6, 5.2.8, 5.3.2, 5.3.6, 5.3.8, 5.4.2, 5.4.8, 5.4.14, 5.5.2, 5.5.6. Notes on CLT here.
Tu, Oct 9 Basic simulation theory: Monte Carlo integration, importance sampling Read chapter 11.1-11.3 in the book. Due Th 10/11: Problems 5.6.2(a)-(d), 5.6.6, 5.6.12, 5.6.14, 5.6.18, 5.7.2, 5.7.4, 5.7.6, 5.7.10.

### Part 2: Statistics

The second half of the course will cover most of chapters 6-9 from the textbook, with a bit of extra material thrown in (e.g., a bit of chapter 10 on regression).

 Th, Oct 11 Decision theory Read chapter 6 in the book, and reread section 4.9. Due Tu 10/16: Problems 5.8.2, 5.8.6, 5.9.6, 5.9.16, 5.9.18, 5.9.22, 5.10.2, 5.10.8, 5.11.4, 11.2.2, 11.2.4. Notes on decision theory here. Tu, Oct 16 Introduction to estimation theory; Bayes estimation; maximum likelihood Due Th 10/18: Problems 4.9.4, 4.9.6, 4.9.8, 6.2.4, 6.2.6, 6.2.10, 6.3.6, 6.3.10, 6.3.12. Notes on estimation theory here. Th, Oct 18 Midterm review No HW Due Th 10/18: midterm (covers material in chapters 1-5 and 11). Tu, Oct 23 Midterm exam No HW due Th 10/25. Th, Oct 25 Bias and variance; more on maximum likelihood Due Tu 10/30: Problems 6.3.8, 6.4.2, 6.4.4, 6.4.8, 6.5.2, 6.5.6, 6.5.10, 6.6.2, 6.6.4 Tu, Oct 30 Sufficiency; exponential families Read chapter 7 in the book. Due Th 11/1: Problems 6.6.6, 6.6.12, 6.7.2, 6.7.6, 6.8.4, 6.8.14, 6.9.2, 6.9.8, 6.9.14. Th, Nov 1 More on exponential families: conjugate priors, method of moments. Asymptotic ideas: consistency, asymptotic efficiency Due Th 11/8: Problems 7.1.2, 7.1.4, 7.1.6, 7.1.8, 7.2.2, 7.2.4, 7.2.10, 7.3.6a, 7.3.8. Tu, Nov 6 No class - University holiday Remember to vote... Th, Nov 8 Consistency of the MLE; Kullback-Leibler divergence. Asymptotic normality of the MLE; Fisher information; Cramer-Rao bound Due Tu 11/13: Problems 7.5.2, 7.5.6, 7.5.10, 7.6.2, 7.7.2, 7.7.4, 7.7.6 Tu, Nov 13 Simple hypothesis testing; likelihood ratio tests; Neyman-Pearson lemma Read chapter 8 in the book. Notes on hypothesis testing here. Due Th 11/15: Problems 7.7.8, 7.7.14, 7.8.2, 7.8.4, 7.8.8, 7.9.6, 7.9.16. Th, Nov 15 Hypothesis testing with compound alternates; uniformly most powerful tests Due Tu 11/20: Problems 8.1.2, 8.1.4, 8.1.8, 8.1.14, 8.2.4, 8.2.7, 8.3.2, 8.3.6, 8.3.12. Tu, Nov 20 Testing with compound null hypotheses; t-tests, F-tests Due Tu 11/27: Problems 8.4.4, 8.4.6, 8.4.12, 8.5.2, 8.5.6, 8.5.8, 8.6.2, 8.6.8, 8.7.2, 8.7.8. Read chapter 9 in the book. Th, Nov 22 No class Happy thanksgiving. Tu, Nov 27 Chi-square tests for goodness of fit, homogeneity, and independence Due Th 11/29: Problems 9.1.4, 9.1.6, 9.1.8, 9.2.2, 9.2.4, 9.3.4, 9.3.6, 9.4.4. Th, Nov 29 Nonparametrics: Kolmogorov-Smirnov / sign and rank tests. Basics of linear regression. Due Tu 12/4: Problems 9.6.3, 9.6.4, 9.8.2, 9.8.12, 9.9.4, 9.9.6, 9.9.14 Tu, Dec 4 The EM algorithm; fitting mixture models Due Th 12/6: E-mail me questions for the review. Th, Dec 6 Review session - last class. E-mail me questions. See here for some extra course notes to help study. Tu, Dec 18 Final exam Note change in time: 7:10-10:00 pm (usual place).