Statistics 4315: Linear Regression Models

Spring 2006

This is a master's / advanced undergraduate level course in linear regression methods.

Time: M, W 7:40-8:55
Place: Math 203
Instructor: Liam Paninski; Office: 1255 Amsterdam Ave, Rm 1028. Email: liam at stat dot columbia dot edu; be sure to put "4315" at the beginning of the subject line, or I might miss it.
Office Hours: W 5:30-7:30 (but note that these are subject to change, so check the website before stopping by).
Text: Applied Linear Regression Models, 4th Ed., by Kutner, Nachtsheim, and Neter. McGraw-Hill, 2004. (Available at Labyrinth Bookstore.)
TA: Man Jin; Office: 1255 Amsterdam Ave, Rm 1021. Email: manj at stat dot columbia dot edu.
TA Hours: Th 7-8 in Rm 901, 1255 Amsterdam Ave.
Recitation: Tu 5:30-6:30 in Rm 424 Pupin. (Not required; just for extra help.)
Prerequisite: Calculus; probability and statistics at the level of W4150, or W4105 and W4107 taken concurrently.
Corequisite: Linear algebra.
Grading: Grades will be assigned on a curve, using the following percentages: 30% Homework, 30% Midterm, 40% Final. No makeup midterm will be given. (If you miss the midterm, the final will count towards 70% of your grade.)
Homework will include a mix of paper and computer problems, and will be assigned in class as we go along; the assignments and due dates will be posted on this webpage. No late homework will be accepted; to compensate for this, we will drop the lowest score.
Midterm: The midterm will be on Wednesday, March 8, during class hours.
Final exam: The final will be on Wednesday, May 10, during normal class hours (7:40-8:55 pm), in the usual place (Math 203).
Computing: You are free to use the software of your choice to complete the homework; we can provide help with Matlab, R, S, or SAS. See the ACIS page for information on software and computing labs. See this online book for help with R.

Scope: This class is about the theory and practice of regression analysis. A geometic approach to the theory and use of the computer to analyze data will be emphasized. The first part of the course will focus on the basic techniques with one-dimensional data, and will assume familiarity with the following topics from statistics (see appendix A in the book for a quick review, or e.g. Rice or a similar textbook for more details):
- Gaussian distributions
- Joint, conditional distributions
- Law of large numbers, central limit theorem
- Estimation
- Bias, variance, covariance
- Maximum likelihood
- Hypothesis testing
- Confidence intervals

The second part of the course will look specifically at the challenges posed by multivariate data. We will do a very brief linear algebra review, but it will be essential to be familiar with the following topics from linear algebra:
- Vectors, matrices
- Linear transformations, bases
- Matrix inverse
- Eigenvalues, eigenvectors
- Quadratic forms
- Determinants

If you haven't taken linear algebra before, don't despair. Some information to help you get started is here.

Part 1: Univariate methods

Date Topic Notes
W, Jan 18 Introduction
M, Jan 23 Simple linear regression model; least squares; residuals Read chapter 1 in book.
W, Jan 25 Normal error regression model; maximum likelihood HW due Feb 1: Exercises 5, 7, 13, 18, 19, 23, 34-36, 40, and 41 from Chapter 1 in the book. (The data sets referred to can be downloaded for free here.) Solutions here.
M, Jan 30 Convex optimization: least-squares, least-absolute deviation, least-maximal deviation. Inference in simple normal regression model Read chapter 2.1-2.6 for this week.
W, Feb 1 Proof of Gauss-Markov thm; more on inference in normal regression model HW due Feb 8: Exercises 1, 4, 13, 50-52, and 54 from Chapter 2 in the book, and problems 1-3 here. Solutions here and some sample contour plot code here.
M, Feb 6 Prediction of new observations
W, Feb 8 Analysis of variance (ANOVA); F-test Read chapter 2.7-2.8. HW due Feb 15: 11, 12, 16, 18, 55-57 from Chapter 2. Solutions here.
M, Feb 13 General linear test; coefficient of determination Read chapter 2.9-2.11.
W, Feb 15 Normal correlation model HW due Feb 22: 53, 59-61, and 66 from Chapter 2, and problems 1-2 here. Solutions here.
M, Feb 20 Rank correlation; model diagnostics Read 3.1-3.7 for this week.
W, Feb 22 Goodness of fit HW due Mar 1: 6, 14, and 19-23 from Chapter 3. Solutions here.
M, Feb 27 Remedial measures: weighted least-squares and transformations Read the rest of Chapter 3 and take a look at Chapter 4 for this week.
W, Mar 1 Nonparametric estimation of the regression function; regression through the origin No more HW due until after the break.
M, Mar 6 Midterm review Bring questions!
W, Mar 8 Midterm
Mar 13-17 Spring break

Part 2: Multivariate methods

Date Topic Notes
M, Mar 20 Midterm rehash. Linear algebra review: matrix version of simple linear regression. Read chapter 5.
W, Mar 22 Linear algebra review: geometry of quadratic forms, multivariate Gaussians HW due Mar 29: 17, 20, 24, 26, and 29 from Chapter 5.
M, Mar 27 PCA, change of basis, Cochran's theorem and chi-square degrees of freedom Read chapter 6. (Some more info on PCA and related topics is available here.)
W, Mar 29 Multiple linear regression, regression with nonlinear terms HW due Apr 5: 3, 4, 5, 22, 24, 25 from Chapter 6 and problems 1 and 2 here. Solutions here; code sample here.
M, Apr 3 Geometry of normal equations, joint inferences Read chapter 7.
W, Apr 5 More generalized linear tests, standardized variables, and introduction to multicollinearity HW due Apr 12: 1, 8, 16, 20, 22, 27, 31, and 35 from Chapter 7. Solutions here; code sample here.
M, Apr 10 Handling quantitative vs. qualitative predictors Read chapter 8.
W, Apr 12 Model selection: prediction error, cross-validation, BIC Read chapter 9. HW due Apr 19: 2, 6, 20, 24, and 42 from Chapter 8; 12, 13, and 23 from Chapter 9. Solutions here.
M, Apr 17 Outlier detection and handling Read chapter 10.
W, Apr 19 Regularization: ridge regression, robust regression Read chapter 11. HW due Apr 26: problems 10.12, 10.23, 10.24, 11.21, 11.22, and 14.12 from the book. Solutions here.
M, Apr 24 Intro to logistic regression Read chapter 14.
W, Apr 26 More on logistic regression; classification; support vector machines
M, May 1 Last day of class: review Bring questions!
W, May 10 Final exam During usual class hours, in usual place