Homework 2

This assignment is due in class Tuesday October 7th.

Here are some data pertaining to 233 branches of a particular retail bank. For the last 33 of the branches the variable "newAccounts" is missing ("NA"). Your job is to build a model using the first 200 branches to predict these 33 missing numbers. Specifically I want to you to:

  1. Fit a standard linear regression model to the first 200 branches and use this to predict the missing values (send me your estimated regression co-efficients and your 33 predictions).
  2. Use 10-fold cross-validation to estimate the predictive accuracy of your model (Tell me how you did this and what your estimated accuracy is. You can use squared error to measure accuracy.)
  3. Use ridge regresssion to perform the same task using cross-validation to select the shrinkage parameter (again send me your estimated regression co-efficients and your 33 predictions).
  4. (optional) Feel free to try something else if you think you can do better.
Use any software you like to accomplish this task.