This assignment is due in class Tuesday October 7th.
Here are some data pertaining to 233
branches of a particular retail bank. For the last 33 of the branches
the variable "newAccounts" is missing ("NA"). Your job is to build a model
using the first 200 branches to predict these 33 missing numbers. Specifically
I want to you to:
Use any software you like to accomplish this task.
- Fit a standard linear regression model to the first 200 branches
and use this to predict the missing values (send me your estimated regression
co-efficients and your 33 predictions).
- Use 10-fold cross-validation to estimate the predictive accuracy of your
model (Tell me how you did this and what your estimated accuracy is. You
can use squared error to measure accuracy.)
- Use ridge regresssion to perform the same task using cross-validation
to select the shrinkage parameter (again send me your
estimated regression co-efficients and your 33 predictions).
- (optional) Feel free to try something else if you think you can do better.