A question about standardizing regression predictors

Posted on August 20, 2009 8:59 PM by Andrew

Shane Murphy writes:

I have a question about normalizing variables (this is based on your recommendation of subtracting the mean and dividing by two standard deviations).

I have a panel which I am using to learn about multiple outcomes, but not every observation has information about every outcome. For instance, I might have some continuous testscore data for congresspeople, and binary scores for their votes, but not all congress people vote the same. If I want to regress model1<-vote1~testscore and model2<-vote2~testscore, and want to normalize testscore, should I normalize testscore to the mean and sd of the congresspeople who voted in vote1 for model1 and to the mean and sd of the congresspeople who voted in vote2 for model2, or should I normalize testscore based on all congresspeople? If I suspect selection bias for who votes in vote1 and vote2, I think that I should normalize against the subsets for each model. Is this correct? Can I ignore this fear? But if I want to compare coefficients between the two models, am I right that normalizing differently for the different models will affect the comparison? Will it affect a comparison of intercepts? Do you have a recommendation here?

My reply: Yes, I think it would make sense to normalize the same way for both models. If you have different normalizations for different models, you’re asking for trouble. But maybe you should be fitting a latent-parameter (“ideal-point”) model in any case.

2 thoughts on “A question about standardizing regression predictors”

Daniel Lakeland on August 20, 2009 5:31 PM at 5:31 pm said:

I don't think it makes sense to standardize on a different subset for the different models. You won't be able to interpret the differences between the two models without reference to the different standardizations.

The point of standardization is to make things MORE interpretable.

Here, I think a standardization method that makes sense is subtract the mean of democrats and divide by the difference in means between republicans and democrats. Now the unit change from 0 to 1 is the difference in average between the two parties and is therefore interpretable, also test values more to the "right" are more republican…

This plays into my general theme in my responses to standardization related topics here: think of a "natural" unit that makes sense for the problem and standardize by that. The natural unit here is the difference in average between the two major parties.
shane on August 21, 2009 6:50 AM at 6:50 am said:

Thanks a lot for the advice. I had to look up ideal point models, and you were one of the top results, so for completeness, I thought I'd add a link to your previous discussion of these models.

Comments are closed.