More on scaling regression inputs

Tom Knapp writes:

I have four questions and one correction about your article about scaling regression inputs in Statistics in Medicine:

1. In your party identification example you show that division by two standard deviations reversed the relative magnitudes of some regression coefficients. Near the end of your paper, with respect to Itani et al., you say “dividing by one (rather than two) standard deviation will lead the reader to understate the importance of these continuous inputs”. Is that always the case?

2. How did you get your paper published in SIM, given that the only reference to medicine is in two of those last three examples?

3. In the text accompanying Figure 2 you say “the coefficient for the interaction of income and ideology is now higher than the coefficient for race [black]”. If I’m reading the data in that figure correctly I think you meant to say that the coefficient for parents.party is now higher.

4. On page 2866 you say that log transformations are not appropriate for Likert scales. Do you have a reference for that claim? I think Likert scales are inappropriate for linear regression analysis in general and require the use of ordinal regression analysis.

5. On page 2868 you have a brief paragraph regarding the ability of experienced practitioners to interpret the regression coefficients in the top half of Figure 2. I guess I qualify (I taught statistics for 41 years), and I usually interpret regression coefficients by eyeballing the associated t’s or p’s. Why didn’t you provide same? I calculated all of the t’s for the unscaled coefficients; for black and for parents.party I got -5.76 and 16.33, respectively, so parents.party is the stronger predictor. [Incidentally, you probably should have reported another
place or two for the data in Figure 2, since the coefficient and the standard error for age squared are both 0.00]

My reply: First off, it’s a thrill to get a comment from someone who taught statistics for 41 years! I’ve been doing it for barely half as long. To get to specifics:

1. Dividing by 1 sd is roughly comparable to a binary predictor being coded as +/- 1. Dividing by 2 sd is roughly comparable to a binary predictor being coded as 0/1. The 0/1 coding is much more common (at least, in the examples that I’ve seen), which is why I chose the 2 sd scaling.

2. I think it got rejected by 2 other places; I can’t quite remember where. But each time I made major improvements.

3. Yes, that’s right. D’oh!

4. I’m not so bothered by treating a 1-5 or 1-10 scale linearly, on the assumption that the difference between 1 and 2 is approximately the same as the difference between 3 and 4, or whatever. I’m working on a research project to use Bayesian methods to bridge between the extremes of pure linearity and pure ordered-categorical models.

5. That’s a good point. Ordering by statistical significance is not the same as ordering by importance, but it would’ve been a good idea to discuss this in the article.