A question about transformation in regression

Alban Z writes,

I am seeking for your view on some concept. This is about transforming a dependent variable to make it normally distributed before doing a regression. For situations where common strategies like logarithm transformation, taking square root …. do not help in making a variable (close to) normally distributed, some of the literature suggests using the so called *Inverse normal transformation: The transformation involves ranking the observations in the dependent variable, and then matching the percentile of each observation to the corresponding percentile in the standard normal distribution. Using the resulting percentiles, each observation is replaced with the corresponding z-score from the standard normal distribution. When there are ties, percentiles are averaged across all ties*.

What are your thoughts about the above procedure? Do you recommend using it?

My reply: I do not recommend transforming to make a variable have a particular distribution. Additivity and linearity of the model are more important. We discuss the issue further in chapter 4 of our new book. See also here.

1 thought on “A question about transformation in regression

  1. Keep in mind that the assumption is not that your dependent variable is normally distributed, but that it is normaly distributed conditional on your explanatory variables, in other words it is your residuals that are assumed to be normally distributed.

    Also, the procedure you propose is a non-linear transformation, which means you loose information about the distances between the values. The only information you retain is the order.

Comments are closed.