Log transformations and generalized linear models

Gregor writes,

I would like to hear your opinion on Paul Johnson comments here, where this link is provided.

In this note Paul states that:

The GLM really is diferent than OLS, even with a Normally distributed dependent variable, when the link function g is not the identity.

Using OLS with manually transformed data leads to horribly wrong parameter estimates. Let y_ii be the dependent variable with mean \mu. OLS estimates:

E(g(y_i)) = b_0 + b_1x_i

but the GLM estimates

g(E(y_i)) = b_0 + b_1x_i

This also applies to log transformation. So the following two approaches are not the same:

glm(log(y) ~ x, family = Gaussian(link = “identity”))

glm(y ~ x, family = Gaussian(link = “log”))

the difference is that first approach log transforms observed values, while the second one log transforms the expected value.

My reply: Yeah, that’s right. Usually I’d just take the log of the data, because, for all-positive outcomes, it typically makes sense to consider effects and errors as multiplicative (that is, additive on the log scale). And on the log scale you won’t get negative predictions. But another way to look at it is that the 2 models are very similar, with the key difference being the relation between the predicted value and the variance. In some problems, you won’t want to pick either model; instead you can model the variance as a power law, with power estimated from the data. This is done in serial dilution assays; see here, for example.

P.S. I answer all of Gregor’s questions because they are interesting. Also, he gave us literally zillions of comments on our forthcoming book.