Log transformations and generalized linear models

| No Comments | No TrackBacks

Gregor writes,

I would like to hear your opinion on Paul Johnson comments here, where this link is provided.
In this note Paul states that:
The GLM really is diferent than OLS, even with a Normally distributed dependent variable, when the link function g is not the identity.

Using OLS with manually transformed data leads to horribly wrong parameter estimates. Let y_ii be the dependent variable with mean \mu. OLS estimates:

E(g(y_i)) = b_0 + b_1x_i

but the GLM estimates

g(E(y_i)) = b_0 + b_1x_i

This also applies to log transformation. So the following two approaches are not the same:

glm(log(y) ~ x, family = Gaussian(link = "identity"))

glm(y ~ x, family = Gaussian(link = "log"))

the difference is that first approach log transforms observed values, while the second one log transforms the expected value.

My reply: Yeah, that's right. Usually I'd just take the log of the data, because, for all-positive outcomes, it typically makes sense to consider effects and errors as multiplicative (that is, additive on the log scale). And on the log scale you won't get negative predictions. But another way to look at it is that the 2 models are very similar, with the key difference being the relation between the predicted value and the variance. In some problems, you won't want to pick either model; instead you can model the variance as a power law, with power estimated from the data. This is done in serial dilution assays; see here, for example.

P.S. I answer all of Gregor's questions because they are interesting. Also, he gave us literally zillions of comments on our forthcoming book.

No TrackBacks

TrackBack URL: http://www.stat.columbia.edu/~cook/MT/mt-tb.cgi/398

Leave a comment

November 2008

Sun Mon Tue Wed Thu Fri Sat
            1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30            

About this Entry

This page contains a single entry by Andrew Gelman published on April 10, 2006 12:20 AM.

Larry Bartels on income, voting, and the economy was the previous entry in this blog.

Modeling positive or negative correlations within groups is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.