When to worry about heavy-tailed error distributions

Hal Daume writes,

I hope you don’t mind an unsolicited question about your book. I’m working with someone in our chemistry department right now on a regression problem, and he’s a bit worried about the “normality of errors” assumption. You state (p.46):

The regression assumption that is generally the least important is that the errors are normally distributed… Thus, in contrast to many regression textbooks, we do not recommend diagnostics of the normality of regression residuals.

Can you elaborate on this? In particular, what if the true error distribution is heavy-tailed? Could this not cause (significant?) problems for the regression? Do you have any references that support this claim?

My response: It depends what you want to do with the model. If you’re making predictions, the error model is certainly important. If you’re estimating regression coefficients, the error distribution is less important since it averages out in the least-squares estimate. The larger point is that nonadditivity and nonlinearity are big deals because they change how we think about the model, whereas the error term is usually not so important. At least that’s the way it’s gone in the examples I’ve worked on.

To get slightly more specific, when modeling elections I’ll occasionally see large outliers, perhaps explained by scandals, midterm redistrictings, or other events not included in my model. I recognize that my intervals for point predictions from the normal regression model will be a little too narrow, but this hasn’t been a big concern for me. I’d have to know more about your chemistry example to see how I’d think about the error distribution there.

4 thoughts on “When to worry about heavy-tailed error distributions

  1. Andrew wrote:

    "If you're estimating regression coefficients, the error distribution is less important since it averages out in the least-squares estimate."

    By "estimating regression coefficients" do you mean checking if they are significantly different from 0? (Sorry if this is a naive question, I am essentially self-taught and so I don't always know the standard vocabulary).

    If so, I really don't know what to make of this advice. By way of example, I sent Andrew (coincidentally, before I saw this post) some data in which I get different conclusions from the data based on whether or not I transform the dependent measure so that the errors are normal. It really can change the basic results when the errors are not normal. I was never too deeply moved by the "trimming" heuristics people use (3 imes SE or whatever), this does reduce the skewness but is completely arbitrary.

    Interesting side note: journal reviewers get really upset when I transform my dependent variable (usually reading times in eyetracking during reading sentences) to get the errors to be normal. They want the "pure" dependent variable, because "the normality of errors assumption is not so important since ANOVAs are robust to violations of normality" (a statement that many books assert without any proof).

    So now I've taken to checking whether I get matching results in log transformed versus raw reading times, and report only the raw reading times. But then I hit this problem that my raw and log reading times do not give identical results in this one case I sent Andrew. This is the precise problem the original post (that I'm commenting on) addresses.

  2. But there's a difference between non-normal symmetric errors (which is what the original post question seemed to ask about) and non-normal asymmetric errors. It's true that the mean difference will show up in the constant, but whether errors are multiplicative or additive (the difference between a log-transformed model and the raw one) is really a different way of thinking about the problem.. Just a note… if you didn't log-transform the data before doing something on say, stock market returns, you'd probably never manage to publish in finance. My advice, vasishth, is, if you can, explain how the errors are multiplicative and point out that then the log-transformed model is the onbly reasonable alternative.

  3. Jonathan, that is very helpful. So, is Andrew talking about non-normal symmetric errors?

    Also, about your comment that I can explain how the errors are multiplicative, you are saying that I have to explain that the relationship between the dependent variable (reading time) and the predictor X (some manipulation in the experiment) is something like (sorry for the LaTeX formatting)

    (1) Y_i = eta_0 X_{i}^{eta_1} imes epsilon_i

    a log transform would then give the right additive assumption:

    (2) log(Y_i)= log(eta_0) + eta_1 (X_i) + log(epsilon_i)

    But how would one show that the errors are multiplicative? Perhaps this is a stupid question that I can figure out the answer to in a less tired state, but nothing springs to mind.

    Or is it simply that if the dependent variable is a skewed distribution, this leads us to suspect that the errors are multiplicative? Is there a book out there that discusses this?

  4. First off, I should have said that symmetric errors are not required for OLS estimates to be unbiased. That said, in real-world samples, I think it matters a fair bit. As to showing that errors are multiplicative, there are several ways one might go about it, including nesting specifications (nonlinearly) so that the logaritmic and linear are chosen, but I frankly think its more an exercise in rhetoric than anything else. How you think the independent variables ought to affect the dependent variables is an exercise in persuasion. A Box-Cox regression model or its variants http://ideas.repec.org/a/ier/iecrev/v33y1992i4p93… may well be persuasive to some, but thinking about how the process ought to work, combined with persuasive evidence that your model, however parameterized, fits the data pretty well, does the trick for me.

Comments are closed.