Using multilevel regression to predict individual responses?

Jacob Felson writes,

I have a question relevant to multilevel modeling about something written by Lawrence Iannaccone, an innovator in applying economic principles to the study of religion. Iannacconne argues that the use of regression to predict individual religiosity by group means of religiosity is fundamentally flawed. The criticism is generalizable to any variable and its group means. He writes,

[models that predict Y with Y_bar have a] potentially serious specification problem because the right-hand averages … depend upon the left-hand levels. The averages will be influenced by the same unobserved “error” terms that influence individual-Rs, thereby violating the OLS independence assumption, and thus biasing estimated coefficients, significance levels, goodness of fit, and so forth.

He cites Moulton (1990) and C. Manski (1993, 1995).

Then he makes the following claim which seems highly questionable to me.

The problem also persists in the face of larger data sets. As the sample approaches the entire population, the average [Y] values exactly equal the average of hte corresponding individual Ys, and estimation becomes impossible due to perfect collinearity.

I [Felson] don’t see how the sampling fraction would affect the determinacy between Y and Y-bar.

He goes on to write:

When average Y for an entire region is replaced by average R in the respondent’s neighborhood the statistical bias grows. The smaller the social circle, the more likely it is that any unobserved effect influencing average religiosity tends also to directly affect the individual’s religiosity. The error term beomes increasingly correlated with the average Y, and hence the regression results increasingly overstate the size and significance of the average Y social effect while possibly invalidating all other coefficient estiamtes as well. Something is seriously wrong when methods of inquiry become less valid as the data become more complete and detailed.

Many studies predict Y with Y-bar in some fashion. Y-bar is often constructed directly from Y, no? This seems to be a common practice in HLM analyses, where, for example, math achievement of students would be predicted by an estimate of school math achievement. Does HLM solve the problem to which he is refering, or is there no problem at all?

My reply: this is going to sound like a cop-out, but . . . I’m not quite sure what you’re trying to do. It sounds like you have survey data on individuals, including a religiosity measure which you can average to get a group-level (“contextual”) measure. But if that’s the case, why do you need to predict individual religiosity–you’ve already observed it. Also, I don’t understand the comment about an “effect influencing average religiosity” that can also “affect the individual’s religiosity.” How do you influence average religiosity without affecting individuals’ religiosity? I guess you can do that by having people move, but I don’t think that’s what he’s talking about.

On the more technical issues, I’d suggest reading Section 21.7 of our book (although I admit we don’t go into this particular issue very deeply).

Also this paper on multilevel modeling: what it can and can’t do.

1 thought on “Using multilevel regression to predict individual responses?

  1. It may be worth looking at the Manski cite in the text (I forget which one is which, but it's titled "The Reflection Problem"). It lays out in some detail, where problems occur using group averages as explanatory variables for outcomes of individuals in the group. It's been a while, so the details are hazy, but I recall he's mostly discussing this in the context of OLS — not sure what the analog issue in a multilevel context would look like.

Comments are closed.