Fitting Multilevel Models When Predictors and Group Effects Correlate

Andy and I (along with David Park, Boris Shor and others) have been working on various projects using multilevel models. We find these models are often optimal, particularly when dealing with small sample sizes in groups (individuals in states, students in schools, states in years, etc.). Many social scientists who come from an econometric background are skeptical of multilevel models because they model varying intercepts with error (often called random effects). With modeled varying intercepts, there’s the possibility that the predictors will correlate with the varying intercepts problematically. Andy and I wrote a paper discussing how this can be solved. Download file

Take a simple equation where some outcome is predicted by varying intercepts for groups and a covariate. Also assume that the covariate and the group effects correlate. A problem emerges because the correlation between the covariate and the varying intercepts falls into the error in the level 2, varying intercept regression equation. This error becomes part of the error in the level 1 model (as is evident when one substitutes the level 2 equation into a level 1 equation), and violates the Gauss-Markov assumption that predictors and errors cannot correlate. This violation will result in problematic estimates for the predictor in the level 1 equation. But there’s a straightforward solution to this problem. We can solve this problem of modeling with more modeling. One can take the correlating predictor, calculate its mean per group and include it in the level 2 modeled varying intercepts equation.

This new predictor will capture the problematic correlation before it falls into the level 2 error. It will also offer a substantive result which may be useful in the research. This method can be applied in any software that allows for model varying parameters (BUGS, R, STATA). I say varying parameters because all of the above can generalize to modeled varying coefficients.

Simulations can illustrate the problem and the solution further. First, we generate a random normal predictor of length 100 with a mean of 0 and a standard deviation of 2. Then, we generate an outcome (often called a dependent variable) that is equal to the predictor plus random normal noise with a mean of 0 and a standard deviation of 7. This ensures a strong, but not perfect, correlation between the two variables. Units effects are added to the outcome by adding a random normal component with a nonzero mean to each quarter of the data. So, for example, a set of random normal values with a mean of 1 and a tight standard deviation of .001 are added to the first 25 observations in the outcome. The means for the next three quarters are -1,-3 and 2. The standard deviations remain tight at .001.

We start by predicting the outcome by the varying unit effects and the explanatory variable free of unit effects. These results should not be problematic since the units and the predictor do not correlate. Next, we estimate the same equation but with a predictor that, like the outcome, varies across the units.

To see if the solution highlighted above works as promised, we run a third simulation where the correlation exists but the mean of the predictor per unit is included as a group-level predictor.

We estimate each equation 1000 times and record the coefficient and standard error of the key predictor. We plot a histogram of the t statistics (the coefficient divided by the standard error) calculated in each of the 1000 simulations.

forblog.png

The figure shows that the t statistic of beta for the model where the predictor does not correlate with the units tends to be smaller than the model where the correlation exists. The larger t statistic in the second plot results in an inflated sense of statistical significance in parameter analysis and a greater tendency to falsely reject the null hypothesis in research works, as mentioned earlier. This is the problem that researchers are cautioned against when estimating modeled varying intercepts. The problem is thought to be so severe, that this model is often cast
aside as inviable.

The third plot shows the same model but with the added level 2 predictor (the correlating predictor measured at its mean per group effect). The t statistic for the key predictor looks virtually identical as in the model with no correlation between the group effects and the predictor. The additional group-level covariate successfully accounted for the correlation before it fell into the group-level error term and
caused a problem.