They call me Dear Abby, or, This might at first seem like a pointless tautological exercise, but actually I think it can lead you forward

Daniel Corsi writes:

I am a PhD student in epidemiology at McMaster University and I am interested in exploring how characteristics of communities are related to child health in developing countries.

I have been using multilevel models to relate physical characteristics of communities such as the number of schools, health clinics, sanitation facilities etc to child height for age and weight for age using observational/survey data.

I have several questions with regards to the group (community-level) level predictors in these models.

1. My first question is about interpretation of the group-level coefficients. I have found some modest coefficients around the order of .13 (se .05) on several community-level variables (i.e. number of schools) predicting child height for age in standard deviation units. I know from your ARM book that we should be interpreting these coefficients cautiously especially in observational studies. My question is does this apply to the interpretation of all variables or just variables created by aggregating an individually-measured variable to the group level?

2. The second and related point is do you have any suggestions on combining several predictors together at the group level? I am wondering if it is more useful to look at the effects for several variables related to schools, health clinics, other services in separately or combine these variables in to some form of an index to include in one model. These variables are typically highly correlated and therefore it doesn’t seem to make sense to me to include several individual variables in the same model without combining into some form of a ‘total’ community facility index – but I haven’t found much in the literature about this point.

3. And the last question I have for you is – Is it even reasonable to be looking at group level influences on child health in this way? And would you suggest controlling for other individual-level predictors of child health for instance household socioeconomic status or mother education? As community-facilities are likely related to these intermediating variables which are stronger predictors of child health, any potential effect of the community environment could be masked by for instance the household SES. It is also likely that it is the high-SES areas that will have access to better facilities so I am having a difficulty with this issue.

I am not necessarily looking for causal effects, although it is helpful to think this way. What I am really interested in is what can be learned about community-level characteristics and their influence on child health parameters by using multilevel models, and is there a way to try and understand this that doesn’t require causal interpretation of the group-level coefficients?

Thank you for your help with this, I realize that it is a potentially a complex issue, but I haven’t found many references to this point, if you have any advice or references that would be very helpful.

My reply:

1. It’s always a good idea to be careful. When I’m stuck on causal interpretations, I go back to descriptive language, for example: Comparing two kids of the same race, birth order, socioeconomic status, etc., but one kid lives in neighborhood X (which is 1 sd above the mean on #schools but at the mean level on all other neighborhood-level characteristics) and the other kid lives in neighborhood Y (which is 1 sd below the mean on #schools but at the mean level on all other neighborhood-level characteristics). Based on the model, you’d expect the kid in neighborhood Y to differ by ** much from the kid in neighborhood X.

This might at first seem like a pointless tautological exercise, but actually I think it can lead you forward. First, it gets you thinking about what does it mean for a neighborhood to be 1 sd above or 1 sd below the mean on a given characteristic. Second, it pushes you to think about the individual neighborhoods, to give you a sense of what these statistical results are really saying. Third, it gets you thinking about correlations between the predictors, Does it really make sense to compare two neighborhoods that differ in #schools but are identical in all other ways, or would it be better to compare neighborhoods that, more realistically, differ in many dimensions?

2. When combining predictors, I’m a big fan of simple averages, as discussed in chapter 4 of ARM. The same reasoning goes for group-level predictors. You just want to think a bit about the scaling.

3. I don’t have any great answers here, but if you’re thinking causally–and you should be, I’m sure–it helps to visualize some potential interventions and think about how they’d trickle down through the predictors in your models. Also think of some hypothetical experiments or observational studies you’d ideally like to do, then see if you can do some modeling to do your best job to fill in the gaps needed to make the inferences you’re interested in.

P.S. Hey, maybe that should be my statistical motto: “This might at first seem like a pointless tautological exercise, but actually I think it can lead you forward”!

2 thoughts on “They call me Dear Abby, or, This might at first seem like a pointless tautological exercise, but actually I think it can lead you forward

  1. The issue (2.) of whether to combine a bunch of correlated indicators into one is a tricky one. It really depends on what you are interested in. Psychologists are adept at using Principal Components analysis & factor analysis to come up with such synthetic measures. If you have some concept (say depression) and a set of indicators X1..Xn that you think reflect it then PCA is quite handy especially if n is big. It is used a bit more in economics now as it branches into dealing with more non-economics stuff and we just want some indicator where we have no particular theory about it. The interpretation can be a bit tricky: say you have a composite measure of deprivation then you can say moving one std deviation of it changes Y by so much. But thats not as clean as saying "if there were 20% more single parents …then..".

  2. There is a large literature on combining many variables into just a few in macroeconomics. Sargent and Sims (1977) is a classic reference, but there has been much work published recently. I don't know if it is helpful for epidemiology, but here's a very brief summary.

    Dimension reduction is particularly important for monetary policy, given that there are hundreds of variables that can all be used to describe and forecast the state of the economy. Here is a paper coauthored by Bernanke:

    http://neumann.hec.ca/pages/jean.boivin/mypapers/

    Stock and Watson, Jean Boivin, and numerous others have papers on factor models. A book that covers Bayesian estimation of factor models:

    http://www.econ.washington.edu/user/cnelson/SSMAR

    Jushan Bai and Serena Ng (2002, Econometrica) discuss the choice of how many factors to include in your model.

Comments are closed.