Analyzing the entire population rather than a sample

Lee Mobley writes:

I recently read what you posted on your blog How does statistical analysis differ when analyzing the entire population rather than a sample?

What you said in the blog accords with my training in econometrics. However I am concerned about a new wrinkle on this problem that derives from multilevel modeling.

We are analyzing multilevel models of the probability of using cancer screening for the entire Medicare population. I argue that every state has different systems in place (politics, cancer control efforts, culture, insurance regulations, etc) so that essentially a different probability generating mechanism is in place for each state. Thus I estimate 50 separate regressions for the populations in each state, and then note and map the variability in the effect estimates (slope parameters) for each covariate.

Reviewers argue that I should be using random slopes modeling, pooling all individuals in all states together. I am familiar with this approach and have your multilevel modeling text.

In multilevel modeling using random versus fixed effects models, when ‘population’ inference is desired, the random slopes model is indicated. When inference about the particular sample is desired, fixed effect models are indicated.

Here is my question: When the sample is the population, and one is interested in explaining outcomes, isn’t the separate-state-regressions approach essentially a fixed-effects approach (mimicking separate intercepts for each state and interactions of those with covariates to allow every parameter to vary across states?). Thus, the approach will allow for inference about the particular ‘sample’ (which happens to be the population we observe).

Is there anything inherently wrong with this approach?

My reply:

I’m happy to hear that the reviewers recommend mulitlevel modeling! My quick answer is that, even if you are only interested in these 50 slopes, the multilevel model will give more efficient estimates, especially for states with smaller sample sizes. Separately estimating for each state is fine, but it’s not the most statistically efficient procedure. To put it another way, mutilevel modeling allows you do estimate deeper interactions than would be feasible via separate regressions in the 50 states.

2 thoughts on “Analyzing the entire population rather than a sample

  1. Andrew, because "every state has different systems in place (politics, cancer control efforts, culture, insurance regulations, etc)," the effects of these variables may be different as well. One state with a preponderance of people who trust in God may eschew all palliative measures, another state with a lot of high-tech hospitals may have a culture of intervention, etc. In other words, the relationships among the many influences on the outcome (I hesitate to call all of them formal "variables") may be entirely different. Since the Medicare population in even the smaller states is pretty large, my own approach would be to drill down and see if one can determine these relationships. What's wrong with this approach?

Comments are closed.