A Very Delayed Lightbulb Over my Head

Daniel Scharfstein (http://commprojects.jhsph.edu/faculty/bio.cfm?F=Daniel&L=Scharfstein) recently gave a very good talk at the Columbia Biostatistics Department. He presented an application of causal inference using principal stratification. The example was similar to something I’ve heard Don Rubin and others speak about before, but I realized I’d been missing something important about this particular example.

The example in Dr. Scharfstein’s talk was estimating the effect of vision loss on depression in older adults. Briefly, the vision was tested early in the study and patients were followed up for a few years. At the end of the study, patients were screened for depression, with the goal of determining whether the patients with vision loss were more depressed than patients without vision loss. The complication (one of them, anyway) is that many patients died before the end of the study. If vision loss affects death at all (and it seems suspect to assume that it doesn’t), then restricting the analysis to those patients who were alive at the end of study can bias results and won’t produce estimates of causal effects that are valid in the Rubin sense. Alternatively, treating depression outcomes for patients who died as missing data isn’t quite right because these outcomes are fundamentally unobservable (a priori counterfactuals). The goal of the analysis then is to estimate the effect of vision loss on depression for that latent subclas (principal strata) of patients for whom depression could be observed both with and without vision loss: the patients who would live whether or not they had vision loss.

As I said, I’ve seen presentations on this type of analysis before, and heard it said that this setting is very different from the standard economic instrumental variables setting. It wasn’t that I didn’t believe it was different, I just never gave a lot of thought to exactly how it was different. It wasn’t until I asked a dumb question in Dr. Scharfstein’s talk that I realized exactly what the difference is. Instrumental variables models rely on the exclusion restriction, the assumption that if the treatment (vision loss, in this case) doesn’t affect the “intermediate outcome” (death) it doesn’t affect the primary outcome (depression). Making that assumption here would defeat the whole purpose of the analysis since the only subgroup of patients we’re interested in is one for whom the treatment does not affect the intermediate outcome. So making the exclusion restriction would lead to the very unhelpful result of forcing our quantity of interest to be zero.

But I digress. Enter the belated lightbulb. Without the exclusion restriction, additional assumptions and covariates are usually necessary in order to obtain precise estimates of causal effects. One nice thing about the potential outcomes/principal stratification framework is that these assumptions can be made very clear and usually have scientific interpretation, so appropriate assumptions can be made based on subject-matter knowledge.

For the abstract of Professor Scharfstein’s talk, go to
http://cpmcnet.columbia.edu/dept/sph/biostat/seminar/spring2005.html, scroll down to January 27, and click on Abstract. See Zhang and Rubin (2003, Journal of Educational and Behavioral Statistics) for another example of this type of model.