Hypothesis testing with multiple imputations

Posted on July 13, 2011 9:36 AM by Andrew

Vincent Yip writes:

I have read your paper [with Kobi Abayomi and Marc Levy] regarding multiple imputation application.

In order to diagnostic my imputed data, I used Kolmogorov-Smirnov (K-S) tests to compare the distribution differences between the imputed and observed values of a single attribute as mentioned in your paper. My question is:

For example I have this attribute X with the following data: (NA = missing)

Original dataset: 1, NA, 3, 4, 1, 5, NA

Imputed dataset: 1, 2 , 3, 4, 1, 5, 6

a) in order to run the KS test, will I treat the observed data as 1, 3, 4,1, 5?

b) and for the observed data, will I treat 1, 2 , 3, 4, 1, 5, 6 as the imputed dataset for the K-S test? or just 2 ,6?

c) if I used m=5, I will have 5 set of imputed data sets. How would I apply K-S test to 5 of them and compare to the single observed distribution? Do I combine the 5 imputed data set into one by averaging each imputed values so I get one single imputed data and compare with the observed data? OR will I run KS test to all 5 and averaging the KS test result (i.e. averaging the p-values)?

My reply:

I have to admit I have not thought about this in detail. I suppose it would make sense to compare the observed data (1,3,4,1,5) to the imputed (2,6). I would do the test separately for each imputation. I also haven’t thought about what to do with the p-values. My intuition would be to average them but this again is not something I’ve thought much about. Also if the test does reject, this implies a difference between observed and imputed values. It does not show that the imputations are wrong, merely that under the model the data are not missing completely at random.

I’m sure there’s a literature on combining hypothesis tests with multiple imputation. Usually I’m not particularly interested in testing–we just threw that Kolmogorov-Smirnov idea into our paper without thinking too hard about what we would do with it.

6 thoughts on “Hypothesis testing with multiple imputations”

Aki Vehtari on July 13, 2011 6:13 AM at 6:13 am said:

In a related problem of convergence diagnostics for several MCMC chains, Robert & Casella (2004) propose that maximum of all the KS-statistic values is compared to simulated distribution of the maximum of all KS-statistic values obtained using independent random random numbers (e.g. from Gaussian).

Robert, C. P, and Casella, G. (2004) Monte Carlo Statistical Methods. Springer. p. 468-470.
Maarten Buis on July 13, 2011 8:15 AM at 8:15 am said:

The null hypothesis is not true when the data is MAR but not MCAR. However, multiple imputation is most useful in exactly that scenario. So I don't think that this test tests a meaningful hypothesis.
Xi'an on July 13, 2011 8:47 AM at 8:47 am said:

Thanks for the pointer, Aki! I had completely forgotten about this idea.
Andrew Gelman on July 13, 2011 11:19 AM at 11:19 am said:

Maarten:

Indeed, as we say in our paper (and I note in the blog above), rejection of the test does not show that the imputations are wrong, merely that under the model the data are not missing completely at random. Rejection implies a difference between observed and imputed values, which can be used as a flag for looking more carefully at the variable in question. A difference between observed and imputed data can indeed be meaningful and important, even if does not show that the imputation model is wrong.
Jim on July 14, 2011 9:41 PM at 9:41 pm said:

Just curious what would be the next step if the KS tests, H0, is rejected? We could say the data is not MCAR but MAR. Does this means that we should accept the imputed result?

Also, how accurate is the KS tests on evaluating the imputed values? Is there any other evaluating methods that are more accurate?
Andrew Gelman on July 15, 2011 4:08 AM at 4:08 am said:

Jim:

As Kobi and I write in our paper, we see the KS test as a tool to use when there are large numbers of variables. Seeing a difference between the dist of observed and imputed, we learn that the imputations have some effect on this variable. Not necessarily something wrong at all, but a sign to look carefully at the variable and make sure you understand what's going on.

In general I'm not a big fan of hyp tests; the point here is not to test a hypothesis so much as to screen variables in a setting where the user might not be inclined to look at each variable in detail.

Comments are closed.