Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box

Our article (by Yu-Sung, Jennifer, Masanao, and myself, and based also on work with Kobi, Grazia, and Peter Messeri) will be appearing in the Journal of Statistical Software, in a special issue on missing-data imputation. Here’s the abstract:

Our mi package in R has several features that allow the user to get inside the imputation process and evaluate the reasonableness of the resulting models and imputations. These features include: flexible choice of predictors, models, and transformations for chained imputation models; binned residual plots for checking the fit of the conditional distributions used for imputation; and plots for comparing the distributions of observed and imputed data in one and two dimensions. In addition, we use Bayesian models and weakly informative prior distributions to construct more stable estimates of imputation models. Our goal is to have a demonstration package that (a) avoids many of the practical problems that arise with existing multivariate imputation programs, and (b) demonstrates state-of-the-art diagnostics that can be applied more generally and can be incorporated into the software of others.

We’ve made lots of improvements since listing the package last year (here). There’s still a lot more work to do, in many different directions (including multilevel models, nonignorable models, the self-cleaning oven, and making the program run faster in sorts of ways), and we keep improving it. But it’s good to have something out there.

To actually get the R package, just open your R window, click on Packages, Install packages, and grab mi.

4 thoughts on “Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box

  1. I am trying to use mi with some likert data (41 variables, n=167), and I'm finding myself stumped. Trying to apply the principles of the examples in your online paper to my own dataset, I generate errors (see below). Are there help files available, or a wiki or somesuch?

    Thanks.

    My Error, FWIW:
    Beginning Multiple Imputation ( Sat Sep 26 09:17:18 2009 ):
    Iteration 1
    Imputation 1 : SI1*
    Error while imputing variable: SI1 , model: mi.polr
    Error in parse(text = x) :
    unexpected numeric constant in "ordered(SI1) ~ SI2 + SI3 + SI4 + SI5 + SI6 + SI7 + SI8 + SI9 + ordered(SI1)0"

    The command that generated it:
    imp

  2. I am having the same problem. I really like the concept of the mi package, but I have been having quite a difficult time getting it to run without errors. Given the size of my data set (233 variables, 1726 observations), this has made for a very long and, so far, fruitless process (as you can see from the R output below). If we can get some people working together on ironing out the problems in the package, I think it would be tremendously helpful. My sense is that most people right now have never used (a conclusion based on the dearth of responses I get when posting questions about mi on the R-help list).

    imp = mi(imp.data, info=info2, n.iter=6)
    Beginning Multiple Imputation ( Wed Jul 14 11:52:50 2010 ):
    Iteration 1
    Imputation 1 : min.func* Tenure* Salary* Housing* ord_stat* rural_now* a3* a7* a8* a9* a10* a12* a13* a14* a15* a16* a17a* a17b* a17c* a17d* a17e* a17f* a18* a21* b1a* b1b* b1c* b1d* b1e* b1f* b1g* b1h* b1i* b3a* b3b* b3c* b3d* b3e* b3f* b3g* b4a* b4b* b4c* b4d* b4e* b4f* b4g* b4h* b4i* b4j* b4k* b4l* b4m* b4n* b4o* b4p* b5a* b5b* b5c* b5d* b5e* b5f* b5g* b5h* b5i* b5j* b5k* b5l* b5m* b5n* b5o* b6a* b6b* b6c* b6d* b6e* b6f* b6g* b6h* b6i* b6j* b6k* b6l* b7a* b7b* b7c* b7d* b7e* b7f* b7g* b7h* b7i* b7j* b7k* b9b* b10a* b10b* b13a* b13b* b13c* b14* c3* c4* c5* c12a* c12b* c12c* c12d* c12e* c12f* c14a4* c19* d1*
    Error while imputing variable: d1 , model: mi.polr
    Error in parse(text = x) :
    unexpected numeric constant in "c5 + c12a + c12b + c12c + c12d + c12e + c12f + c13count + c14a4 + c19 + c20 + d2 + d3 + d6 + d7a + d7b + d7c + d7d + d7e + d7f + d7g + d7h + d7i + d7j + d7k + d8a + d8b + d8c + d8d + d8e + d8f"

Comments are closed.