Number of parameters in a model and DIC

Posted on July 6, 2006 12:12 AM by Andrew

OK, this one is for hard-core Bayesians only . . . it’s some info from Brad Carlin, Nicky Best, and Angelika van der Linde on the deviance information criterion (DIC):

Brad Carlin wrote to me

. . . to clarify why we [David Spiegelhalter and Brad Carlin] don’t like the measure of effective sample size . . . half the sample variance of the deviance scores. This same formula appears on the top of p.182 of Gelman et al. (2004), so we wanted you to have a chance to look at this too.

Basically we show that this formula does not deliver the trace of the hat matrix even in a simple Gaussian model, and in fact will tend to be much higher than this. Thus any DIC-like statistic that uses this complexity penalty will be overpenalizing the more complicated models in many hierarchical settings.

I replied,

We give both formulas in our book, and if you notice carefully, we give preference to the formula from your paper. (We mention it first, and we give it an equation number.) I’ve been playing around with the 1/2 var formula for awhile (it came up in my thesis research in the late 80s) but i agree that it has problems, and I prefer the other one. The only reason I used 1/2 var in r2winbugs is that there’s no easy way to calculate your formula using the Bugs output. (The difficulty is that your formula requires evaluating D at theta-hat, and there’s no way to evaluate D as a function, if I’m calling Bugs as a “black box.”)

So, in your paper, no need to say that we “recommend” the use of half the variance. Better to say that we “offer it as an alternative” which can be easier to compute from black-box simulation output from Bugs.

But, given David and Brad’s recent research, maybe we should just completly remove the 1/2 var formula from our book. Especially since Nicky Best let us know that we actually should be able to compute the standard DIC formula from Bugs.

Nicky writes:

After consulting with a couple of my post-docs who are regular R2WinBUGS users, I [Nicky] have discovered that it *IS* possible to easily obtain the value of pD that we propose in the DIC paper using R2WinBUGS. If you set the following argument in the ‘bugs()’ function:

clearWD = FALSE

this keeps a copy of the winbugs log file (saved as “log.txt” in the working directory).

The R2WinBUGS function ‘bugs.log’ can then be used to extract the DIC table (including pD) reported by WinBUGS:

bugs.log(“log.txt”)$DIC

which gives the following output for the schools example in the R2WinBUGS help on the bugs function (note – you have to request more than 1000 iterations for this example – requesting 1000 with R2WinBUGS defaults means that the resulting bugs script does 500 burn-in, then tries to set DIC. But, because slice sampler is being used to update sigma.theta, this has a 500 iteration adaptive phase when you can’t set DIC. Due to a small bug, you actually have to do *501* iterations before winbugs will allow you to set the dic monitor (Dave L – we should probably add this bug to our wish list of things to fix in a patch to winbugs 1.4 at some point). Anyway, if you request more than 1000 iterations in the bugs() function, then winbugs will monitor and report DIC and pD, and this can then be extracted from the log file in R using bugs.log():

Dbar Dhat pD DIC
y 60.444 57.665 2.779 63.222
total 60.444 57.665 2.779 63.222

Nicky continues:

Andrew, Sibylle – my experience of talking to various people about DIC, and of teaching bugs courses etc., is that the different versions of pD reported by WinBUGS and R2WinBUGS are a source of confusion to a lot of users. I fully appreciate that you are not “recommending” the use of half the variance instead of our pD, but the very fact that this is what R2WinBUGS gives by default means that this is what a lot of people end up using (in much the same way that many people have ended up using Gamma(0.001, 0.001) priors for precisions just because that is what was used in the winbugs examples). As Brad and David point out in their brief note in response to Raftery, half the variance can be a very poor estimate in many situations. My own (quite limited) experience of using half the variance also supports this view, particularly in hierarchical models. At a recent conference in Spain, I saw a poster where someone had carried out a simulation study on spatial random effects models, comparing DIC calculated using Dbar-Dhat, with DIC calculated using 1/2 variance, and found that Dbar-Dhat did much better than 1/2 variance. I accept that there are also problems with using Dbar – Dhat to estimate the effective number of parameters (particularly with the way WinBUGS 1.4 currently chooses the plug-in estimates to use for Dhat) but this still seems to be more reliable that 1/2 variance.

So, my request is whether you would consider changing the next version of R2WinBUGS to report the winbugs DIC and pD as the default, rather than using the 1/2 variance version?

Sibylle has altered R2WinBUGS to use the Dbar-Dhat estimate (extracting it from the Bugs output). We are working out some details and it should be in the next release of R2WinBUGs.

Finally, Angelika van der Linde pointed us to this paper on DIC, that appeared in Statistica Neerlandica. She writes:

I [Angelika van der Linde] am astonished to learn that even given Brad’s and David’s discussion of Raftery’s paper a “complexity term” called 1/2 variance survives in Bayesian ? statistical software. I haven’t seen any justification of this term yet, and I think the use of unjustified procedures should not be encouraged “technically”.

Already that was a point in the discussion of the DIC paper in the RSS and it turned out to be correct in as much as pD itself is a valid estimate of complexity only if the sampling distribution belongs to an exponential family.
Anyone can build a criterion for model choice combining some term of “fit” and a “felt term of model complexity”. But that is naive, not science and we (our community) should not spend time on simulation studies of such criteria. We should spend our time on finding good approximations for predictive criteria (for example for mixtures, still unsolved). The choice of the predictive criterion remains somewhat subjective like the choice of a loss function, but its approximation / estimation requires sound mathematics. I keep trying to tell people that the mathematical core of the decomposition of a predictive criterion into a term of fit and a term of complexity is the decomposition of marginal entropy into condional entropy (model adequacy, not fit in terms of the DIC paper) and mutual info (representing model complexity). Thus any suggested term of model complexity is to be justified as an estimate of (the version of) mutual info occurring in a particular predictive criterion to be made explicit. Did anyone see such a derivation of 1/2 variance ??

Beyond, other than predictive criteria for model assessment may of course be used. Tom Leonard and John Hsu systematically explored the use of the posterior distribution of the log likelihood to this aim in a recent paper (submitted). But that’s another story.

2 thoughts on “Number of parameters in a model and DIC”

Gregor on July 6, 2006 1:07 AM at 1:07 am said:

The reason for bugs.log() function was exactly this issue i.e. to get pD from BUGS, but I did not had any time to look into fusing it with output by bugs().
Bob O'H on July 7, 2006 8:29 AM at 8:29 am said:

Martyn Plummer presented an interesting idea for estimating p_D at IceBUGS over the winter, but unfortunately I didn't get the slides, so I can't remember the details! As I recall, it was fairly easy to calculate from two chains.

Bob

Comments are closed.