Validation of posterior probabilities

John Monahan and Dennis Boos pointed out that one of the key ideas in this paper by Sam Cook, Don Rubin, and myself also arose in two papers of theirs. They write,

In reading your recent paper ‘Validation of Software for Bayesian Models Using Posterior Quantiles,’ we noted your use of the uniform distribution of posterior distribution function evaluated at the true parameter point. We would like to point out that we used that same tool in a similar fashion in our 1986 paper (p. 82-83) to evaluate the propriety of posteriors constructed from a bootstrapped likelihood. In our situation, we were computing posterior probabilities numerically, rather than using Monte Carlo methods. In looking at what functions could serve as a likelihood function for a valid posterior, we employed that same tool in our 1992 paper to show that certain approximate methods were invalid, and that any posterior constructed using the density of a statistic, not necessarily sufficient would be valid, that is, the posterior probability statements would be correct. This relates to your work in that your method would also be unable to detect a problem arising from any valid posterior, which may not have been the one intended. We have included the references to these papers below.

John Monahan
Dennis Boos

Dennis D. Boos and John F. Monahan (1986) ‘Bootstrap methods using prior information,’ Biometrika 73:77-83.

John F. Monahan and Dennis D. Boos (1992) ‘Proper likelihoods for Bayesian analysis,’ Biometrika 79:271-8.

There’s a lot of interesting stuff to chew on here, including the idea of using bootstrap ideas to get a robust likelihood, and issues of implementation of our debugging method.

P.S. I can’t resist plugging our own paper again by pointing out its pretty graphics, which I think are not gratuitous but are actually essential to the story (as they say in the movies).

2 thoughts on “Validation of posterior probabilities

  1. I'm going to present a poster at a conference next week in which one of the methods by which I evaluate my model is by withholding 10% of the data at random for cross-validation. I look at the amount of predictive probability mass below the true values of the data to check if the predictive distributions are properly calibrated in a frequency sense. They turn out to be conservative in the sense that they slightly overestimate the frequency of occurence of extreme values, leading to a deficiency in p-values close to 0 and 1 relative to the uniform distribution.

  2. In an attempt to explain Bayesian statistics via a demonstration to epidemiology students that involved as little mathematics as possible, I repeatedly generated p(y, theta), first drawing from p(theta) and then p(y|theta) and then just kept those (theta,y) that had a given y if discrete (or something very close to y if continuous). Now one can only get approximate posteriors for “toy” examples this way, but it is “direct”.

    Of course the students did a web search and came back with the claim – “no one else explains Bayesian statistics that way” – to which I had no reply :-(

    Now I can reference Cook et al (and one other paper that did a direct simulation of p(y, theta) to resolve in everybody’s eyes what the posterior actually was in a given in a special case where this was feasible), but I am surprised it was not done [more often] in introductory or expository papers on Bayesian statistics.

    I think when direct calculations are not feasible for real problems they get “forgotten” about and perhaps over looked when they might be very useful.
    (Sorry if I am incorrectly assuming this was as big a step as Theorem 1 in Cook et al)

Comments are closed.