Why I think a Bayesian formulation of exploratory data analysis and goodness-of-fit testing is so important

In a comment to this entry on Bayesian exploratory data analysis, Carolyn asks why I love the idea so much.

My response: People will often not use a statistical method until it has a theoretical justification. In particular, many Bayesians don’t seem to want to check their models. This is something that I noticed, and that disturbed me, at the 1991 Valencia meeting. Graphs of raw data, graphs of posterior inferences, graphs of convergence of Gibbs samplers–but no plots comparing data to simulated replicated data. Even though this seemed (to me) like such a natural idea, and it had been done (in a non-Bayesian context) by distinguished statisticians including Mosteller in 1954 and Ripley in 1987.

How could it be that Bayesians–the most model-bound statisticians of all–were doing less model checking than classical statisticians? Worse, I found many Bayesians to be actively hostile to model checking. I had an example with 20,000 parameters where the chi^2 statistic was 30,000, and Bayesians were telling me I wasn’t allowed to say this was a problem with the model.

My take on this was that these Bayesians needed a theoretical justification for model checking, a theoretical framework under which model checking is a natural part. We did a bit of this in our earlier paper (Gelman, Meng, and Stern, 1996). My 2003 paper (linked to in the above blog entry) put this in the context of other generalizations of Bayesian theory such as hierarchical modeling and missing-data modeling (see Section 2.1 of that paper) and put exploratory data analysis in the same theoretical framework. EDA and modeling are usually considered opposites, so I really liked the idea of putting them together.

I think I’ve succeeded quite a bit (although not completely) in the mathematics, but I still have a lot to go in the sociology, in that most Bayesians still don’t use the y.rep notation that I think is key to understanding the application and checking of models. I’m hoping that the next paper, based on ideas in Jouni’s thesis, will push things along.

2 thoughts on “Why I think a Bayesian formulation of exploratory data analysis and goodness-of-fit testing is so important

  1. Hi,
    Could you elaborate a bit about this "y.rep" notation ? Is it the R way to say : "sample from your fitted model and look if this synthetic data looks like the true data" ? Thanks.
    Pierre

  2. A quick comment drawing on CS Peirces' theory of there being three aspects of thinking about anything

    Breifly you MIGHT entertain a particular joint probability model for parameters and outcomes (or just outcomes in a frequency approach)

    Under that particular joint probability you MUST accept various implications such as independence or not of various parameters and when given observations – the exact posterior probabilities (implied by conditioning).

    Given the above two, SHOULD you accept or take the exercise seriously, should others?
    (Amongst other things, the comfort with the joint probability model will matter).

    Now, mathematics is mostly about MUSTs and it may be too much to expect anything too close to "rules" for the first (MIGHTs) and the thirds
    (SHOULDS).

    Mallow's had a nice paper on the statistics profession neglecting first which he called the Zeroth problem in his Fisher Memorial Lecture The Zeroth Problem.

    Keith

Comments are closed.