Some thoughts on the sociology of statistics

One thing that bugs me is that there seems to be so little model checking done in statistics. As I wrote in this referee report,

I’d like to see some graphs of the raw data, along with replicated datasets from the model. The paper admirably connects the underlying problem to the statistical model; however, the Bayesian approach requires a lot of modeling assumptions, and I’d be a lot more convinced if I could (a) see some of the data and (b) see that the fitted model would produce simulations that look somewhat like the actual data. Otherwise we’re taking it all on faith.

But, why, if this is such a good idea, do people not do it? I don’t buy the cynical answer that people don’t want to falsify their own models. My preferred explanation might be called sociological and goes as follows: We’re often told to check model fit. But suppose we fit a model, write a paper, and check the model fit with a graph. If the fit is ok, then why bother with the graph: the model is OK, right? If the fit shows problems (which, realistically, it should, if you think hard enough about how to make your model-checking graph), then you better not include the graph in the paper, or the reviewers will reject, saying that you should fix your model. And once you’ve fit the better model, no need for the graph.

The result is: (a) a bloodless view of statistics in which only the good models appear, leaving readers in the dark about all the steps needed to get there; or, worse, (b) statisticians (and, in general, researchers) not checking the fit of their model in the first place, so that neither the original researchers nor the readers of the journal learn about the problems with the model.

One more thing . . .

You might say that there’s no reason to bother with model checking since all models are false anyway. I do believe that all models are false, but for me the purpose of model checking is not to accept or reject a model, but to reveal aspects of the data that are not captured by the fitted model. (See chapter 6 of Bayesian Data Analysis for some examples.)

11 thoughts on “Some thoughts on the sociology of statistics

  1. This is great advice in principle.

    But reviewers in general punish honesty and reward rhetoric. In my own area (psycholinguistics), many reviewers (many of them senior researchers who've been around for decades) do not even understand basic things in hypothesis testing. For example, an editor-in-chief of a major journal recently rejected a paper of mine on the grounds that the p values were not small enough below the upper bound of 0.05 (in his words, "the effects were not strong enough"). If you ask a random researcher what a confidence interval is, 95% will give you the wrong answer; many don't even know what a p value tells you (many think it's the probability of the null being false). I could go on and on.

    In the face of all this, it's pretty pointless to talk about the very relevant issues you raise. Poor fits can be very informative in a sense, but reviewers will see "poor fit" and say: reject.

    A further issue is that reviewers often say things like this when confronted with a statistical procedure they were not taught in graduate school: "The statistical analyses presented are non-standard. I would suggest that the authors use the well-understood methods already in use." In other words, "new" approaches are discouraged because the reviewers (a) didn't learn them in grad school (b) don't want to know about them for lack of time (c) in principle don't accept anything that does not tread the beaten path.

    This sort of inertia usually changes when a senior researcher (and it has to be a senior researcher) writes an article advocating change, and then the new approach becomes the norm, repeating the cycle described above.

    I suspect my area is not an isolated case of this holding pattern, because I keep seeing articles in Ecology and Economics that bring up similar suggestions for change.

  2. Correction to my above comment: I meant of course that many people seem to think that the p value tells you the probability of the null hypothesis being true.

  3. Shravan,

    In political science, I've found almost no resistance to new statistical ideas. Perhaps this can be explained by there being no established group of "polimetricians" (or, whatever is the equivalent to psychometricians) who would view themselves as the enforcer of statistical truth. I imagine things are getting better in psychology, since this is one of the earliest fields to use multilevel modeling.

    Regarding the question of what to do in writing papers in psychology: I'd suggest including all the classical tests (even including that multiple comparisons stuff that I hate) but also including some displays of observed and replicated data. Try framing this not as "testing" but rather as "model-assisted data exploration." (See my 2004 paper for more of the theory on this.)

  4. I think one thing that doesn't help us as Bayesians is that the standard model checking toolbox isn't far enough developed. There's a lot to be said for things like normal probability plots, and simple residual plots, but we have to do some work to get them out of the analysis.

    Or write one line of BUGS code. :-)

    Bob

  5. Regarding resistance to new ideas, I think there is certainly quite a bit of this, but I think it also interacts with three other aspects of journal writing and publishing.

    1. Length. Journals (many? most? nearly all?) have word limits. Authors needing to cut something will often be more willing to cut explanation of new statistical ideas than the substantive parts of the article.

    2. Clarity. Explaining new methods is HARD! Some people do it very well. Andrew is one example of such a person. If we can somehow explain our techniques better, perhaps (PERHAPS) they will be more accepted.

    3. Review. I do statistical reviews for PLoS medicine. How many journals have statistical reviewers for substantive articles? I honestly don't know. In the articles I've submitted I've never run into it, but that's a small sample. Is there data on this?

  6. Andrew,

    Suppose you create replicate data sets and see a discrepancy….then you go back to your model and make it more complex to accommodate your data set. When you do this, the supposed fit of your model will always improve, like adding more parameters and interactions in a regression… but what penalty, if any, is there for a bayesian to make their model more and more complex?

  7. Current grad student wrote:

    Suppose you create replicate data sets and see a discrepancy….then you go back to your model and make it more complex to accommodate your data set. When you do this, the supposed fit of your model will always improve, like adding more parameters and interactions in a regression… but what penalty, if any, is there for a bayesian to make their model more and more complex?

    This is a great question (if I understand it correctly–well, it may be a great question even if I don't). I am reading a book, Scientific Reasoning, The Bayesian Approach (Howson and Urbach). As is my wont, I'm reading the last chapter first. And there they argue against the AIC and BIC (which penalize more complex models in model comparison) and suggest that model plausibility is the main criterion of interest. This certainly squares with Andrew and Hill's admonitions in their book to not drop predictors that are not significant if they "make sense" theoretically, but this still leaves us in la-la land as far as formal model comparison goes. AIC and friends provide a nice way to characterize this relative complexity-better fit problem. I still don't know what Bayesians offer in its place (other than plausibility–which is often in the eye of the beholder).

  8. Vasishth, AIC corresponds to a choice of an uninformative but complexity-averse prior and then doing MAP (I know the derivation is different, but it really comes down to this).

    As for "making sense", just assign a higher prior probability to those configurations that are more plausible. If AIC doesn't yield good results, get a more informed prior.

  9. Er, what is MAP? I guess I don't really understand the AIC and its relatives, except superficially as used with lmer for model comparison.

    Nothing works like a simple illustrative example. For example, I'd like to see a concret example of how one gets "a more informed prior".

  10. Aleks,
    Are you suggesting that if your model complexity prior still does not result in adequate posterior predictive datasets, you should go back and adjust your prior to favor either more or less complexity?

  11. Going back and adjusting priors wouldn't really be the proper approach.

    MAP stands for "maximum a posteriori" – finding the configuration of parameters that has the highest posterior probability.

    As an example of a "more informed prior", assume a problem where you're predicting survival based on hair color and quantity of antibiotic used. Your Bayesian prior could reflect your belief that the importance of hair color is going to be lower than the importance of the antibiotic. However, plain AIC doesn't do that.

Comments are closed.