Bayesian inference from the outside, vs. Bayesian inference as it is practiced

SIAM Review asked me to review Jeff Gill’s new book (Bayesian Methods: A Social and Behavioral Sciences Approach, second edition) but they said they’d like a general review essay that would be of interest to their readers, not a mere Siskel-and-Ebert on the book itself. Below is my first draft. Any comments would be appreciated.

Bayesian inference from the outside, vs. Bayesian inference as it is practiced

Andrew Gelman

25 Aug 2008

Bayesian statistics means different things to different people. To non-statisticians, Bayes is about assigning probabilities to scientific hypotheses. For example, one summary of moderately-informed opinion says:

“Bayesian inference uses aspects of the scientific method, which involves collecting evidence that is meant to be consistent or inconsistent with a given hypothesis. As evidence accumulates, the degree of belief in a hypothesis ought to change. . . . proponents of Bayesian inference say that it can be used to discriminate between conflicting hypotheses: hypotheses with very high support should be accepted as true and those with very low support should be rejected as false.” — Wikipedia article on “Bayesian inference”

It may come as a surprise to many readers that this is not how I view Bayesian inference at all!

In my work, I treat Bayesian methods as a souped-up least squares or maximum likelihood, a way to perform better inference within a model. For example, when modeling the rates of which New York City police stopped people of different ethnic groups (Gelman, Fagan, and Kiss, 2007), we did not attempt to compute the probability of the hypothesis that police stopped blacks, hispanics, and whites at the same rate; rather, we estimated these different probabilities and assessed how they varied for different types of crimes and in different parts of the city. In estimating social networks using survey data (Zheng, Salganik, and Gelman, 2006), we did not estimate our degree of belief in the hypothesis that people formed social ties at random; instead, we fit a model in which people varied in their social networks, and compared our fitted model to predictions from the simpler model. And so on. I have worked on hundreds of applied research projects, but I don’t know that I’ve ever accepted a hypothesis as “true” (as suggested is appropriate in the Wikipedia quote above). Conversely, I would not reject a model just because it is “false”! False models help us learn about the world; that’s what much of statistics is about (as in the famous quote of Box and Draper, 1987, that “all models are wrong, but some are useful”). Or else, I don’t know what all that stuff in classical statistics about maximum likelihood from the Poisson distribution, etc etc, is all for.

I’m not trying to argue that the Wikipedians’ interpretation is wrong, just that their view focuses on what seems to me to be a small part of what Bayesian statistics is about. It also represents a view of the philosophy of science with which I disagree, but this review is not the place for such a discussion. What is relevant here—and, again, which I suspect will be a surprise to many readers who are not practicing applied statisticians—is that what is in Bayesian statistics textbooks is much different from what outsiders think is important about Bayesian inference, or Bayesian data analysis.

This brings me to the second edition of Jeff Gill’s book, which does an excellent job at presenting my view of Bayesian inference—as a method for fitting models and estimating parameters—in the language of scientific hypotheses.

The thing that makes this book excellent for the social sciences is not so much its examples (although these are real social-science examples, which can be hard to find in statistics textbooks) but the tone, a sort of theoretically-minded empiricism that is hard for me to characterize exactly but strikes me as a style of writing, and of thinking, that will resonate with the social science readership.

Compared to a more mainstream Bayesian data analysis book such as Carlin and Louis (2000) or our own, Gill has more on the history (addressing questions such as why has Bayes suddenly seemed to become more popular) and a lot on hypothesis testing, which is a big issue in social science, where a standard research paradigm is that falsifiable research hypotheses are set up and then put to the test.

One great feature of this book is its use of examples where real prior information is used. Not just convenient noniformative priors, but real discussion of how prior information comes in to the analysis. As a related point, summaries such as that on page 64 are particularly useful in comparing Bayesian and classical approaches to statistics. This kind of thing is great for a class: if students disagree on these things, it can spark useful discussion.

The presentation of results is largely done in a standard social science manner; for example the table on page 121 presents posterior intervals to three decimal places ([6.510:11.840], etc.), and the table on page 126 presents variable names in all-caps (EXTENT, DIVERSE, etc.). This isn’t how I would do it, but it does place things closer to what is usually done in social science, which can be a virtue here.

Gill’s book also has a fairly theoretical treatment of computational issues, actually more theoretical than our book, which might seem surprising (I’d think that, if anything, social scientists would be less likely to want to see heavy Markov chain theory), but makes sense for a couple of reasons. First, Gill himself does research in statistical computation and can give the readers the benefit of his insights. Second, social scientists, not being mathematicians themselves, do want to see the rigorous mathematical foundations of their methods. It’s fine for me to just describe methods and sketch proofs in my book, because much of my audience is statisticians who will know where to follow the more detailed derivations if they need to, but Gill is connecting with the social science students who might not want see this anywhere else—and the good ones will want some rigor.

Finally, given that Gill does talk about history, I would’ve liked to have seen a bit more discussion of the applied Bayesian work in the “dark ages” between Laplace/Gauss in the early 1800s and the use of the Gibbs sampler and related algorithms in the late 1980s. In particular, Henderson et al. used these methods in animal breeding (and, for that matter, Fisher himself thought Bayesian methods were fine when they were used in actual multilevel settings where the “prior distribution” corresponded to an actual, observable distribution of entities (rather than a mere subjective statement of uncertainty)); Lindley and Smith; Dempster, Rubin, and their collaborators (who did sophisticated pre-Gibbs-sampler work, published in JASA and elsewhere, applying Bayesian methods to educational data); and I’m sure others. Also, in parallel, the theoretical work by Box, Tiao, Stein, Efron, Morris, and others on shrinkage estimation and robustness. These statisticians and scientists worked their butt off getting applied Bayesian methods to work before the new computational methods were around and, in doing so, motivated the development of said methods and actually developed some of these methods themselves. Writing that these methods, “while superior in theoretical foundation, led to mathematical forms that were intractable,” is a bit unfair. Intractable is as intractable does, and the methods of Box, Rubin, Morris, etc etc. worked. The Gibbs sampler etc. took the methods to the next level (more people could use the methods with less training, and the experts could fit more sophisticated methods), but Bayesian statistics was more than a theoretical construct back in 1987 or whenever.

In conclusion, Gill has written a thoughtful and thought-provoking book, focusing more on priors, motivation, model evaluation, and computation, and less on the nuts-and-bolts of constructing and fitting models. As such, it fits in very well with existing books that focus more on the models.

References

Box, G. E. P., and Draper, N. R. (1987). Empirical Model-Building and Response Surfaces. New York: Wiley.

Carlin, B. P., and Louis, T. A. (2000). Bayes and Empirical Bayes Methods for Data Analysis. London: Chapman and Hall.

Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2003). Bayesian Data Analysis. London: Chapman and Hall.

Gelman, A., Fagan, J., and Kiss, A. (2007). An analysis of the NYPD’s stop-and-frisk policy in the context of claims of racial bias. Journal of the American Statistical Association.

Zheng, T., Salganik, M., and Gelman, A. (2006). “How many people do you know in prison?”: Using overdispersion in count data to estimate social structure in networks. Journal of the American Statistical Association 101, 409-423.

4 thoughts on “Bayesian inference from the outside, vs. Bayesian inference as it is practiced

  1. It reads well to me (but I'm not an expert on these methods).

    This bit: "but Gill is connecting with the social science students who might not want see this anywhere else—and the good ones will want some rigor." makes no sense to me!

    I think I'd like more detail in a couple of places. For example:

    "One great feature of this book is its use of examples where real prior information is used. Not just convenient noniformative priors, but real discussion of how prior information comes in to the analysis."

    An example of use of real priors would be helpful to me.

  2. I think more rigor just make it harder for a typical social science audience – and applied researches in general – to truly understand and use these methods effectively. On the contrary, the applied types who needs/wants more rigor can certainly find it elsewhere. Yet, I think the VERY BEST type of statistical book for an applied audience is a book like your own on multi-level modeling: This cannot be found elsewhere, for sure.

  3. I just need statistics advise. Specifically on structural equation modelling with AMOS. I have a sample size of 260 and about 20 variables. My individual basic 2 variable regressions and ANOVAs are all significant and in the correct direction. For some reason though I added the data from a pilot sample that had a lot stronger relationships for all basic analyses. Once this data was added, the SEM model became weaker. My knowledge of SEM is limited and I can not understand why the SEM model can become weaker when each individual variable correllates more strongly with the others in the model. I think it may be because one of the latent variable is negatively related to all of the other latent variables…

Comments are closed.