Anything worth doing is worth doing repeatedly, or The fundamental connection between frequentist statistics and hierarchical models

Hierarchical modeling is gradually being recognized as central to Bayesian statistics. Why? Well, one way of looking at it is that any given statistical model or estimation procedure will be applied to a series of problems–not just once–and any such set of problems corresponds to some distribution of parameter values which can themselves be modeled, conditional on any other available information. (This is the “meta-analysis” paradigm.)

Hierarchical modeling

Anything worth doing is worth doing repeatedly–hence all estimation problems can be formulated as hierarchical models. To use some notation, the repeated problems are j=1,…,J, the unknown parameters are theta_j, and the datasets are y_j. In old-fashioned Bayesian statistics, j=1 and there is a lot of talk about principles for assigning the prior distribution, p(theta). In modern, hierarchical Bayesian statistics, the theta_j’s come from a distribution, p(theta), whose hyperparameters are estimated from data. (If appropriate, there are group-level input variables x_j, so that the group-level model is p(theta_j|x_j).)

Frequentist statistics

The so-called “frequentist” approach to statistics is based on evaluating the long-run frequency properties of statistical estimates–that is, considering what would happen if a particular estimation procedure were applied repeatedly to different data sets. Well, that’s what hierarchical modeling is all about.

As Don Rubin pointed out in 1984, it makes sense for applied statisticians to be interested in frequency properties, since we do apply our methods repeatedly. Hierarchical modeling is the framework under which we can understand these frequency evaluations.

Multiple comparisons

For example, consider a problem of multiple comparisons, in which we are repeatedly examining differences, theta_j – theta_k, and comparing these differences to zero. Assuming we have a lot of these comparisons, we can easily estimate them through a hierarchical model. Comparisons obtained using the resulting shrinkage estimates have desirable frequency properties. (See here for more on this.)

Take-home point

It makes sense for Bayesians to think about frequency properties in the context of hiearchical models. It is well known that one can typically interpret a classical statistical estimate as a summary of a posterior distribution under a particular choice of model. Similarly, one can typically interpret a frequentist statement as a summary of posterior inferences in a particular hierarchical setting.

2 thoughts on “Anything worth doing is worth doing repeatedly, or The fundamental connection between frequentist statistics and hierarchical models

  1. There was some research on meta-learning, "multi-task learning" and "learning to learn" in machine learning, that addressed similar problems as hierarchical modelling. I personally prefer hierarchical modelling for its cleanliness.

    But if anyone is interested in digging deeper, there is a book "Learning to Learn" that comprises the related research in machine learning. A good single paper to read is Baxter's, "A Bayesian/information theoretic model of learning to learn via multiple task sampling."

  2. Aleks,

    I took a look at the Baxter paper–it's pretty abstract, hard for me to follow without an applied example. But I guess that serves me right for not including any applied example in my posting…

Comments are closed.