Whassup with altering priors after you see the data??

Juned Siddique writes:

I have a question regarding a paragraph in your paper, “Prior distributions for variance parameters in hierarchical models.”

In the paper, you write, “We view any noninformative or weakly-informative prior distribution as inherently provisional–after the model has been fit, one should look at the posterior distribution and see if it makes sense. If the posterior distribution does not make sense, this implies that additional prior knowledge is available that has not been included in the model, and that contradicts the assumptions of the prior distribution that has been used. It is then appropriate to go back and alter the prior distribution to be more consistent with this external knowledge.”

I don’t quite understand this passage, especially the part where you write, “this implies that additional prior knowledge is available that has not been included in the model,” and was hoping to get more explanation.

My situation is that I am fitting a random-effects probit model and using posterior predictive checking to check the fit of the model. One way to get the model to fit the data well is to use an informative prior that I arrived at by iterating between posterior predictive checking and making my prior more informative. While changing one’s model to make it fit the data better is standard in statistics, it seems like I should be changing the likelihood, not the prior. One the other hand, my “model” is my posterior distribution which also includes the prior.

My reply:

1. You should certainly feel free to change the likelihood as well as the prior. Both can have problems.

2. With hierarchical models, the whole likelihood/prior distinction becomes less clear. In your example, you have a probability model for the data (independent observations, I assume), a nonlinear probit model with various predictors, a normal model (probably) for your group-level coefficients, and some sort of hyperprior for what remains

3. My point in the quoted passage is that in the phrase “the posterior distribution does not make sense,” the “does not make sense” part is implicitly (or explicitly) a comparison to some external knowledge or expectation that you have. “Does not make sense” compared to what? This external knowledge represents additional prior information.

6 thoughts on “Whassup with altering priors after you see the data??

  1. I've had a very, very, very traditional statistics (really, econometrics) education – I check in on your blog from time to time, but often get lost on posts like this.

    Do you have a good resource explaining the logic of Bayesian statistics and examples of how the empirical process differs between Bayesians and non-Bayesians. Not a book, but maybe an essay or a paper that is accessible but thorough?

    Thanks

  2. Daniel: I recommend chapter 1 of Bayesian Data Analysis. There are various explanations of Bayesian inference on the web, but to my taste they don't give the real flavor of applied Bayesian statistics. The examples in these web tutorials tend to be toy examples with discrete parameter spaces.

    On the specific issue of prior distributions, you could take a look at this article and this article.

    Jonathan: Yes, exactly. But in a good way.

  3. I think I'd prefer to spend a little more time constructing a decent prior in the first place. I guess it might be due to my chosen field (engineering) but generally I can come up with several reasons to put bounds on some coefficients or to see why they should be at least in some vicinity…

    For example, suppose we're talking about estimating travel time from one location to another by car for a website like Google Maps. You know it isn't likely to be less than the time it would take to travel the given distance at 100 MPH, and it's not likely to take longer than if your average speed were say 10 mph. Furthermore, you can guess that the travel time has a skewness to the right since there are more ways to be delayed than to go really fast (in a sense). based on this, you could perhaps construct a reasonable prior based on a shifted lognormal distribution… or a beta distribution mixed with a shifted exponential distribution.

    Creating such a prior and putting it into your simulation might be sort of difficult depending on your computational sophistication and your tools. So you might look for a simpler distribution that can approximate the right shape, like a mixture of two or three normal distributions… You can tell a story that goes something like this: if I don't hit traffic, I will go between 50 and 70 MPH (first normal), but if I do hit moderate traffic I might go 30 to 50 MPH (second normal), and if there's a major accident, I might go as slow as 10 MPH the whole way, but that only happens say 10% of the time (3rd normal).

    I think this kind of story is also constructable in areas like econometrics or political science, but it might take a little more thinking to figure out what the right story is.

    If you've got enough data, you can ignore all this since your likelihood will dominate, but if you have limited data, and your prior makes some difference, it's not impossible to come up with logical arguments that can get you an approximate prior that's a lot more informative than a normal distribution with a massive default variance.

  4. Daniel: What you say is completely reasonable to me.

    John: Thanks for pointing out. I fixed the links.

Comments are closed.