Some practical questions about prior distributions

Bob writes:

I’ve been meaning to follow up two comments you made in passing about priors:

1. You said you didn’t like Dirichlet priors for multinomials because they didn’t model covariance. What alternative do you suggest?

2. When I told you I was using the prior from the hierarchical binomial survival example from [page 128 of] your BDA book, you said you didn’t like that prior any more. Why and what would you suggest as an alternative?

The book model reparameterized Beta(a,b) in terms of mean a/(a+b) which got a uniform prior, and scale a+b with a Pareto(1.5) prior [p(a+b) proportional to (a+b)**-2.5].

It works fairly well in practice, though it does lead to a fair number of large scale (a+b) samples.

I used your prior for baseball batting average estimation; the post includes the raw data (2006 AL position players) in tsv form, BUGS code, and the R calling harness.

I also use your prior for a hierarchical model of diagnostic test accuracy in epidemiology (or other data coding tasks).

I have longer versions of that paper with more analysis, simulations, data, alternative item-response type models, and pointers to all the code and data.

The basic epidemiology model keeps getting rediscovered. I’m still the only one who’s drunk
enough of your Kool-Aid to go the full Bayesian hierarchical model route.

My reply:

0. When I started blogging, I made a conscious effort to be serious and focused on research. I didn’t want to be like the many bloggers out there who had academic credentials but really just mouthed off on current events. (I’m happy to provide my take on current events, but I try to add something new when doing so, not just giving my political opinions as if anyone should give a damn about what I happen to think is right and wrong about the world.) Anyway, I took a look at Bob’s blog and he does me one better: at Bob’s place, it’s all business, all the time. Talk about restraint!

1. For modeling parameters that sum to 1, I prefer the following sort of model. Suppose you are modeling a_1, a_2, a_3, a_4, where a_1+a_2+a_3+a_4=1. Then I’d assign a multivariate normal model to a new set of random variables b_1, b_2, b_3, b_4 and define, a_j = exp(b_j) / (exp(b_1)+exp(b_2)+exp(b_3)+exp(b_4)), for j=1,2,3,4. This model has extra parameters that give you more flexibility compared to the Dirichlet. There’s also a slight nonidentifiability–you can add a constant to all four b_j’s without changing the model–but that’s no big deal; you’re only using the b’s as a way to model the a’s.

See here for some discussion of setting prior parameters for such a model and here for an example of its application (for parameters describing fractions of blood flow in the human body).

2. Yeah, I’m not so thrilled with the prior distribution in Section 5.2 of BDA. I was trying too hard to come up with a natural noninformative distribution. The solution I came up with was reasonable, and it was moderately clever, but now I lean toward more brute-force approaches, and also I prefer weakly-informative priors. Here’s what I’d do if I were rewriting this chapter today:

The challenge is to set a prior for the hyperparameters (a, b) for the Beta (a, b) population distribution on the probabilities of tumor in a bunch of rat experiments. In the book, we first transformed to (a/(a+b), 1/sqrt(a+b)). This makes sense: a/(a+b) is the expected value and 1/sqrt(a+b) is close to the standard deviation of the population distribution. I think a Uniform (0,1) distribution on a/(a+b) is reasonable, but I’m no longer such a fan of that Uniform (0, infinity) distribution on 1/sqrt(a+b), which we did in the book. Instead, I’d probably prefer something like a half-Cauchy (0, 1). I don’t think it would make much difference in this example, but I’m moviing away from the whole “noninformative” thing.

3 thoughts on “Some practical questions about prior distributions

  1. Thanks for the detailed answers. I'll have to digest them and then take 'em out for a spin in BUGS.

    I do try to stay on topic. Mainly because I can't stand the blogs that mix science and "where I went on vacation", or worse yet, science and political or economic rants. I draw the line at book reviews and Manhattan traffic reports.

    Mitzi once pointed me at this Slice blog post about staying on the topic of pizza. The author, Adam Kuban, said:

    Sure, I'd love to rhapsodize about how great the new Battlestar Galactica is (I'm dying for the new season to start). Or the sad, sad cancelation of Arrested Development. But I can't. I've gotta stay on message here.

    So until I see Admiral Adama chowing down on a slice between cylon attacks or I run across one of the Bluths ordering a pie on the AD DVDs, I can't go there.

    The post's actually about an appearance of pizza on an American TV show called "Colbert Report".

  2. Is this also called the "logistic normal" distribution? I believe that this model used by Blei et. al. in correlated topic models. see. http://www.cs.cmu.edu/~lafferty/pub/ctm.pdf

    The dirichlet has the advantage that it is conjugate to multinomial. And the posterior can be expressed as a simple closed form formula of the prior and – is there an alternative prior for use with normalized vectors which allows the modeling of covariance but is still in the exponential family?

    I guess if you are using MCMC this doesn't matter..

  3. I like this distribution too. It was called the logistic normal by Aitchison, though I think it was invented earlier than his work. His book is nice and contains some history. John Lafferty and I used it as a latent variable in a hierarchical mixed membership model of text (i.e., a "topic model"). A longer AOAS paper about this model is here. (The link in the earlier comment is to a shorter conference version.)

Comments are closed.