On the half-Cauchy prior for a global scale parameter

Nick Polson and James Scott write:

We generalize the half-Cauchy prior for a global scale parameter to the wider class of hypergeometric inverted-beta priors. We derive expressions for posterior moments and marginal densities when these priors are used for a top-level normal variance in a Bayesian hierarchical model. Finally, we prove a result that characterizes the frequentist risk of the Bayes estimators under all priors in the class. These arguments provide an alternative, classical justification for the use of the half-Cauchy prior in Bayesian hierarchical models, complementing the arguments in Gelman (2006).

This makes me happy, of course. It’s great to be validated.

The only think I didn’t catch is how they set the scale parameter for the half-Cauchy prior. In my 2006 paper I frame it as a weakly informative prior and recommend that the scale be set based on actual prior knowledge. But Polson and Scott are talking about a default choice. I used to think that such a default would not really be possible but given our recent success with automatic priors for regularized point estimates, now I’m thinking that a reasonable default might be possible in the full Bayes case too.

P.S. I found the above article while looking on Polson’s site for this excellent paper, which considers in a more theoretical way some of the themes that Jennifer, Masanao, and I are exploring in our research on hierarchical models and multiple comparisons.

3 thoughts on “On the half-Cauchy prior for a global scale parameter

  1. This is an interesting note but the authors do not seem to show the minimaxity of their Bayes estimator, which I think cannot hold for all p's (Strawderman, 1973, AoS). We gave sufficient conditions for minimaxity in Berger and Robert (1992, AoS) but these conditions do not seem to cover the case of the half-Cauchy.

  2. Andrew: we scale the half-Cauchy prior by sigma (the scale of the error model), which in turn gets Jeffreys' prior.

    Christian: Yuzo Maruyama has some great work on minimaxity, which we discovered after writing this article. He doesn't consider the wider class we consider, but I believe he proves minimaxity of the half-Cauchy prior for p >= 8.

  3. James:

    I thought of that but then doesn't the scaling depend on the sample size? In which case how does it work when the sample sizes are unequal by group?

    Consider the following model. For simplicity I'll set the residual data variance to be 1:

    y_i ~ N(theta_{j[i]}, 1), for i=1,…,n, with groups j=1,…,J.
    Sample size in group j is n_j; thus, n=n_1+…+n_J.

    Consider three scenarios, each with J=10 groups:

    (1) n_j=1 for all j. Then I can definitely see how you can use sigma (that is, 1 in this case) as a scaling parameter for the variance of the theta_j's.

    (2) n_j=10 for all j. Here it's not clear to me if you would use 1 or 0.31 (i.e., 1/sqrt(10)) as your scaling factor.

    (3) n_1=1, n_2=2, n_3=3, … n_{10}=10. What's the scaling factor here?

    Thanks for the clarification.

Comments are closed.