Free advice: you get what you pay for [statistics edition]

Gregg Keller asks,

When setting up a regression model with no obvious hierarchical structure, with normal distribution priors for all the coefficients (where the normal distribution’s mean and variance are also defined by a prior distribution), does it make sense to give the prior variance distributions for the various coefficients a shared component, or is it better to give each coefficient a completely independent set of prior distributions with no shared components?

For instance, in Prof. Radford Neal’s bayesian regression/neural network software, he gives all the inputs a common prior variance distribution (with an optional ARD prior to allow some variations). Is there any obvious reason to do this in a regression model?

My response: first off, I love Radford’s book, although I haven’t actually fit his models so can’t comment on the details there. And to give another copout, it’s hard to be specific without knowing the context of the problem. That said, when there are many predictors, I think it should be possible to structure them (see Section 5.2 of this paper or, for some real examples, some of Sander Greenland‘s papers on multilevel modeling). With many variance parameters, these themselves can be modeled hierarchically (see Section 6 of this paper for an exchangeable half-t model). I think that some of your questions about variance components can be addressed by a hierarchical model of this sort; the recent work of Jim Hodges and others on this topic is also relevant (actually, it’s probably more relevant than my own papers).

1 thought on “Free advice: you get what you pay for [statistics edition]

  1. In Neal's Bayesian neural network approach, variance of the weights from an input is related to relevancy and more specifically to non-linearity of the effect of that input (see this paper). When using a common variance, assumption is that all inputs have a priori same relevancy and non-linearity. When using hierarchical (ARD) prior, assumption is that relevancies and non-linearities differ. In Neal's FBM software the hierarchical prior is such that the prior variation between variances is fixed (fixed degrees of freedom in S-Inv-Chi2 prior), but there is unknown common scale, so that assumption is that if some of the inputs are non-linear probably others are too. This prior may be problematic if there are both (almost) linear and highly nonlinear relevant effects, and results may be sensitive to fixed values in hyperprior. Naturally use of S-Inv-Chi2 prior for variances has other problems too as described by Gelman in his paper, and it would be interesting to test, e.g., half-Cauchy prior for prior variances in Bayesian neural network.

Comments are closed.