Anova

Cari Kaufman writes,

I am writing a paper on using Gaussian processes for Bayesian functional ANOVA, and I’d like to draw some connections to your 2005 Annals paper. In my own work I’ve chosen to use a 1-1 reparameterization of the cell means, that is, to constrain the levels within each factor. But I am intrigued by your use of exchangeable levels for all factors, and I’m hoping you can take a few minutes to help me clarify your motivation for this decision. Since not all parameters are estimable under the unconstrained model, don’t you encounter problems with mixing when the sums of the levels trade off with the grand mean? It seems in many situations it’s advantageous to have an orthogonal design matrix, especially when the observed levels correspond to all possible levels in the population. Do you have any thoughts on this you can share?

I should say I found the paper very useful, especially your graphical representation of the variance components. I also like your distinction between the superpopulation and finite population variances, which helped me clarify what happens when generalizing to functional responses. Basically, we can share information across the domain to estimate the superpopulation variances by having a stationary Gaussian process prior, but the finite population variances can differ over the domain, which gives some nice insight into where
various sources of variability are important. (At the moment I’m working with climate modellers, who can really use maps of where various sources of variability show up in their output.)

My reply: I’m not quite sure what the question is, but I think you’re pointing out the redundant parameterization issue, that if we specify all levels of a factor, and then have other crosscutting or nested factors (or even just a constant term), then the linear parameters are not all identifiable. I would deal with this issue by fitting the large, nonidentified model and then summarizing using the relevant finite-population summaries. We discuss this a bit in Sections 19.4-19.5 and Chapters 21-22 of our new book.

A couple notes on this:

1. Mixing of the Gibbs sampler can be slow on the original, redundant parameter space but fast on the transformed space, which is what we really care about. Also, things work better with proper priors. My new thing is weakly informative priors which don’t include all your prior information but act to regularize your inferences and keep the algorithms in a reasonable space where they can converge faster. The orthoganality that you want can come in this lower-dimensional summary.

2. The redundant-parameter model is identified, if only weakly, as long as we use proper prior distributions on the variance parameters. In Bayesian Data Analysis and in my 2005 Anova paper, I was using flat prior distributions on these “sigma” parameters. But since then I’ve moved to proper priors, or, in the Anova context, hierarchical priors. See this paper for more information, including an example in Section 6 of the hierarchical model for the variance parameters.

1 thought on “Anova

  1. Thanks for the quick response. My main question was whether there are ANY circumstances under which you think a constrained parameterization for the levels of a given factor is appropriate. In particular, when the observed levels constitute all levels in the population, I wonder why one wouldn't just work in the constrained subspace from the start, treating, say, appropriately normalized contrasts as exchangeable.

Comments are closed.