Test variables that don’t depend on data

Posted on November 6, 2007 1:56 AM by Andrew

Ying Yuan writes,

I have a question on posterior predictive model checking (Gelman, Meng and Stern 1996). Consider a simple two-level model

y_{ij} ~ N(\alpha_i + \beta_i * x_ij, \sigma^2)
\alpha_i ~ N(\alpha, \tau_1^2)
\beta_i ~ N(\beta, \tau_2^2)

I am interesting in assessing the second level of the model, and come up with a discrepancy function

D=(\beta_i – \beta)^2/tau_2^2.

However, in Gelman, Meng and Stern (1996), the posterior predictive checking compares

D(y^rep, \theta^j) with D(y^obs, \theta^j) for j=1, … J, where \theta^j is a posterior draw of \theta.

The question is that the above discrepancy function does not involve y, and it seems difficult to apply posterior predictive model checking for this specific discrepancy function. I am wondering if the discrepancy function has to been depend on data to use posterior predictive method.

My reply: In general, the discrepancy measure can depend on data and parameters, but depending just on parameters is ok–that’s a special case. But I’m not sure that the particular discrepancy function you’re considering will be so useful. If you’re estimating tau_2 from the data, then this particular function of parameters will probably be fit well in any case. I think you have to think a bit harder about what aspects of the beta model you’re interested in studying.

Also, in the replications, you have to decide whether you’re interested in new data with the same alpha’s and beta’s, or also new alpha’s, or also new beta’s, or both. It depends how you plan to use the model.