Why Bayes

Shravan Vasishth writes,

One thing I [Shravan] have not understood so far (and I’m on the 250th page or so [of Data Analysis Using…]) is what the fully Bayesian approach buys me as an experimenter. As far as I can tell, the major advantage is that I can leverage previous related experiments to set up an “informative” prior. But it’s not clear yet what this leveraging will give me that the conventional approach won’t. I understand and appreciate the importance of the simulation idea, I do that all the time for computational modeling of sentence processing, but the major gains from the Bayesian approach for data analysis are not clear to me yet.

My reply: Bayes gives you partial pooling using a convenient formula (see, for example, chapter 5 of Bayesian Data Analysis, or the equivalent chapters in Carlin and Louis or other books). You can do partial pooling “manually” but that’s more work and gets tougher for models with varying intercepts and slopes, and for nonnested models. Also, by being based on a generative model, Bayesian inferences can be checked by comparing data to replications simulated under the model.

5 thoughts on “Why Bayes

  1. Recasting Bayes as the pooling of non-data and data information by the multiplication of the prior (P) by the likelihood (L) i.e. P * L and the pooling of data from related studies by the multiplication of likelihoods i.e. L1 * L2, apart from conveniences (especially when there are random parameters) P*L1*L2*…*Ln would not seem to offer much advantage over L1*L2*…*Ln – unless P carries some useful non-data information.
    Not to undervalue convenience but I believe the bigger challenge is when L1*L2*…*Ln cannot provide answers with reasonable "confidence" (and that in my opinion is what experimenters need brought to their attention.)

  2. Keith,

    I agree. The biggest gains from Bayes come in hierarchical models, in which case your prior distribution is really a group-level model, and it doesn't just come into the product once, it comes in J times (where J is the number of groups).

  3. Andrew,

    Over at overcoming bias, I have now seen several people attempt to use bayesian reasoning in discussing the question of whether or not the universe is infinite.

    When no finite amount of evidence can truly constitute evidence for the hypothesis that the universe is infinite, approaching this question from a probabilistic or statistical point of view seems fundamentally flawed.

    See
    http://www.overcomingbias.com/2007/06/1_2_3_infin

  4. One major complaint with frequentist methods is that they are not self-consistent. For instance, if you fit a regression to data generated from DGP
    Y=Xb+e and find the OLS
    betahat ~N(b,sigma^2(X'X)^(-1)).
    That much is fine. But in practice, frequentists fit, test, restrict, fit again, use criteria, etc. In general, if you were to fit y=X1*beta1+X2*beta2+eps,
    then, conditional on having passed the large array of statistical tests beta1hat is no longer normal. In reality, frequentists are oftentimes fitting the model
    y=(X1*1(t-stat(X1)>critval)*beta1+X2*1(t-stat(X2)>critval)*beta2)*1(Fstat(X1,X2)>critval)+eps, where 1() is an indicator function.

    I don't know if anyone has the sampling distributions for betahat conditional on it not being rejected, but I am pretty sure it should no longer be normal. In effect, frequentists are hiding their assumptions, perhaps not being entirely honest about them.

    Of course, I wind up using MLE more than 50% of the time, since so many MCMC problems are still too close to intractable.

Comments are closed.