Alternatives to regression for social science predictions

Somebody named David writes:

I [David] thought you might be interested or have an opinion on the paper referenced below. I am largely skeptical on the techniques presented and thought you might have some insight because you work with datasets more similar to those in ‘social science’ than myself.

Dana and Dawes. The superiority of simple alternatives to regression for social science predictions. Journal of Educational and Behavioral Statistics (2004) vol. 29 (3) pp. 317.

My reply: I read the abstract (available online) and it seemed reasonable to me. They prefer simple averages or weights based on correlations rather than regressions. From a Bayesian perspective, what they’re saying is that least-squares regression and similar methods are noisy, and they can do better via massive simplification.

I’ve been a big fan of Robyn Dawes ever since reading his article in the classic Kahneman, Slovic, and Tversky volume. I have no idea how much Dawes knows about modern Bayesian statistics (that is, multilevel models), but if he does, I assume he’d support a partial-pooling approach that makes use of data information in determining weights while keeping stability in the estimates.

To put it another way, least squares regression won’t help you make maps like these, but simple averaging won’t either. At some point you have to step things up to the next level.

5 thoughts on “Alternatives to regression for social science predictions

  1. Dear prof. Gelman.
    I took a look at the paper you linked and I was interested in you technical report where you describe in details how to produce that maps.

    Thanks
    Manoel Galdino

  2. Prof.Gelman,

    A fascinating link. I followed up and got the paper from CMU where Prof.Dawes is based, I think. The idea is really interesting and the study talks about sample sizes where regression indeed becomes more valuable.

    I wonder how relevant the overall concept is (simpler techniques are more robust in cross-validation) in a business context, where the availability of data is several orders of magnitude higher than in a pure social sciences context. Any opinions?

  3. least squares regression formally is just a particular weighted average of individual observation slope estimates or unit of analysis slope estimates

    so in general, any least squares regression can be recast as a weighted average

    more generally, any likelihood based technique can be factorized by individual observations or units of analysis and if the log of these likelihoods is approximately quadratic – this approximation can be recast as a weighted average (David Cox in one of his paper spelled out conditions for when this would be asympototically fully efficient)

    But agree with Andrew that when things are simplified more people may be able to be grasp what is going on. And simplifying an analysis by recasting it in term of weighted averages of estimates from pieces of the data – likely will help many.

    Once wrote a paper about doing this in clinical research – called recasting complex analyses in terms of t-tests to provide "understudy" analyses – the clinical reviewers did not quite seem to get it and the statistical reviewers thought it was either unnecessary or even "harmful" and I ran into other distractions.

    K?

Comments are closed.