The winner’s curse

Posted on October 2, 2010 7:03 PM by Andrew

If an estimate is statistically significant, it’s probably an overestimate of the magnitude of your effect.

P.S. I think youall know what I mean here. But could someone rephrase it in a more pithy manner? I’d like to include it in our statistical lexicon.

21 thoughts on “The winner’s curse”

K? O'Rourke on October 2, 2010 4:14 PM at 4:14 pm said:

Andrew: Are you concerned about the damage that will be done by those that don't get this?
(hard to imagine – isn't it)

http://jama.ama-assn.org/cgi/content/extract/304/…

Some upcomming debate about this at the next Cochrane Collaboration – but could not find the link… (Saturday night and I am behind on making dinner)

K?
OneEyedMan on October 2, 2010 4:15 PM at 4:15 pm said:

The curse of multiple comparisons

The tallest pygmy problem

The problem of infinite monkey testing

Rain dance testing

Cargo cult testing
Laurence on October 2, 2010 4:23 PM at 4:23 pm said:

I can't quite get this right, but something like: "If p is too small for not our effect to be true, our b is probably too big."
The Science Pundit on October 2, 2010 5:26 PM at 5:26 pm said:

I guess you're looking for something like "If it sounds too good to be true, then it probably is" or "Anything that can go wrong will go wrong." Well, you could always just use those as models. How about?

Statistically significant estimates are usually overestimates.
Bayesian Empirimance on October 3, 2010 9:15 AM at 9:15 am said:

As anyone who has ever had a heated political discussion knows, bias and confidence go hand in hand.
Charles on October 3, 2010 9:49 AM at 9:49 am said:

Maybe building on science pundit:

Significant estimates overestimate.
Jonathan on October 3, 2010 10:47 AM at 10:47 am said:

I always liked: "If you can't get t-statistic of 3, you either aren't trying or you're wrong."
E. on October 3, 2010 3:06 PM at 3:06 pm said:

A pithier version of Laurence, above, might be:

If p is small enough, b is probably too big.
K? O'Rourke on October 4, 2010 4:35 AM at 4:35 am said:

Casella's original term _recognizable set_ was to the point – was it not?

K?
Dan Goldstein on October 4, 2010 5:41 AM at 5:41 am said:

This seems related to the Proteus Phenomenon

http://clinicaltrials.ploshubs.org/article/info:d…

"In the Proteus phenomenon, the first published study on a scientific question may find a most extravagant effect size; this is followed by the publication of another study that shows a large contradicting effect. Subsequent studies report effect sizes between these extremes"

See also
http://www.jclinepi.com/article/S0895-4356%2805%2…
Adam on October 4, 2010 7:01 PM at 7:01 pm said:

Andrew,

Can I be the stupid one and ask what, exactly, you do mean here?
Andrew Gelman on October 5, 2010 10:01 AM at 10:01 am said:

Adam:

I'm talking about Type M errors. See here.
Jerzy on October 6, 2010 6:04 AM at 6:04 am said:

I really like that paper on sex ratios. It's good to be reminded to always check previous research, and see what effect sizes and levels of variation/precision can be expected, before you try to make sense of a new claimed result.

Here's a paraphrase from the paper that may or may not be pithy enough for the lexicon:

Large estimates often do not mean "Wow, I’ve found something big!" but, rather, "Wow, this study is underpowered!"
Jerzy on October 6, 2010 6:20 AM at 6:20 am said:

But it's subtly different from the Winner's Curse in economics, right?
http://econ.ucdenver.edu/beckman/Econ%204001/thal…
There, you're cursing the fact that you spent too many resources to win a competition against other people in order to win a prize that's worth less than you had thought.
Here, you're cursing that you spent too few resources (i.e. had too small a sample) to know whether you can trust your "significant" effect. (Except that in many cases, you're cheering that you can slip this problem past reviewers, and then rigorous-minded people like Andrew are left cursing instead of you.)
anon on October 6, 2010 6:58 PM at 6:58 pm said:

"Statistically significant estimates are usually overestimates."

Since, on the whole, the residuals sum to zero, then statistically insignificant estimates are usually underestimates.
zbicyclist on October 6, 2010 7:04 PM at 7:04 pm said:

Winner's mirage?

"Effects seen in the significance lens are smaller than they appear"
K? O'Rourke on October 7, 2010 4:03 AM at 4:03 am said:

anon: Yes! but what percentage of people get this on their own, versus after it has been pointed out once versus multiple, multiple times.

Also the reason as Andrew once put it you need to keep one eye on the power – the less power in the studies the larger those two cancelling _biases_ become.

(Even dressing it up in a topic of "beauty and sex" may not be enough)

K?
K? O'Rourke on October 7, 2010 4:14 AM at 4:14 am said:

The scandal of (low) power

"Effects seen in the significance lens are (much) smaller than they appear"

"Effects seen in the non-significance lens are (much) larger than they appear"

K?
Joseph on October 7, 2010 6:00 AM at 6:00 am said:

In my mind, one of the issues is what is publishable (and how that shapes the literature). In epidemiology, an unexpected large association is immediately publishable whereas many null studies are extremely hard to publish. If power is low (due to a rare outcome and the difficulties in getting data) then the published associations are almost certianly dramatic over-estimates.

It's not an easy problem as (for example) ignoring a safety signal sometimes leads to very unfortunate outcomes.
wei on October 11, 2010 1:22 PM at 1:22 pm said:

i am curious about the fact that the sentence starts with significance testing and ends with effect estimation. Does that mean we can avoid the problem by avoid testing (but still screening a lot of effects by effect estimation)?
thom on October 22, 2010 11:54 AM at 11:54 am said:

What the about the "filter fallacy"? People pass studies through a filter that excludes small effects (fixed p, fixed n) and are then surprised that you've overestimated the effects …

… or the "too big to be true" effect.

There is also the "file drawer problem" in which excluding ns effects biases published study effects upwards.

Comments are closed.