If an estimate is statistically significant, it’s probably an overestimate of the magnitude of your effect.
P.S. I think youall know what I mean here. But could someone rephrase it in a more pithy manner? I’d like to include it in our statistical lexicon.
Andrew: Are you concerned about the damage that will be done by those that don't get this?
(hard to imagine – isn't it)
http://jama.ama-assn.org/cgi/content/extract/304/…
Some upcomming debate about this at the next Cochrane Collaboration – but could not find the link… (Saturday night and I am behind on making dinner)
K?
The curse of multiple comparisons
The tallest pygmy problem
The problem of infinite monkey testing
Rain dance testing
Cargo cult testing
I can't quite get this right, but something like: "If p is too small for not our effect to be true, our b is probably too big."
I guess you're looking for something like "If it sounds too good to be true, then it probably is" or "Anything that can go wrong will go wrong." Well, you could always just use those as models. How about?
Statistically significant estimates are usually overestimates.
As anyone who has ever had a heated political discussion knows, bias and confidence go hand in hand.
Maybe building on science pundit:
Significant estimates overestimate.
I always liked: "If you can't get t-statistic of 3, you either aren't trying or you're wrong."
A pithier version of Laurence, above, might be:
If p is small enough, b is probably too big.
Casella's original term _recognizable set_ was to the point – was it not?
K?
This seems related to the Proteus Phenomenon
http://clinicaltrials.ploshubs.org/article/info:d…
"In the Proteus phenomenon, the first published study on a scientific question may find a most extravagant effect size; this is followed by the publication of another study that shows a large contradicting effect. Subsequent studies report effect sizes between these extremes"
See also
http://www.jclinepi.com/article/S0895-4356%2805%2…
Andrew,
Can I be the stupid one and ask what, exactly, you do mean here?
Adam:
I'm talking about Type M errors. See here.
I really like that paper on sex ratios. It's good to be reminded to always check previous research, and see what effect sizes and levels of variation/precision can be expected, before you try to make sense of a new claimed result.
Here's a paraphrase from the paper that may or may not be pithy enough for the lexicon:
Large estimates often do not mean "Wow, I’ve found something big!" but, rather, "Wow, this study is underpowered!"
But it's subtly different from the Winner's Curse in economics, right?
http://econ.ucdenver.edu/beckman/Econ%204001/thal…
There, you're cursing the fact that you spent too many resources to win a competition against other people in order to win a prize that's worth less than you had thought.
Here, you're cursing that you spent too few resources (i.e. had too small a sample) to know whether you can trust your "significant" effect. (Except that in many cases, you're cheering that you can slip this problem past reviewers, and then rigorous-minded people like Andrew are left cursing instead of you.)
"Statistically significant estimates are usually overestimates."
Since, on the whole, the residuals sum to zero, then statistically insignificant estimates are usually underestimates.
Winner's mirage?
"Effects seen in the significance lens are smaller than they appear"
anon: Yes! but what percentage of people get this on their own, versus after it has been pointed out once versus multiple, multiple times.
Also the reason as Andrew once put it you need to keep one eye on the power – the less power in the studies the larger those two cancelling _biases_ become.
(Even dressing it up in a topic of "beauty and sex" may not be enough)
K?
The scandal of (low) power
"Effects seen in the significance lens are (much) smaller than they appear"
"Effects seen in the non-significance lens are (much) larger than they appear"
K?
In my mind, one of the issues is what is publishable (and how that shapes the literature). In epidemiology, an unexpected large association is immediately publishable whereas many null studies are extremely hard to publish. If power is low (due to a rare outcome and the difficulties in getting data) then the published associations are almost certianly dramatic over-estimates.
It's not an easy problem as (for example) ignoring a safety signal sometimes leads to very unfortunate outcomes.
i am curious about the fact that the sentence starts with significance testing and ends with effect estimation. Does that mean we can avoid the problem by avoid testing (but still screening a lot of effects by effect estimation)?
What the about the "filter fallacy"? People pass studies through a filter that excludes small effects (fixed p, fixed n) and are then surprised that you've overestimated the effects …
… or the "too big to be true" effect.
There is also the "file drawer problem" in which excluding ns effects biases published study effects upwards.