Statistical challenges in estimating small effects

John Carlin had some comments on my paper with Weakliem:

My immediate reaction is that we won’t get people away from these mistakes as long as we talk in terms of “statistical significance” and even power, since these concepts are just too subtle for most people to understand, and they distract from the real issues. Somewhat influenced by others, I spend quite a bit of time eradicating the term “statistical significance” from colleagues’ papers. I suspect that as long as the world sees statistical analysis as dividing “findings” into positives and negatives then the nonsense will keep flowing, so an important step in dealing with this is to change the terminology. In your example you seem to be arguing too much on his ground by focussing on the fact that although he data-dredged a significant p-value, your p-value is not significant. (So the ignorant editor or reader may see it as technical squabbling between statisticians rather than being forced to deal with the real issues about precision of estimation or lack of information.)

I agree entirely that the problem is with the framework of effects as true/false, but this is the very framework that “statistical significance” is built around and your article makes that concept very central by continually referring to “what if the effect is not statistically significant?” etc. I think the focus should be on how dangerous it is to overinterpret small studies with vast imprecision, and I’m not sure why this can’t be clarified by sticking to the precision (or information) concept. I still haven’t looked again at your Type S and Type M but on the face of it wonder if they may just confuse by adding more layers. Statistical significance gets it wrong because it focuses on null hypotheses (usually artificial), but when you say Type S it almost sounds similar in that you are thinking of truth/falsity with respect to the sign, rather than uncertainty about effects…?

My big point in considering Type S errors is to move beyond the idea of hypotheses being true or false (that is, to move beyond the idea of comparisons being exactly zero), but John has a point, that I still have to decide how to think about statistical significance. The problem is that, from the Bayesian perspective, you can simply ignore statistical significance entirely and just make posterior statements like Pr (theta_1 > theta_2 | data) = 0.8 or whatever, but such statements seem silly given that you can easily get impressive-seeming probabilities like 80% by chance.

2 thoughts on “Statistical challenges in estimating small effects

  1. A few comments.

    1. If we just forgot about small effects in social science for a while, this would save both trees and media credibility.

    2. I wonder what they do about this in the medical literature, where small effects are more common (e.g. probability of a side effect, such a dying from a medication that's supposed to cure you)? In some cases you might have a pretty good prior (actuarial data), but probably not in all.

  2. In education research, just about nothing seems to work, which makes a certain amount of sense since presumably the low-hanging fruit have already been plucked and added to the sauce. But they keep trying since it certainly _seems_ like there's room for improvement. Perhaps there's a problem with the whole research paradigm of incremental improvements. Much of this paradigm derives from agricultural experimentation, I believe (consider the hugely statisticians Fisher and Snedecor, both of whom worked in agriculture), and maybe it made sense 80 years ago but not so much now.

    On the other hand, rigorous statistics sometimes seems like the only thing standing between us and uncritical and random acceptance of whatever hot new medical procedure or drug somebody thinks up.

Comments are closed.