More on “The difference between ‘significant’ and ‘not significant’ is not itself statistically significant”

Following up on the remark here, Ben Jann writes,

This just sprang to my mind: Do you remember the 2005 paper on oxytocin and trust by Kosfeld et al. in Nature? It has been in the news. I think they did the same mistake. The study contains a “Trust experiment” and a “Risk experiment”. Because the oxytocin effect was significant in the Trust experiment, but not in the Risk experiment, Kosfeld et al. see their hypothesis confirmed that oxytocin increases trust, but not the readiness to bear risks in general. However, this is not a valid conclusion since they did not test the difference in effects. Such a test would, most likely, not turn out to be significant (at least if performed on the aggregate level as the other tests in the paper; the test might be significant if using the individual-level experimental data). (Furthermore, note that there is an error in Figure 2a: there should be an additional hollow 0.10 relative frequency bar at transfer 10.)