Finding signal from noise

A reporter contacted me to ask my impression of this article by Peter Bancel and Roger Nelson, which reports evidence that “the coherent attention or emotional response of large populations” can affect the output of quantum-mechanical random number generators.

I spent a few minutes looking at the article, and, well, it’s about what you might expect. Very professionally done, close to zero connection between their data and whatever they actually think they’re studying.

(Just for example, they mention that their random number generators are electromagnetically shielded–which seems kinda funny given that they’re trying to detect effects on these generators, and I assume some of these effects might come electromagnetically. They also alternate between assuring the reader that (a) The random number generators really are generating random numbers, (b) They cleaned the data to fix all the cases where the random number generators aren’t generating random numbers, (c) The random numbers can be affected by people all over the place, so they’re not really random.)

OK, OK, fine. The substance of the article isn’t particularly interesting to me: I have little interest in what was happening with these people’s random number generators, and I have little doubt that the researchers could have found similar patterns had they looked at the data in other, more obviously meaningless ways. (For example, instead of taking 236 days that were believed to be particularly important (New Year’s Days, dates of earthquakes, plane crashes, other newsworthy events), I suspect they could’ve taken just about any selection of days and found something interesting.) Anyway, that’s not the issue.

My real point here is that the article reads like a physics paper–and, indeed, I looked up the first author and indeed he is a physicist. Physicists can look pretty silly doing data analysis on non-physics problems. But, now I’m wondering: is data analysis in experimental physics this bad? Or am I just succumbing to my own selection bias, judging the academic field of physics by its most publicity-worthy rather than its best practitioners? I’d hate to think that the occasionally headline-grabbing research out of CERN, etc., is really just blind data manipulation!

P.S. Yeah, sure, make all the jokes you want about how we do things in quantitative social science. Still, it’s not this bad, is it? At least we try to have some connection between our measurements and the phenomena we’re studying. Here, these dudes are torturing the data to within an inch of its life (as the saying goes). But such behavior might be second nature to physicists, who routinely have to process and process and process the noise out of their experimental data.

P.P.S. Yes, I know it’s in poor taste to make fun of people who (unlike others whom I make fun of here) are neither trying nor succeeding to do any harm (beyond, maybe, wasting the money of some of their funders). Somebody asked me to read the article and I felt like procrastinating, but maybe that’s not really enough of a justification, I guess. I apologize and promise never to do it again.

15 thoughts on “Finding signal from noise

  1. OK… We know you don't like adjustments for multiple comparisons. And the fact that you could write a critique of this paper without mentioning it shows just how little the term "multiple comparisons" penetrates your "global consciousness."

  2. I definitely believe that multiple comparisons ideas are important; I just don't think multiple comparisons arises when things are done right. See the examples here for illustrations of both these points.

    The multiple comparisons issues in the Bancel and Nelson article are so huge and obvious that I thought there was no point mentioning it.

  3. I think the average physics experiment involves statistics that is closer to a simple textbook model than social science does. So physicists don't need as much statistical literacy as social scientists do.

  4. Physics is a large field which has statistics at it very heart – you must have heard of Boltzman, Gibbs? Laws of Thermodynamics? The models used in High Energy Physics are very interesting, multiple models using Monte Carlo simulations all over the place. These are so good they can pick out hundreds of events out of trillions as background.

    That is not to say there isn't controversy. There are literally shouting matches at meetings about what kind of models to use. Tommaso Dorigo often has statistics related posts at his blog "A Quantum Diaries Surviver" on issues at Fermilab or CERN. The thing is here you have extremely good models to draw on, that match effects amazingly well and can be used for engineering new tests which generally the models match. The order they are looking at now is as I said above, extremely small numbers of events out of a huge data set.

    The article you are talking about isn't physics and just because someone has a physics background doesn't make it so. There are some relatively famous guys who I won't dignify with names that used to work at physics labs that produced even worse junk.

  5. The statistics used for large-scale particle physics such as LHC are highly sophisticated, and the physicists I have encountered have proved to have a good grasp of the fundamental issues of inference and the philosophical issues underlying frequentist/Bayesian differences. I would recommend that you take a look at the Banff challenge (http://newton.hep.upenn.edu/~heinrich/birs/), which was designed with the Higgs detection problem in mind, and the SAMSI working group on statistics in particle physics from 2006 (http://sisla06.samsi.info/astro/phy/).

  6. I think that physics uses a lot of sophisticated stochastic theory, but this isn't really the same thing as sophisticated statistics modeling. In my experience with physics trained folks working in biology, they can be awfully naive about data issues in complex systems.

  7. I think this should become a genre of statistical analysis: what does it look like when we torture innocent data within an inch of its life? What stories will it tell to make us go away and leave it alone?

  8. I'm not a physicist and as an outsider, I pretty much have the same impression Yolio does. Physicists use a lot of probability (like in statistical mechanics) and are are interested in a lot of the same computational tools as statisticans (like integration problems). But apart from exceptions in particular areas where statistics is valuable, like particle physics (low signal to noise ratio) and astronomy (lots of sample selection problems), they don't really do statistical data analysis as such.

  9. I don't think statistical data analysis in physics is naive but there are really two cases here.

    Outside of high energy experiment you basically do as little statistical data analysis as you can. More or less plot the data points against a theoretical model and leave it up to the eye to decide. A least squares error if youre trying to fit some parameter when youre confident of the functional form is the fanciest thing I see. I think physicists would get suspicious if you did something statistically fancy. Basically because its not necessary.

    In high energy experiment, as Alex pointed out, you are dealing with fantastically complicated statistics and people spend a long time trying to figure out what a statistically significant signal looks like. A lot of HEX is statistics. Simulated data sets are a foundational tool over there and I believe you're a fan of that so thats something. Also funky things like neural nets.

    Punchline: Physicists are not naive about data issues.

  10. Predicting the likelihood of meaningful collaborations with others on stats?

    From personal experience in clinical research I have noticed something seemingly distinctive about the basic science guys like Biophysicists

    And I can put into one of my favorite quotes

    They know what math they know and the know they dont know some of the math needed for statistics – but they dont know that they dont know that there is more to statistics than math

    Not that thats "their" fault and for instance Frank Harrel has recently put together some stat materials for basic clinical scientists at Vanderbuilt

    Keith

Comments are closed.