“An ounce of replication…”

I was looking through this old blog entry and found an exchange I like enough to repost. Raymond Hubbard and R. Murry Lindsay wrote,

An ounce of replication is worth a ton of inferential statistics.

I questioned this, writing:

More data are fine, but sometimes it’s worth putting in a little effort to analyze what you have. Or, to put it more constructively, the best inferential tools are those that allow you to analyze more data that have already been collected.

Seth questioned my questioning, writing:

I’d like to hear more about why you don’t think an ounce of replication is worth a ton of inferential statistics. That has been my experience. The value of inferential statistics is that they predict what will happen. Plainly another way to figure out what will happen is to do it again.

To which I replied:

I’m not sure how to put replication and inferential statistics on the same scale . . . but a ton is 32,000 times an ounce. To put in dollar terms, for example, I think that in many contexts, $32,000 of data analysis will tell me more than $1 worth of additional data. Often the additional data are already out there but haven’t been analyzed.

I think it’s fun to take this sort of quotation literally and see where it leads. It’s a rhetorical strategy that I think works well for me, as a statistician.

5 thoughts on ““An ounce of replication…”

  1. Putting everything on a financial scale settles the argument in many cases: $1 of replication is worth $1 of inference because they're both worth a dollar! This isn't just a flippant answer. It points out that these debates are ultimately about economic decisions. Statisticians often go to great lengths to avoid discussing economics.

  2. @1: Perhaps, also, I should cook breakfast on my copy of Bayesian Data Analysis – it certainly costs as much as a decent skillet and thus should work just as well.

    Would it be fair to say that analyzing the data you have can give you a strategy for better collecting new data (guide you to the right ounce)? Of course the term "inferential statistics" has a lot of baggage attached which I don't understand, but certainly one can model the data collection system and gain a lot.

    In some cases e.g. genomics there is only so much data: one evolutionary tree, one human genome, however-many polymorphisms in a coding region, &c. Although people say genomics is the ultimate in data-richness, in many ways the sample size is one. We are reaching quite hard limits and some of our difficulty comes from having to redigest data – to repeat the study is in the realm of Dyson-level science-fiction.

  3. Thanks for reprinting this, it's interesting. I think in lots of real situations with lots of data, the investigation is badly compromised by the fact that the data have been used to choose the hypothesis to be tested. And the same data is used to "test" the hypothesis. And then the authors are impressed that the hypothesis is true at a modest level of significance. Taking advantage of random variation, in other words. More analysis won't help them. To allow for the fact that they have used the data to choose the hypothesis, they need replication.I also think there's a big difference here between coming up with new ideas (for which additional stats is often very helpful for finding new patterns in your data and replication isn't helpful at all) and testing the ideas you already have (where new data will help but additional stats probably won't because of the capitalization on chance problem).

  4. Seth,

    Psychologists like to do experiments, political scientists like to analyze existing data. I'm always impressed that when a psychologist wants to learn something, he or she will typically run an experiment. Whenever I've tried to run an experiment, it's been a mess. Which makes sense: I've been analyzing data continuously for the past 25 years but I have very little experience with data collection (and that is mostly helping others with their data collection projects).

  5. Why not $32,000,000 of data analysis will NOT tell me more than $10,000 for another study or "independent" data source?

    Or as Mosteller & Tukey put it in their chapter on Hunting for the Real Uncertainty – you only learn how studies differ (degree of replication) – when you have more than one study

    cheers
    Keith

Comments are closed.