The data (and the questions you ask) are more important than the analysis

Andrew Sullivan links to this amusing study [link fixed]. The whole blog is lots of fun–I’ve linked to it before–and it illustrates an important point in statistics, which I’ve given as the title of this blog entry.

P.S. I’m not trying to say that statistical methodology is a waste of time. Good methods–and I include good graphical methods in this category–allow us to make use of more data. If all you can do is pie charts and chi-squared tests (for example), you won’t be able to do much.

9 thoughts on “The data (and the questions you ask) are more important than the analysis

  1. do you think someone could have produced that study, though, without a very, very solid graph of statistics? Sure, the techniques actually used are very low tech, but it's all done so flawlessly, and with so much sophistication and attention to detail, that I highly suspect that a serious geek was at work here. It's the interaction terms (cleavage/abs and age), the exclusion of outliers, and, not least, the very smart&clear graphs that I'd suspect just don't come to someone who just has the data and the question, but not the methods training.
    (anyone can spot the statistics package? It's neither R nor Stata. SPSS?)

  2. I think I agree that the analysis was done by someone with a firm grasp of statistics; as others have said, they ask good questions and are willing to take "the data don't specify" for an answer (my favourite graph in the article is the "glurg" scatterplot). There's something to be said for intelligently using the simplest tools.

  3. Those "researchers" did a terrific job at identifying questions that their intended audience would find interesting and that can be answered by looking only at data that didn't involve looking into anyone's mail. They have enough data that they could have (and, I think, should have) done a lot more with more than one variable at a time. Perhaps they were worried that people would be confused. But basically this is an excellent illustration of the title of Andrew's post. Ask the right questions and a simple summary plot may be all you need.

  4. …we finalized our data pool at 7,140 users. Aside from running each picture through a variety of analysis scripts, we tagged, by hand, each picture for various contextual indicators. We double-checked the tags before generating our data.

    You have to give them credit for hand tagging over 7000 photos and then double-checking them.

    If it was anywhere but America I would suspect it was the "wink" that told you they were pulling your leg over the whole thing.

  5. Interesting that nobody's pointed out that they questions they're asking are about causality and the analysis is purely correlational.

Comments are closed.