Pretty graph, could be made even prettier

Here’s a pretty graph (from Steven Levitt, who says “found on the web” but I don’t know the original source):

Birthsbymonth.jpg

This is a good one for your stat classes. My only suggestions:

1. Get rid of the dual-colored points. What’s that all about? One color per line, please! As Tufte might say, this is a pretty graph on its own, it doesn’t need to get all dolled up. Better to reveal its natural beauty through simple, tasteful attire.

2. Normalize each month’s data by the #days in the month. Correcting for the “30 days hath September” effect will give a smoother and more meaningful graph.

3. Something wacky happened with the y-axis: the “6” is too close to the “7”. Actually, I think it would be fine to just label the axis at 6, 8, 10,… Not that it was necessarily worth the effort to do it in this particular case, just thinking about this one to illustrate general principles. Ideally, the graphing software would make smart choices here.

4. (This takes a bit more work, but…) consider putting +/- 1 s.e. bounds on the hockey-player data. Hmm, I can do it right now….761/12 = 63, so we’re talking about relative errors of approximately 1/sqrt(63)=1/8, so the estimates are on the order of 8% +/- 1%.

P.S. See Junk Charts for more.

5 thoughts on “Pretty graph, could be made even prettier

  1. You didn't check the comments in the post. I discovered that Levitt got this graph from an astrology site! I also hypothesize that the reason that Levitt failed to provide a link — even though he almost always does in his other posts — is that he was embarrassed to admit the source for the graph.

    Despite the source, I think that the data is probably correct. See the comments for a full discussion. Alas, it also seems clear that the New York Times article that started the conversation is almost total hokum.

    Beware the riches of fame.

  2. Great post as usual. Especially #4. I haven't seen the "overwhelming" evidence that Freakonomics alleged to exist but judging from the sample size, and that sampling variation, it is unclear whether we should immediately jump to the stage of explaining the trend. The current discussion smacks of using our creativity to explain potentially spurious patterns.

    Another distortion introduced by this chart is the rounding used for the NHL line but not for the other two lines.

    I expanded upon your #4 on my blog.

  3. My first thought was that one color in each series is used for 31-day months, another for shorter months, but this is not true, since July and August would then have the same color.

    Note the alternation in the U.S. and Canada data: Jan high, Feb low, Mar high, April low, May high…, which is what we would expect of course.

    Apologies if I'm just stating the obvious.

  4. An astrology site, huh? Astrology is an example of the rule that, the lower your signal-to-noise ratio, the stronger your statistics need to be. Astrologers and ESP researchers are statistical experts by necessity.

Comments are closed.