This is what is done

Posted on March 9, 2010 2:58 PM by Andrew

This is from a commercial software package:

[image removed to avoid embarrassing anybody]

This is page 1 of a 66-page document. This was essentially impossible to follow on the screen, so I printed it out in 6-pages-per-sheet format, at which size the tiny text was difficult but barely possible to read.

Now here’s a fun assignment. How many flaws can you find in this display? Here’s what I noticed (in no particular order):

– Sizing of graph is inconsistent with sizing of text: If the text is readable, the graph is too huge to process; if the graph is sized nicely (so that many pages can be fit on a single sheet), the text is tiny.

– The cyclical display of a piechart is inappropriate for these ordered variables.

– The ordering isn’t even done right! See the legend: “6-10 hours” should be between “less than 1 hour” and “over 10 hours.” The categories appear to have been sorted either alphabetically or in order of increasing frequency, neither of which makes sense here?

– At least one category (1-5 hours) is missing. Even if there are no responses to this one, it should be included.

– The labels 2, 3, 4 on the pie wedges have no connection to anything else on the page.

– The wedges should be labeled directly; no need for the reader to have to go back and forth between the legend and the picture. This will also allow you to get rid of the distracting colors. As Ed Tufte would say, if you want pretty colors, include a pretty picture with your report–but don’t clutter your graphs with that stuff!

– If you are going to label the wedges, put the labels right there; the steppy lines connecting the labels to the pie are unnecessary; they just add to the complexity of the presentation. (Perhaps this is a desired effect, to make the results look more professional, in some sense?)

– 100% is written, inappropriately, as “100.00%”.

– The text of the question (very top of the page) is completely separated from the responses (at the bottom).

– In fact, it’s easy to miss the question entirely. The main label of the graph is some sort of gobbledygook. This sort of identifying information would be better kept in small print in the lower-right corner of the display. (If you’re doing this, I’d also recommend a date and time identifier. Datasets get modified all the time.)

– With the distorted shape of the pie chart and the edge added to the lower border, the visible area of each wedge no longer corresponds to the numbers being displayed.

– The percentages are inappropriately displayed as 22.2%, 33.3%, 44.4%; they should be 22%, 33%, 44%. It’s just about never meaningful to look at fractions of percentages: we are almost never studying proportions that can be measured to that level of accuracy. Certainly not with a sample of size 9, but even if n=10,000 I would round all percentages to the nearest integer.

Somebody paid money for this!

15 thoughts on “This is what is done”

j on March 9, 2010 10:21 AM at 10:21 am said:

Stunningly-awful. Reminds me of Komar and Melamid's 'Most Wanted Paintings.' The survey section includes a series of images about people's views on art, like this one:
http://awp.diaart.org/km/fin/favcol2.html
Of course, there's are meant to be funny.
Andrew Gelman on March 9, 2010 11:01 AM at 11:01 am said:

Hey–if you think the above is awful, imagine looking at 63 pages of this stuff!

P.S. I love that Komar and Melamid book! I think my favorite image there is the pie chart of people's favorite colors. As I recall, blue is the #1 choice.
Wayne on March 9, 2010 11:03 AM at 11:03 am said:

All very good observations. I didn't see half of them, and feel like I need to get out Tufte and reread 3 times as my penance.

The pie wedge labels appear to be the number of observations, so they indirectly relate to the total of 9 observations, but you basically have to figure this out and add them up to confirm that's what's going on.
JF on March 9, 2010 11:17 AM at 11:17 am said:

Also,

Is there a need to tell us 9 of 9 is 100%?

Is there a need to tell us that all responses were included in the graph? It seems to me that this could be assumed and that special mention made only if not all responses are used. Or at least shorten the two lines into one: "All 9 responses were used."

Why are "hours per day" and "you" surrounded by asterisks?

The graph does not accommodate color blindness. As it is, the user is being asked to match colors which they might not be able to distinguish. Another reason why the parts should be labeled directly. (This applies for any color scheme, but I think the present color scheme is especially problematic — red/green and blue/green with similar brightness levels.)
zbicyclist on March 9, 2010 11:32 AM at 11:32 am said:

Using a 3-D pie chart for anything is the first mistake.
zbicyclist on March 9, 2010 11:39 AM at 11:39 am said:

Why 66 pages of graphs? Graphs create the illusion of work:

http://www.phdcomics.com/comics/archive.php?comic…
Oliver on March 9, 2010 12:22 PM at 12:22 pm said:

Great comic. I have literally used that approach multiple times when meeting with my advisor. I feel ashamed.
Andrew Gelman on March 9, 2010 1:38 PM at 1:38 pm said:

That's a scary thought–the idea that when meeting with me, students might be trying to manipulate me in some way. I almost always assume people are completely sincere, except in those few cases where it's obvious to me that they're not.
veblen on March 9, 2010 2:00 PM at 2:00 pm said:

I don't have a comment on this particular graph, but I was curious if you've heard of the free Web-based graphing software Tableau Public and, if you have, what you think of it.
Oliver on March 9, 2010 5:46 PM at 5:46 pm said:

I don't think it's usually insincere. I think it speaks to experience of many students where visible progress does not equal actual progress. It's easy to spend weeks in the lab or writing code with nothing to show for it. But you always need something to talk about at a meeting (by the definition of a meeting, not by some desire to manipulate). It's surprisingly easy to create a graph and have a productive meeting talking about it, even if it only represents 1% of what you're actually working on…
Willem on March 9, 2010 10:27 PM at 10:27 pm said:

"If you want to fail this class, just show me a piechart." – Ross Ihaka
Leo Martins on March 10, 2010 4:18 PM at 4:18 pm said:

On the other hand, I just saw this very nice google motion chart about the genomic coverage of vertebrates – how genomic information has been accumulating over the years. It took me a while to interpret the "video", but certainly attracted my attention…
Ulviyya on March 11, 2010 5:49 AM at 5:49 am said:

Veblen – I use Tableau at my work, it is a great tool for data visualization. It is quick, somewhat flexible and easy to use. It is not as versatile as SAS graphs, but those require programming and when you are short in time or just need exploratory graphs Tableau will do the trick. I alternate between those two. You should try it out, it is a good tool for an analyst to have.
veblen on March 11, 2010 8:03 AM at 8:03 am said:

Ulviyya, thanks.
Georgia Sam on March 12, 2010 11:05 AM at 11:05 am said:

Flaw number 1: It's a pie chart.

Comments are closed.