I'd love if someone else were to write my article, tentatively titled "Better than a boxplot," with the following abstract: "We demonstrate graphical options that dominate the boxplot. We hope that, once these alternatives are understood, boxplots are never used again." But I have a horrible feeling I'm going to have to write this article itself.
Better than a boxplot
Categories:
12 Comments
Leave a comment
Subscribe to Entry
For more info on our research:
Blogroll
Sister Blogs:
Statistics:
- Chance News
- The Endeavour
- Christian Robert
- Revolution Computing
- Yu-Sung Su's Blog
- The Numbers Guy
- Messy Matters
Visualization:
Cognitive and Behavioral Science:
- Decision Science News
- British Psychological Society Research Digest
- Seth Roberts [experimental psychology]
- Criteria's Employee Testing Blog
- The Hardest Science
Social and Political Science:
- Monthly Labor Review Precis
- Marginal Revolution [economics]
- Language Log
- Social Science Statistics
- The Baby Name Wizard
- Vox EU
Machine Learning:
Cultural:
Pages
Research supported by the National Science Foundation
National Institutes of Health
Yahoo Research
Search
Recent Comments
- DaveG: Hi, the hdr boxplots are in the R package hdrcde read more
- Jocelyn Paine: I've started analysing some drug trial data, and I'm interested read more
- Andy: Beanplots! http://www.jstatsoft.org/v28/c01/paper There's even a comparison in there with boxplots. read more
- Kaiser: Does your allergy include side-by-side boxplots to compare groups? And read more
- Ben Bolker: Doesn't it depend on the size of the data set read more
- dan: I feel like this paper has already been written: Jackson read more
- Jesse: Here is the first image that comes up when you read more
- Daniel Lakeland: Start by writing a single function in R that produces read more
- DaveG: You could try Pharmaceutical programming. You do have to send read more
- yolio: Maybe someone should write a plot function in R for read more
- Andrew Gelman: The other option would be to find a journal I've read more
- Bob O'H: Well, the abstract is a start. Now you just need read more
Recent Entries
- Involuntary exits and the incumbency advantage
- Do not control for post-treatment variables?
- Continuing puzzlement over "Why" questions
- Another long post on causal inference is coming!
- "I currently have no home internet service"
- 1989
- On a Class of Bias-Amplifying Covariates that Endanger Effect Estimates
- 6 cents a word
- In the Applied Statistics Blog this week
- The two blogs
- Slipperiness of the term "risk aversion"
- Computing power, n, and multilevel models
- Jewish Marriage Tied to Israel Trip
- Null and Vetoed: "Chance Coincidence"?
- Med School Interview Questions
- Constructing informative priors
- Can pseudo-R-squareds from logistic regressions be compared and used as a measure of fit?
- Your chance to help some people make money (maybe) and improve research (maybe)
- Reminder: my talks in London today and tomorrow
- Just to disillusion you about the reproducibility of textbook analyses
Categories
- Administrative (15)
- Art (37)
- Bayesian Statistics (259)
- Causal Inference (105)
- Decision Theory (157)
- Economics (231)
- Literature (146)
- Miscellaneous Science (124)
- Miscellaneous Statistics (450)
- Multilevel Modeling (179)
- Political Science (678)
- Public Health (148)
- Sociology (271)
- Sports (30)
- Statistical computing (131)
- Statistical graphics (162)
- Teaching (160)

Well, the abstract is a start. Now you just need the conference to send it to. Then you wait 6 months before you realise that the meeting is only a week away, so you'd better do something about it.
The other option would be to find a journal I've never published in before where I could submit it. Then I'd do it right away.
Maybe someone should write a plot function in R for jittered dot plots that is as easy to use as the boxplot function. This would influence me. The boxplot function is quite flexible, easy to use, and general. But usually when I make a jittered dotplot, i have to fiddle with it it a lot to get the dots small enough, to add the median lines, to make the column width wide enough to be able to see the dots...
You could try Pharmaceutical programming. You do have to send WORD tho.
I have to say I use Rob's Hdr boxplots where ever possible. Especially good if you may have bi-modality.
DaveG
Start by writing a single function in R that produces plots that dominate the boxplot, has good defaults, and also can deal with both large and small datasets in a good way.
The boxplot is pretty darn useful, but I can easily believe that it can be dominated. I would use your function. Once you've got the function, a 2 page or 3 page paper about why it is better would be pretty easy.
Here is the first image that comes up when you google it: http://www.mathworks.com/matlabcentral/fx_files/23661/3/distributionPlot.png
The one on the bottom left is pretty great.
I feel like this paper has already been written:
Jackson 2008, "Displaying Uncertainty With Shading" (American Statistician)
http://pubs.amstat.org/doi/abs/10.1198/000313008X370843
He describes density strips here as a replacement for more than just boxplots. Do you prefer something else?
Doesn't it depend on the size of the data set and the goal of the graphical display (inference vs data description, exploratory vs publication)?
For displaying data distributions, would it be reasonable to say
* small to moderate: dotplots, jittered dotplots (a special challenge are small, discrete data sets, where there are lots of repeated values)
* moderate: boxplots
* moderate to large: violin plots, density strips, beanplots , box-percentile plots [Hmisc::panel.bpplot] etc. etc. etc..
Or do you think the range in which boxplots are useful is squeezed out by the ranges of the "small to moderate" and "moderate to large" choices?
Does your allergy include side-by-side boxplots to compare groups?
And what caused the high pollen count this morning?
Beanplots!
http://www.jstatsoft.org/v28/c01/paper
There's even a comparison in there with boxplots.
I've started analysing some drug trial data, and I'm interested in this, because we have bi-modality. What are Rob's Hdr boxplots, and where could I find code and documentation?
Thanks, Jocelyn PaineHi,
the hdr boxplots are in the R package hdrcde you can get it at CRAN or see Rob Hyndman's page:
http://www.robjhyndman.com/software/hdrcde
I don't know of any implementations in SAS.
Bimodality is very common in biological data from populations. Another good idea which is relatively easy to implement is to use histograms with quantiles as the cut points. This is much better for multimodality than standard histogram methods.
See also the recent post from Andrew about histograms with more complicated ways to calculate the cut points.
http://www.stat.columbia.edu/~cook/movabletype/archives/2009/10/variations_on_t.html
enjoy
Dave