Graphics suggest new ideas

Posted on January 13, 2006 12:10 AM by Andrew

Seth wrote an article, “Three Things Statistics Textbooks Don’t Tell You.” I don’t completely agree with the title (see below) but I pretty much agree with the contents. As with other articles by Seth, there are lots of interesting pictures. My picky comments are below.

Using graphs to suggest new research ideas

Seth writes, “Tukey (1977) [the book “Exploratory Data Analysis”] had not made clear a major reason for graphing one’s data: A tiny fraction of one’s graphs will suggest new lines of research.” I don’t know if Tukey was clear on this in his 1977 book, but in an article from 1972 (cited in this paper) Tukey discusses “graphs intended to let us see what may be happening over and above what we have already described,” contrasting exploratory analysis with calculations of p-values, or confirmatory data analysis. So I think he was very aware of graphing as a way to inspire new ideas.

I certainly agree with Seth that good graphs can suggest new directions for research. This happens to me all the time.

Also, on page 15 Seth mentions scatterplot matrices. I’ve never found these to be useful in my work. I’ve had lots of success with plots of raw data (as in the examples of Seth’s article) and also I’ve found plots of parameter estimates to be extremely helpful. This is the tool I call the “secret weapon.”

One number per subject summaries

I also agree with Seth about the value of one-number-per-subject summaries. This is related to the idea from cluster sampling that you can usually just work with the cluster averages and not worry about within-cluster variation. In fact, I just suggested the one-number-per-subject idea the other day to an economist who was studying political attitudes: she was constructing some measure of internal consistency of individual attitudes, which she was measuring using some regression coefficient. I suggested that her results would be more clearly interpretable if she were to create a measure of consistency for each person and then look at these measures.

Of course, for complicated problems, summarizing each person by one number won’t work so well. Multilevel models will do better. Seth in his paper compares to F-tests but that’s not fair from my perspective since F-tests are super-crude.

I disacree with Seth’s comment on page 22: “The one-number-per-subject method of testing has no interesting statistical content. To a statistician, it is obvious.” It actually depends on some statistical features of the data, most notably that the design be balanced.

Transformations can increase sensitivity

Seth writes that this idea is “not in textbooks.” Please see Section 9.3 of Bayesian Data Analysis, 2nd edition. (It’s also in the first edition, in the last chapter.) But, yeah, I think a lot of people skip chapter 9 when they teach this book.

Summary

I like this article and I like its direct and exploratory presentation. It’s good to think about research tools and also about what it takes for these tools to be used.

P.S. to Seth: you might be amused by the most recent comment on this entry.

2 thoughts on “Graphics suggest new ideas”

dsquared on January 20, 2006 7:51 AM at 7:51 am said:

wholeheartedly agree about scatterplot matrices. I have never got on with trellis plots either although some people swear by them.
Antony Unwin on January 22, 2006 6:15 AM at 6:15 am said:

It is great to see someone emphasising the use of graphics for exploration as well as for presentation, but if you want to do EDA you need interactive graphics. It's a mystery to me why more people do not use them. Perhaps it's because you have to have fast and flexible software, perhaps it's because of the subjective component in exploratory work, perhaps it's because using interactive graphics is fun as well as productive and statisticians are serious people. I'd be grateful for explanations.

Comments are closed.