Results matching “R”

Exploratory and confirmatory data analysis

Seth points me to this discussion he wrote on Tukey's famous book, Exploratory Data Analysis. I use exploratory methods all the time and have thought a lot about Tukey and his book and so wanted to add a few comments.

In particular, I'd like to separate Seth's important points about statistical practice from his inflammatory rhetoric and pop sociology. (Disclaimer: I engage in rantin', ragin', and pop sociology all the time--but, when I do it, it's for a higher purpose and it's ok.)

I have several important points to make here, so I recommend you stop whatever else you're doing and read all of this.

1. As Seth discusses, so-called exploratory and confirmatory methods are not in opposition (as is commonly assumed) but rather go together. The history on this is that "confirmatory data analysis" refers to p-values, while "exploratory data analysis" is all about graphs, but both these approaches are ways of checking models. I discuss this point more fully in my articles, Exploratory data analysis for complex models and A Bayesian formulation of exploratory data analysis and goodness-of-fit testing. The latter paper is particularly relevant for the readers of this blog, I think, as it discusses why Bayesians should embrace graphical displays of data--which I interpret as visual posterior predictive checks--rather than, as is typical, treating exploratory data analysis as something to be done quickly before getting to the real work of modeling.

2. Let me expand upon this point. Here's how I see things usually going in a work of applied statistics:

Step 1: Exploratory data analysis. Some plots of raw data, possibly used to determine a transformation.

Step 2: The main analysis--maybe model-based, maybe non-parametric, whatever. It is typically focused, not exploratory.

Step 3: That's it.

I have a big problem with Step 3 (as maybe you could tell already). Sometimes you'll also see some conventional model checks such as chi-squared tests or qq plots, but rarely anything exploratory. Which is really too bad, considering that a good model can make exploratory data analysis much more effective and, conversely, I'll understand and trust a model a lot more after seeing it displayed graphically along with data.

3. Seth writes:

A more accurate title of Tukey's book would have been Low-Status Data Analysis. Graphs and transformations are low-status. They are low-status because graphs are common and transformations are easy. Anyone can make a graph or transform their data. I believe they were neglected for that reason. To show their high status, statistics professors focused their research and teaching on more difficult and esoteric stuff -- like complicated regression. That the new stuff wasn't terribly useful (compared to graphs and transformations) mattered little. Like all academics -- like everyone -- they cared enormously about showing high status. It was far more important to be impressive than to be useful.

This is, in my experience, ridiculous. Seth doesn't just say that some work that is useful is low status, or that some work that is low status is useful. He says that useful statistical research work is generally low status. No, no, no, no! It's hard to be useful! Just about everybody in statistics tries to do work that is useful.

OK, I know what Seth is talking about. I used to teach at Berkeley (as did Seth), and indeed the statistics department back then was chock-full of high-status professors (the department was generally considered #1 or #2 in the world) who did little if anything useful in applied statistics. But they were trying to be useful! They were just so clueless that they didn't know better. And there was also some socialization going on, where the handful of faculty members who really were doing useful work seemed to most highly value the non-useful work of the others. It's certainly true that they didn't appreciate graphical methods or the challenges of getting down and dirty with data. (They might have dismissed such work as being insufficiently general and enduring, but in my experience such applied work has been crucial in motivating the development of new methods.) And they were also dismissive of applied research areas such as survey research that are fascinating and important but did not happen to be "hot" at the time. This is consistent with Seth's hypothesis of status-seeking, but I'm inclined to give the more charitable interpretation that my Berkeley colleagues wanted to work on what they viewed as the most important and challenging problems.

I repeat: I completely disagree with Seth's claim that, in statistics, it is "low status" to develop useful methods. Developing useful methods is as central to statistics as saving souls is in church--well, I'm guessing on this latter point, but stay with me on this, OK?--it's just hard to do, so some people occupy themselves in other useful ways such as building beautiful cathedrals (or managing bazaars). But having people actually use your methods--that's what it's all about.

Seth is the psychologist, not me, so I won't argue his claim that "everyone cares enormously about showing high status." In statistics, though, I think he has his status attributions backward. (I also worry about the circularity of arguments about status, like similar this-can-explain-anything arguments based on self-interest or, for that matter, "the unconscious.")

4. Hey, I almost forgot Seth's claim, "Anyone can make a graph or transform their data." No, no, no. Anyone can run a regression or an Anova! Regression and Anova are easy. Graphics is hard. Maybe things will change with the software and new media--various online tools such as Gapminder make graphs that are far far better than the Excel standard, and, with the advent of blogging, hot graphs are popular on the internet. We've come a long way from the days in which graphs were in drab black-and-white, when you had to fight to get them into journals, and when newspaper graphics were either ugly or (in the case of USA Today) of the notoriously trivial, "What are We Snacking on Today?", style.

Even now, though, if you're doing research work, it's much easier to run a plausible regression or Anova than to make a clear and informative graph. I'm an expert on this one. I've published thousands of graphs but created tens of thousands more that didn't make the cut.

One problem, perhaps, is that statistics advice is typically given in terms of the one correct analysis that you should do in any particular setting. If you're in situation A, do a two-sample t-test. In situation B, it's Ancova; for C you should do differences-in-differences; for D the correct solution is weighted least squares, and so forth. If you're lucky, you'll get to make a few choices regarding selection of predictors or choice of link function, but that's about it. And a lot of practical advice on statistics actually emphasizes how little choice you're supposed to have--the idea that you should decide on your data analysis before gathering any data, that it's cheating to do otherwise.

One of the difficulties with graphs is that it clearly doesn't work that way. Default regressions and default Anovas look like real regressions and Anovas, and in many cases they actually are! Default graphics may sometimes do a solid job at conveying information that you already have (see, for example, the graphs of estimated effect sizes and odds ratios that are, I'm glad to say, becoming standard adjuncts to regression analyses published in medical and public health journals), but it usually takes a bit more thought to really learn from a graph. Even the superplot--a graph I envisioned in my head back in 2003 (!) back at the very start of our Red State, Blue State project, before doing any data analysis at all--even the superplot required a lot of tweaking to look just right.

Perhaps things will change. One of my research interests is to more closely tie graphics to modeling and to develop a default process for looking through lots of graphs in a useful way. Researchers were doing this back in the 1960s and 70s--methods for rotating point clouds on the computer, and all that--but I'm thinking of something slightly different, something more closely connected to fitted models. But right now, no, graphs are harder, not easier, than formal statistical analysis.

Seth also writes:

Most statistics professors and their textbooks have neglected all uses of graphs and transformations, not just their exploratory uses. I used to think exploratory data analysis (and exploratory science more generally) needed different tools than confirmatory data analysis and confirmatory science. Now I don't. A big simplification.

All I can say is: Things are changing! The most popular book on Bayesian statistics and the most popular book on multilevel modeling have a strong exploratory framework and strongly support the view that similar tools are used for exploratory and confirmatory data analysis. (Not exactly the same tools, of course: there are lots of technical issues specific to graphics, and other technical issues specific to probability calculations. But I agree with Seth, there's a lot of overlap.)

5. To return briefly to Tukey's extremely influential book: EDA was published in 1977 but I believe he began to work in that area in the 1960s, about ten or fifteen years after doing his also extremely influential work on multiple comparisons (that is, confirmatory data analysis). I've always assumed that Tukey was finding p-values to be too limited a tool for doing serious applied statistics--something like playing the piano with mittens. I'm sure Tukey was super-clever at using the methods he had to learn from data, but it must have come to him that he was getting the most from his graphical displays of p-values and the like, rather than from their Type 1 and Type 2 error probabilities that he'd previously focused so strongly on. From there it was perhaps natural to ditch the p-values and the models entirely--as I've written before, I think Tukey went a bit too far in this particular direction--and see what he could learn by plotting raw data. This turned out to be an extremely fruitful direction for researchers, and followers in the Tukey tradition--I'm thinking of statisticians such as Bill Cleveland, Howard Wainer, Andreas Buja, Diane Cook, Antony Unwin, etc.--are continuing to make progress here.

(I'll have to talk with my colleagues who knew Tukey to see how accurate the above paragraph is as a description of his actual progression of thoughts, rather than merely my rational reconstruction.)

The actual methods and case studies in the EDA book . . . well, that's another story. Hanging rootograms, stem-and-leaf plots, goofy plots of interactions, the January temperature in Yuma, Nevada--all of this is best forgotten or, at best, remembered as an inspiration for important later work. Tukey was a compelling writer, though--I'll give him that. I read Exploratory Data Analysis twenty-five years ago and was captivated. At some point I escaped its spell and asked myself why I should care about the temperature in Yuma--but, at the time, it all made perfect sense. Even more so once I realized that his methods are ultimately model-based and can be even more effective if understood in that way (a point that I became dimly aware of while completing my Ph.D. thesis in 1990--when I realized that the model I'd spent two years working on didn't actually fit my data--and which I first formalized at a conference talk in 1997 and published in 2003 and 2004. It's funny how slowly these ideas develop.).

P.S. It's funny that Seth characterizes Freakonomics as low-status economics. Freakonomics is great, but this particular "rogue economist" was tenured at the University of Chicago and had been given major awards by the economics profession. The problem here, I think, is Seth's tendency to characterize everyone and everything into good guys and bad guys. Levitt's a good guy and low-status work is good, therefore Levitt's work must be low-status. Levitt's "real innovation" (in Seth's words) was to do excellent, headline-worthy work, then to actually get the headlines, to do more excellent, newsworthy work, and to attract the attention of a dedicated journalist. (Now he has to decide where to go next.)

That said, the economics profession (and academia in general) faces some tough questions, such as: How much in the way of resources should go toward studying sumo wrestlers and how much should go toward studying international trade? I assume Seth would argue that sumo wrestling is low-status and deserves more resources, while the study of international trade is overvalued and should be studied less. This line of thinking can run into trouble, though. For example, consider various declining academic subjects such as anthropology, philosophy, and classics. I'm not quite sure how Seth (or others) define status, but it's my impression that--in addition to their difficulties with funding and enrollment--anthropology, philosophy, and classics are lower status than more applied subjects such as economics, architecture, and law. I guess what I'm saying is: Low status isn't always such a good thing! Some topics are low-status for a good reason, for example that they've been bypassed or discredited. Consider various forgotten fads such as astrology or deconstructionism.

(I have some sympathy for "let the market decide" arguments: we don't need a commisar to tell us how many university appointments should be made in art history and how many in statistics, for example. Still, even in market or quasi-market situation, someone has to decide. For example, suppose a particular economics department has the choice of hiring a specialist in sumo wrestling or in international trade. They still have to make a decision.)

P.P.S. You might wonder why I'm spilling so much ink (so to speak) responding to Seth. I'll give two reasons (in addition to that he's my friend, and one of the functions of this blog is to allow me to share thoughts that would otherwise be restricted to personal emails). First is the journalistic tradition of the hook to hang a story on. All these thoughts spilled out of me, but Seth's thoughts were what got me started in this particular instance. Second, I know that exploratory statistical ideas have been important in Seth's own applied research, and so his views on what makes these methods work are worth listening to, even if I disagree on many points.

In any case, I'll try at some point to figure out a way to package these and related ideas in a more coherent way and perhaps publish them in some journal that no one reads or put them deep inside a book that nobody ever reads past chapter 5. Blogging as the higher procrastination, indeed.

P.P.P.S. Sometimes people ask me how much time I spend on blogging, and I honestly answer that I don't really know, because I intersperse blogging with other work. This time, however, I know the answer because I wrote this on the train. It took 2 hours.

The chorus of ablationists

Dan Walter writes:

I am writing an article about the study of a medical procedure that was recently published in the Journal of the American Medical Association. The study concerns a procedure called catheter ablation for atrial fibrillation. It was bought and paid for by the device manufacturer. You can see my [Walter's] take on it here.

Doug Bates's lmer book

Here.

Philosophy and the practice of Bayesian statistics in the social sciences

I present my own perspective on the philosophy of Bayesian statistics, based on my experiences doing applied statistics in the social sciences and elsewhere. My motivation for this project is dissatisfaction with what I perceive as the standard view of the philosophical foundations of Bayesian statistics, a view in which Bayesian inference is inductive and scientific learning proceeds via the computation of the posterior probability of hypotheses. In contrast, I view Bayesian inference as deductive and as part of a larger Bayesian data-analytic process, different parts of which I believe can be usefully understood in light of the philosophical frameworks of Popper, Kuhn, and Lakatos. The practical implication of my philosophy is to push Bayesian data analysis toward a continual creative-destruction process of model building, inference, and model-checking rather than to aim for an overarching framework of scientific learning via posterior probabilities of hypotheses. This work is joint with Cosma Shalizi.

Paris Diderot Philmath Seminar
Lundi 15 février 2010 à 14h-15h30

Université Paris Diderot - Site Rive Gauche, Bâtiment Condorcet
4 rue Elsa Morante
75205 PARIS CEDEX 13
Salle Klee (454A)

Another Valentine's Day entry

Tom Ball points me to this news article about the data-based OK Cupid dating service. I like their analysis. Regular readers of this blog will recall this picture, for example:

If you knew for sure you would not get caught,
would you commit murder for any reason?


Scale
359,761 people have answered

More Baye$

As we tell our students all the time, Bayes is useful. Bayesian data analysis is all about solving problems, using mathematical tools to learn about the world.

Here's an example. Jim Rogers sends along this interesting opportunity.

Bayesian Modeling Statistician

MetrumRG is selectively seeking enthusiastic and energetic individuals to join a team of scientists in a unique working environment at the level of Research Scientist, Senior Scientist or Principal Scientist. As a member of the MetrumRG team, you will participate in the research, development, and application of quantitative data analysis methods in biomedical sciences through collaborations with both academia and the pharma/biotech industry. Job responsibilities include modeling and simulation to support drug development and clinical trials, consultation and technical guidance for clients, preparation of reports and presentations, and participation in MetrumRG training courses. You will also be expected to develop your own research interests, present and publish results, and expand your scientific knowledge and skills through professional development opportunities. Ideal candidates will have a doctoral degree in statistics, excellent written and verbal communication skills, statistical consulting experience, and proficiency with programming/statistical tools such as R or S-PLUS. Experience with Bayesian data analysis and C or Fortran programming is highly desirable.

The MetrumRG offices are located in Tariffville, CT, a vibrant New England town situated in the Farmington River Valley. MetrumRG also offers the flexibility and professional rewards that come from working in a small, focused business environment. MetrumRG offers a competitive salary and full benefits package.

Valentine's Day statistical love poems

Elissa Brown sends these in. They're actually pretty good, with a quite reasonable Ogden-Nash-style rhythm and a certain amount of statistical content. It's good to know that the kids today are learning useful skills in their graduate programs.

You are perfect; I'd make no substitutions You remind me of my favorite distributions With a shape and a scale that I find reliable You're as comforting as a two parameter Weibull When I ask you a question and hope you answer truly You speak as clearly as a draw from a Bernoulli Your love of adventure is most influential Just like the constant hazard of an exponential. With so many moments, all full of fun, You always integrate perfectly to one.

And here are a bunch more:

The Whiter Olympics

Matthew Yglesias links to Reihan Salam's article on the whiteness of the Winter Olympics. And they're not talkin bout the snow, either.

Things are actually worse than Yglesias and Salam realize. Did you know that Puerto Rico had a Winter Olympics team? One year it featured my cousin Bill, who finished last in the slalom. I'm pretty sure he wasn't born in Puerto Rico (despite what it says on one website), but I guess he's probably been there on vacation on occasion. And I wouldn't be surprised if he speaks Spanish--he does live in L.A., after all. And, of course, it takes some skill to finish last in the slalom. I'd probably fall off the chairlift and never even get to the starting line.

Someone writes:

Here. And here's the background:

David Johnstone writes:

Generational effects

James O'Brien writes:

Do you have any comment on what seems to be widespread interest in generational effects (Gen X, Y, millenials, etc). My sense is that these categories may not be particularly meaningful, because cutting the year of birth distribution at these points seems pretty arbitrary, and because there's likely more variance within the groups than between them.

I'd be interested to know what you think about this, because I occasionally read things that seem devoted to the idea that these sorts of differences are quite meaningful. I would be inclined to agree that attitudes toward social issues, etc likely shift over the life course, are there are likely many other causal forces at work here thru history, but the cohorts seem like a pretty coarse way to go after this.

My reply: There is definitely something to this, if only from basic numerical calculations (what happens to the so-called marriage market when millions of baby boomers hit their twenties, and so forth). And sometimes we see new sorts of generational effects. For instance, the huge difference in voting patterns between under-30s and everyone else in the 2006 and 2008 elections was unprecedented, at least in the modern age of political polling.

I imagine people have studied generational patterns and probably not found any sharp divisions, for example between people born before 1965 ("boomers") and those born after. The sharp dividing points must be matters of convenience more than anything else.

A coauthor and I just recently submitted a revision of our manuscript to a journal. If we'd known it was going to be so much work, we probably never would've written the paper in the first place. . . . It's a surprising amount of work between idea and execution (even forgetting about issues such as writing the letter in response to the referee reports). And, actually, this particular review process was very easy, as such things go. Still a lot of effort, though. It reminds me that being able to something once is a lot less than describing a method clearly and in appropriate generality.

Get off that goddam cell phone!

Mark Glaser writes an interesting but confusing article about a journalism class at NYU where students aren't allowed to blog or twitter about the class content:

After New York University journalism student Alana Taylor wrote her first embed report for MediaShift on September 5, it didn't take long for her scathing criticism of NYU to spread around the web and stir conversations. . . . By Taylor's account, [journalism professor Mary] Quigley had a one-on-one meeting with Taylor to discuss the article, and Quigley made it clear that Taylor was not to blog, Twitter or write about the class again.

Glaser then corresponds with Prof. Quigley, who emails:

I [Quigley] will confirm that I asked the class not to text, email or make cell phone calls during class. It's distracting to both me and other students, especially in a small class seated around a conference table. This has always been my policy, and I would hazard a guess that it's the policy of many professors no matter the discipline.

However, I did say after the class session they were free to text, Twitter, blog, email, post on Facebook or whatever outlet they wanted about the course, my teaching, the content, etc.

Seems clear enough: Keep your thumbs to yourself during the class period then write it all down later. Makes sense to me. But then Glaser reports:

When I [Glaser] followed up and asked her whether that meant students still needed to get permission before writing about class, she said: "Yes, I would certainly require a student to ask permission to use direct quotes from the class on a blog written after class."

Huh? Didn't she just say "they were free to text, Twitter, blog, email, . . . whatever they wanted about the course"? At this point, I wish Glaser had gone back to Quigley one more time for a clarification.

P.S. I looked up Mary Quigley on the web and found this list of articles by her students--judging from the quick summaries, apparently Quigley teaches a class on feature writing--and
this homepage, which to me was suprisingly brief, but I suppose that journalists have a tradition of not giving our their work for free.

P.P.S. Without knowing more details than what is in the links above, I'm 100% in support of Taylor, the student who was told not to blog. But I can definitely sympathize with Quigley: I can well imagine a student in one of my classes blogging something like this:

At the halfway point in the class, Quigley lets us go on a break. In the bathroom I run into an old classmate who asks me if I am going to stay in the class. I ask her if she doesn't like it and she responds that she is worried of it being too "all-over the place" or "disorganized" or "confusing."

Ouch!

P.P.P.S. I was amused that Taylor wrote that "I like to think that having a blog is as normal as having a car." Where exactly does she park?

After writing this, I scrolled down Ben Casnocha's blog and read a few more entries and came to this discussion of Nassim Taleb. Casnocha writes:

At the bottom of Taleb's homepage he posts his email address and invites readers to contact him. With some qualifications:
Concise messages are much preferable (say a maximum < 40 words) as I will not be able to read long letters. Please do not 1) send me your papers or other "interesting material" to read, 2) ask finance questions (not my specialty, 3) make me to rewrite sections of my books (I write books, not emails), 4) ask for a list of "other interesting books to read", 5) ask me to provide career or educational advice, 6) send me passages from Tolstoy or the Ecclesiast on luck and randomness, 7) send me the list of typos in my drafts. Note that I almost always reply (but ONLY to short messages), time permitting (but once) -even to nasty emails. Finally, note that, thanks to my new keyboard, I sometimes reply in Arabic, particularly to academics. [Also please please refrain from offering to "improve" my web site].

He opens his piece on walking by noting that thanks to the "exposure" of his books he came onto theories about fitness by two authors. I imagine this happend by a reader writing in and sharing "interesting material" of the sort he says he does not want. I have never emailed Taleb, but I [Casnocha] don't take his qualifications seriously. It is, in fact, a very naked way to signal busyness and importance.

I think there's something important that Casnocha (and his blog commenter) are not understanding here, and that is the interaction between the linear scaling of a person's time and the exponential scaling of fame.

Here's the deal. Taleb is one person. I'm sure he can answer emails faster than most of us--and he might even have a secretary to filter out the spam--but, still, he's responding to these on human scale. Similarly, he writes just like the rest of us (James Patterson and Doris Kearns Goodwin excepted), putting one word after the other. Even if he writes 10 times faster than a less practiced writer, he still has to do the work.

But . . . he's really famous. OK, not famous like Elvis or even Bob Dylan, but he could very well be receiving a zillion emails per day. Taleb doesn't need to signal busyness and important. He's certainly important, and if he tries to answer all his emails, he's gonna be busy also. Lots of famous people don't have emails at all

I do, however, think it's a bit silly for Taleb to ask people not to send him things to read. I like when people send me things to read. I can look at a couple paragraphs and decide if I want to read more. Sometimes people send interesting things. Also, I'd recommend that Taleb get rid of his "almost always reply" rule. I almost always reply to emails, but Taleb must receive many many more than I do.

P.S. Heinlein's solution.

Last week, Christian Robert and I separately reviewed Krzysztof Burdzy's book, The Search for Certainty, which I characterized as a harmless if misleading discussion of the philosophy of probability. Burdzy sent us his reply, which I will post below, followed by my comments. I am omitting some parts of Burdzy's comments that are specifc to Christian's review and not of general interest.

Blog style

I followed this link from Tyler Cowen to "Ben Casnocha on Chile" and found . . . a long blog entry that was exactly in the style of Tyler Cowen! I wonder if Cowen realized this when he linked to it. Probably not: just as we don't notice our own strong smells (or so I've been told), it's probably also hard for anyone to notice an imitation of one's own style. I do wonder whether Casnocha was imitating Cowen on purpose--not such a bad idea when blogging to imitate a master, just as short-story writers continue to imitate John Updike. Personally, I'm sick and tired of book and movie reviewers imitating Pauline Kael--I didn't even like her own writing and I don't enjoy seeing her stylistic ticks repeated by others--but, hey, that's their choice.

P.S. In case you're wondering, here are a few Cowenisms in Casnocha's blog:

Annie Lowrey speculates:

Based on Census Bureau data, five senators would represent Americans earning between $100,000 and $1 million individually per year, with [2/10 of a senator] working on behalf of the millionaires. Eight senators would represent Americans with no income. Sixteen would represent Americans who make less than $10,000 a year, an amount well below the federal poverty line for families. The bulk of the senators would work on behalf of the middle class, with 34 representing Americans making $30,000 to $80,000 per year. . . . Or how about if senators represented particular demographic groups, based on gender and race? White women would elect the biggest group of senators -- 37 of them, though only 38 women have ever served in the Senate.

I don't know how well all of this would work in practice--for one thing, I wouldn't want the senator who represents two-year-olds to be anywhere near the nuclear button--but I agree that ideas of fairness and political representation are subtle.

Along similar lines, here is my response to economists who complained that there were not enough economists in elective office:

Ma conférence à Ensae

Ici.

A matter of perspective?

An article in The Guardian says;


David Champion, director of automobile testing for Consumer Reports magazine, said the core problem of faulty Toyota accelerators had been linked to 19 deaths in a decade, amounting to two a year of the 40,000 people killed annually on American roads.

"I find it a little odd that we're going to have a Congressional hearing to look at those two deaths out of 40,000," said Champion.

Eric Bettinger, Bridget Terry Long, Philip Oreopoulos, and Lisa Sanbonmatsu write:

Growing concerns about low awareness and take-up rates for government support programs like college financial aid have spurred calls to simplify the application process and enhance visibility.

Here's the study:

H&R Block tax professionals helped low- to moderate-income families complete the FAFSA, the federal application for financial aid. Families were then given an estimate of their eligibility for government aid as well as information about local postsecondary options. A second randomly-chosen group of individuals received only personalized aid eligibility information but did not receive help completing the FAFSA.

And the results:

Comparing the outcomes of participants in the treatment groups to a control group . . . individuals who received assistance with the FAFSA and information about aid were substantially more likely to submit the aid application, enroll in college the following fall, and receive more financial aid. . . . However, only providing aid eligibility information without also giving assistance with the form had no significant effect on FAFSA submission rates.

The treatment raised the proportion of applicants in this group who attended college from 27% (or, as they quaintly put it, "26.8%") to 35%. Pretty impressive. Overall, it appears to be a clean study. And they estimate interactions (that is, varying treatment effects), which is always, always, always a good idea.

Here are my recommendations for improving the article (and this, I hope, increasing the influence of this study):

Update on the coffee experiment

It's working, so far.

This program introduces students to three modern, applied statistics research problems, and gives them a sense of how statisticians approach large, complex problems, with the aim of encouraging them to pursue advanced degrees in statistics.

The program takes place at the National Center for Atmospheric Research, Boulder, Colorado. According to the website, the summer 2011 program will be at Columbia.

I emailed David Runciman my discussion of his BBC broadcast (in which he wrote: "It is striking that the people who most dislike the whole idea of healthcare reform - the ones who think it is socialist, godless, a step on the road to a police state - are often the ones it seems designed to help" and "many of America's poorest citizens have a deep emotional attachment to a party that serves the interests of its richest").

Runciman responded with some comments which made me feel that I was being unfair in my original description of his statements as "the usual errors."

Below is my dialogue with Runciman and also my response to a related comment by Megan Pledger.

Runciman replied to my original blog, reasonably enough, as follows:

I [Runciman] don't think I say at any point (either in the radio program, or the article which is a shortened version of the script) that there is more opposition among the poor than among the rich, or among the young than among the old. I don't say that more people vote against their own interests than vote in their own interests - obviously not true. Maybe it reads like that's implied. But many also implies more than you would expect and I still believe that's true.

To which I replied:

A propensity for bias?

Teryn Mattox writes:

Matt Stephenson points me to this BBC article, "Why do people vote against their own interests?", that seems to me to be a bit misleading. This would seem to fall into the dog-bites-man category of "This is important. Someone is wrong on the internet"--but it is the fabled BBC, and it is written by a political scientist at fabled Cambridge University--so maybe it's going through some problems.

It is striking [says David Runciman, speaking on the BBC] that the people who most dislike the whole idea of healthcare reform - the ones who think it is socialist, godless, a step on the road to a police state - are often the ones it seems designed to help.

B-b-b-but . . . what about this?

mapsnyt.jpg

The people who dislike healthcare are primarily those over 65 (who already have free medical care in America) and people with above-average income. No, these are not really the ones the new bill is most designed to help.

To be fair, though, my maps are based on survey data from 2004. I haven't been able to grab more recent individual-level data to replicate our analysis with current public opinion. Still, my guess is that it is the older and richer who most strongly oppose changing the health-care system.

Next:

If people vote against their own interests, it is not because they do not understand what is in their interest or have not yet had it properly explained to them. They do it because they resent having their interests decided for them by politicians who think they know best. There is nothing voters hate more than having things explained to them as though they were idiots.

Hey, I didn't know that! Maybe it's true. I thought that in a relatively peaceful and prosperous country such as the United States, there's nothing voters hate more than an economic downturn.

Beyond this, there's little evidence that people vote based on their individual interest or even that they should vote based on their interest; rather, survey data and theory both suggest that people vote based on what they think is best for the country. (See here and here.) This is not to say that the psychological models of Drew Westen, which are touched upon in this article, are wrong or irrelevant, but merely to point out that "people voting against their interests" is not such a surprise or paradox.

And then there's this:

It was Oscar Wilde, was it not, who said he would sooner believe a falsehood told well than a truth told falsely? And George Orwell who wrote that good prose is like a windowpane, but sometimes it needs a bit of Windex and a clean rag to fully do its job.

Along those lines, Don Rubin has long ago convinced me of the importance of clean statistical notation. One example that's been important to me is model checking--residual plots, p-values, and all the rest. The key, to me, is the Tukeyesque idea of comparing observed data to what could've occurred if the model were true. The usual way this used to be done in statistics books was to talk about data y and a random variable Y. If the test statistic is T(y), then the p-value is Pr (T(Y)>T(y)) or, more generally, Pr (T(Y)>T(y) | theta). (I'm assuming continuous models here so as to avoid having to use the "greater than or equal" symbol.)

But this notation starts to break down once you start thinking about uncertainty in theta. If theta can be well estimated from data, then maybe you're ok with Pr (T(Y)>T(y) | theta.hat). But once we go beyond point estimation, we're in trouble, and the trouble is that y is said to be a "realization" of Y. Just as Clark Kent is a particular realization of Superman.

Ouch.

Here's the story (which Kaiser forwarded to me). The English medical journal The Lancet (according to its publisher, "the world's leading independent general medical journal") published an article in 1998 in support of the much-derided fringe theory that MMR vaccination causes autism. From the BBC report:

The Lancet said it now accepted claims made by the researchers were "false".

It comes after Dr Andrew Wakefield, the lead researcher in the 1998 paper, was ruled last week to have broken research rules by the General Medical Council. . . . Dr Wakefield was in the pay of solicitors who were acting for parents who believed their children had been harmed by MMR. . . .

[The Lancet is now] accepting the research was fundamentally flawed because of a lack of ethical approval and the way the children's illnesses were presented.

The statement added: "We fully retract this paper from the published record." Last week, the GMC ruled that Dr Wakefield had shown a "callous disregard" for children and acted "dishonestly" while he carried out his research. It will decide later whether to strike him off the medical register.

The regulator only looked at how he acted during the research, not whether the findings were right or wrong - although they have been widely discredited by medical experts across the world in the years since publication.

They also write:

The publication caused vaccination rates to plummet, resulting in a rise in measles.

An interesting question, no? What's the causal effect of a single published article?

P.S. I love it how they refer to the vaccine as a "three-in-one jab." So English! They would never call it a "jab" in America. So much more evocative than "shot," in my opinion.

Problems with Census data

Following this link from John Sides, I read this blog by Justin Wolfers on a problem with U.S. Census data discovered by Trent Alexander, Michael Davern and Betsey Stevenson:

The authors compare the official census count (based on the tallying up of all Census forms) with their own calculations, based on the sub-sample released for researchers (the "public use micro sample," available through IPUMS). If all is well, then the authors' estimates should be very close to 100% of the official population count. But they aren't:

blogSpan.jpg

Sort of multiple comparisons problem

Nick Allum writes:

Talkin bout Doris Kearns Goodwin blues

I heard a rumour that Doris Kearns Goodwin is still being interviewed on TV, and . . . yes, it's true!

My first thought was: What, they couldn't find an equally appealing talking head who wasn't also a plagiarist? I'm sure there are lots of well-spoken historians who'd love the chance to go on Johnny Carson or whatever it's called nowadays.

But then I looked around on her website, and now I'm not sure. Her books have received all sorts of praise as exemplary popular history, and that sounds like as good a qualification as any for explaining history on TV. Who cares if she's a plagiarist? She's not on the tube for her creative writing talent or, for that matter, for her ability to learn from the primary sources.

The other dimension is that plagiarism is a moral offense. At the very least, I think it might help if Goodwin's TV interviewers every once in a while brought up the piagiarism issue in some relevant way. For example, "Since we're on the topic of authenticity in political candidates, what do you think of the accusation that candidate X is ripping off the ideas of politician Y? As a plagiarist yourself, you must have some thoughts on this?" Or, "The relations between senators and their staff are complicated, no? You must have some insights into this, having delegated the writing of your book to research assistants who copied whole chunks from others' work. How many of 100 members of the U.S. Senate do you think actually read more of the health care bill than you've read of your own publications?"

Arlen Specter's running for reelection??

Really??? He's almost 80 years old! Yeah, I know, U.S. senator is a pretty cushy job, not much heavy lifting involved, but still . . .

P.S. If I'm still blogging when I'm 80, please don't throw this one back at me.

Tyler Cowen quotes Barbara Demick as writing, "North Koreans have multiple words for prison in much the same way that the Inuit do for snow." So do we, no? But in our case, they seem to come from 1930s B-movies

I wonder if there are almost as many words for prison in Russia, Turkmenistan, and the other leaders on the list. Apparently North Korea is off the charts, so perhaps they have ten times as many words for prison/jail as we do.

P.S. America includes a bunch of Inuits, so I guess we have multiple words for snow also!

Stop me before I rant again

David Shor writes:

I just read an idea for a pollster that crowd-sources statistical work, and was curious what you thought about the idea.

Here's the idea:

Today, there is a new polling method available: IVR, or 'Interactive Voice Response' polling. Basically, the pollster records several questions, a computer auto-dials hundreds of landlines, and with the people who are willing to participate in the survey, they go through the script automatically.

Even though the old media pollsters and traditional polling organisations like AAPOR are busy discrediting those polls that they condescendingly call 'robopolls', there is not much evidence that they do any worse than live-interviewer polls- but they are much, much cheaper. . . .

Now, the next step to make polls even easier to access for everyone is there- with the mid-January start-up of the IVR pollster Precision Polling.

From Precision Polling's website:

Automated Phone Surveys are phone calls where a recorded voice asks you questions and you type in responses on your keypad (e.g. "Who will get your vote for mayor? Press 1 for Joe..."). This provides a fast and affordable way to get answers from real people.

What do I think? I think it's evil. These robopolls "fast and affordable" for the pollster but not for the person being hassled by the phone call. I think these machine phone calls should be illegal--yes, I would eagerly support a law making it illegal to call someone if there's no human making the call (fax and data transmission excepted, of course). This would have the side benefit of making all those pre-election endorsement auto-calls illegal, as well as various obnoxious calls used by collection agencies.

It's simply an abuse of the phone system, just as it would be an abuse of the electrical system to sneak into your neighbor's house one night, plug in a really long extension cord, and run it out their window to your house to power your appliances.

"Fast and affordable," indeed! Fast, affordable, and abusive is more like it.

P.S. I feel bad even giving these dudes publicity, but I figure, once it's on Daily Kos it's already been read by a million people, so I hope the good I'm doing by disparaging this idea outweighs the harm I'm doing by publicizing it.

P.P.S. I'm not saying the Daily Kos diarist ("twohundertseventy") is evil, or even that that the people at Precision Polling are bad guys. I just don't know if they've thought through the ethical implications of their suggestion, which amounts to bombarding millions of people with irritating calls at dinnertime. Or perhaps they have a retort to my ethical argument, something like: Lots of people enjoy answering polls, or Polls are essential to democracy. OK, if they're so damn essential, try paying people to participate in your poll. You're making money off of them, why not give something back to the people you're hassling? Grrrr.

P.P.P.S. I agree with commenter Tom that robocalls should be legal if the person being called agrees to it ahead of time.

Hal Daume pointed me to this plan of some marathon-running dude named Matt has to quit drinking caffeine. Here's Matt's motivation:

I [Matt] try hard to stay away from acid-forming foods and to eat by the principles of Thrive, where energy comes not from stimulation but from nourishment. I want to maximize the energy I have available to create an exciting life, and coffee, in the long-term, only robs me of this energy.

I've tried hard to quit coffee in the past--I even went a month without coffee a while back. But I keep coming back to it. I come back to it because I have this idea that it helps me think better. I enjoy reading books and doing math more when I drink coffee, and I think I come up with better ideas when I'm caffeinated. But I know that's not true. The type of thinking coffee helps me with is a very linear kind, a proficiency at checking items off a list or even of recombining old ideas in a new way. This isn't real creativity. Real creativity is nonlinear, the creation of truly new ideas that haven't yet been conceived, not simply the reordering of old ones.

What's cool about Matt's project is that he's randomizing: some days he'll drink caffeinated, some days regular coffee, and other days a mix. (To be precise, his wife is doing the randomizing, and she gets to choose the mix.) Each week, he alters the proportions to have more and more decaf--that way he can transition to fully-decaffeinated coffee, but in a way that is slightly unpredictable, so that he's never quite sure what he's getting in any day.

Also, of course, he's making all this public, which I guess will make it tougher for him to break his self-imposed rules.

This is an interesting example in which randomization is used for something other than the typical statistical reason of allowing unbiased comparisons of treatment groups.

I was also amused by his method of having his wife randomize. I remember thinking about this when Seth was telling me about one of his self-experiments, where I worried that expectation effects could be large--Seth knows what he's doing to himself (in this case, I believe it was some choice of which oil he was drinking every day) and I was thinking that this could have a huge effect on his self-measurements. I spent awhile trying to think of a way that Seth could randomize his treatment, but it wasn't easy--Seth was living alone at the time, and there wasn't anyone who could conveniently do it for him--and for reasons having to do with the effects that Seth was expecting to see, a simple randomization wouldn't work. (Seth was expecting results to last over several days, so a randomization by day wouldn't do the trick. But randomizing weeks wouldn't do either, because then you're losing independence of the daily measurements, if Seth guesses (or thinks he can guess) the new treatment on the day of the switch.) It would've been so so easy to do it using a friend, but not at all easy to do alone.

A prediction

What it takes

From a recent email exchange with a collaborator on a paper that a bunch of us are working on:

Yes, it's definitely a methodology paper. But, given that we don't have any theorems or simulation studies, the motivation for the methodology has to come from the application, no?

A few days ago, I suggested that we could invert the usual forecast-the-election-from-the-economy rule and instead use historical election returns to make inferences about past economic trends.

Bob Erikson is skeptical. He writes:

It is an interesting idea but I don't think the economics-vote connection is strong enough to make it work. At best econoims explains no more than "half the variance" and often less. Like I [Bob] am on record as saying the economy has little to do with midterm elections (AJPS 1990) unlike prez elections.

Damn. It's such a cute idea, though, I still want to give it a try.

Some thoughts on final exams

I just finished grading my final exams--see here for the problems and the solutions--and it got me thinking about a few things.

#1 is that I really really really should be writing the exams before the course begins. Here's the plan (as it should be):
- Write the exam
- Write a practice exam
- Give the students the practice exam on day 1, so they know what they're expected to be able to do, once the semester is over.
- If necessary, write two practice exams so that you have more flexibility in what should be on the final.

The students didn't do so well on my exam, and I totally blame myself, that they didn't have a sense of what to expect. I'd given them weekly homework, but these were a bit different than the exam questions.

My other thought on exams is that I like to follow the principles of psychometrics and have many short questions testing different concepts, rather than a few long, multipart essay questions. When a question has several parts, the scores on these parts will be positively correlated, thus increasing the variance of the total.

More generally, I think there's a tradeoff in effort. Multi-part essay questions are easier to write but harder to grade. We tend to find ourselves in a hurry when it's time to write an exam, but we end up increasing our total workload by writing these essay questions. Better, I think, to put in the effort early to write short-answer questions that are easier to grade and, I believe, provide a better evaluation of what the students can do. (Not that I've evaluated that last claim; it's my impression based on personal experience and my casual reading of the education research literature. I hope to do more systematic work in this area in the future.)

I just graded the final exams for my first-semester graduate statistics course that I taught in the economics department at Sciences Po.

I posted the exam itself here last week; you might want to take a look at it and try some of it yourself before coming back here for the solutions.

And see here for my thoughts about this particular exam, this course, and final exams in general.

Now on to the exam solutions, which I will intersperse with the exam questions themselves:

Kevin Spacey famously said that the greatest trick the Devil ever pulled was convincing the world he didn't exist. When it comes The Search for Certainty, a new book on the philosophy of statistics by mathematician Krzysztof Burdzy, the greatest trick involved was getting a copy into the hands of Christian Robert, who trashed it on his blog and then passed it on to me.

The flavor of the book is given from this quotation from the back cover: "Similarly, the 'Bayesian statistics' shares nothing in common with the 'subjective philosophy of probability." We actually go on and on in our book about how Bayesian data analysis does not rely on subjective probability, but . . . "nothing in common," huh? That's pretty strong.

Rather than attempt to address the book's arguments in general, I will simply do two things. First, I will do a "Washington read" (as Yair calls it) and see what Burdzy says about my own writings. Second, I will address the question of whether Burdzy's arguments will have any effect on statistical practice. If the answer to the latter question is no, we can safely leave the book under review to the mathematicians and philosophers, secure in the belief that it will do little mischief.

This is pretty funny. And, to think that I used to work there. This guy definitely needs a P.R. consultant. I've seen dozens of these NYT mini-interviews, and I don't think I've ever seen someone come off so badly. The high point for me was his answering a question about pay cuts by saying that he's from Philadelphia. I don't know how much of this is sheer incompetence and how much is coming from the interviewer (Deborah Solomon) trying to string him up. Usually she seems pretty gentle to her interview subjects. My guess is what happened is her easygoing questions lulled Yudof into a false sense of security, he got too relaxed, and he started saying stupid things. Solomon must have been amazed by what was coming out of his mouth.

P.S. The bit about the salary was entertaining too. I wonder if he has some sort of deal like sports coaches do, so that even if they fire him, they have to pay out X years on his contract.

Question on propensity score matching

Ban Chuan Cheah writes:

I'm trying to learn propensity score matching and used your text as a guide (pg 208-209). After creating the propensity scores, the data is matched and after achieving covariate balance the treatment effect is estimated by running a regression on the treatment variable and some other covariates. The standard error of the treatment effect is also reported - in the book it is 10.2 (1.6).

We all know, following the research of Rosenstone, Hibbs, Erikson, and others, that that economic conditions can predict vote swings at state and national levels.

But, what about the reverse? Could we deduce historical economic conditions from election returns? Instead of forecasting elections from the economy, we could hindcast the economy from elections.

Would this make sense as a way of studying local and regional economic conditions in the U.S. in the 1800s, for example? I could imagine that election data are a lot easier to come by than economic data.

P.S. Don't forget that there have been big changes over time in our impressions of the ability of presidents to intervene successfully in the economy.

Patterson update

I went to the library and took a look at a book by James Patterson. It was pretty much the literary equivalent of a TV cop show. I couldn't really see myself reading it all the way through, but it was better-written than I'd expected. It's hard for me to see why Patterson wants to keep doing it (even if his coauthors are doing most of the work at this point). But I suppose that, once you're on the bestseller list, it's a bit addictive and you want to stay up there.

Today I faced some tedious work on a project that must be finished by the end of the week, so my procrastination methods reached new heights of creativity. For the first time, I clicked on the "Most Popular" tab at the top of the NY Times website. This gives me another opportunity for procrastination, by typing this blog post, because I noticed something surprising: There's not much overlap between the 10 "most e-mailed" and the 10 "most blogged" recent stories. Only 3 stories are on both "top 10" lists...which is to say, 7 of the most e-mailed stories are not among those that drew the attention of the most bloggers, and 7 of the most-blogged stories didn't make the cut for most emailers. I don't know if this is typical -- maybe this is an unusual week -- but I find it surprising. If a story seems like the kind of thing that would interest your friends, wouldn't it also be a good one to blog about? Does the difference simply reflect demographics? Perhaps bloggers are younger, and are interested in different stories than non-bloggers?

It's not 1933, it's 1930

A major storyline of the 2008 election was that it was the Great Depression all over again: George W. Bush was the hapless Herbert Hoover and Barack Obama was the FDR figure, coming in on a wave of popular resentment to clean things up. The stock market crash made the parallels pretty direct. One could continue the analogy, with Bill Clinton playing the Calvin Coolidge role, mindlessly stoking the paper economy and complicit in the rise of the stock market as a national sport. Public fascination with various richies seemed very 1920s-ish, and we had lots of candidates for the "Andrew Mellon" of the 2000s. Obama's decisive victory echoed Roosevelt's in 1932.

But history doesn't really repeat itself--or if it does, it's not always quite the repetition that was expected. With his latest plan of a spending freeze (on the 17% of the federal budget that is not committed to the military, veterans, homeland security and international affairs, Social Security, or Medicare), Obama is being labeled by many liberals as the second coming of Herbert Hoover--another well-meaning technocrat who can't put together a political coalition to do anything to stop the slide. Conservatives, too, may have switched from thinking of Obama as a scary realigning Roosevelt to viewing him as a Hoover from their own perspective--as a well-meaning fellow who took a stock market crash and made it worse through a series of ill-timed government interventions.

I can see the future debates already: was Obama a Hoover who dithered while the economy burned, too little and too late (the Krugman version) or a Hoover who hindered the ability of the economy to recover on his own by pushing every button he could find on the national console (the Chicago-school version)?

In either storyline, it's 1930, not 1932: rather than being three years into a depression, we're still just getting started and we're still in the Hoover-era position of seeing things fall apart but not quite being ready to take the next step.

Anyway, I'm not claiming to offer any serious political or economic analysis here, just pointing out that the 1932 election was a full three years after the 1929 stock market crash, so Obama's stepping into the story at a different point than when Roosevelt stepped in to his.

Or maybe we're still on track for Obama to "do a Reagan,' ride out the recession in the off-year election and sit tight as the economy returns in years 3 and 4.

Tufte recommendation

A former student writes:

I'm going to get a Tufte book. Do you recommend "The Visual Display of Quantitative Information" or "Envisioning Information?"

My reply: My favorite is his second book, Envisioning Information. His first book was his breakthrough but the second book is the one that I learned the most from, myself.

P.S. I don't know if this counts as a 3-star thread.

What can search predict?

You've all heard about how you can predict all sorts of things, from movie grosses to flu trends, using search results. I earlier blogged about the research of Yahoo's Sharad Goel, Jake Hofman, Sebastien Lahaie, David Pennock, and Duncan Watts in this area. Since then, they've written a research article.

Here's a picture:

sharadsearch.png

And here's their story:

We [Goel et al.] investigate the degree to which search behavior predicts the commercial success of cultural products, namely movies, video games, and songs. In contrast with previous work that has focused on realtime reporting of current trends, we emphasize that here our objective is to predict future activity, typically days to weeks in advance. Specifically, we use query volume to forecast opening weekend box-office revenue for feature films, first month sales of video games, and the rank of songs on the Billboard Hot 100 chart. In all cases that we consider, we find that search counts are indicative of future outcomes, but when compared with baseline models trained on publicly available data, the performance boost associated with search counts is generally modest--a pattern that, as we show, also applies to previous work on tracking flu trends.

The punchline:

We [Goel et al.] conclude that in the absence of other data sources, or where small improvements in predictive performance are material, search queries may provide a useful guide to the near future.

I like how they put this. My first reaction upon seeing the paper (having flipped through the graphs and not read the abstract in detail) was that it was somewhat of a debunking exercise: Search volume has been hyped as the greatest thing since sliced bread, but really it's no big whoop, it adds almost no information beyond a simple forecast. But then my thought was that, no, this is a big whoop, because, in an automatic computing environment, it could be a lot easier to gather/analyze search volume than to build those baseline models.

Sharad's paper is cool. My only suggestion is that, in addition to fitting the separate models and comparing, they do the comparison on a case-by-case basis. That is, what percentage of the individual cases are predicted better by model 1, model 2, or model 3, and what is the distribution of the difference in performance. I think they're losing something by only doing the comparisons in aggregate.

It also might be good if they could set up some sort of dynamic tracker that could perform the analysis in this paper automatically, for thousands of outcomes. Then in a year or so they'd have tons and tons of data. That would take this from an interesting project to something really cool.

Alex Lundry sent along this presentation.. As some of you know, I hate videos, so I didn't actually look at this, but it seems to combine two of my main interests, so I thought it might interest some of you too. If you like it (or you don't), feel free to say so in the comments.

The man with the golden gut

Seth links to this fascinating article by Jonathan Mahler about the popular novelist James Patterson:

Last year, an estimated 14 million copies of his books in 38 different languages found their way onto beach blankets, airplanes and nightstands around the world. Patterson may lack the name recognition of a Stephen King, a John Grisham or a Dan Brown, but he outsells them all. Really, it's not even close. (According to Nielsen BookScan, Grisham's, King's and Brown's combined U.S. sales in recent years still don't match Patterson's.) This is partly because Patterson is so prolific: with the help of his stable of co-authors, he published nine original hardcover books in 2009 and will publish at least nine more in 2010.

Patterson has written in just about every genre -- science fiction, fantasy, romance, "women's weepies," graphic novels, Christmas-themed books. He dabbles in nonfiction as well. In 2008, he published "Against Medical Advice," a book written from the perspective of the son of a friend who suffers from Tourette's syndrome.

More than Grisham, King, and Brown combined: that really is pretty impressive. The sixty-somthing Patterson has written 35 New York Times #1 best sellers but doesn't seem to have too much of a swelled head:

A new kind of spam

As a way of avoiding work, I check the comments on this blog and decide which to approve and which to send to the spam folder. (Lots of stuff gets sent directly to spam; these are almost 100% classified correctly and I basically never need to check there.)

There are different kinds of spam, but I can typically spot it by being close to content-free and with a link to a site that is selling something. I don't mind if you're a statistical consultant and you link to your consulting site, but, no, if you submit a comment with a link to some discount DVD site or whatever, yes, you're going straight to the spam fliter.

Today, though, I got a new kinds of spam: it looked just like the usual stuff but there was no URL, either in the mssage or in the regular URL field. I can't figure out why somebody would bother to do this.

More on the estimation of war deaths

Following up on our recent discussion (see also here) about estimates of war deaths, Megan Price pointed me to this report, where she, Anita Gohdes, Megan Price, and Patrick Ball write:

Several media organizations including Reuters, Foreign Policy and New Scientist covered the January 21 release of the 2009 Human Security Report (HSR) entitled, "The Shrinking Cost of War." The main thesis of the HRS authors, Andrew Mack et al, is that "nationwide mortality rates actually fall during most wars" and that "today's wars rarely kill enough people to reverse the decline in peacetime mortality that has been underway in the developing world for more than 30 years." . . . We are deeply skeptical of the methods and data that the authors use to conclude that conflict-related deaths are decreasing. We are equally concerned about the implications of the authors' conclusions and recommendations with respect to the current academic discussion on how to count deaths in conflict situations. . . .

The central evidence that the authors provide for "The Shrinking Cost of War" is delivered as a series of graphs. There are two problems with the authors' reasoning.

From blogging legend Phil Nugent:

capt.macs10404302112.fells_acres_macs104.jpg

If Scott Brown wins, I [Nugent] suspect that it will have less to do with a massive swing to the right in the bosom of liberalism than with a tendency there to vote against the repulsive and inept candidate in favor of the one who seems Kennedyesque, no matter whether he belongs to the Kennedys' party or not. On the other hand, if the election comes down to a squeaker that finds Coakley victorious, it'll probably be because the last minute media explosion, complete with the sight of all those gleeful Republicans turning cartwheels in the end zone, alerted voters to the strategic importance of holding their noses and voting for the monster over the centerfold.

Andrew Sullivan links to this amusing study [link fixed]. The whole blog is lots of fun--I've linked to it before--and it illustrates an important point in statistics, which I've given as the title of this blog entry.

P.S. I'm not trying to say that statistical methodology is a waste of time. Good methods--and I include good graphical methods in this category--allow us to make use of more data. If all you can do is pie charts and chi-squared tests (for example), you won't be able to do much.

Alan Turing is said to have invented a game that combines chess and middle-distance running. It goes like this: You make your move, then you run around the house, and the other player has to make his or her move before you return to your seat. I've never played the game but it sounds like fun. I've always thought, though, that the chess part has got to be much more important than the running part: the difference in time between a sprint and a slow jog is small enough that I'd think it would always make sense just to do the jog and save one's energy for the chess game.

But when I was speaking last week at the University of London, Turing's chess/running game came up somehow in conversation, and somebody made a point which I'd never thought of before, that I think completely destroys the game. I'd always assumed that it makes sense to run as fast as possible, but what if you want the time to think about a move? Then you can just run halfway around the house and sit for as long as you want.

It goes like this. You're in a tough spot and want some time to think. So you make a move where the opponent's move is pretty much obvious, then you go outside and sit on the stoop for an hour or two to ponder. Your opponent makes the obvious move and then has to sit and wait for you to come back in. Sure, he or she can plan ahead, but with less effectiveness than you because of not knowing what you're going to do when you come back in.

So . . . I don't know if anyone has actually played Turing's running chess game, but I think it would need another rule or two to really work.

Quick summary of statistics for media

This looks interesting; too bad I'm not around to hear it:

Book titles

My collaborators and I have had some successes and some failures; here are some stories, with the benefit of (varying degrees of) hindsight.

"Bayesian Data Analysis." We thought a lot about this one. It was my idea to use the phrase "data analysis": the idea was that "inference" is too narrow (being only one of the three data analysis steps of model-building, inference, and model checking) and "statistics" is too broad (seeing as it also includes design and decision making as well as data analysis). I hadn't thought of the way that BDA sounds like EDA but that came out well, even though the first edition of BDA was pretty weak on the EDA stuff--we fit more of that into the second edition (in chapter 6 and even in the cover). Beyond this, I was never satisfied with "Bayes" in the title--it seemed, and still seems, too jargony and not descriptive enough for me. I'd prefer something like "Data Analysis Using Probability Models" or even "Data Analysis Using Generative Models" (to use a current buzzword that, yes, may be jargon but is also descriptive). But we eventually decided (correctly, I think) that we had to go with Bayes because it's such a powerful brand name. Every once in awhile I see the phrase "Bayesian data analysis" used generically, not in reference to our book, and when this happens it always makes me happy; I think the statistical world is richer to have this phrase rather than the formerly-standard "Bayesian inference" (which, as noted above, misses some big issues).

"Teaching Statistics: A Bag of Tricks." Should've been called "Learning Statistics: A Bag of Tricks." Only a few people want to teach statistics; lots of people want to learn it. And, ultimately, a book of teaching methods is really a book of learning methods. Also, many people have told me that they've bought the book and read it. I actually think it's had more effect from people reading it than from people using it in their classes. Sort of like one of those golf books that people put by their bedside and read even if they don't get around to practicing and following all the instructions.

"Applied Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives." The title seems fine, but something went wrong in the promotion of this book. Xiao-Li and I collected some excellent articles and put a huge amount of effort into editing them. I think the book is great but it hasn't sold a lot. Perhaps we should've structured it slightly differently so it could've been used as a course book? And of course we shouldn't have published with Wiley, who are notorious for pricing their books too high. (I notice they now charge $132 (!) for Feller's famous book on probability theory.) Why did we go with Wiley? At the time, Xiao-Li and I thought it would be difficult to find a publisher so we didn't really try shopping it around. In retrospect, we didn't fully realize how great our book was; we were satisfied just to get it out there without thinking clearly about what would happen next.

"Data Analysis Using Regression and Multilevel/Hierarchical Models." The awkward "Multilevel/Hierarchical" thing is Phil's fault: I wanted to go with "multilevel" (because I felt, and still feel, that "hierarchical" can be seen as implying nested models, and it was very important for me in this book to go beyond the simple identification of multilevel models with simple hierarchical designs and data structures), but Phil pointed out that "hierarchical" is a much more standard word than "multilevel" (for example, "hierarchical model" gets four times as many Google hits as "multilevel model"). So I did the awkward think and kept both words. (And Jennifer was fine with this too.) Also we needed to put Regression in there because a multilevel model is really just regression with a discrete predictor. And Data Analysis for the reasons described above. The book has sold well so the title doesn't seem to have hurt it any.

"Red State, Blue State, Rich State, Poor State: Why Americans Vote the Way They Do." I think this was a mistake. First, as some people have pointed out and as we realized even at the time, we don't actually say why Americans vote the way they do. I really wish we had chosen our other candidate subtitle, "How Americans are Polarized and How They're Not." Beyond this, I'm actually down on the whole Red State, Blue State thing. Sure, it's grabby, but I fear it makes the book seem less serious. Given that we didn't become the next Freakonomics and we didn't sell a zillion copies, if I could go back in time I'd give it a more serious title, such as, hmmm..., "Geographic and Demographic Polarization in American Poliitcs"--no, that's too serious-sounding. Maybe "Democrats and Republicans: Who They Are, Where They Live, and Where They Stand on the Issues." Or "American Voters, Red and Blue: Who They Are, Where They Live, and Where They Stand on the Issues." Something that is a bit grabby but conveys more of our research content. (Many people were misled by our title into thinking the book was merely a retread of our Red State, Blue State article, but really it was full of original research that, to this date, has still only appeared in the book.)

"A Quantitative Tour of the Social Sciences." I can't imagine a better title for this one. And I love the book, too. In addition to having wonderful content, it has a great cover that was contributed by a blog commenter (who I still have to send a free book to; sorry!). We've gotta do a better job of promoting it, but I'm not quite sure how. Here's a nice review.

I have a few more books in (various stages of) the pipeline, but I'll hold off telling you their titles until they're closer to done.

I remember many years ago being told that political ideologies fall not along a line but on a circle: if you go far enough to the extremes, left-wing communists and right-wing fascists end up looking pretty similar.

I was reminded of this idea when reading Christian Robert and George Casella's fun new book, "Introducing Monte Carlo Methods with R."

I do most of my work in statistical methodology and applied statistics, but sometimes I back up my methodology with theory or I have to develop computational tools for my applications. I tend to think of this sort of ordering:

Probability theory - Theoretical statistics - Statistical methodology - Applications - Computation

Seeing this book, in which two mathematical theorists write all about computation, makes me want to loop this line in a circle. I knew this already--my own single true published theorem is about computation, after all--but I tend to forget. In some way, I think that computation--more generally, numerical analysis--has taken some of the place in academic statistics that was formerly occupied by theorem-proving. I think it's great that many of our more mathematical-minded probabilists and statisticians can follow their theoretical physicist colleagues and work on computational methods. I suspect that applied researchers such as myself will get much more use out of theory as applied to computation, as compared to traditionally more prestigious work on asymptotic inference, uniform convergence, mapping the rejection regions of hypothesis tests, M-estimation, three-armed bandits, and the like.

Don't get me wrong--I'm not saying that computation is the only useful domain for statistical theory, or anything close to that. There are lots of new models to be built and lots of limits to be understood. Just, for example, consider the challenges of using sample data to estimate properties of a network. Lots of good stuff to do all around.

Anyway, back to the book by Robert and Casella. It's a fun book, partly because they resist the impulse to explain everything or to try to be comprehensive. As a result, reading the book requires the continual solution of little puzzles (as befits a book that introduces its chapters with quotations from detective novels). I'm not sure if this was intended, but it makes it a much more participatory experience, and I think for that reason it would also be an excellent book for a course on statistical computing.

Charles Warne writes:

A colleague of mine is running logistic regression models and wants to know if there's any sort of a test that can be used to assess whether a coefficient of a key predictor in one model is significantly different to that same predictor's coefficient in another model that adjusts for two other variables (which are significantly related to the outcome). Essentially she's wanting to statistically test for confounding, and while my initial advice was that a single statistical test isn't really appropriate since confounding is something that we make an educated judgement about given a range of factors, she is still keen to see if this can be done. I read your 2006 article with Hal Stern "The difference between 'significant' and 'not significant' is not itself statistically significant" which included the example (p. 328) where evidence for a difference between the results of two independent studies was assessed by summing the squares of the standard errors of each and taking the square root to give the standard error of the difference (se=14). My question is whether this approach can be applied to my colleague's situation, given that both logistic regression models are based on the same sample of individuals and therefore are not independent? Is there an adjustment that can be used to produce more accurate standard errors for non-independent samples or should i not be applying this approach at all? Is there a better way this problem could be tackled?

My reply: No, you wouldn't want to take the two estimates and treat them as if they were independent. My real question, though, is why your colleague wants to do this in the first place. It's not at all clear what question such an analysis would be answering.

P.S. Warne adds:

My final exam

I'm not particularly proud of this one, but I thought it might interest some of you in any case. It's the final exam for the course I taught this fall to the economics students at Sciences Po. Students were given two hours.

Overexposure

Thinking about Erma Bombeck, I'm reminded of the whole "overexposure" phenomenon. Some people get overexposed but it's still ok. The classic example is Michael Jackson: no matter what, people still think Billie Jean and the rest are cool. And somehow Dave Barry managed to hit the stratosphere without getting that "overexposed" vibe. But Bombeck had more of the classic pattern: at first, she was this exciting new thing--I remember when we got The Grass Is Always Greener Over The Septic Tank out of the library--then, somewhere along the way, she became tacky. I guess it would make sense to go reread The Grass is Always Greener and see if it's still funny. I think I'd still think Art Buchwald's old columns are funny, but who knows.

And then there's Erle Stanley Gardner. I have no sense whether he was "overexposed" or just had his deserved period of popularity which naturally ended.

Boris writes, regarding the recent U.S. Senate election (in which moderate Republican Scott Brown narrowly beat liberal Democrat Martha Coakley in usually reliably-Democratic Massachusetts):

I [Boris] disagree with Josh Tucker that the election isn't that consequential. First, the pivotal Senator will now be a Republican, not a Democrat. The parties put a lot of pressure on moderate members of Congress to vote one way or the other; it's often unsuccessful, but its a pretty powerful source of influence. Second, that pivotal Senator will be Brown, not Snowe (if my prediction proves accurate). Finally, this pivotality will exist on every issue, not just health care reform, which probably just expired in its current form. Not too shabby as a consequential election, right?

Based upon his voting record in the Massachusetts State Senate as well the Votesmart surveys of MA state legislators (include his own from 2002), I [Boris] estimate that Brown is to the left of the leftmost Republican in the Senate, Olympia Snowe of Maine and to the right of the rightmost Democrat in the Senate, Ben Nelson of Nebraska. Just as important, Brown stands to become the pivotal member of the Senate--that is, the 60th least liberal (equivalently, the 40th most conservative)-a distinction previously held by Nelson.

More here.

I posted a note on the other blog about the difference between internal and external coherence of political ideology. The basic idea is that, a particular person or small group can have an ideology (supporting positions A, B, C, and D, for example) that is perfectly internally coherent--that is, all these positions make sense given the underlying ideology--while being incoherent with other ideologies (for example, those people who support positions A, B, not-C, and not-D). What's striking to me is how strongly people can feel that their beliefs on a particular issue flow from their being a liberal, or a conservative, or whatever, even though others with similar opinions will completely disagree with them on that issue.

Stephen Dubner reports on an observational study of bike helmet laws, a study by Christopher. Carpenter and Mark Stehr that compares bicycling and accident rates among children among states that did and did not have helmet laws. In reading the data analysis, I'm reminded of the many discussions Bob Erikson and I have had about the importance, when fitting time-series cross-sectional models, of figuring out where your identification is coming from (this is an issue that's come up several times on this blog)--but I have no particular reason to doubt the estimates, which seem plausible enough. The analysis is clear enough, so I guess it would be easy enough to get the data, fit a hierarchical model, and, most importantly, make some graphs of what's happening before and after the laws, to see what's going on in the data.

Beyond this, I had one more comment, which is that I'm surprised that Dubner found it surprising that helmet laws seem to lead to a decrease in actual bike riding. My impression is that when helmet laws are proposed, this always comes up: the concern that if people are required to wear helmets, they'll just bike less. Hats off to Carpenter and Stehr for estimating this effect in this clever way, but it's certainly an idea that's been discussed before. In this context, I think it wouldb useful to think in terms of sociology-style models of default behaviors as well as economics-style models of incentives.

I read this report by Matthew Yglesias that Blue Cross/Blue Shield is "covertly backing far-right efforts to get health reform declared unconstitutional." I don't want to get into a discussion about whether these efforts are really "far-right"--I know next to nothing about the politics of the health reform battle.

What I really wanted to convey here was my first reaction upon seeing this, which was: Blue Cross/Blue Shield?? I remember this organization from the 70s, when it was my vague impression that Blue Cross was synonymous with "health insurance." I've always thought of it as a quasi-public organization, a sort of default health plan. I mean, sure, they're a private organization, so I assume that, just like the gas company and the electric company and the phone company, they're probably top-heavy with overpaid executives who don't do anything while earning ten times what they'd get on the federal scale. Whatever. That's the system we have here: people who work for quasi-public companies get a soft deal.

I was surprised, though, to hear about Blue Cross doing such strong lobbying. Sort of similar to the reaction I had seeing the percentage of political contributions from employees at Harvard etc. that went to the Democrats. I mean, sure, employees of Harvard have the right to give to whoever they want, but, still, there's something funny about a quasi-public institution such as Harvard (or Blue Cross) leaning so strongly on one side of the debate.

I don't really know if I should think of any of this as a problem; it's just seems strange to think of Blue Cross as sponsoring a covert political agenda. It almost sounds like something from one of those '60s parody spy movies, where the bad guys aren't the Russians or ex-Nazis or whatever, but . . . Blue Cross!

I like paperback books that fit in my pocket. Unfortunately, about 25 years ago they pretty much stopped printing books in that size. Usually the closest you can get are those big floppy "trade paperbacks" or, in the case of the occasional Stephen King-type bestseller, a thick-as-a-brick paperback with big printing and fat pages.

It's not my place to question book marketers. My best theory is that book prices went up, for whatever reason, and then people wanted to feel like they're getting their money's worth: instead of a little pocket book for $2.95, you get the trade paperback for $16.95. Personally, I'd prefer the little book--whether or not I'm paying $16.95--but probably others feel differently. It's sort of like they way they'll sell you 50 aspirins in a bottle that would hold 200, and so forth.

Anyway, I pretty much have to get my pocket books used. I was in a used bookstore the other day and bought Killing Time (1961) by Donald E. Westlake, an author whom I've referred to before as the master of the no-redeeming-social-value thriller. This book was pretty good, and, on top of that, it actually had some redeeming social value.

I'll get back to this point in a moment, but first I wanted to say that one of the funnest things about reading a book from fifty years ago is to get a sense of how things used to be. Killing Time takes place in a small East Coast town which is dominated by a few local bigwigs. I imagine there used to be a lot of places like this in the old days but not so much any more, now that not so many people work in factories, and local ties are weaker. It reminded me of when I watched a bunch of Speed Racer cartoons with Phil in a movie theater in the early 90s. These were low-budget Japanese cartoons from the 60s that we loved as kids. From my adult perspective, the best parts were during the characters' long drives, where you could see Japanese industrial scenes in the background.

OK, now back to the "redeeming social value" thing. In Killing Time, Westlake takes the traditional Philip Marlowe private eye scenario and turns it inside out. The main character of the book (named Smith--make of that what you will) follows the standard pattern: he's outwardly cynical, just wanting to live his life and get by, but underlying this he has a philosophy of government that you might call "realistic idealism" or "idealistic realism." In the book, some reformers from the state capital come to town with the goal of exposing corruption, but private eye Smith doesn't want to go along with this: in his view, the reformers are naive, society has a balance, and it's best to keep things on an even keel. There's a crucial scene about two-thirds of the way through the book, though, where I suddenly realized (through the words of another character) how Smith's apparent cynicism is an extreme form of idealism. And then when I got to the end of the book, I had a sense of the explosive internal contradictions inherent in the standard "private eye" view of the world.

What I can't figure out is how anybody could write a private eye story with a straight face after reading the Westlake book. To me, it really closes the door on the genre. It's the Watchmen of private eye novels.

P.S. An interesting thing about Westlake is that he has not, I believe, ever had a breakout bestseller. I don't know what it takes to get such success, but I don't think it ever happened to him. He had many books made into movies, though, so I'm sure he did just fine financially.

P.P.S. Don't get me wrong, it"s not like I'm saying Westlake is some sort of unrecognized literary master. He has great plots and settings and charming characters, but nothing I've ever read of his has the emotional punch of, say, Scott Smith's A Simple Plan (to choose a book whose plot would fit well into the Westlake canon).

It's the Gatsby seminar in the Computational Neuroscience Unit at University College London, Mon 18 Jan at 4pm:

Creating structured and flexible models: some open problems

A challenge in statistics is to construct models that are structured enough to be able to learn from data but not be so strong as to overwhelm the data. We introduce the concept of "weakly informative priors" which contain important information but less than may be available for the given problem at hand. We also discuss some related problems in developing general models for taxonomies and deep interactions. We consider how these ideas apply to problems in social science and public health. If you don't walk out of this talk a Bayesian, I'll eat my hat.

P.S. Link updated.

Nate does Bayes

The classical statisticians among you can call it a measurement-error model. Whatever.

Bayesian statistics then and now

The following is a discussion of articles by Brad Efron and Rob Kass, to appear in the journal Statistical Science. I don't really have permission to upload their articles, but I think (hope?) this discussion will be of general interest and will motivate some of you to read the others' articles when they come out. (And thanks to Jimmy and others for pointing out typos in my original version!)

It is always a pleasure to hear Brad Efron's thoughts on the next century of statistics, especially considering the huge influence he's had on the field's present state and future directions, both in model-based and nonparametric inference.

Three meta-principles of statistics

Before going on, I'd like to state three meta-principles of statistics which I think are relevant to the current discussion.

First, the information principle, which is that the key to a good statistical method is not its underlying philosophy or mathematical reasoning, but rather what information the method allows us to use. Good methods make use of more information. This can come in different ways: in my own experience (following the lead of Efron and Morris, 1973, among others), hierarchical Bayes allows us to combine different data sources and weight them appropriately using partial pooling. Other statisticians find parametric Bayes too restrictive: in practice, parametric modeling typically comes down to conventional models such as the normal and gamma distributions, and the resulting inference does not take advantage of distributional information beyond the first two moments of the data. Such problems motivate more elaborate models, which raise new concerns about overfitting, and so on.

As in many areas of mathematics, theory and practice leapfrog each other: as Efron notes, empirical Bayes methods have made great practical advances but "have yet to form into a coherent theory." In the past few decades, however, with the work of Lindley and Smith (1972) and many others, empirical Bayes has been folded into hierarchical Bayes, which is part of a coherent theory that includes inference, model checking, and data collection (at least in my own view, as represented in chapters 6 and 7 of Gelman et al, 2003). Other times, theoretical and even computational advances lead to practical breakthroughs, as Efron illustrates in his discussion of the progress made in genetic analysis following the Benjamini and Hochberg paper on false discovery rates.

My second meta-principle of statistics is the methodological attribution problem, which is that the many useful contributions of a good statistical consultant, or collaborator, will often be attributed to the statistician's methods or philosophy rather than to the artful efforts of the statistician himself or herself. Don Rubin has told me that scientists are fundamentally Bayesian (even if they don't realize it), in that they interpret uncertainty intervals Bayesianly. Brad Efron has talked vividly about how his scientific collaborators find permutation tests and p-values to be the most convincing form of evidence. Judea Pearl assures me that graphical models describe how people really think about causality. And so on. I'm sure that all these accomplished researchers, and many more, are describing their experiences accurately. Rubin wielding a posterior distribution is a powerful thing, as is Efron with a permutation test or Pearl with a graphical model, and I believe that (a) all three can be helping people solve real scientific problems, and (b) it is natural for their collaborators to attribute some of these researchers' creativity to their methods.

The result is that each of us tends to come away from a collaboration or consulting experience with the warm feeling that our methods really work, and that they represent how scientists really think. In stating this, I'm not trying to espouse some sort of empty pluralism--the claim that, for example, we'd be doing just as well if we were all using fuzzy sets, or correspondence analysis, or some other obscure statistical method. There's certainly a reason that methodological advances are made, and this reason is typically that existing methods have their failings. Nonetheless, I think we all have to be careful about attributing too much from our collaborators' and clients' satisfaction with our methods.

My third meta-principle is that different applications demand different philosophies. This principle comes up for me in Efron's discussion of hypothesis testing and the so-called false discovery rate, which I label as "so-called" for the following reason. In Efron's formulation (which follows the classical multiple comparisons literature), a "false discovery" is a zero effect that is identified as nonzero, whereas, in my own work, I never study zero effects. The effects I study are sometimes small but it would be silly, for example, to suppose that the difference in voting patterns of men and women (after controlling for some other variables) could be exactly zero. My problems with the "false discovery" formulation are partly a matter of taste, I'm sure, but I believe they also arise from the difference between problems in genetics (in which some genes really have essentially zero effects on some traits, so that the classical hypothesis-testing model is plausible) and in social science and environmental health (where essentially everything is connected to everything else, and effect sizes follow a continuous distribution rather than a mix of large effects and near-exact zeroes).

To me, the false discovery rate is the latest flavor-of-the-month attempt to make the Bayesian omelette without breaking the eggs. As such, it can work fine if the implicit prior is ok, it can be a great method, but I really don't like it as an underlying principle, as it's all formally based on a hypothesis testing framework that, to me, is more trouble than it's worth. In thinking about multiple comparisons in my own research, I prefer to discuss errors of Type S and Type M rather than Type 1 and Type 2 (Gelman and Tuerlinckx, 2000, Gelman and Weakliem, 2009, Gelman, Hill, and Yajima, 2009). My point here, though, is simply that any given statistical concept will make more sense in some settings than others.

For another example of how different areas of application merit different sorts of statistical thinking, consider Rob Kass's remark: "I tell my students in neurobiology that in claiming statistical significance I get nervous unless the p-value is much smaller than .01." In political science, we're typically not aiming for that level of uncertainty. (Just to get a sense of the scale of things, there have been barely 100 national elections in all of U.S. history, and political scientists studying the modern era typically start in 1946.)

Progress in parametric Bayesian inference

I also think that Efron is doing parametric Bayesian inference a disservice by focusing on a fun little baseball example that he and Morris worked on 35 years ago. If he would look at what's being done now, he'd see all the good statistical practice that, in his section 10, he naively (I think) attributes to "frequentism." Figure 1 illustrates with a grid of maps of public opinion by state, estimated from national survey data. Fitting this model took a lot of effort which was made possible by working within a hierarchical regression framework--"a good set of work rules," to use Efron's expression. Similar models have been used recently to study opinion trends in other areas such as gay rights in which policy is made at the state level, and so we want to understand opinions by state as well (Lax and Phillips, 2009).

I also completely disagree with Efron's claim that frequentism (whatever that is) is "fundamentally conservative." One thing that "frequentism" absolutely encourages is for people to use horrible, noisy estimates out of a fear of "bias." More generally, as discussed by Gelman and Jakulin (2007), Bayesian inference is conservative in that it goes with what is already known, unless the new data force a change. In contrast, unbiased estimates and other unregularized classical procedures are noisy and get jerked around by whatever data happen to come by--not really a conservative thing at all. To make this argument more formal, consider the multiple comparisons problem. Classical unbiased comparisons are noisy and must be adjusted to avoid overinterpretation; in constrast, hierarchical Bayes estimates of comparisons are conservative (when two parameters are pulled toward a common mean, their difference is pulled toward zero) and less likely to appear to be statistically significant (Gelman and Tuerlinckx, 2000).

Another way to understand this is to consider the "machine learning" problem of estimating the probability of an event on which we have very little direct data. The most conservative stance is to assign a probability of ½; the next-conservative approach might be to use some highly smoothed estimate based on averaging a large amount of data; and the unbiased estimate based on the local data is hardly conservative at all! Figure 1 illustrates our conservative estimate of public opinion on school vouchers. We prefer this to a noisy, implausible map of unbiased estimators.

Of course, frequentism is a big tent and can be interpreted to include all sorts of estimates, up to and including whatever Bayesian thing I happen to be doing this week--to make any estimate "frequentist," one just needs to do whatever combination of theory and simulation is necessary to get a sense of my method's performance under repeated sampling. So maybe Efron and I are in agreement in practice, that any method is worth considering if it works, but it might take some work to see if something really does indeed work.

Comments on Kass's comments

Before writing this discussion, I also had the opportunity to read Rob Kass's comments on Efron's article.

I pretty much agree with Kass's points, except for his claim that most of Bayes is essentially maximum likelihood estimation. Multilevel modeling is only approximately maximum likelihood if you follow Efron and Morris's empirical Bayesian formulation in which you average over intermediate parameters and maximize over hyperparameters, as I gather Kass has in mind. But then this makes "maximum likelihood" a matter of judgment: what exactly is a hyperparameter? Things get tricky with mixture models and the like. I guess what I'm saying is that maximum likelihood, like many classical methods, works pretty well in practice only because practitioners interpret the methods flexibly and don't do the really stupid versions (such as joint maximization of parameters and hyperparameters) that are allowed by the theory.

Regarding the difficulties of combining evidence across species (in Kass's discussion of the DuMouchel and Harris paper), one point here is that this works best when the parameters have a real-world meaning. This is a point that became clear to me in my work in toxicology (Gelman, Bois, and Jiang, 1996): when you have a model whose parameters have numerical interpretations ("mean," "scale," "curvature," and so forth), it can be hard to get useful priors for them, but when the parameters have substantive interpretations ("blood flow," "equilibrium concentration," etc.), then this opens the door for real prior information. And, in a hierarchical context, "real prior information" doesn't have to mean a specific, pre-assigned prior; rather, it can refer to a model in which the parameters have a group-level distribution. The more real-worldy the parameters are, the more likely this group-level distribution can be modeled accurately. And the smaller the group-level error, the more partial pooling you'll get and the more effective your Bayesian inference is. To me, this is the real connection between scientific modeling and the mechanics of Bayesian smoothing, and Kass alludes to some of this in the final paragraph of his comment.

Hal Stern once said that the big divide in statistics is not between Bayesians and non-Bayesians but rather between modelers and non-modelers. And, indeed, in many of my Bayesian applications, the big benefit has come from the likelihood. But sometimes that is because we are careful in deciding what part of the model is "the likelihood." Nowadays, this is starting to have real practical consequences even in Bayesian inference, with methods such as DIC, Bayes factors, and posterior predictive checks, all of whose definitions depend crucially on how the model is partitioned into likelihood, prior, and hyperprior distributions.

On one hand, I'm impressed by modern machine-learning methods that process huge datasets and I agree with Kass's concluding remarks that emphasize how important it can be that the statistical methods be connected with minimal assumptions; on the other hand, I appreciate Kass's concluding point that statistical methods are most powerful when they are connected to the particular substantive question being studied. I agree that statistical theory is far from settled, and I agree with Kass that developments in Bayesian modeling are a promising way to move forward.

This story is pretty funny. "Distractions in the classroom," indeed. They take nursery school pretty seriously down there in Texas, huh?

Where's Ripley on the web?

Related to our discussion of influential statisticians, I looked up Brian Ripley, who has long been an inspiration to me. (Just to take one example, the final chapter of his book on spatial processes had an example of simulation-based model checking that had a big influence on my ideas in that area.)

I was stunned to find that his webpage hasn't been updated since 2002, and it links to a "list of recent and forthcoming papers" that, believe it or not, hasn't been updated since 1997! I can't figure it out, especially given that Ripley is so computer-savvy and still appears to be active in the computational statistics community. Perhaps someone can explain?

P.S. No rude comments, please. Thank you.

P.P.S. Somebody pointed out that you can search for B D Ripley's recent papers using Google. Here's what's been going on since 2002. Aside from the R stuff, he seems to have been focusing on applied work. Perhaps he could be persuaded to write an article for a statistics journal discussing what he's learned from these examples. I find that working with applied collaborators gives me insights that I never would've had on my own, and I'd be interested in hearing Ripley's thoughts on his own successes and struggles on applied problems.

First the scientific story, then the journalist, then my thoughts.

Part 1: The scientific story

From the Daily News:

Spanking makes kids perform better in school, helps them become more successful: study

The research, by Calvin College psychology professor Marjorie Gunnoe, found that kids smacked before age 6 grew up to be more successful . . . Gunnoe, who interviewed 2,600 people about being smacked, told the [London] Daily Mail: "The claims that are made for not spanking children fail to hold up. I think of spanking as a dangerous tool, but then there are times when there is a job big enough for a dangerous tool. You don't use it for all your jobs."

From the Daily Mail article:

Professor Gunnoe questioned 2,600 people about being smacked, of whom a quarter had never been physically chastised. The participants' answers then were compared with their behaviour, such as academic success, optimism about the future, antisocial behaviour, violence and bouts of depression. Teenagers in the survey who had been smacked only between the ages of two and six performed best on all the positive measures. Those who had been smacked between seven and 11 fared worse on negative behaviour but were more likely to be academically successful. Teenagers who were still smacked fared worst on all counts.

Part 2: The journalist

Po Bronson (whose life and career are eerily similar to the slightly older and slightly more famous Michael Lewis) writes about this study in Newsweek:

Unfortunately, there's been little study of [kids who haven't been spanked], because children who've never been spanked aren't easy to find. Most kids receive physical discipline at least once in their life. But times are changing, and parents today have numerous alternatives to spanking. The result is that kids are spanked less often overall, and kids who've never been spanked are becoming a bigger slice of the pie in long-term population studies.

One of those new population studies underway is called Portraits of American Life. It involves interviews of 2,600 people and their adolescent children every three years for the next 20 years. Dr. Marjorie Gunnoe is working with the first wave of data on the teens. It turns out that almost a quarter of these teens report they were never spanked.

So this is a perfect opportunity to answer a very simple question: are kids who've never been spanked any better off, long term?

Gunnoe's summary is blunt: "I didn't find that in my data." . . . those who'd been spanked just when they were young--ages 2 to 6--were doing a little better as teenagers than those who'd never been spanked. On almost every measure.

A separate group of teens had been spanked until they were in elementary school. Their last spanking had been between the ages of 7 and 11. These teens didn't turn out badly, either.

Compared with the never-spanked, they were slightly worse off on negative outcomes, but a little better off on the good outcomes. . . .

Gunnoe doesn't know what she'll find, but my thoughts jump immediately to the work of Dr. Sarah Schoppe-Sullivan, whom we wrote about in NurtureShock. Schoppe-Sullivan found that children of progressive dads were acting out more in school. This was likely because the fathers were inconsistent disciplinarians; they were emotionally uncertain about when and how to punish, and thus they were reinventing the wheel every time they had to reprimand their child. And there was more conflict in their marriage over how best to parent, and how to divide parenting responsibilities.

I [Bronson] admit to taking a leap here, but if the progressive parents are the ones who never spank (or at least there's a large overlap), then perhaps the consistency of discipline is more important than the form of discipline. In other words, spanking regularly isn't the problem; the problem is having no regular form of discipline at all.

I couldn't find a copy of Gunnoe's report on the web. Her local newspaper (the Grand Rapids News) reports that she "presented her findings at a conference of the Society for Research in Child Development," but the link only goes to the conference website, not to any manuscript. Following the link for Marjorie Gunnoe takes me to this page at Calvin College, which describes itself as "the distinctively Christian, academically excellent liberal arts college that shapes minds for intentional participation in the renewal of all things."

Gunnoe is quoted in the Grand Rapids Press as saying:

"This in no way should be thought of as a green light for spanking . . . This is a red light for people who want to legally limit how parents choose to discipline their children. I don't promote spanking, but there's not the evidence to outlaw it."

I'm actually not sure why these results, if valid, should not by taken as a "green light" for spanking, but I guess Gunnoe's point is that parental behaviors are situational, and you might not want someone reading her article and then hitting his or her kid for no reason, just for its as-demonstrated-by-research benefits.

Unsurprisingly, there's lots of other research on the topic of corporal punishment. A commenter at my other blog found a related study of Gunnoe's, from 1997. It actually comes from an entire issue of the journal that's all about discipine, including several articles on spanking.

Another commenter linked to several reports of research, including this from University of New Hampshire professor Murray Straus:

090924231749.jpg

(I don't know who is spanked exactly once, but maybe this is #times spanked per week, or something like that. I didn't search for the original source of the graph.)

I agree with the commenter that it would be interesting to see Gunnoe and Straus speaking on the same panel.

Part 3: My thoughts

I can't exactly say that Po Bronson did anything wrong in his writeup--he's knowledgeable in this area (more than I am, certainly) and has thought a lot about it. He's a journalist who's written a book on child-rearing, and this is a juicy topic, so I can't fault him for discussing Gunnoe's findings. And I certainly wouldn't suggest that this topic is off limits just because nothing has really been "proved" on the causal effects of corporal punishment. Research in this area is always going to be speculative.

Nonetheless, I'm a little bothered by Bronson's implicit acceptance of Gunnoe's results and his extrapolations from her more modest claims. I get a bit uncomfortable when a reporter starts to give explanations for why something is happening, when that "something" might not really be true at all. I don't see any easy solution here--Bronson is even careful enough to say, "I admit to taking a leap here." Still, I'm bothered by what may be a too-easy implicit acceptance of an unpublished research claim. Again, I'm not saying that blanket skepticism is a solution either, but still . . .

It's a tough situation to be in, to report on headline-grabbing claims when there's no research paper to back them up. (I assume that if Bronson had a copy of Gunnoe's research article, he could send it to various experts he knows to get their opinions.)

P.S. I altered the second-to-last paragraph above in light of Jason's comments.

Jim Madden writes:

I have been developing interactive graphical software for visualizing hierarchical linear models of large sets of student performance data that I have been allowed access to by the Louisiana Department of Education. This is essentially pre/post test data, with student demographic information (birthday, sex, race, ses, disability status) and school associated with each record. (Actually, I can construct student trajectories for several years, but I have not tried using this capability yet.) My goal is to make the modeling more transparent to audiences that are not trained in statistics, and in particular, I am trying to design the graphics so that nuances and uncertainties are apparent to naive viewers.

Andrew Gelman, Jingchen Liu, and Sophia Rabe-Hesketh are looking for two fulltime postdocs, one to be based at the Department of Statistics, Columbia University, and the other at the Graduate School of Education, University of California, Berkeley.

The positions are funded by a grant entitled "Practical Tools for Multilevel/Hierarchical Modeling in Education" (Institute of Education Sciences, Department of Education). The project addresses modeling and computational issues that stand in the way of fitting larger, more realistic models of variation in social science data, in particular the problem that (restricted) maximum likelihood estimation often yields estimates on the boundary, such as zero variances or perfect correlations. The proposed solution is to specify weakly informative prior distributions - that is, prior distributions that will affect inferences only when the data provide little information about the parameters. Existing software for maximum likelihood estimation can then be modified to perform Bayes posterior modal estimation, and these ideas can also be used in fully Bayesian computation. In either case, the goal is to be able to fit and understand more complex, nuanced models. The postdocs will contribute to all aspects of this research and will implement the methods in C and R (Columbia postdoc) and Stata (Berkeley postdoc).

Both locations are exciting places to work with research groups comprising several faculty, postdocs, and students working on a wide range of interesting applied problems. The Columbia and Berkeley groups will communicate with each other on a regular basis, and there will be annual workshops with outside panels of experts.

Applicants should have a Ph.D. in Statistics or a related area and have strong programming skills, preferably with some experience in C/R/Stata. Experience with Bayesian methods and good knowledge of hierarchical Bayesian and/or "classical" multilevel modeling would be a great advantage.

The expected duration of the positions is 2 years with an approximate start date of September 1, 2010.

Please submit a statement of interest, not exceeding 5 pages, and your CV to either Andrew Gelman and Jingchen Liu (asc.coordinator@stat.columbia.edu) or Sophia Rabe-Hesketh (Graduate School of Education, 3659 Tolman Hall, Berkeley, CA 94720-1670, sophiarh@berkeley.edu), stating whether you would also be interested in the other position. Applications will be considered as they arrive but should be submitted no later than April 1st.

This is in addition to, but related to, our other postdoctoral positions. A single application will work for all of them.

Alex Tabarrok has posts on the amusing story of Westerners overestimating the Soviet economy. For example, here's a graph from the legendary Paul Samuelson textbook (from 1961):

samuelson.png

Tabarrok points out that it's even worse than it looks: "in subsequent editions Samuelson presented the same analysis again and again except the overtaking time was always pushed further into the future so by 1980 the dates were 2002 to 2012. In subsequent editions, Samuelson provided no acknowledgment of his past failure to predict and little commentary beyond remarks about 'bad weather' in the Soviet Union."

The bit about the bad weather is funny. If you've had bad weather in the past, maybe the possibility of future bad weather should be incorporated into the forecast, no?

As Tabarrok and his commenters point out, this mistake can't simply be attributed to socialist sympathies of the center-left Samuelson: For one thing, various other leftist economists did not think that the Soviets were catching up to us; and, for another, political commentators on the right at the time were all telling us that the communists were about to overwhelm us militarily.

I don't really have anything to add here, I just agree with Alex that it's a funny graph.

This is just to say

They are always saying to check the temperature of your oven. Well, "they" aren't kidding. I checked with an oven thermometer, and it was 30 degrees (C) higher than labeled. We were suspicious that something was going on, but who'd ever think it could be off by 30 degrees??

pourquoi michael jackson est blanc

Alex Tabarrok links to these amusing partial Google searches found by Dan Ariely:

ariely1.png

ariely2.png

Ariely pretty much takes these at face value, labeling them "What Boyfriends and Girlfriends Search for on Google" and writing: "This shows Google's remarkable power as a source of data on a range of human behaviors, emotions, and opinions. It gives us insights into what people might care the most about concerning a given topic. . . ."

I followed a link in Ariely's comments to a blog whose entire content are partial Google searches. Seems like a bit of a niche market to me, but the results were so weird (for example, one of the top ten searches for "my rob" is "my robot friend is pregnant") that i started to get skeptical.

So I tried the simplest thing I could think of on my own computer, and here's what came out:

jackson.png

(Click to see a larger version of this image.)

My second choice was est-ce-que, which also yielded some strange results.

So my current thought is not to take these Google partial searches so seriously. I wonder if the algorithm purposely spits out wacky searches in order to make the search function more fun to play with.

Maybe some of the Google employees who read this blog can enlighten us (anonymously, if necessary) about how seriously we should interpret these?

P.S. Ariely's blog is pretty cool--a mix of some basic intro stuff (as is appropriate, since the blog is attached to his popular book and some deeper ideas too. When Predictably Irrational came out, we received from his publicist several emails, a copy of the book (which Juli reviewed), and a suggestion that we could interview Ariely or he could guest blog for us. We said yes on both but never heard back. I can understand it: publicists get busy, and we did get a free book out of it. But, Dan, if you're reading this, get in touch: we'd still be glad to have you guest blog for us!

P.P.S. Our earlier discussion of googlefights as a teaching tool.

Europe vs. America: the grudge match

Tyler Cowen adds to the always-popular "Europe vs. America" debate. At stake is whether western European countries are going broke and should scale back their social spending (for example, here in Paris they have free public preschool starting at 3 years old, and it's better than the private version we were paying for in New York), or whether, conversely, the U.S. should ramp up spending on public goods, as Galbraith suggested back in the 1950s when he wrote The Affluent Society.

Much of the debate turns on statistics, oddly enough. I'm afraid I don't know enough about the topic to offer any statistical contributions to the discussion, but I wanted to bring up one thing which I remember people used to talk about but hasn't seem to have come up in the current discussion (unless I've missed something, which is quite possible).

Here's my question. Shouldn't we be impressed by the performance of the U.S. economy, given that we've spent several zillion dollars more on the military than all the European countries combined, but our economy has continued to grow at roughly the same rate as Europe's? (Cowen does briefly mention "military spending" but only in a parenthetical, and I'm not sure what he was referring to.) From the other direction, I guess you could argue that in the U.S., military spending is a form of social spending--it's just that, instead of providing health care etc. for everyone, it's provided just for military families, and instead of the government supporting some modern-day equivalent of a buggy-whip factory, it's supporting some company that builds airplanes or submarines. Anyway, this just seemed somewhat relevant to the discussion.

P.S. OK, there's one place where I can offer a (very small) bit of statistical expertise.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48  

Recent Comments

  • Hopefully Anonymous: How much military spending is benefits vs. technology/research? Is military read more
  • Ed: It is the military spending. Taxes in the US are read more
  • jonathan: This issue reflects something about America: we choose to spend read more
  • Mark Fredrickson: The technique Cowan describes was employed to compare levels of read more
  • Goldilocksisableachblond: All of the debate on comparing GDP growth rates ignores read more
  • Frederik Hjorth: That paragraph of Tyler's made me wonder whether he's heard read more
  • ceolaf: Krugman addressed the military spending thing. Look over on his read more

Find recent content on the main index or look in the archives to find all content.