Recently in Zombies Category

I think I'm starting to resolve a puzzle that's been bugging me for awhile.

Pop economists (or, at least, pop micro-economists) are often making one of two arguments:

1. People are rational and respond to incentives. Behavior that looks irrational is actually completely rational once you think like an economist.

2. People are irrational and they need economists, with their open minds, to show them how to be rational and efficient.

Argument 1 is associated with "why do they do that?" sorts of puzzles. Why do they charge so much for candy at the movie theater, why are airline ticket prices such a mess, why are people drug addicts, etc. The usual answer is that there's some rational reason for what seems like silly or self-destructive behavior.

Argument 2 is associated with "we can do better" claims such as why we should fire 80% of public-schools teachers or Moneyball-style stories about how some clever entrepreneur has made a zillion dollars by exploiting some inefficiency in the market.

The trick is knowing whether you're gonna get 1 or 2 above. They're complete opposites!

Our story begins . . .

Here's a quote from Steven Levitt:

One of the easiest ways to differentiate an economist from almost anyone else in society is to test them with repugnant ideas. Because economists, either by birth or by training, have their mind open, or skewed in just such a way that instead of thinking about whether something is right or wrong, they think about it in terms of whether it's efficient, whether it makes sense. And many of the things that are most repugnant are the things which are indeed quite efficient, but for other reasons -- subtle reasons, sometimes, reasons that are hard for people to understand -- are completely and utterly unacceptable.

As statistician Mark Palko points out, Levitt is making an all-too-convenient assumption that people who disagree with him are disagreeing because of closed-mindedness. Here's Palko:

There are few thoughts more comforting than the idea that the people who disagree with you are overly emotional and are not thinking things through. We've all told ourselves something along these lines from time to time.

I could add a few more irrational reasons to disagree with Levitt: political disagreement (on issues ranging from abortion to pollution) and simple envy at Levitt's success. (It must make the haters even more irritated that Levitt is, by all accounts, amiable, humble, and a genuinely nice guy.) In any case, I'm a big fan of Freakonomics.

But my reaction to reading the above Levitt quote was to think of the puzzle described at the top of this entry. Isn't it interesting, I thought, that Levitt is identifying economists as rational and ordinary people as irrational. That's argument 2 above. In other settings, I think we'd hear him saying how everyone responds to incentives and that what seems like "efficiency" to do-gooding outsiders is actually not efficient at all. The two different arguments get pulled out as necessary.

The set of all sets that don't contain themselves

Which in turn reminds me of this self-negating quote from Levitt protoge Emily Oster:

anthropologists, sociologists, and public-health officials . . . believe that cultural differences--differences in how entire groups of people think and act--account for broader social and regional trends. AIDS became a disaster in Africa, the thinking goes, because Africans didn't know how to deal with it.

Economists like me [Oster] don't trust that argument. We assume everyone is fundamentally alike; we believe circumstances, not culture, drive people's decisions, including decisions about sex and disease.

I love this quote for its twisted logic. It's Russell's paradox all over again. Economists are different from everybody else, because . . . economists "assume everyone is fundamentally alike"! But if everyone is fundamentally alike, how is it that economists are different "from almost anyone else in society"? All we can say for sure is that it's "circumstances, not culture." It's certainly not "differences in how entire groups of people think and act"--er, unless these groups are economists, anthropologists, etc.

OK, fine. I wouldn't take these quotations too seriously; they're just based on interviews, not careful reflection. My impression is that these quotes come from a simple division of the world into good and bad things:

- Good: economists, rationality, efficiency, thinking the unthinkable, believing in "circumstances"

- Bad: anthropologists, sociologists, public-health officials, irrationality, being deterred by repugnant ideas, believing in "culture"

Good is entrepreneurs, bad is bureaucrats. At some point this breaks down. For example, if Levitt is hired by a city government to help reform its school system, is he a rational, taboo-busting entrepreneur (a good thing) or a culture-loving bureaucrat who thinks he knows better than everybody else (a bad thing)? As a logical structure, the division into Good and Bad has holes. But as emotionally-laden categories ("fuzzy sets," if you will), I think it works pretty well.

The solution to the puzzle

OK, now to return to the puzzle that got us started. How is it that economics-writers such as Levitt are so comfortable flipping back and forth between argument 1 (people are rational) and argument 2 (economists are rational, most people are not)?

The key, I believe, is that "rationality" is a good thing. We all like to associate with good things, right? Argument 1 has a populist feel (people are rational!) and argument 2 has an elitist feel (economists are special!). But both are ways of associating oneself with rationality. It's almost like the important thing is to be in the same room with rationality; it hardly matters whether you yourself are the exemplar of rationality, or whether you're celebrating the rationality of others.


I'm not saying that arguments based on rationality are necessarily wrong in particular cases. (I can't very well say that, given that I wrote an article on why it can be rational to vote.) I'm just trying to understand how pop-economics can so rapidly swing back and forth between opposing positions. And I think it's coming from the comforting presence of rationality and efficiency in both formulations. It's ok to distinguish economists from ordinary people (economists are rational and think the unthinkable, ordinary people don't) and it's also ok to distinguish economists from other social scientists (economists think ordinary people are rational, other social scientists believe in "culture"). You just have to be careful not to make both arguments in the same paragraph.

P.S. Statisticians are special because, deep in our bones, we know about uncertainty. Economists know about incentives, physicists know about reality, movers can fit big things in the elevator on the first try, evolutionary psychologists know how to get their names in the newspaper, lawyers know you should never never never talk to the cops, and statisticians know about uncertainty. Of that, I'm sure.

When it rains it pours . . .

John Transue writes:

I saw a post on Andrew Sullivan's blog today about life expectancy in different US counties. With a bunch of the worst counties being in Mississippi, I thought that it might be another case of analysts getting extreme values from small counties.

However, the paper (see here) includes a pretty interesting methods section. This is from page 5, "Specifically, we used a mixed-effects Poisson regression with time, geospatial, and covariate components. Poisson regression fits count outcome variables, e.g., death counts, and is preferable to a logistic model because the latter is biased when an outcome is rare (occurring in less than 1% of observations)."

They have downloadable data. I believe that the data are predicted values from the model. A web appendix also gives 90% CIs for their estimates.

Do you think they solved the small county problem and that the worst counties really are where their spreadsheet suggests?

My reply:

I don't have a chance to look in detail but it sounds like they're on the right track. I like that they cross-validated; that's what we did to check we were ok with our county-level radon estimates.

Regarding your question about the small county problem: no matter what you do, all maps of parameter estimates are misleading. Even the best point estimates can't capture uncertainty. As noted above, cross-validation (at the level of the county, not of the individual observation) is a good way to keep checking.

Brendan Nyhan points me to this from Don Taylor:

Can national data be used to estimate state-level results? . . . A challenge is the fact that the sample size in many states is very small . . . Richard [Gonzales] used a regression approach to extrapolate this information to provide a state-level support for health reform:
To get around the challenge presented by small sample sizes, the model presented here combines the benefits of incorporating auxiliary demographic information about the states with the hierarchical modeling approach commonly used in small area estimation. The model is designed to "shrink" estimates toward the average level of support in the region when there are few observations available, while simultaneously adjusting for the demographics and political ideology in the state. This approach therefore takes fuller advantage of all information available in the data to estimate state-level public opinion.

This is a great idea, and it is already being used all over the place in political science. For example, here. Or here. Or here.

See here for an overview article, "How should we estimate public opinion in the states?" by Jeff Lax and Justin Phillips.

It's good to see practical ideas being developed independently in different fields. I know that methods developed by public health researchers have been useful in political science, and I hope that in turn they can take advantage of the progress we've made in multilevel regression and poststratification.

Last Wegman post (for now)


John Mashey points me to a news article by Eli Kintisch with the following wonderful quote:

Will Happer, a physicist at Princeton University who questions the consensus view on climate, thinks Mashey is a destructive force who uses "totalitarian tactics"--publishing damaging documents online, without peer review--to carry out personal vendettas.

I've never thought of uploading files as "totalitarian" but maybe they do things differently at Princeton. I actually think of totalitarians as acting secretly--denunciations without evidence, midnight arrests, trials in undisclosed locations, and so forth. Mashey's practice of putting everything out in the open seems to me the opposite of totalitarian.

The article also reports that Edward Wegman's lawyer said that Wegman "has never engaged in plagiarism." If I were the lawyer, I'd be pretty mad at Wegman at this point. I can just imagine the conversation:

Lawyer: You never told me about that 2005 paper where you stole from Brian Everitt. That was a mistake, dude! Everitt's a more famous statistician than you are!

Wegman: Ummm . . . I think my other student did that one. But I can't remember, I must have lost the files in an office move. . . .

P.S. This is not a personal vendetta on my part. I've never met Edward Wegman and have nothing against him personally. I just think it's super-tacky to plagiarize, and even more so to not admit it when you're caught. Yeah, I know Chris Rock says: when you're caught, just deny everything. But Chris Rock's a comedian. Wegman is supposed to be a scientist.

The funny part is how clear the evidence is--that's why I keep throwing in jokes. The sad part is that Wegman, Goodwin, Fischer, etc etc don't have the decency to say they're sorry.

Since we're on the topic of nonreplicable research . . . see here (link from here) for a story of a survey that's so bad that the people who did it won't say how they did it.

I know too many cases where people screwed up in a survey when they were actually trying to get the right answer, for me to trust any report of a survey that doesn't say what they did.

I'm reminded of this survey which may well have been based on a sample of size 6 (again, the people who did it refused to release any description of methodology).

Sanjay Srivastava reports:

Recently Ben Goldacre wrote about a group of researchers (Stuart Ritchie, Chris French, and Richard Wiseman) whose null replication of 3 experiments from the infamous Bem ESP paper was rejected by JPSP - the same journal that published Bem's paper.

Srivastava recognizes that JPSP does not usually publish replications but this is a different story because it's an anti-replication.

Here's the paradox:

- From a scientific point of view, the Ritchie et al. results are boring. To find out that there's no evidence for ESP . . . that adds essentially zero to our scientific understanding. What next, a paper demonstrating that pigeons can fly higher than chickens? Maybe an article in the Journal of the Materials Research Society demonstrating that diamonds can scratch marble but not the reverse??

- But from a science-communication perspective, the null replication is a big deal because it adds credence to my hypothesis that the earlier ESP claims arose from some sort of measurement error (which might be of interest for people doing straight psychology experiments using similar methods).

The rules of journal publication are all about scientific progress, but scientific journals are plugged into the news media, where the rules are different. My guess is that the JPSP editors thought the original Bem article was not real science even when they accepted it for publication, but they wanted to be open-minded and bend over backward to be fair. Sort of like what happened when Statistical Science published that notorious Bible Code paper back in 1994.

E. J. Wagenmakers writes:

Here's a link for you. The first sentences tell it all:
Climate warming since 1995 is now statistically significant, according to Phil Jones, the UK scientist targeted in the "ClimateGate" affair. Last year, he told BBC News that post-1995 warming was not significant--a statement still seen on blogs critical of the idea of man-made climate change. But another year of data has pushed the trend past the threshold usually used to assess whether trends are "real."

Now I [Wagenmakers] don't like p-values one bit, but even people who do like them must cringe when they read this. First, this apparently is a sequential design, so I'm not sure what sampling plan leads to these p-values. Secondly, comparing significance values suggests that the data have suddenly crossed some invisible line that divided nonsignificant from significant effects (as you pointed out in your paper with Hal Stern). Ugh!

I share Wagenmakers's reaction. There seems to be some confusion here between inferential thresholds and decision thresholds. Which reminds me how much I hate the old 1950s literature (both classical and Bayesian) on inference as decision, loss functions for estimators, and all the rest. I think the p-value serves a role in summarizing certain aspects of a model's fit to data but I certainly don't think it makes sense as any kind of decision threshold (despite that it is nearly universally used as such to decide on acceptance of research in scientific journals).

Nicholas Christakis and James Fowler are famous for finding that obesity is contagious. Their claims, which have been received with both respect and skepticism (perhaps we need a new word for this: "respecticism"?) are based on analysis of data from the Framingham heart study, a large longitudinal public-health study that happened to have some social network data (for the odd reason that each participant was asked to provide the name of a friend who could help the researchers locate them if they were to move away during the study period.

The short story is that if your close contact became obese, you were likely to become obese also. The long story is a debate about the reliability of this finding (that is, can it be explained by measurement error and sampling variability) and its causal implications.

This sort of study is in my wheelhouse, as it were, but I have never looked at the Christakis-Fowler work in detail. Thus, my previous and current comments are more along the lines of reporting, along with general statistical thoughts.

We last encountered Christakis-Fowler last April, when Dave Johns reported on some criticisms coming from economists Jason Fletcher and Ethan Cohen-Cole and mathematician Russell Lyons.

Lyons's paper was recently published under the title, The Spread of Evidence-Poor Medicine via Flawed Social-Network Analysis. Lyons has a pretty aggressive tone--he starts the abstract with the phrase "chronic widespread misuse of statistics" and it gets worse from there--and he's a bit rougher on Christakis and Fowler than I would be, but this shouldn't stop us from evaluating his statistical arguments. Here are my thoughts:

Another Wegman plagiarism


At the time of our last discussion, Edward Wegman, a statistics professor who has also worked for government research agencies, had been involved in three cases of plagiarism: a report for the U.S. Congress on climate models, a paper on social networks, a paper on color graphics.

Each of the plagiarism stories was slightly different: the congressional report involved the distorted copying of research by a scientist (Raymond Bradley) whose conclusions Wegman disagreed with, the social networks paper included copied material in its background section, and the color graphics paper included various bits and pieces by others that had been used in old lecture notes.

Since then, blogger Deep Climate has uncovered another plagiarized article by Wegman, this time an article in a 2005 volume on data mining and data visualization. Deep Climate writes, "certain sections of Statistical Data Mining rely heavily on lightly edited portions on lectures from Wegman's statistical data mining course at GMU. In turn, those lectures contain 'copy-and-paste' material from a variety of sources, some partially attributed and some not at all."` It looks pretty bad. And, as with the other cases of plagiarism, sometimes the small changes they made caused errors that were not in the original sources. Ouch!

One of the authors Wegman stole from was Brian Everitt. Couldn't Wegman have just invited Everitt to be a coauthor of his article? To steal his work, that's sooooo tacky.

After posting on David Rubinstein's remarks on his "cushy life" as a sociology professor at a public university, I read these remarks by some of Rubinstein's colleagues at the University of Illinois, along with a response from Rubinstein.

Before getting to the policy issues, let me first say that I think it must have been so satisfying, first for Rubinstein and then for his colleagues (Barbara Risman, William Bridges, and Anthony Orum) to publish these notes. We all have people we know and hate, but we rarely have a good excuse for blaring our feelings in public. (I remember when I was up for tenure, I was able to read the outside letters on my case (it's a public university and they have rules), and one of the letter writers really hated my guts. I was surprised--I didn't know the guy well (the letters were anonymized but it was clear from context who the letter writer was) but the few times we'd met, he'd been cordial enough--but there you have it. He must have been thrilled to have the opportunity to write, for an audience, what he really thought about me.)

Anyway, reading Rubinstein's original article, it's clear that his feelings of alienation had been building up inside of him for oh I don't know how long, and it must have felt really great to tell the world how fake he really felt in his job. And his colleagues seem to have detested him for decades but only now have the chance to splash this all out in public. Usually you just don't have a chance.

Looking for a purpose in life

To me, the underlying issue in Rubinstein's article was his failure to find a purpose to his life at work. To go into the office, year after year, doing the very minimum to stay afloat in your classes, to be teaching Wittgenstein to a bunch of 18-year-olds who just don't care, to write that "my main task as a university professor was self-cultivation"--that's got to feel pretty empty.

Siobhan Mattison pointed me to this. I'm just disappointed they didn't use my Fenimore Cooper line. Although I guess that reference wouldn't resonate much outside the U.S.

P.S. My guess was correct See comments below. Actually, the reference probably wouldn't resonate so well among under-50-year-olds in the U.S. either. Sort of like the Jaycees story.

Funniest comment ever


Here (scroll down to the bottom; for some reason the link doesn't go directly to the comment itself). I've never actually seen a Kaypro but I remember the ads.

(Background here.)

Another silly graph

| 1 Comment

Somebody named Justin writes:

Check this out for some probably bad statistics and bad graphs. It looks like they tallied the most frequent names of CEOs, professions, geography, etc. What inference they are trying to make from this, I have no clue.

I agree this is pretty horrible isn't the sort of graph that I would make. But, readers, please! You don't have to just send me the bad stuff!

A common reason for plagiarism is laziness: you want credit for doing something but you don't really feel like doing it--maybe you'd rather go fishing, or bowling, or blogging, or whatever, so you just steal it, or you hire someone to steal it for you.

Interestingly enough, we see that in many defenses of plagiarism allegations. A common response is: I was sloppy in dealing with my notes, or I let my research assistant (who, incidentally, wasn't credited in the final version) copy things for me and the research assistant got sloppy. The common theme: The person wanted the credit without doing the work.

As I wrote last year, I like to think that directness and openness is a virtue in scientific writing. For example, clearly citing the works we draw from, even when such citing of secondary sources might make us appear less erudite. But I can see how some scholars might feel a pressure to cover their traces.


Which brings us to Ed Wegman, whose defense of plagiarism in that Computational Statistics and Data Analysis paper is as follows (from this report by John Mashey):

(a) In 2005, he and his colleagues needed "some boilerplate background on social networks" for a high-profile report for the U.S. Congress. But instead of getting an expert on social networks for this background, or even simply copying some paragraphs (suitably cited) from a textbook on the topic, he tasked a Ph.D. student, Denise Reeves, to prepare the boilerplate. Reeves was no expert: her knowledge of social networks came from having taken a short course on the topic. Reeves writes the boilerplate "within a few days" and Wegman writes "of course, I took that to be her original work."

(b) Wegman gave this boilerplate to a second student, Walid Sharabati, who included it in his Ph.D. dissertation "with only minor amendments." (I think he's saying Sharabati copied it.)

(c) Sharabati was a coauthor of the Computational Statistics and Data Analysis article. He took the material he'd copied from Reeves's report and stuck it in to the CSDA article.

Now let's apply our theme of the day, laziness:

Here and here, for example.

I just hope they're using our survey methods and aren't trying to contact the zombies face-to-face!

See more at the Statistics Forum (of course).

Missed Friday the 13th Zombie Plot Update


The revised paper

Slightly improved figures

And just the history part from my thesis - that some find interesting.
(And to provide a selfish wiki meta-analysis entry pointer)

I have had about a dozen friends read this or earlier versions - they split into finding it interesting (and pragmatic) versus incomprehensible.

The reason for that may or may not point to ways to make it clearer.


Much-honored playwright Tony Kushner was set to receive one more honor--a degree from John Jay College--but it was suddenly taken away from him on an 11-1 vote of the trustees of the City University of New York. This was the first rejection of an honorary degree nomination since 1961.

The news article focuses on one trustee, Jeffrey Wiesenfeld, an investment adviser and onetime political aide, who opposed Kushner's honorary degree, but to me the relevant point is that the committee as a whole voted 11-1 to ding him.

Kusnher said, "I'm sickened," he added, "that this is happening in New York City. Shocked, really." I can see why he's shocked, but perhaps it's not so surprising that it's happening in NYC. Recall the famous incident from 1940 in which Bertrand Russell was invited and then uninvited to teach at City College. The problem that time was Russell's views on free love (as they called it back then). There seems to be a long tradition of city college officials being willing to risk controversy to make a political point.

P.S. I was trying to imagine what these 11 trustees could've been thinking . . . my guess is it was some sort of group-dynamics thing. They started talking about it and convinced each other that the best thing to do would be to set Kushner's nomination aside. I bet if they'd had to decide separately most of them wouldn't have come to this conclusion. And I wouldn't be surprised if, five minutes after walking away from that meeting, most of those board members suddenly thought, Uh oh--we screwed up on this one! As cognitive psychologists have found, this is one of the problems with small-group deliberation: a group of people can be led to a decision which is not anywhere near the center of their positions considered separately.

Arrow's other theorem


I received the following email from someone who'd like to remain anonymous:

Lately I [the anonymous correspondent] witnessed that Bruno Frey has published two articles in two well known referreed journals on the Titanic disaster that try to explain survival rates of passenger on board.

The articles were published in the Journal of Economic Perspectives and Rationality & Society. While looking up the name of the second journal where I stumbled across the article I even saw that they put the message in a third journal, the Proceedings of the National Academy of Sciences United States of America.

To say it in Sopranos like style - with all due respect, I know Bruno Frey from conferences, I really appreciate his take on economics as a social science and he has really published more interesting stuff that most economists ever will. But putting the same message into three journals gives me headaches for at least two reasons:

1) When building a track record and scientific reputation, it's publish or perish. What about young scholars that may have interesting stuff to say, but get rejected for (sometimes) obscure reasons, especially if you have innovative ideas that run against the mainstream. Meanwhile acceptance is granted to papers with identical messages in three journals that causes both congestion in the review procedures in biases acceptance, assuming that for two of three articles that are not entirely unique two other manuscripts will be rejected from an editorial point of view to preserve exclusivity by sticking to low or constant acceptance rates. Do you see this as a problem? Or is the main point against this argument that if the other papers would have the quality they would be published.

2) As an author one usually gets the question on "are the results published in another journal" (and therefore not original) or "is this paper under review in an another journal". In their case the answer should be no for both answers as they report different results and use different methods in every paper. But if you check the descriptive statistics in the papers, they are awkwardly similar. At what point do these questions and the content overlap that it really causes problems for authors? Have you ever heard about any stories about double publications that were not authorized reprints or translations in other languages (which usually should not be problematic, as shown by the way in Frey publication list) and had to be withdrawn? Barely happens I guess.

Best regards and thank you for providing an open forum to discuss stuff like that.

I followed the links and read the abstracts. The three papers do indeed seem to describe similar work. But the abstracts are in remarkably different styles. The Rationality and Society abstract is short and doesn't say much. The Journal of Economic Perspectives abstract is long with lots of detail but, oddly, no conclusions! This abstract has the form of a movie trailer: lots of explosions, lots of drama, but no revealing of the plot. Finally, here's the PNAS abstract, which tells us what they found:

To understand human behavior, it is important to know under what conditions people deviate from selfish rationality. This study explores the interaction of natural survival instincts and internalized social norms using data on the sinking of the Titanic and the Lusitania. We show that time pressure appears to be crucial when explaining behavior under extreme conditions of life and death. Even though the two vessels and the composition of their passengers were quite similar, the behavior of the individuals on board was dramatically different. On the Lusitania, selfish behavior dominated (which corresponds to the classical homo economicus); on the Titanic, social norms and social status (class) dominated, which contradicts standard economics. This difference could be attributed to the fact that the Lusitania sank in 18 min, creating a situation in which the short-run flight impulse dominated behavior. On the slowly sinking Titanic (2 h, 40 min), there was time for socially determined behavioral patterns to reemerge. Maritime disasters are traditionally not analyzed in a comparative manner with advanced statistical (econometric) techniques using individual data of the passengers and crew. Knowing human behavior under extreme conditions provides insight into how widely human behavior can vary, depending on differing external conditions.

Interesting. My only quibble here is with the phrase "selfish rationality," which comes up in the very first sentence. As Aaron Edlin, Noah Kaplan, and I have stressed, rationality doesn't have to imply selfishness, and selfishness doesn't have to imply rationality. One can achieve unselfish goals rationally. For example, if I decide not to go on a lifeboat, I can still work to keep the peace and to efficiently pack people onto existing lifeboat slots. I don't think this comment of mine affects the substance of the Frey et al. papers; it's just a slight change of emphasis.

Regarding the other question, of how could the same paper be published three times, my guess is that a paper on the Titanic can partly get published for its novelty value: even serious journals like to sometimes run articles on offbeat topics. I wouldn't be surprised if the editors of each journal thought: Hey, this is fun. We don't usually publish this sort of thing, but, hey, why not? And then it appeared, three times.

How did this happen? Arrow's theorem. Let me explain.

Catherine Rampell highlights this stunning Gallup Poll result:

6 percent of Americans in households earning over $250,000 a year think their taxes are "too low." Of that same group, 26 percent said their taxes were "about right," and a whopping 67 percent said their taxes were "too high."

OK, fine. Most people don't like taxes. No surprise there. But get this next part:

And yet when this same group of high earners was asked whether "upper-income people" paid their fair share in taxes, 30 percent said "upper-income people" paid too little, 30 percent said it was a "fair share," and 38 percent said it was too much.

30 percent of these upper-income people say that upper-income people pay too little, but only 6 percent say that they personally pay too little. 38% say that upper-income people pay too much, but 67% say they personally pay too much.

We were having so much fun on this thread that I couldn't resist linking to this news item by Adrian Chen. The good news is that Scott Adams (creater of the Dilbert comic strip) "has a certified genius IQ" and that he "can open jars with [his] bare hands." He is also "able to lift heavy objects." Cool!

In all seriousness, I knew nothing about this aspect of Adams when I wrote the earlier blog. I was just surprised (and remain surprised) that he was so impressed with Charlie Sheen for being good-looking and being able to remember his lines. At the time I thought it was just a matter of Adams being overly-influenced by his direct experience, along with some satisfaction in separating himself from the general mass of Sheen-haters out there. But now I wonder if something more is going on, that maybe he feels that he and Sheen are on the same side in a culture war.

In any case, the ultimate topic of interest here is not Sheen or Adams but rather more general questions of what it takes for someone to root for someone. I agree with some of the commenters on the earlier thread that it's not about being a good guy or a bad guy. Lots of people rooted for the Oakland Raiders (sorry, I'm showing my age here), maybe partly because of their reputation as bad boys. And Charlie Sheen is definitely an underdog right now.

P.S. Amazingly enough, Chen includes a link to a Dilbert strip mocking the very behavior that Adams was doing. Not a bit deal but it's a bit odd.

P.P.S. No, I'm not Dilbert-obsessed! It just happened that I was reading Gawker (sorry!) and the Scott Adams entry caught my eye.

P.P.P.S. My favorite part of this whole story is Russell's-paradox-evoking thread centered around Adams's self-contradicting statement, "You're talking about Scott Adams. He's not talking about you."

Awhile ago I was cleaning out the closet and found some old unread magazines. Good stuff. As we've discussed before, lots of things are better read a few years late.

Today I was reading the 18 Nov 2004 issue of the London Review of Books, which contained (among other things) the following:

- A review by Jenny Diski of a biography of Stanley Milgram. Diski appears to want to debunk:

Milgram was a whiz at devising sexy experiments, but barely interested in any theoretical basis for them. They all have the same instant attractiveness of style, and then an underlying emptiness.

Huh? Michael Jordan couldn't hit the curveball and he was reportedly an easy mark for golf hustlers but that doesn't diminish his greatness on the basketball court.

She also criticizes Milgram for being "no help at all" for solving international disputes. OK, fine. I haven't solved any international disputes either. Milgram, though, . . . he conducted an imaginative experiment whose results stunned the world. And then in his afterlife he must suffer the indignity of someone writing that his findings are useless because people still haven't absorbed them. I agree with Diski that some theory might help, but it hardly seems to be Milgram's fault that he was ahead of his time.

- A review by Patrick Collinson of a biography of Anne Boleyn. Mildly interesting stuff, and no worse for being a few years delayed. Anne Boleyn isn't going anywhere.

- An article by Charles Glass on U.S. in Afghanistan. Apparently it was already clear in 2004 that it wasn't working. Too bad the policymakers weren't reading the London Review of Books. For me, though, it's even more instructive to see this foretold six years ago.

- A review by Wyatt Mason of a book by David Foster Wallace. Mason reviews in detail a story with a complicated caught-in-a-dream plot which the critic James Wood, writing for the New Republic, got completely wrong. Wood got a key plot point backwards and as a result misunderstands the story and blames Wallace for creating an unsympathetic character.

Again, the time lag adds an interesting twist. I was curious as to whether Wood ever acknowledged Mason's correctly, or apologized to Wallace for misreading his story, so I Googled "james wood david foster wallace." What turned up was a report by James Yeh of a lecture by Wood at the 92nd St. Y on Wallace after the author's death. Discussing a later book by Wallace, Wood said, "Wallace gives you the key, overexplaining the hand, instead of actually being enigmatic, like Beckett."

I dunno: After reading Wood's earlier review, maybe Wallace felt he had to overexplain. Damned if you do, etc.

- A review by Hugh Pennington of some books about supermarkets that contains the arresting (to me) line:

Consumption [of chicken] in the US has increased steadily since Herbert Hoover's promise of 'a chicken in every pot' in 1928; it rose a hundredfold between 1934 and 1994, from a quarter of a chicken a year to half a chicken a week.

A hundredfold--that's a lot! I thought it best to look this one up so I Googled "chicken consumption usda" and came up with this document by Jean Buzby and Hodan Farah, which contains this delightfully-titled graph:


OK, so it wasn't a hundredfold increase, actually only sixfold. People were eating way more than a quarter of a chicken a year in 1934. And chicken consumption did not increase steadily since 1928. The curve is flat until the early 1940s.

This got me curious: who is Hugh Pennington, exactly? In that issue of the LRB, it says he "sits on committees that advise the World Food Programme and the Food Standards Agency. I guess he was just having a bad day, or maybe his assistant gave him some bad figures. Too bad they didn't have Google back in 1994 or he could've looked up the numbers directly. "A hundredfold" . . . didn't that strike him as a big number??

Jonathan Chait writes that the most important aspect of a presidential candidate is "political talent":

Republicans have generally understood that an agenda tilted toward the desires of the powerful requires a skilled frontman who can pitch Middle America. Favorite character types include jocks, movie stars, folksy Texans and war heroes. . . . [But the frontrunners for the 2012 Republican nomination] make Michael Dukakis look like John F. Kennedy. They are qualified enough to serve as president, but wildly unqualified to run for president. . . . [Mitch] Daniels's drawbacks begin -- but by no means end -- with his lack of height, hair and charisma. . . . [Jeb Bush] suffers from an inherent branding challenge [because of his last name]. . . . [Chris] Christie . . . doesn't cut a trim figure and who specializes in verbally abusing his constituents. . . . [Haley] Barbour is the comic embodiment of his party's most negative stereotypes. A Barbour nomination would be the rough equivalent of the Democrats' nominating Howard Dean, if Dean also happened to be a draft-dodging transsexual owner of a vegan food co-op.

Chait continues:

The impulse to envision one of these figures as a frontman represents a category error. These are the kind of people you want advising the president behind the scenes; these are not the people you put in front of the camera. The presidential candidate is the star of a television show about a tall, attractive person who can be seen donning hard hats, nodding at the advice of military commanders and gazing off into the future.

Geddit? Mike Dukakis was short, ethnic-looking, and didn't look good in a tank. (He did his military service in peacetime.) And did I mention that his middle name was Stanley? Who would vote for such a jerk?

All I can say is that Dukakis performed about as well in 1988 as would be predicted from the economy at the time. Here's a graph based on Doug Hibbs's model:


Sorry, but I don't think the Democrats would've won the 1988 presidential election even if they'd had Burt Reynolds at the top of the ticket. And, remember, George H. W. Bush was widely considered to be a wimp and a poser until he up and won the election. Conversely, had Dukakis won (which he probably would've, had the economy been slumping that year), I think we'd be hearing about how he was a savvy low-key cool dude.

Let me go on a bit more about the 1988 election.

Of Beauty, Sex, and Power: Statistical Challenges in Estimating Small Effects. At the Institute of Policy Research, Thurs 7 Apr 2011, 3.30pm.

Regular blog readers know all about this topic. (Here are the slides.) But, rest assured, I don't just mock. I also offer constructive suggestions.

My last talk at Northwestern was fifteen years ago. Actually, I gave two lectures then, in the process of being turned down for a jobenjoying their chilly Midwestern hospitality.

P.S. I searched on the web and also found this announcement which gives the wrong title.

Tyler Cowen links to this article by Matt Ridley that manages to push all my buttons. Ridley writes:

As many of you know, this blog is on an approximate one-month delay. I schedule my posts to appear roughly once a day, and there's currently a backlog of about 20 or 30 posts.

Recently I've decided to spend less time blogging, but I have some ideas I'd still like to share. To tweet, if you will. So I thought I'd just put a bunch of ideas out there that interested readers could follow up on. Think of it like one of those old-style dot-dot-dot newspaper columns.

After seeing my recent blogs on Nathan Myhrvold, a friend told me that, in the tech world, the albedo-obsessed genius is known as a patent troll.


Yup. My friend writes:

Physics is hard


Readers of this bizarre story (in which a dubious claim about reflectivity of food in cooking transmuted into a flat-out wrong claim about the relevance of reflectivity of solar panels) might wonder how genius Nathan Myhrvold (Ph.D. in theoretical physics from Princeton at age 24, postdoc with Stephen Hawking for chrissake) could make such a basic mistake.

In an earlier comment, I dismissed this with a flip allusion to Wile E. Coyote. But now I'm thinking there's something more going on.

In our blog discussion (see links above), Phil is surprised I didn't take a stronger stance on the albedo issue after reading Pierrehumbert's explanation. Phil asks: Why did I write "experts seem to think the albedo effect is a red herring" instead of something stronger such as, "as Pierrehumbert shows in detail, the albedo effect is a red herring"?

I didn't do this because my physics credentials are no better than Myhrvold's. And, given that Myhrvold got it wrong, I don't completely trust myself to get it right!

I majored in physics in college and could've gone to grad school in physics--I actually almost did so, switching to statistics at the last minute. I could be a Ph.D. in physics too. But I've never had a great physical intuition. I could definitely get confused by a slick physics argument. And I suspect Myhrvold is the same way. Given what he's written on albedo, I doubt his physics intuition is anywhere near as good as Phil's. My guess is that Myhrvold, like me, got good grades and was able to solve physics problems but made a wise choice in leaving physics to do something else.

Now, it's true, I don't think I would've made Myhrvold's particular mistake, because I would've checked--to start with, I would've asked my friends Phil and Upmanu before making any public claims about physics. In that sense, the difference between me and Myhrvold is not that I know more (or less) than he does, but that I have more of a clear sense of my areas of ignorance.

P.S. I'm on a Windows machine but my spell checker keeps flagging "Myhrvold." I'm surprised that in all his years there, he didn't use his influence to put his name in the dictionary. Then again, "Obama" gets flagged as a typo too. But "Clinton" it knows about. Hmm, lemme try some more: "Dukakis" gets flagged. But not "Reagan" or "Nixon" or "Roosevelt." Or "Quayle." If I were Nathan Myhrvold or Mike Dukakis, I'd be pretty annoyed at this point. Getting frozen out by Reagan or Roosevelt, fine. But Quayle??

I followed the link of commenter "Epanechnikov" to his blog, where I found, among other things, an uncritical discussion of Richard von Mises's book, "Probability, Statistics and Truth."

The bad news is that, based on the evidence of his book, Mises didn't seem to understand basic ideas of statistical significance. See here, Or at the very least, he was grossly overconfident (which can perhaps be seen from the brash title of his book). This is not the fault of "Epanechnikov," but I just thought that people should be careful about taking too seriously the statistical philosophy of someone who didn't think to do a chi-squared test when it was called for. (This is not a Bayesian/non-Bayesian thing; it's just basic statistics.)

A commenter wrote (by email):

I've noticed that you've quit approving my comments on your blog. I hope I didn't anger you in some way or write something you felt was inappropriate.

My reply:

I have not been unapproving any comments. If you have comments that have not appeared, they have probably been going into the spam filter. I get literally thousands of spam comments a day and so anything that hits the spam filter is gone forever. I think there is a way to register as a commenter; that could help.

What Zombies see in Scatterplots


This video caught my interest - news video clip
(from this

The news commentator did seem to be trying to point out what a couple of states had to say about the claimed relationship - almost on their own.

Some methods have been worked out for zombies to do just this!

So I grabbed the data as close as I quickly could, modified the code slightly and here's the zombie veiw of it.


North Carolina is the bolded red curve, Idaho the bolded green curve.
Missisipi and New York are the bolded blue.

As ugly as it is this is the Bayasian marginal picture - exactly (given MCMC errror).

p.s. you will get a very confusing picture if you forget to centre the x (i.e. see chapter 4 of Gelman and Hill book)

This was just bizarre. It's an interview with Colin Camerer, a professor of finance and economics at Caltech, The topic is Camerer's marriage, but what's weird is that he doesn't say anything specific about his wife at all. All we get are witticisms of the sub-Henny-Youngman level, for example, in response to the question, "Any free riding in your household?", Camerer says:

No. Here's why: I am one of the world's leading experts on psychology, the brain and strategic game theory. But my wife is a woman. So it's a tie.

Also some schoolyard evolutionary biology ("men signaling that they will help you raise your baby after conception, and women signaling fidelity" blah blah blah) and advice for husbands in "upper-class marriages with assets." (No advice to the wives, but maybe that's a good thing.) And here are his insights on love and marriage:

Marriage is like hot slow-burning embers compared to the flashy flames of love. After the babies, the married brain has better things to do--micromanage, focus on those babies, create comfort zones. Marriage love can then burrow deeper, to the marrow.

To the marrow, eh? And what about couples who have no kids? Then maybe you're burrowing through the skin and just to the surface of the bone, I guess.

It seems like a wasted opportunity, really: this dude could've shared insights from his research and discussed its applicability (or the limitations of its applicability) to love, a topic that everybody cares about. (In contrast, another interview in this Economists in Love series, by Daniel Hamermesh, was much more to the point.)

Yeah, sure, I'm a killjoy, the interview is just supposed to be fluff, etc. Still, what kind of message are you sending when you define yourself as "one of the world's leading experts on psychology" and define your wife as "a woman"? Yes, I realize it's supposed to be self-deprecating, but to me it comes off as self-deprecating along the lines of, "Yeah, my cat's much smarter than I am. She gets me to do whatever she wants. Whenever she cries out, I give her food."

I'm not talking about political correctness here. I'm more worried about the hidden assumptions that can sap one's research, as well as the ways in which subtle and interesting ideas in psychology can become entangled with various not-so-subtle, not-so-well-thought-out ideas on sex roles etc.

I'm being completely unfair to Camerer

I have no idea how this interview was conducted but it could well have been done over the phone in ten minutes. Basically, Camerer is a nice guy and when these reporters called him up to ask him some questions, he said, Sure, why not. And then he said whatever came to his mind. If I were interviewed without preparation and allowed to ramble, I'd say all sorts of foolish things too. So basically I'm slamming Camerer for being nice enough to answer a phone call and then having the misfortune to see has casual thoughts spread all over the web (thanks to a link from Tyler Cowen, who really should've known better). So I take it all back.

P.S. Camerer's webpage mentions that he received his Ph.D. in 1981 at the age of 22. Woudn't it be more usual to simply give your birth year (1958 or 1959, in this case)? Perhaps it's some principle of behavioral economics, that if people have to do the subtraction they'll value the answer a bit more.

Heat map


Jarad Niemi sends along this plot:


and writes:

2010-2011 Miami Heat offensive (red), defensive (blue), and combined (black) player contribution means (dots) and 95% credible intervals (lines) where zero indicates an average NBA player. Larger positive numbers for offensive and combined are better while larger negative numbers for defense are better.

In retrospect, I [Niemi] should have plotted -1*defensive_contribution so that larger was always better. The main point with this figure is that this awesome combination of James-Wade-Bosh that was discussed immediately after the LeBron trade to the Heat has a one-of-these-things-is-not-like-the-other aspect. At least according to my analysis, Bosh is hurting his team compared to the average player (although not statistically significant) due to his terrible defensive contribution (which is statistically significant).

All fine so far. But the punchline comes at the end, when he writes:

Anyway, a reviewer said he hated the figure and demanded to see a table with the actual numbers instead.


Statisticians vs. everybody else


Statisticians are literalists.

When someone says that the U.K. boundary commission's delay in redistricting gave the Tories an advantage equivalent to 10 percent of the vote, we're the kind of person who looks it up and claims that the effect is less than 0.7 percent.

When someone says, "Since 1968, with the single exception of the election of George W. Bush in 2000, Americans have chosen Republican presidents in times of perceived danger and Democrats in times of relative calm," we're like, Hey, really? And we go look that one up too.

And when someone says that engineers have more sons and nurses have more daughters . . . well, let's not go there.

So, when I was pointed to this blog by Michael O'Hare making the following claim, in the context of K-12 education in the United States:

My [O'Hare's] favorite examples of this junk [educational content with no workplace value] are spelling and pencil-and-paper algorithm arithmetic. These are absolutely critical for a clerk in an office of fifty years ago, but being good at them is unrelated to any real mental ability (what, for example, would a spelling bee in Chinese be?) and worthless in the world we live in now. I say this, by the way, aware that I am the best speller that I ever met (and a pretty good typist). But these are idiot-savant abilities, genetic oddities like being able to roll your tongue. Let's just lose them.

My first reaction was: Are you sure? I also have no systematic data on this, but I strongly doubt that being able to spell and add are "unrelated to any real world abilities" and are "genetic oddities like being able to roll your tongue." For one thing, people can learn to spell and add but I think it's pretty rare for anyone to learn how to roll their tongue! Beyond this, I expect that one way to learn spelling is to do a lot of reading and writing, and one way to learn how to add is to do a lot of adding (by playing Monopoly or whatever). I'd guess that these are indeed related to "real mental ability," however that is defined.

My guess is that, to O'Hare, my reactions would miss the point. He's arguing that schools should spend less time teaching kids spelling and arithmetic, and his statements about genetics, rolling your tongue, and the rest are just rhetorical claims. I'm guessing that O'Hare's view on the relation between skills and mental ability, say, is similar to Tukey's attitude about statistical models: they're fine as an inspiration for statistical methods (for Tukey) or as an inspiration for policy proposals (for O'Hare), but should not be taken literally. That things I write are full of qualifications, which might be a real hindrance if you're trying to propose policy changes.

This came in the inbox today:

Chris Masse points me to this response by Daryl Bem and two statisticians (Jessica Utts and Wesley Johnson) to criticisms by Wagenmakers of Bem's recent ESP study. I have nothing to add but would like to repeat a couple bits of my discussions of last month, of here:

Classical statistical methods that work reasonably well when studying moderate or large effects (see the work of Fisher, Snedecor, Cochran, etc.) fall apart in the presence of small effects.

I think it's naive when people implicitly assume that the study's claims are correct, or the study's statistical methods are weak. Generally, the smaller the effects you're studying, the better the statistics you need. ESP is a field of small effects and so ESP researchers use high-quality statistics.

To put it another way: whatever methodological errors happen to be in the paper in question, probably occur in lots of researcher papers in "legitimate" psychology research. The difference is that when you're studying a large, robust phenomenon, little statistical errors won't be so damaging as in a study of a fragile, possibly zero effect.

In some ways, there's an analogy to the difficulties of using surveys to estimate small proportions, in which case misclassification errors can loom large.

And here:

[One thing that Bem et al. and Wagenmakers et al. both miss] is that Bayes is not just about estimating the weight of evidence in favor of a hypothesis. The other key part of Bayesian inference--the more important part, I'd argue--is "shrinkage" or "partial pooling," in which estimates get pooled toward zero (or, more generally, toward their estimates based on external information).

Shrinkage is key, because if all you use is a statistical significance filter--or even a Bayes factor filter--when all is said and done, you'll still be left with overestimates. Whatever filter you use--whatever rule you use to decide whether something is worth publishing--I still want to see some modeling and shrinkage (or, at least, some retrospective power analysis) to handle the overestimation problem. This is something Martin and I discussed in our discussion of the "voodoo correlations" paper of Vul et al.

Finally, my argument for why a top psychology journal should never have published Bem's article:

I mean, how hard would it be for the experimenters to gather more data, do some sifting, find out which subjects are good at ESP, etc. There's no rush, right? No need to publish preliminary, barely-statistically-significant findings. I don't see what's wrong with the journal asking for better evidence. It's not like a study of the democratic or capitalistic peace, where you have a fixed amount of data and you have to learn what you can. In experimental psychology, once you have the experiment set up, it's practically free to gather more data.

I made this argument in response to a generally very sensible paper by Tal Yarkoni on this topic.

P.S. Wagenmakers et al. respond (to Bem et al., that is, not to me). As Tal Yarkoni would say, I agree with Wagenmakers et al. on the substantive stuff. But I still think that both they and Bem et al. err in setting up their models so starkly: either there's ESP or there's not. Given the long history of ESP experiments (as noted by some of the commenters below), it seems more reasonable to me to suppose that these studies have some level of measurement error of magnitude larger than that of any ESP effects themselves.

As I've already discussed, I'm not thrilled with the discrete models used in these discussions and I am for some reason particularly annoyed by the labels "Strong," "Substantial," "Anecdotal" in figure 4 of Wagenmakers et al. Whether or not a study can be labeled "anecdotal" seems to me to be on an entirely different dimension than what they're calculating here. Just for example, suppose you conduct a perfect randomized experiment on a large random sample of people. There's nothing anecdotal at all about this (hypothetical) study. As I've described it, it's the opposite of anecdotal. Nonetheless, it might very well be that the effect under study is tiny, in which case a statistical analysis (Bayesian or otherwise) is likely to report no effect. It could fall into the "anecdotal" category used by Wagenmakers et al. But that would be an inappropriate and misleading label.

That said, I think people have to use what statistical methods they're comfortable with, so it's sort of silly for me to fault Wagenmakers et al. for not using the sorts of analysis I would prefer. The key point that they and other critics have made is that the Bem et al. analyses aren't quite as clean as a casual observer might think, and it's possible to make that point coming from various statistical directions. As I note above, my take on this is that if you study very small effects, then no amount of statistical sophistication will save you. If it's really true, as commenter Dean Radin writes below, that these studies "took something like 6 or 7 years to complete," then I suppose it's no surprise that something turned up.

Matthew Yglesias links approvingly to the following statement by Michael Mandel:

Homeland Security accounts for roughly 90% of the increase in federal regulatory employment over the past ten years.

Roughly 90%, huh? That sounds pretty impressive. But wait a minute . . . what if total federal regulatory employment had increased a bit less. Then Homeland Security could've accounted for 105% of the increase, or 500% of the increase, or whatever. The point is the change in total employment is the sum of a bunch of pluses and minuses. It happens that, if you don't count Homeland Security, the total hasn't changed much--I'm assuming Mandel's numbers are correct here--and that could be interesting.

The "roughly 90%" figure is misleading because, when written as a percent of the total increase, it's natural to quickly envision it as a percentage that is bounded by 100%. There is a total increase in regulatory employment that the individual agencies sum to, but some margins are positive and some are negative. If the total happens to be near zero, then the individual pieces can appear to be large fractions of the total, even possibly over 100%.

I'm not saying that Mandel made any mistakes, just that, in general, ratios can be tricky when the denominator is the sum of positive and negative parts. In this particular case, the margins were large but not quite over 100%, which somehow gives the comparison more punch than it deserves, I think.

We discussed a mathematically identical case a few years ago involving the 2008 Democratic primary election campaign.

What should we call this?

There should be a name for this sort of statistical slip-up. The Fallacy of the Misplaced Denominator, perhaps? The funny thing is that the denominator has to be small (so that the numerator seems like a lot, "90%" or whatever) but not too small (because if the ratio is over 100%, the jig is up).

P.S. Mandel replies that, yes, he agrees with me in general about the problems of ratios where the denominator is a sum of positive and negative components, but that in this particular case, "all the major components of regulatory employment change are either positive or a very tiny negative." So it sounds like I was choosing a bad example to make my point!

New innovations in spam


I received the following (unsolicited) email today:

Hello Andrew,

I'm interested in whether you are accepting guest article submissions for your site Statistical Modeling, Causal Inference, and Social Science? I'm the owner of the recently created nonprofit site and am interested in writing / submitting an article for your consideration to be published on your site. Is that something you'd be willing to consider, and if so, what specs in terms of topics or length requirements would you be looking for?

Thanks you for your time, and if you have any questions or are interested, I'd appreciate you letting me know.

Samantha Rhodes


P.S. My vote for most obnoxious spam remains this one, which does its best to dilute whatever remains of the reputation of Wolfram Research. Or maybe that particular bit of spam was written by a particularly awesome cellular automaton that Wolfram discovered? I guess in the world of big-time software it's ok to lie if it might net you a bit of money.

Kaiser nails it. The offending article, by John Tierney, somehow ended up in the Science section rather than the Opinion section. As an opinion piece (or, for that matter, a blog), Tierney's article would be nothing special. But I agree with Kaiser that it doesn't work as a newspaper article. As Kaiser notes, this story involves a bunch of statistical and empirical claims that are not well resolved by P.R. and rhetoric.

My Wall Street Journal story


I was talking with someone the other day about the book by that Yale law professor who called her kids "garbage" and didn't let them go to the bathroom when they were studying piano . . . apparently it wasn't so bad as all that, she was misrepresented by the Wall Street Journal excerpt:

"I was very surprised," she says. "The Journal basically strung together the most controversial sections of the book. And I had no idea they'd put that kind of a title on it. . . . "And while it's ultimately my responsibility -- my strict Chinese mom told me 'never blame other people for your problems!' -- the one-sided nature of the excerpt has really led to some major misconceptions about what the book says, and about what I really believe."

I don't completely follow her reasoning here: just because, many years ago, her mother told her a slogan about not blaming other people, therefore she can say, "it's ultimately my responsibility"? You can see the illogic of this by flipping it around. What if her mother had told her that nothing is really your fault, everything you do is a product of what came before you, etc.? Then would she be able to say that that WSJ article is not her responsibility?

But I digress.

What I really want to say here is that I find completely plausible the claim that the Wall Street Journal sensationalized her book. I say this based on an experience I had last year.

Seeing as the Freakonomics people were kind enough to link to my list of five recommended books, I'll return the favor and comment on a remark from Levitt, who said:

Thiel update


A year or so ago I discussed the reasoning of zillionaire financier Peter Thiel, who seems to believe his own hype and, worse, seems to be able to convince reporters of his infallibility as well. Apparently he "possesses a preternatural ability to spot patterns that others miss."

More recently, Felix Salmon commented on Thiel's financial misadventures:

Peter Thiel's hedge fund, Clarium Capital, ain't doing so well. Its assets under management are down 90% from their peak, and total returns from the high point are -65%. Thiel is smart, successful, rich, well-connected, and on top of all that his calls have actually been right . . . None of that, clearly, was enough for Clarium to make money on its trades: the fund was undone by volatility and weakness in risk management.

There are a few lessons to learn here.

Firstly, just because someone is a Silicon Valley gazillionaire, or any kind of successful entrepreneur for that matter, doesn't mean they should be trusted with other people's money.

Secondly, being smart is a great way of getting in to a lot of trouble as an investor. In order to make money in the markets, you need a weird combination of arrogance and insecurity. Arrogance on its own is fatal, but it's also endemic to people in Silicon Valley who are convinced that they're rich because they're smart, and that since they're still smart, they can and will therefore get richer still. . . .

Just to be clear, I'm not saying that Thiel losing money is evidence that he's some sort of dummy. (Recall my own unsuccess as an investor.) What I am saying is, don't believe the hype.

R Advertised


The R language is definitely going mainstream:


Benedict Carey writes a follow-up article on ESP studies and Bayesian statistics. (See here for my previous thoughts on the topic.) Everything Carey writes is fine, and he even uses an example I recommended:

The statistical approach that has dominated the social sciences for almost a century is called significance testing. The idea is straightforward. A finding from any well-designed study -- say, a correlation between a personality trait and the risk of depression -- is considered "significant" if its probability of occurring by chance is less than 5 percent.

This arbitrary cutoff makes sense when the effect being studied is a large one -- for example, when measuring the so-called Stroop effect. This effect predicts that naming the color of a word is faster and more accurate when the word and color match ("red" in red letters) than when they do not ("red" in blue letters), and is very strong in almost everyone.

"But if the true effect of what you are measuring is small," said Andrew Gelman, a professor of statistics and political science at Columbia University, "then by necessity anything you discover is going to be an overestimate" of that effect.

The above description of classical hypothesis testing isn't bad. Strictly speaking, one would follow "is less than 5 percent" above with "if the null hypothesis of zero effect were actually true," but they have serious space limitations, and I doubt many readers would get much out of that elaboration, so I'm happy with what Carey put there.

One subtlety that he didn't quite catch was the way that researchers mix the Neyman-Pearson and Fisher approaches to inference. The 5% cutoff (associated with Neyman and Pearson) is indeed standard, and it is indeed subject to all the problems we know about, most simply that statistical significance occurs at least 5% of the time, so if you do a lot of experiments you're gonna have a lot of chances to find statistical significance. But p-values are also used as a measure of evidence: that's Fisher's approach and it leads to its own problems (as discussed in the news article as well).

The other problem, which is not so well known, comes up in my quote: when you're studying small effects and you use statistical significance as a filter and don't do any partial pooling, whatever you have that's left standing that survives the filtering process will overestimate the true effect. And classical corrections for "multiple comparisons" do not solve the problem: they merely create a more rigorous statistical significance filter, but anything that survives that filter will be even more of an overestimate.

If classical hypothesis testing is so horrible, how is it that it could be so popular? In particular, what was going on when a well-respected researcher like this ESP guy would use inappropriate statistical methods.

My answer to Carey was to give a sort of sociological story, which went as follows.

Psychologists have experience studying large effects, the sort of study in which data from 24 participants is enough to estimate a main effect and 50 will be enough to estimate interactions of interest. I gave the example of the Stroop effect (they have a nice one of those on display right now at the Natural History Museum) as an example of a large effect where classical statistics will do just fine.

My point was, if you've gone your whole career studying large effects with methods that work, then it's natural to think you have great methods. You might not realize that your methods, which appear quite general, actually fall apart when applied to small effects. Such as ESP or human sex ratios.

The ESP dude was a victim of his own success: His past accomplishments studying large effects gave him an unwarranted feeling of confidence that his methods would work on small effects.

This sort of thing comes up a lot, and in my recent discussion of Efron's article, I list it as my second meta-principle of statistics, the "methodological attribution problem," which is that people think that methods that work in one sort of problem will work in others.

The other thing that Carey didn't have the space to include was that Bayes is not just about estimating the weight of evidence in favor of a hypothesis. The other key part of Bayesian inference--the more important part, I'd argue--is "shrinkage" or "partial pooling," in which estimates get pooled toward zero (or, more generally, toward their estimates based on external information).

Shrinkage is key, because if all you use is a statistical significance filter--or even a Bayes factor filter--when all is said and done, you'll still be left with overestimates. Whatever filter you use--whatever rule you use to decide whether something is worth publishing--I still want to see some modeling and shrinkage (or, at least, some retrospective power analysis) to handle the overestimation problem. This is something Martin and I discussed in our discussion of the "voodoo correlations" paper of Vul et al.

Should the paper have been published in a top psychology journal?

Real-life psychology researcher Tal Yarkoni adds some good thoughts but then he makes the ridiculous (to me) juxtaposition of the following two claims: (1) The ESP study didn't find anything real, there's no such thing as ESP, and the study suffered many methodological flaws, and (2) The journal was right to publish the paper.

If you start with (1), I don't see how you get to (2). I mean, sure, Yarkoni gives his reasons (basically, the claim that the ESP paper, while somewhat crappy, is no crappier than most papers that are published in top psychology journals), but I don't buy it. If the effect is there, why not have them demonstrated it for real? I mean, how hard would it be for the experimenters to gather more data, do some sifting, find out which subjects are good at ESP, etc. There's no rush, right? No need to publish preliminary, barely-statistically-significant findings. I don't see what's wrong with the journal asking for better evidence. It's not like a study of the democratic or capitalistic peace, where you have a fixed amount of data and you have to learn what you can. In experimental psychology, once you have the experiment set up, it's practically free to gather more data.

P.S. One thing that saddens me is that, instead of using the sex-ratio example (which I think would've been perfect for this article, Carey uses the following completely fake example:

Consider the following experiment. Suppose there was reason to believe that a coin was slightly weighted toward heads. In a test, the coin comes up heads 527 times out of 1,000.

And they he goes on two write about coin flipping. But, as I showed in my article with Deb, there is no such thing as a coin weighted to have a probability p (different from 1/2) of heads.

OK, I know about fake examples. I'm writing an intro textbook, and I know that fake examples can be great. But not this one!

P.P.S. I'm also disappointed he didn't use the famous dead-fish example, where Bennett, Baird, Miller, and Wolferd found statistically significant correlations in an MRI of a dead salmon. The correlations were not only statistically significant, they were large and newsworthy!

P.P.P.S. The Times does this weird thing with its articles where it puts auto-links on Duke University, Columbia University, and the University of Missouri. I find this a bit distracting and unprofessional.

I received the following in email from our publisher:

I write with regards to the project to publish a China Edition of your book "Data Analysis Using Regression and Multilevel/Hierarchical Models" (ISBN-13: 9780521686891) for the mainland Chinese market. I regret to inform you that we have been notified by our partner in China, Posts & Telecommunications Press (PTP), that due to various politically sensitive materials in the text, the China Edition has not met with the approval of the publishing authorities in China, and as such PTP will not be able to proceed with the publication of this edition. We will therefore have to cancel plans for the China Edition of your book. Please accept my apologies for this unforeseen development. If you have any queries regarding this, do feel free to let me know.

Oooh, it makes me feel so . . . subversive. It reminds me how, in Sunday school, they told us that if we were ever visiting Russia, we should smuggle Bibles in our luggage because the people there weren't allowed to worship.

Xiao-Li Meng told me once that in China they didn't teach Bayesian statistics because the idea of a prior distribution was contrary to Communism (since the "prior" represented the overthrown traditions, I suppose).

And then there's this.

I think that the next printing of our book should have "Banned in China" slapped on the cover. That should be good for sales, right?

P.S. Update here.

John Talbott points me to this, which I briefly mocked a couple months ago. I largely agree with the critics of this research, but I want to reiterate my point from earlier that all the statistical sophistication in the world won't help you if you're studying a null effect. This is not to say that the actual effect is zero--who am I to say?--just that the comments about the high-quality statistics in the article don't say much to me.

There's lots of discussion of the lack of science underlying ESP claims. I can't offer anything useful on that account (not being a psychologist, I could imagine all sorts of stories about brain waves or whatever), but I would like to point out something that usually doesn't seem to get mentioned in these discussions, which is that lots of people want to believe in ESP. After all, it would be cool to read minds. (It wouldn't be so cool, maybe, if other people could read your mind and you couldn't read theirs, but I suspect most people don't think of it that way.) And ESP seems so plausible, in a wish-fulfilling sort of way. It really feels like if you concentrate really hard, you can read minds, or predict the future, or whatever. Heck, when I play squash I always feel that if I really really try hard, I should be able to win every point. The only thing that stops me from really believing this is that I realize that the same logic holds symmetrically for my opponent. But with ESP, absent a controlled study, it's easy to see evidence all around you supporting your wishful thinking. (See my quote in bold here.) Recall the experiments reported by Ellen Langer, that people would shake their dice more forcefully when trying to roll high numbers and would roll gently when going for low numbers.

When I was a little kid, it was pretty intuitive to believe that if I really tried, I could fly like Superman. There, of course, there was abundant evidence--many crashes in the backyard--that it wouldn't work. For something as vague as ESP, that sort of simple test isn't there. And ESP researchers know this--they use good statistics--but it doesn't remove the element of wishful thinking. And, as David Weakiem and I have discussed, classical statistical methods that work reasonably well when studying moderate or large effects (see the work of Fisher, Snedecor, Cochran, etc.) fall apart in the presence of small effects.

I think it's naive when people implicitly assume that the study's claims are correct, or the study's statistical methods are weak. Generally, the smaller the effects you're studying, the better the statistics you need. ESP is a field of small effects and so ESP researchers use high-quality statistics.

To put it another way: whatever methodological errors happen to be in the paper in question, probably occur in lots of researcher papers in "legitimate" psychology research. The difference is that when you're studying a large, robust phenomenon, little statistical errors won't be so damaging as in a study of a fragile, possibly zero effect.

In some ways, there's an analogy to the difficulties of using surveys to estimate small proportions, in which case misclassification errors can loom large, as discussed here.

Now to criticize the critics: some so-called Bayesian analysis that I don't really like

I agree with the critics of the ESP paper that Bayesian analysis is a good way to combine the results of this not-so-exciting new finding that people in the study got 53% correct instead of the expected 50% correct, with the long history of research in this area.

But I wouldn't use the Bayesian methods that these critics recommend. In particular, I think it's ludicrous for Wagenmakers to claim a prior probability of 10^-20 for ESP, and I also think that they're way off base when they start talking about "Bayesian t-tests" and point null hypotheses. I think a formulation based on measurement-error models would be far more useful. I'm very disturbed by purportedly Bayesian methods that start with meaningless priors which then yield posterior probabilities that, instead of being interpreted quantitatively, have to be converted to made-up categories such as "extreme evidence," "very strong evidence," "anecdotal evidence," and the like. This seems to me to be taking some of the most arbitrary aspects of classical statistics. Perhaps I should call this the "no true Bayesian" phenomenon.

And, if you know me at all (in a professional capacity), you'll know I hate statements like this:

Another advantage of the Bayesian test that it is consistent: as the number of participants grows large, the probability of discovering the true hypothesis approaches 1.

The "true hypothesis," huh? I have to go to bed now (no, I'm not going to bed at 9am; I set this blog up to post entries automatically every morning). If you happen to run into an experiment of interest in which psychologists are "discovering a true hypothesis," (in the statistical sense of a precise model), feel free to wake me up and tell me. It'll be newsworthy, that's for sure.

Anyway, the ESP thing is pretty silly and so there are lots of ways of shooting it down. I'm only picking on Wagenmakers because often we're full of uncertainty about more interesting problems For example, new educational strategies and their effects on different sorts of students. For these sorts of problems, I don't think that models of null effects, verbal characterizations of Bayes factors, and reassurances about "discovering the true hypothesis" are going to cut it. These methods are important, and I think that, even when criticizing silly studies, we should think carefully about what we're doing and what our methods are actually purporting to do.

Clarity on my email policy


I never read email before 4. That doesn't mean I never send email before 4.

A couple people pointed me to this recent news article which discusses "why, beyond middle age, people get happier as they get older." Here's the story:

When people start out on adult life, they are, on average, pretty cheerful. Things go downhill from youth to middle age until they reach a nadir commonly known as the mid-life crisis. So far, so familiar. The surprising part happens after that. Although as people move towards old age they lose things they treasure--vitality, mental sharpness and looks--they also gain what people spend their lives pursuing: happiness.

This curious finding has emerged from a new branch of economics that seeks a more satisfactory measure than money of human well-being. Conventional economics uses money as a proxy for utility--the dismal way in which the discipline talks about happiness. But some economists, unconvinced that there is a direct relationship between money and well-being, have decided to go to the nub of the matter and measure happiness itself. . . There are already a lot of data on the subject collected by, for instance, America's General Social Survey, Eurobarometer and Gallup. . . .

And here's the killer graph:


All I can say is . . . it ain't so simple. I learned this the hard way. After reading a bunch of articles on the U-shaped relation between age and happiness--including some research that used the General Social Survey--I downloaded the GSS data (you can do it yourself!) and prepared some data for my introductory statistics class. I made a little dataset with happiness, age, sex, marital status, income, and a couple other variables and ran some regressions and made some simple graphs. The idea was to start with the fascinating U-shaped pattern and then discuss what could be learned further using some basic statistical techniques of subsetting and regression.

But I got stuck--really stuck. Here was my first graph, a quick summary of average happiness level (on a 0, 1, 2 scale; in total, 12% of respondents rated their happiness at 0 (the lowest level), 56% gave themselves a 1, and 32% described themselves as having the highest level on this three-point scale). And below are the raw averages of happiness vs. age. (Note: the graph has changed. In my original posted graph, I plotted the percentage of respondents of each age who had happiness levels of 1 or 2; this corrected graph plots average happiness levels.)


Uh-oh. I did this by single years of age so it's noisy--even when using decades of GSS, the sample's not infinite--but there's nothing like the famous U-shaped pattern! Sure, if you stare hard enough, you can see a U between ages 35 and 70, but the behavior from 20-35 and from 70-90 looks all wrong. There's a big difference between the publishedl graph, which has maxima at 20 and 85, and the my graph from GSS, which has minima at 20 and 85.

There are a lot of ways these graphs could be reconciled. There could be cohort or period effects, perhaps I should be controlling for other variables, maybe I'm using a bad question, or maybe I simply miscoded the data. All of these are possibilities. I spent several hours staring at the GSS codebook and playing with the data in different ways and couldn't recover the U. Sometimes I could get happiness to go up with age, but then it was just a gradual rise from age 18, without the dip around age 45 or 50. There's a lot going on here and I very well may still be missing something important. [Note: I imagine that sort of cagey disclaimer is typical of statisticians: by our training we are so aware of uncertainty. Researchers in other fields don't seem to feel the same need to do this.]

Anyway, at some point in this analysis I was getting frustrated at my inability to find the U (I felt like the characters in that old movie they used to show on TV on New Year's Eve, all looking for "the big W") and beginning to panic that this beautiful example was too fragile to survive in the classroom.

So I called Grazia Pittau, an economist (!) with whom I'd collaborated on some earlier happiness research (in which I contributed multilevel modeling and some ideas about graphs but not much of substance regarding psychology or economics). Grazia confirmed to me that the U-shaped pattern is indeed fragile, that you have to work hard to find it, and often it shows up when people fit linear and quadratic terms, in which case everything looks like a parabola. (I'd tried regressions with age & age-squared, but it took a lot of finagling to get the coefficient for age-squared to have the "correct" sign.)

And then I encountered a paper by Paul Frijters and Tony Beatton which directly addressed my confusion. Frijters and Beatton write:

Whilst the majority of psychologists have concluded there is not much of a relationship at all, the economic literature has unearthed a possible U-shape relationship. In this paper we [Frijters and Beatton] replicate the U-shape for the German SocioEconomic Panel (GSOEP), and we investigate several possible explanations for it.

They conclude that the U is fragile and that it arises from a sample-selection bias. I refer you to the above link for further discussion.

In summary: I agree that happiness and life satisfaction are worth studying--of course they're worth studying--but, in the midst of looking for explanations for that U-shaped pattern, it might be worth looking more carefully to see what exactly is happening. At the very least, the pattern does not seem to be as clear as implied from some media reports. (Even a glance at the paper by Stone, Schwartz, Broderick, and Deaton, which is the source of the top graph above, reveals a bunch of graphs, only some of which are U-shaped.) All those explanations have to be contingent on the pattern actually existing in the population.

My goal is not to debunk but to push toward some broader thinking. People are always trying to explain what's behind a stylized fact, which is fine, but sometimes they're explaining things that aren't really happening, just like those theoretical physicists who, shortly after the Fleischmann-Pons experiment, came up with ingenious models of cold fusion. These theorists were brilliant but they were doomed because they were modeling a phenomenon which (most likely) doesn't exist.

A comment from a few days ago by Eric Rasmusen seems relevant, connecting this to general issues of confirmation bias. If you make enough graphs and you're looking for a U, you'll find it. I'm not denying the U is there, I'm just questioning the centrality of the U to the larger story of age, happiness, and life satisfaction. There appear to be many different age patterns and it's not clear to me that the U should be considered the paradigm.

P.S. I think this research (even if occasionally done by economists) is psychology, not economics. No big deal--it's just a matter of terminology--but I think journalists and other outsiders can be misread if they hear about this sort of thing and start searching in the economics literature rather than in the psychology literature. In general, I think economists will have more to say than psychologists about prices, and psychologists will have more insights about emotions and happiness. I'm sure that economists can make important contributions to the study of happiness, just as psychologists can make important contributions to the study of prices, but even a magazine called "The Economist" should know the difference.

"'Why work?'"


Tyler Cowen links to a "scary comparison" that claims that "a one-parent family of three making $14,500 a year (minimum wage) has more disposable income than a family making $60,000 a year."

Kaiser Fung looks into this comparison in more detail. As Kaiser puts it:

As we said in Red State, Blue State, it's not the Prius vs. the pickup truck, it's the Prius vs. the Hummer. Here's the graph:


Or, as Ross Douthat put it in an op-ed yesterday:

This means that a culture war that's often seen as a clash between liberal elites and a conservative middle America looks more and more like a conflict within the educated class -- pitting Wheaton and Baylor against Brown and Bard, Redeemer Presbyterian Church against the 92nd Street Y, C. S. Lewis devotees against the Philip Pullman fan club.

Our main motivation for doing this work was to change how the news media think about America's political divisions, and so it's good to see our ideas getting mainstreamed and moving toward conventional wisdom.

A few days after "Dramatic study shows participants are affected by psychological phenomena from the future," (see here) the British Psychological Society follows up with "Can psychology help combat pseudoscience?."

Somehow I'm reminded of that bit of financial advice which says, if you want to save some money, your best investment is to pay off your credit card bills.

I guess there's a reason they put this stuff in the Opinion section and not in the Science section, huh?

P.S. More here.

I came across an interesting article by T. W. Farnam, "Political divide between coasts and Midwest deepening, midterm election analysis shows."

There was one thing that bugged me, though.

Near the end of the article, Farnam writes:

Latinos are not swing voters . . . Exit polls showed that 60 percent of Latino voters favored Democratic House candidates - a relatively steady proportion with the 69 percent the party took in 2006, the year it captured 31 seats.

Huh? In what sense is 60% close to 69%? That's a swing of 9 percentage points. The national swing to the Republicans can be defined in different ways (depending on how you count uncontested races, and whether you go with total vote or average district vote) but in any case was something like 8 percentage points.

The swing among Latinos was, thus, about the same as the national swing. At least based on these data, the statement "Latinos are not swing voters" does not seem supported by the facts. Unless you also want to say that whites are not swing voters either.

It probably got caught in the spam filter. We get tons and tons of spam (including the annoying spam that I have to remove by hand).

If your comment was accompanied by an ad or a spam link, then maybe I just deleted it.

Just chaid


Reading somebody else's statistics rant made me realize the inherent contradictions in much of my own statistical advice.

Is parenting a form of addiction?


The last time we encountered Slate columnist Shankar Vedantam was when he puzzled over why slightly more than half of voters planned to vote for Republican candidates, given that polls show that Americans dislike the Republican Party even more than they dislike the Democrats. Vedantam attributed the new Republican majority to irrationality and "unconscious bias." But, actually, this voting behavior is perfectly consistent with there being some moderate voters who prefer divided government. The simple, direct explanation (which Vedantam mistakenly dismisses) actually works fine.

I was flipping through Slate today and noticed a new article by Vedantam headlined, "If parenthood sucks, why do we love it? Because we're addicted." I don't like this one either.

Xian pointed me to this recycling of a classic probability error. It's too bad it was in the New York Times, but at least it was in the Opinion Pages, so I guess that's not so bad. And, on the plus side, several of the blog commenters got the point.

What I was wondering, though, was who was this "Yitzhak Melechson, a statistics professor at the University of Tel Aviv"? This is such a standard problem, I'm surprised to find a statistics professor making this mistake. I was curious what his area of research is and where he was trained.

I started by googling Yitzhak Melechson but all I could find was this news story, over and over and over and over again. Then I found Tel Aviv University and navigated to its statistics department but couldn't find any Melechson in the faculty list. Next stop: entering Melechson in the search engine at the Tel Aviv University website. It came up blank.

One last try: I entered the Yitzhak Melechson into Google Scholar. Here's what came up:

Your search - Yitzhak Melechson - did not match any articles

Computing wrong probabilities for the lottery must be a full-time job! Get this guy on the Bible Code next.

P.S. If there's some part of this story that I'm missing, please let me know. How many statistics professors could there be in Tel Aviv, anyway? Perhaps there's some obvious explanation that's eluding me.

Why it can be rational to vote

| No Comments

I think I can best do my civic duty by running this one every Election Day, just like Art Buchwald on Thanksgiving. . . .

With a national election coming up, and with the publicity at its maximum, now is a good time to ask, is it rational for you to vote? And, by extension, wass it worth your while to pay attention to whatever the candidates and party leaders have been saying for the year or so? With a chance of casting a decisive vote that is comparable to the chance of winning the lottery, what is the gain from being a good citizen and casting your vote?

The short answer is, quite a lot. First the bad news. With 100 million voters, your chance that your vote will be decisive--even if the national election is predicted to be reasonably close--is, at best, 1 in a million in a battleground district and much less in a noncompetitive district such as where I live. (The calculation is based on the chance that your district's vote will be exactly tied, along with the chance that your district's electoral vote is necessary for one party or the other to take control of a house of congress. Both these conditions are necessary for your vote to be decisive.) So voting doesn't seem like such a good investment.

But here's the good news. If your vote is decisive, it will make a difference for 300 million people. If you think your preferred candidate could bring the equivalent of a $50 improvement in the quality of life to the average American--not an implausible hope, given the size of the Federal budget and the impact of decisions in foreign policy, health, the courts, and other areas--you're now buying a $1.5 billion lottery ticket. With this payoff, a 1 in 10 million chance of being decisive isn't bad odds.

And many people do see it that way. Surveys show that voters choose based on who they think will do better for the country as a whole, rather than their personal betterment. Indeed, when it comes to voting, it is irrational to be selfish, but if you care how others are affected, it's a smart calculation to cast your ballot, because the returns to voting are so high for everyone if you are decisive. Voting and vote choice (including related actions such as the decision to gather information in order to make an informed vote) are rational in large elections only to the extent that voters are not selfish.

That's also the reason for contributing money to a candidate: Large contributions, or contributions to local elections, could conceivably be justified as providing access or the opportunity to directly influence policy. But small-dollar contributions to national elections, like voting, can be better motivated by the possibility of large social benefit than by any direct benefit to you. Such civically motivated behavior is consistent with both small and large anonymous contributions to charity.

The social benefit from voting also explains the declining response rates in opinion polls. In the 1950s, when mass opinion polling was rare, we would argue that it was more rational to respond to a survey than to vote in an election: for example, as one of 1000 respondents to a Gallup poll, there was a real chance that your response could noticeably affect the poll numbers (for example, changing a poll result from 49% to 50%). Nowadays, polls are so common that a telephone poll was done recently to estimate how often individuals are surveyed (the answer was about once per year). It is thus unlikely that a response to a single survey will have much impact.

So, yes, if you are in a district or state that might be close, it is rational to vote.

For further details, see our articles in Rationality and Society and The Economist's Voice.

I'd like to add one more thing. You've all heard about low voter turnout in America, but, among well-educated, older white people, turnout is around 90% in presidential elections. Some economists treat this as a source of amusement--and, sure, I'd be the first to admit that well-educated, older white people have done a lot of damage to this country--but it's a funny thing . . . Usually economists tend not to question the actions of this particular demographic. I'm not saying that the high turnout of these people (e.g., me) is evidence that voting is rational. But I would hope that it would cause some economists to think twice before characterizing voting as irrational or laughable.

(And, no, it's not true that "the closer an election is, the more likely that its outcome will be taken out of the voters' hands." See the appendix on the last page of this article for a full explanation, with calculus!)

I don't exactly disagree with the two arguments that I reproduce below, but I think they miss the point.

Is "the battle over elitism" really central to this election?

First, the easy one. Peter Baker in the New York Times, under the heading, "Elitism: The Charge That Obama Can't Shake":

For all the discussion of health care and spending and jobs, at the core of the nation's debate this fall has been the battle of elitism. . . . Ron Bonjean, a Republican strategist, said Mr. Obama had not connected with popular discontent. "A lot of people have never been to Washington or New York, and they feel people there are so out of touch," he said. . . . Rather than entertaining the possibility that the program they have pursued is genuinely and even legitimately unpopular, the White House and its allies have concluded that their political troubles amount to mainly a message and image problem.

I think this is misleading for the usual reason that these message-oriented critiques are misleading: When things are going well, your message is going to sound good; when things aren't going so well, it doesn't matter much how you spin things. Baker recognizes this: "the debate has taken on particular resonance in a time of economic distress." To put it another way, I disagree that the battle of elitism is "at the core of the nation's debate this fall." I think it would be more accurate to say that economic policy and outcomes are at the core of the debate, and "elitism" is just being attached to that debate. If it wasn't "elitism," it would be something else.

Is it really true that you can't blame the Democrats for their anticipated poor performance in the upcoming election?

Here's the second story I'm not thrilled with, this time from Jonathan Chait in the New Republic:

Republicans are going to gain a lot of seats in the midterm elections. The big question is why. Political punditry has been saturated with arguments interpreting this result as a verdict of sorts on the Obama administration. Liberals are interpreting the incipient GOP win on poor communication or perhaps timid policies by the Democrats. Conservatives are interpreting it as the natural punishment for a party that moved too far left. . . .

But in order to have that conversation, you need to begin with a baseline expectation. What sort of performance should we expect normally? Clearly, in the current environment, it's not rational to expect the majority party to escape any losses whatsoever. If you want to blame the Democrats' loss on bad messaging or wimpy policies or rampaging socialism, then you need to establish how you'd expect them to do given normal messaging and policies.

Chait then discusses Doug Hibbs's model, featured on this blog before, which predicts midterm election outcomes given incumbency, the current distribution of House seats, and recent economic performance. Given that Hibbs's model--which does not use any polling data--predicts a Republican gain of 45 seats, Chait concludes that you can't blame Obama (or, by extension, congressional Democrats) for what's happening now. In Chait's words, "If you want to have the "what did Obama do wrong" argument, you first need to establish what "wrong" would look like. That's probably a 50 seat-or-more loss."

But I don't think that's right. From Paul Krugman on the left to Casey Mulligan on the right, commenters have been arguing that the governing party can make a difference. Whether it's Kruguman recommending a larger stimulus or Mulligan saying the government should avoid intervention into the financial sector, the claim is that that the economy could be doing better (or worse). In statistical jargon, recent personal income is "endogenous."

Or, to put it another way, I think it's perfectly reasonable for liberals to interpret some of the current state of the economy, and thus the predicted election outcome, to "timid policies by the Democrats." That's what Krugman is saying every day. Similarly, why shouldn't conservatives think that the current economic doldrums are partly explained by the Democrats' policies, from regulation to stimulus to health care? Republicans have been making this argument for awhile, that these activist government policies are counterproductive.

I see where Chait is coming from, and I agree with him that it's silly to say that the Democrats are losing because of bad messaging or any such shallow thing, but I think he's too quick to dismiss the idea that different policies on the part of the Democrats could've led to better (or worse) economic outcomes.


Baker is following a traditional path in political journalism by focusing on what seem to me to be superficial issues--the kinds of things that voters sometimes tell pollsters but I think are ultimately driven by more fundamental concerns (the economy, security, etc).

Chait is making an opposite mistake. He's right that elections are largely determined by the fundamentals, but he's wrong to think that the governing party has no effect on the economy. (Or, maybe he's right, but if so, he should take this up with Krugman et al., not me.) To say that the fundamentals matter, that the economy is key, is not the same as saying that the president and Congress can't influence elections. The 2010 economy was not predetermined as of January, 2009. Or, if it was, a lot of people were wasting their time arguing about the macroeconomic consequences of the stimulus, the bailouts, the deficit, the tax cuts, etc etc etc.

Maria Wolters writes:

The parenting club Bounty, which distributes their packs through midwives, hospitals, and large UK supermarket and pharmacy chains, commissioned a fun little survey for Halloween from the company OnePoll. Theme: Mothers as tricksters - tricking men into fathering their babies. You can find a full smackdown courtesy of UK-based sex educator and University College London psychologist Petra Boynton here.

My talk at American University


Red State Blue State: How Will the U.S. Vote?

It's the "annual Halloween and pre-election extravaganza" of the Department of Mathematics and Statistics, and they suggested I could talk on the zombies paper (of course), but I thought the material on voting might be of more general interest.

The "How will the U.S. vote?" subtitle was not of my choosing, but I suppose I can add a few slides about the forthcoming election.

Fri 29 Oct 2010, 7pm in Ward I, in the basement of the Ward Circle building.

Should be fun. I haven't been to AU since taking a class there, over 30 years ago.

P.S. It was indeed fun. Here's the talk. I did end up briefly describing my zombie research but it didn't make it into any of the slides.

In the inbox today:

From Jimmy.

From Kieran.

The relevant references are here and, of course, here.

Shankar Vedantam writes:

Americans distrust the GOP. So why are they voting for it? . . . Gallup tells us that 71 percent of all Americans blame Republican policies for the bad economy, while only 48 percent blame the Obama administration. . . . while disapproval of congressional Democrats stands at 61 percent, disapproval of congressional Republicans stands at 67 percent.

[But] Republicans are heavily tipped to wrest control of one or both houses of Congress from the Democrats in the upcoming midterms.

Hey! I know the answer to that one. As I wrote in early September:

Subtle statistical issues to be debated on TV.


There is live debate that will available this week for those that might be interested. The topic: Can early stopped trials result in misleading results of systematic reviews?



How am I supposed to handle this sort of thing? (See below.) I just stuck it one of my email folders without responding, but then I wondered . . . what's it all about? Is there some sort of Glengarry Glen Ross-like parallel world where down-on-their-luck Jack Lemmons of public relations world send out electronic cold calls? More than anything else, this sort of thing makes me glad I have a steady job.

Here's the (unsolicited) email, which came with the subject line "Please help a reporter do his job":

Dear Andrew,

As an Editor for the Bulldog Reporter (, a media relations trade publication, my job is to help ensure that my readers have accurate info about you and send you the best quality pitches. By taking five minutes or less to answer my questions (pasted below), you'll receive targeted PR pitches from our client base that will match your beat and interests. Any help or direction is appreciated. Here are my questions.

We have you listed in our media database as : Andrew Gelman, Editor with Chance Magazine covering Gambling.

1. Which specific beats and topic areas do you cover?
2. What do the best PR people do to grab you, to get your attention and make you want to work with them?
3. On the other hand, what are some inappropriate pitches for your type of coverage (i.e., material that PR keeps sending you that you don't cover or pet peeves you may have about PR people)?
4. Can you briefly tell me about a PR pitch that resulted in a story? What was it about the pitch or PR pro that sparked your interest?

Thanks so much for helping me gather this information.


Jim Bucci
Research Journalist
Bulldog Reporter
124 Linden Street
Oakland, CA 94607

Bayes jumps the shark


John Goldin sends in this, from an interview with Alan Dershowitz:

After noticing these remarks on expensive textbooks and this comment on the company that bribes professors to use their books, Preston McAfee pointed me to this update (complete with a picture of some guy who keeps threatening to sue him but never gets around to it).

The story McAfee tells is sad but also hilarious. Especially the part about "smuck." It all looks like one more symptom of the imploding market for books. Prices for intro stat and econ books go up and up (even mediocre textbooks routinely cost $150), and the publishers put more and more effort into promotion.

McAfee adds:

I [McAfee] hope a publisher sues me about posting the articles I wrote. Even a takedown notice would be fun. I would be pretty happy to start posting about that, especially when some of them are charging $30 per article.

Ted Bergstrom and I used state Freedom of Information acts to extract the journal price deals at state university libraries. We have about 35 of them so far. Like textbooks, journals have gone totally out of control. Mostly I'm focused on journal prices rather than textbooks, although of course I contributed a free text. People report liking it and a few schools, including Harvard and NYU, used it, but it fizzled in the marketplace. I put it in to see if things like testbanks make a difference; their model is free online, cheap ($35) printed. The beauty of free online is it limits the sort of price increases your book experienced.

Here is a link to the FOIA work

which also has some discussion of the failed attempts to block us.

By the way, I had a spoof published in "Studies in Economic Analysis", a student-run journal that was purchased by Emerald Press. Emerald charges about $35 for reprints. I wrote them a take-down notice since SEA didn't bother with copyright forms so I still owned the copyright. They took it down but are not returning any money they collected on my article, pleading a lack of records. These guys are the schmucks of all schmucks.



Last night I spoke at the Columbia Club of New York, along with some of my political science colleagues, in a panel about politics, the economy, and the forthcoming election. The discussion was fine . . . until one guy in the audience accused us of bias based on what he imputed as our ethnicity. One of the panelists replied by asking the questioner what of all the things we had said was biased, and the questioner couldn't actually supply any examples.

It makes sense that the questioner couldn't come up with a single example of bias on our part, considering that we were actually presenting facts.

At some level, the questioner's imputation of our ethnicity and accusation of bias isn't so horrible. When talking with my friends, I engage in casual ethnic stereotyping all the time--hey, it's a free country!--and one can certainly make the statistical argument that you can guess people's ethnicities from their names, appearance, and speech patterns, and in turn you can infer a lot about people's political attitudes from their occupations, ethnicities, and so on. Still, I think it was a pretty rude comment and pretty pointless. How was he expecting us to respond? Maybe he thought we'd break down under the pressure and admit that we were all being programmed by our KGB handlers??

Then, later on, someone asked a truly racist question--a rant, really--that clearly had a close relation to his personal experiences even while having essentially zero connection to the real world as we understand it statistically.

I've seen the polls and I know that there are a lot of racists out there, of all stripes. Still, I don't encounter this sort of thing much in my everyday life, and it was a bit upsetting to see it in the flesh. Blog commenters come to life, as it were. (Not this blog, though!)

P.S. Yes, I realize that women and minorities have to deal with this all the time. This was the first time in my professional life that I've been accused of bias based on my (imputed) ethnicity, but I'm sure that if you're a member of a traditionally-disparaged group, it happens all over. So I'm not complaining, exactly, but it still upsets me a bit.

Somebody I know sent me a link to this news article by Martin Robbins describing a potential scientific breakthrough. I express some skepticism but in a vague enough way that, in the unlikely event that the research claim turns out to be correct, there's no paper trail showing that I was wrong. I have some comments on the graphs--the tables are horrible, no need to even discuss them!--and I'd prefer if the authors of the paper could display their data and model on a single graph. I realize that their results reached a standard level of statistical significance, but it's hard for me to interpret their claims until I see their estimates on some sort of direct real-world scale. In any case, though, I'm sure these researchers are working hard, and I wish them the best of luck in their future efforts to replicate their findings.

I'm sure they'll have no problem replicating, whether or not their claims are actually true. That's the way science works: Once you know what you're looking for, you'll find it!

what is = what "should be" ??


This hidden assumption is a biggie.

Brendan Nyhan gives the story.

Here's Sarah Palin's statement introducing the now-notorious phrase:

The America I know and love is not one in which my parents or my baby with Down Syndrome will have to stand in front of Obama's "death panel" so his bureaucrats can decide, based on a subjective judgment of their "level of productivity in society," whether they are worthy of health care.

And now Brendan:

Palin's language suggests that a "death panel" would determine whether individual patients receive care based on their "level of productivity in society." This was -- and remains -- false. Denying coverage at a system level for specific treatments or drugs is not equivalent to "decid[ing], based on a subjective judgment of their 'level of productivity in society.'"

Seems like an open-and-shut case to me. The "bureaucrats" (I think Palin is referring to "government employees") are making decisions based on studies of the drug's effectiveness:

I can't escape it


I received the following email:

Ms. No.: ***

Title: ***

Corresponding Author: ***

All Authors: ***

Dear Dr. Gelman,

Because of your expertise, I would like to ask your assistance in determining whether the above-mentioned manuscript is appropriate for publication in ***. The abstract is pasted below. . . .

My reply:

I would rather not review this article. I suggest ***, ***, and *** as reviewers.

I think it would be difficult for me to review the manuscript fairly.

I came across this blog by Jonathan Weinstein that illustrated, once again, some common confusion about ideas of utility and risk. Weinstein writes:

When economists talk about risk, we talk about uncertain monetary outcomes and an individual's "risk attitude" as represented by a utility function. The shape of the function determines how willing the individual is to accept risk. For instance, we ask students questions such as "How much would Bob pay to avoid a 10% chance of losing $10,000?" and this depends on Bob's utility function.

This is (a) completely wrong, and (b) known to be completely wrong. To be clear: what's wrong here is not that economists talk this way. What's wrong is the identification of risk aversion with a utility function for money. (See this paper from 1998 or a more formal argument from Yitzhak in a paper from 2000.)

It's frustrating. Everybody knows that it's wrong to associate a question such as "How much would Bob pay to avoid a 10% chance of losing $10,000?" with a utility function, yet people do it anyway. It's not Jonathan Weinstein's fault--he's just calling this the "textbook definition"--but I guess it is the fault of the people who write the textbooks.

P.S. Yes, yes, I know that I've posted on this before. It's just sooooooo frustrating that I'm compelled to write about it again. Unlike some formerly recurring topics on this blog, I don't associate this fallacy with any intellectual dishonesty. I think it's just an area of confusion. The appealing but wrong equation of risk aversion with nonlinear utility functions is a weed that's grown roots so deep that no amount of cutting and pulling will kill it.

P.P.S. To elaborate slightly: The equation of risk aversion with nonlinear utility is empirically wrong (people are much more risk averse for small sums than could possibly make sense under the utility model) and conceptually wrong (risk aversion is an attitude about process rather than outcome).

P.P.P.S. I'll have to write something more formal about this some time . . . in the meantime, let me echo the point made by many others that the whole idea of a "utility function for money" is fundamentally in conflict with the classical axiom of decision theory that preferences should depend only on outcomes, not on intermediate steps. Money's value is not in itself but rather in what it can do for you, and in the classical theory, utilities would be assigned to the ultimate outcomes. (But even if you accept the idea of a "utility of money" as some sort of convenient shorthand, you still can't associate it with attitudes about risky gambles, for the reasons discussed by Yitzhak and myself and which are utterly obvious if you ever try to teach the subject.)

P.P.P.P.S. Yes, I recognize the counterargument: that if this idea is really so bad and yet remains so popular, it must have some countervailing advantages. Maybe so. But I don't see it. It seems perfectly possible to believe in supply and demand, opportunity cost, incentives, externalities, marginal cost and benefits, and all the rest of the package--without building it upon the idea of a utility function that doesn't exist. To put it another way, the house stands up just fine without the foundations. To extent that the foundations hold up at all, I suspect they're being supported by the house.

Inbox zero. Really.


Just in time for the new semester:


This time I'm sticking with the plan:

1. Don't open a message until I'm ready to deal with it.
2. Don't store anything--anything--in the inbox.
3. Put to-do items in the (physical) bookje rather than the (computer) "desktop."
4. Never read email before 4pm. (This is the one rule I have been following.
5. Only one email session per day. (I'll have to see how this one works.)

Masanao sends this one in, under the heading, "another incident of misunderstood p-value":

Warren Davies, a positive psychology MSc student at UEL, provides the latest in our ongoing series of guest features for students. Warren has just released a Psychology Study Guide, which covers information on statistics, research methods and study skills for psychology students.
Despite the myriad rules and procedures of science, some research findings are pure flukes. Perhaps you're testing a new drug, and by chance alone, a large number of people spontaneously get better. The better your study is conducted, the lower the chance that your result was a fluke - but still, there is always a certain probability that it was.

Statistical significance testing gives you an idea of what this probability is.

In science we're always testing hypotheses. We never conduct a study to 'see what happens', because there's always at least one way to make any useless set of data look important. We take a risk; we put our idea on the line and expose it to potential refutation. Therefore, all statistical tests in psychology test the possibility that the hypothesis is correct, versus the possibility that it isn't.

I like the BPS Research Digest, but one more item like this and I'll have to take them off the blogroll. This is ridiculous! I don't blame Warren Davies--it's all-too-common for someone teaching statistics to (a) make a mistake and (b) not realize it. But I do blame the editors of the website for getting a non-expert to emit wrong information. One thing that any research psychologist should know is that statistics is tricky. I hate to see this sort of mistake (saying that statistical significance is a measure of the probability the null hypothesis is true) being given the official endorsement of British Psychological Society.

P.S. To any confused readers out there: The p-value is the probability of seeing something as extreme as the data or more so, if the null hypothesis were true. In social science (and I think in psychology as well), the null hypothesis is almost certainly false, false, false, and you don't need a p-value to tell you this. The p-value tells you the extent to which a certain aspect of your data are consistent with the null hypothesis. A lack of rejection doesn't tell you that the null hyp is likely true; rather, it tells you that you don't have enough data to reject the null hyp. For more more more on this, see for example this paper with David Weakliem which was written for a nontechnical audience.

P.P.S. This "zombies" category is really coming in handy, huh?

Note to semi-spammers

| 1 Comment

I just deleted another comment that seemed reasonable but was attached to an advertisement.

Here's a note to all of you advertisers out there: If you want to leave a comment on this site, please do so without the link to your website on search engine optimization or whatever. Or else it will get deleted. Which means you were wasting your time in writing the comment.

I want your comments and I don't want you to waste your time. So please just stop already with the links, and we'll both be happier.

P.S. Don't worry, you're still not as bad as the journal Nature (see the P.S. here).

Fake newspaper headlines

| 1 Comment

I used this convenient site to create some images for a talk I'm preparing. (The competing headlines: "Beautiful parents have more daughters" vs. "No compelling evidence that beautiful parents are more or less likely to have daughters." The latter gets cut off at "No compelling evidence that," which actually works pretty well to demonstrate the sort of dull headline that would result if newspapers were to publish null results.)

One can quibble about the best way to display county-level unemployment data on a map, since a small, populous county gets much less visual weight than a large, sparsely populated one. Even so, I think we can agree that this animated map by LaToya Egwuekwe is pretty cool. It says it shows the unemployment rate by county, as a function of time, but anyone with even the slightest knowledge of what happens during a zombie attack will recognize it for what it is.

I was reading this article by Ariel Levy in the New Yorker and noticed something suspicious. Levy was writing about an event in 1979 and then continued:

One year later, Ronald Reagan won the Presidency, with overwhelming support from evangelicals. The evangelical vote has been a serious consideration in every election since.

From Chapter 6 of Red State, Blue State:


According to the National Election Study, Reagan did quite a bit worse than Carter among evangelical Protestants than among voters as a whole--no surprise, really, given that Reagan was not particularly religious and Cater was an evangelical himself.

It was 1992, not 1980, when evangelicals really started to vote Republican.

What's it all about?

I wouldn't really blame Ariel Levy for this mistake; a glance at her website reveals a lot of experience as a writer and culture reporter but not much on statistics or politics. That's fine by me: there's a reason I subscribe to the New Yorker and not the American Political Science Review!

On the other hand, I do think that the numbers are important, and I worry about misconceptions of American politics--for example, the idea that Reagan won "overwhelming support from evangelicals." A big reason we wrote Red State, Blue State was to show people how all sorts of things they "knew" about politics were actually false.

Perhaps the New Yorker and other similar publications should hire a statistical fact checker or copy editor? Maybe this is the worst time to suggest such a thing, with the collapsing economics of journalism and all that. Still, I think the New Yorker could hire someone at a reasonable rate who could fact check their articles. This would free up their writers to focus on the storytelling that they are good at without having to worry about getting the numbers wrong.

Another option would be to write a letter to the editor, but I don't think the New Yorker publishes graphs.

P.S. I've written before about the need for statistical copy editors (see also here, here, and, of course, the notorious "But viewed in retrospect, it is clear that it has been quite predictable").

P.P.S. I think one of my collaborators made this graph, maybe by combining the National Election Study questions on religious denomination and whether the respondent describes him/herself as born again.

P.P.P.S. Somebody pointed out that Reagan did do well among white evangelicals, so maybe that's what Levy was talking about.

Recent Comments

  • C Ryan King: I'd say that the previous discussion had a feature which read more
  • K? O'Rourke: On the surface, it seems like my plots, but read more
  • Vic: I agree with the intervention-based approach -- spending and growth read more
  • Phil: David: Ideally I think one would model the process that read more
  • Bill Jefferys: Amplifying on Derek's comment: read more
  • Nameless: It is not uncommon in macro to have relationships that read more
  • derek: taking in each others' laundry It's more like the farmer read more
  • DK: #17. All these quadrillions and other super low p-values assume read more
  • Andrew Gelman: Anon: No such assumption is required. If you multiply the read more
  • anon: Doesn't this rely on some form of assumed orthogonality in read more
  • Andrew Gelman: David: Yup. What makes these graphs special is: (a) Interpretation. read more
  • David Shor: This seems pretty similar to the "Correlations" feature in the read more
  • David W. Hogg: If you want probabilistic results (probabilities over outcomes, with and read more
  • Cheryl Carpenter: Bob is my brother and he mentioned this blog entry read more
  • Bob Carpenter: That's awesome. Thanks. Exactly the graphs I was talking about. read more

About this Archive

This page is an archive of recent entries in the Zombies category.

Teaching is the previous category.

Find recent content on the main index or look in the archives to find all content.