December 2009 Archives

End-of-the-Year Altruists


I've been picking on the Freaknomics blog a lot recently, while occasionally adding the qualifier that in general it's great. What you see here is the result of selection bias: when the Freakonomics blog has material of its usual high quality, I don't have much to add, and when there's material of more questionable value, I notice and sometimes comment on it.

In this end-of-the-year spirit, though, I'd like to point to this entry by Stephen Dubner on altruism, which to my mind captures many of the Freakonomics strengths: it's an engaging, topical, and thought-provoking article on an important topic, and it discusses some current research in economics.

And this is as good a way as any for me to end another year of blogging.



My colleague Macartan Humphreys recently came out with book, Coethnicity (with James Habyarimana, Daniel Posner, and Jeremy Weinstein, addresses the question of why public services and civic cooperation tend to be worse in areas with more ethnic variation. To put it another way: people in homogeneous areas work well together, whereas in areas of ethnic diversity, there's a lot less cooperation.

I'll give my comments, then at the end I posted a response from Macartan.

From one perspective, this one falls into the "duh" category. Of course, we cooperate with people who are more like us! But it's not so simple. Macartan and his colleagues discuss and discard a number of reasonable-sounding explanations before getting to their conclusion, which is that people of the same ethnic group are more able to enforce reciprocity and thus are more motivated to cooperate with each other.

But, looking at it another way, I wonder whether it's actually true that people in homogenous societies cooperate more. I think of the U.S. is pretty ethnically diverse, compared to a lot of much more disorganized places. One question is what counts as ethnicity. Fifty or a hundred years ago in the U.S., I think we'd be talking about Irish, English, Italians, etc., as different ethnic groups, but now they'd pretty much all count as white. To what extent is noncooperation not just the product of ethnic diversity but also a contributor to its continuation?

Macartan and his collaborators address some of these issues in their concluding chapter, and I'm sure there's a lot more about this in the literature. This is an area of political science that I know almost nothing about. When a researcher such as myself writes a book in American politics, we don't have to explain much--our readers are already familiar with the key ideas. Comparative politics, though, is a mystery to the general reader such as myself.

I should say something about the methods used by Macartan and his collaborators. They went to a city in Uganda, told people about their study, and performed little psychology/economics experiments on a bunch of volunteers. Each experiment involved some task or choice involving cooperation or the distribution of resources, and they examined the results by comparing people, and pairs of people, by ethnicity, to see where and how people of the same or different ethnic groups worked together in different ways.

One thing that was cool about this study, and which reminded me of research I've seen in experimental psychology, was that they did lots of little experiments to tie up loose ends and to address possible loopholes. Just for example, see the discussion on pages 137-139 of how they rule out the possibility that their findings could be explained by collusion among study participants.

I was also thinking about the implications of their findings for U.S. politics. (Macartan has told me that he doesn't understand how there can be a whole subfield of political scientist specializing in American politics, but he told me that he'll accept "Americanists" by thinking of us as comparative politics scholars who happen to be extremely limited in what we study.) The authors allude to research by Robert Putnam and others comparing civic behavior in U.S. communities of varying ethnic homogeneity, but I also wonder about public opinion at the national level, not just local cooperation but also to what extent people feel that "we're all in this together" and to what extent people evaluate policies and candidates based on how they effect their ethnic group (however defined). I'm also interested in the sometimes-vague links between ethnicity and geography, for example the idea that being a Southerner (in the U.S.) or a Northerner (in England) seems like an ethnic identity. Even within a city, different neighborhoods have different identities.

If I haven't made the point clear enough already, I think the book is fascinating, and it looks like it will open the door to all sorts of interesting new work as well.

Complicated categories


From a letter by Caroline Williamson of Brunswick, Australia, in the London Review of Books:

Ange Mlinko repeats the rumour that Barbara Guest married an English lord (LRB, 3 December 2009). She married Stephen Haden-Guest in 1948; he was the son of the Labour MP Leslie Haden-Guest, who was made a political peer in 1950. Stephen Haden-Guest inherited the title in 1960, six years after the couple divorced.

As an American, I'm eternally amused by this sort of thing. I just love it that people out there cares whether someone is a lord, or a knight, or whatever. It reminds me of the rule that the wife of a king is a queen, but the husband of a queen is not necessarily a king.

P.S. Yes, I know that Americans are silly in other ways. I grew up 2 blocks away from a McDonald's! I'm not saying that we're better than people from other countries, just that this particular thing amuses me.

Reuters 1, New York Times 0

| 1 Comment



(See here for background.)

I recently blogged on the following ridiculous (to me) quote from economist Gary Becker:

According to the economic approach, therefore, most (if not all!) deaths are to some extent "suicides" in the sense that they could have been postponed if more resources had been invested in prolonging life.

In my first entry I dealt with Becker's idea pretty quickly and with a bit of mockery ("Sure, 'counterintuitive' is fine, but this seems to be going off the deep end . . ."), and my commenters had no problem with it. But then I updated with a more elaborate argument and discussion of how Becker could've ended up making such a silly-seeming (to me) statement, and the commenters here and here just blasted me. I haven't had such a negative reaction from my blog readers since I made the mistake of saying that PC's are better than Macs.

This got me thinking that sometimes a quick reaction is better than a more carefully thought-out analysis. But I also thought I'd take one more shot at explaining my reasoning and, more importantly, understanding where I might have gone astray. After all, if I can barely convince half the commenters at the sympathetic venue of my own blog, I must be doing something wrong!

Yesterday I posted this graph, a parallel-coordinates plot showing health care spending and life expectancy in a sample of countries:


I remarked that a scatterplot should be better. Commenter Freddy posted a link to the data--you guys are the best blog commenters in the world!--so, just for laffs, I spent a few minutes making a scatterplot containing all the same information. Here it is. (Clicking on any of the graphs gives a larger version.)


(I was able to make the circles gray thanks to the commenters here.)

How do the two graphs compare? There are some ways in which the first graph is better, but I think these have to do with that graph being made by a professional graphic designer--at least, I assume he's a professional; in any case, he's better at this than I am! He also commented that he removed a few countries from the plot to make it less cluttered. Here's what happens if I take them out too:


(Unlike the National Geographic person, I kept in Turkey. It didn't seem right to remove a point that was on the edge of the graph. I also kept in Norway, which was the highest-spending country on the graph, outside the U.S. And I took out Sweden and Finland--sorry, Jouni!--because they overlapped, too. Really, I prefer jittering rather than removing as a solution to overlap, but here I'll go with what was already done in this example.)

What the scatterplot really made me realize was the arbitrariness of the scaling of the parallel coordinate plot. In particular, the posted graph gives a sense of convergence, that spending is all over the map but all countries have pretty much the same life expectancy--look at the way the lines converge to a narrow zone as you follow the lines from the left to the right of the plot.

Actually, though, once you remove the U.S., there's a strong correlation between spending and life expectancy, and this is super-clear from the scatterplot.

The only other consideration is novelty. The scatterplot is great, but it looks like lots of other graphs we've all seen. This is a plus--familiar graphical forms are easier to read--but also a minus, in that it probably looks "boring" to many readers. The parallel-coordinate plot isn't really the right choice for the goal of conveying information, but it's new and exciting, and that's maybe why one of the commenters at the National Geographic site hailed it as "a masterpiece of succinct communication." Recall our occasional discussions here on winners of visualization contests. The goal is not just to display information, it's also to grab the eye. Ultimately, I think the solution is to do both--in this case, to make a scatterplot in some pretty, eye-catching way.

P.S. I never know how much to trust these purchasing-power-adjusted numbers. Recall our discussion of Russia's GDP.

P.P.S. And here's the R code. Yes, I know it could be cleaner, but I just thought some of the debutants out there might find it helpful:

How to make colored circles in R?


In R, I can plot circles at specified sizes using the symbols() function, but for some reason it won't allow me to do it in color. For example, try this:

symbols (0, 0, circles=1, col="red")

It makes a black circle, just as if the "col" argument had never been specified. What's wrong?

P.S. I could just write my own function to draw circles, but that would be cheating. . . .

Ben Hyde and Aleks both sent me this:


The graph isn't as bad as all that, but, yes, a scatterplot would make a lot more sense than a parallel coordinate plot in this case. Also, I don't know how they picked which countries to include. In particular, I'm curious about Taiwan. We visited there once and were involved in a small accident. We were very impressed by the simplicity and efficiency of their health care system. France's system is great too, but everybody knows that.

Ekkehart Schlicht points to this article by Bruno Frey suggesting a change in journal review processes, so that the editorial board first decides whether to accept or reject a paper and then referees are brought in solely to suggest changes on accepted papers. Frey's paper was published in 2003 and, according to Google Scholar, has been cited about 100 times, but I don't know what effects it's had. One journal I know with something close to Frey's system is Statistica Sinica, which screens all submissions through the editorial board before sending out to reviewers. Another is Economic Inquiry, which accepts or rejects your paper as is, without going through a painful revision process. On the downside, Economic Inquiry charged $75 to submit an article, which was kind of irritating. Statistics journals don't do that.

Frey's article is thoughtful and entertaining but does not mention what seems to me to be the biggest advantage of his proposal, which is that it offers a huge reduction in the amount of labor put in by the referees! Frey quotes journal rejection rates of 95%. It would be a lot easier to get these referee reports if there were only 1/20th as many to chase down. When I write a referee report it usually takes about 15 minutes, but other people put more effort into each review, and it's inefficient to waste their time.

I also don't think Frey makes enough of the fact that editing and reviewing journal articles is volunteer work. Sure, there's some prestige involved in editing a journal, and it also gives you some chance to influence the direction of the field, but my impression is these payoffs are low compared to the cost. (Rather than edit a journal, I've chosen to edit a magazine--that is, to blog--which is similar in many ways but gives me the freedom to focus on the topics that interest me rather than on whatever happens to be submitted. (Regular readers know that I often do react to "submissions"--that is, things that people email me--but I don't have to.)

Beyond this, I agree with Frey's general point that, when you write a book, you're writing for the reader, whereas when you write a journal article, you're writing for the referees. Econ journals are particularly bad, in my experience. It's just a different style. In statistics or political science, someone might publish 5 or 10 major papers in a year. In economics (maybe also in psychology?), people work over and over again on a single paper, trying for that elusive "home run." I don't know that either approach is better, but I find it difficult to switch from one to the other.

Frey also points out that as researchers get older, they're less inclined to spend the time on the referee process, instead writing books or publishing in less-demanding journals or simply placing their articles on the web. Schlicht recommends something called RePEC, and Christian posts papers on Arxiv, something that I've found to be a pain in the ass because of the requirement that the article be in Latex. I certainly don't plan to submit many articles to econ journals unless I have an economist collaborator who feels like dealing with the review process.

This is all very important to us because we work hard and, having done the work, we'd like others to follow our lead. It's so frustrating to figure something out but then not be able to communicate our findings to others who might be interested.

P.S. Here's Frey's decision tree (which he calls the Journal Publication Game):

Who owns sparklines?

| 1 Comment

This is pretty funny (in a horrible sort of way).

My Tiger Woods post


I was just thinking about how everyone's buggin Tiger about his stuff on the side, but nobody cared that the Beatles were doing all the same things (well, not the text messaging, I guess) with groupies. The Beatles are the rock-star equivalent of Tiger, right? A long sequence of #1's, disciplined about work, and so on?

Then again, Lennon and McCartney didn't do ads for AT&T, Gilette, Nike, Accenture (huh? what's that, anyway?), Gatorade, or TLC Laser Eye Centers (or any eye centers, as far as I know). Maybe the standards are higher for people in advertising?

Felix Salmon mocks the above-linked study which claims evidence that Tiger Woods's scandal hurt his sponsors financially, but what I really don't understand, though, is how it can make sense for these companies to be paying a golfer to endorse their products. I mean, Golf Digest, sure, but the others? I'm gonna buy somebody's razor because they paid a million dollars to some dude who can putt? I mean, sure, I understand the reasoning, sort of: Tiger gets attention, you see his face on TV and you whip around to see what the ad is about. If you're a 30 billion dollar company, it can be worth spending $20 million if you think it will increase profits by 0.067%. But it still seems a bit weird to me. At the level of individual decisions, it makes some sense, but if you step back a bit, it's just bizarre.

P.S. The Freaknomics blog links to a Yahoo News report of the study claiming that Tiger Woods's sponsors lost money, but without linking to Felix Salmon's demolition job. I assume that I'm not the only Freakonomics readers who reads Salmon, so maybe someone will point this out in the comments there.

Taxation curves and poverty traps


Dan Lakeland has been thinking about taxation curves and the poverty trap.

Is this true? I usually bill by the hour, but I have to say that there's always some awkwardness about this aspect of consulting. Compared the typical hourly rates charged by statistical consultants, my impression is that I charge more but that I bill for far fewer hours--partly because I do consulting as an extra, not as my main job, so I'm typically trying to keep the hours limited.

Maybe a fixed charge would be better, but the trouble is that it's not always clear to me what exactly is needed. Or maybe there should be two stages: first a fixed charge where the product is an assessment of the problem, then another fixed charge for the main projects. Or maybe charge an hourly rate for the little problems and a fixed rate for the big ones. It's something to think about. It would be great to get enough money from consulting to really support some of my research efforts.

I recently reviewed Bryan Caplan's book, The Myth of the Rational Voter, for the journal Political Psychology. I wish I thought this book was all wrong, because then I could've titled my review, "The Myth of the Myth of the Rational Voter." But, no, I saw a lot of truth in Caplan's arguments. Here's what i wrote:

Bryan Caplan's The Myth of the Rational Voter was originally titled "The logic of collective belief: the political economy of voter irrationality," and its basic argument goes as follows:

(1) It is rational for people to vote and to make their preferences based on their views of what is best for the country as a whole, not necessarily what they think will be best for themselves individually.

(2) The feedback between voting, policy, and economic outcomes is weak enough that there is no reason to suppose that voters will be motivated to have "correct" views on the economy (in the sense of agreeing with the economics profession).

(3) As a result, democracy can lead to suboptimal outcomes--foolish policies resulting from foolish preferences of voters.

(4) In comparison, people have more motivation to be rational in their conomic decisions (when acting as consumers, producers, employers, etc). Thus it would be better to reduce the role of democracy and increase the role of the market in economic decision-making.

Caplan says a lot of things that make sense and puts them together in an interesting way. Poorly informed voters are a big problem in democracy, and Caplan makes the compelling argument that this is not necessarily a problem that can be easily fixed--it may be fundamental to the system. His argument differs from that of Samuel Huntington and others who claimed in the 1970s that democracy was failing because there was too much political participation. As I recall, the "too much democracy" theorists of the 1970s saw a problem with expectations: basically, there is just no way for "City Hall" to be accountable to everyone, thus they preferred limiting things to a more manageable population of elites. Caplan thinks that voting itself (not just more elaborate demands for governmental attention) is the problem.

Bounding the arguments

I have a bunch of specific comments on the book but first want to bound its arguments a bit.

The Death of the Blog Post?


Aleks sent me this ugly thing. It's a joke? Or perhaps a sad reflection that people prefer production values over substance?

While putting together a chapter on inference from simulations and monitoring convergence (for a forthcoming Handbook of Markov Chain Monte Carlo; more on that another day), I came across this cool article from 2003 by Jarkko Venna, Samuel Kaski, and Jaakko Peltonen, who show how tools from multivariate discriminant analysis can be used to make displays of MCMC convergence that are much more informative than what we're used to. There's also an updated article from 2009 by Venna with Jaakko Peltonen and Samuel Kaski.

After a brief introduction, Venna et al. set up the problem:

It is common practice to complement the convergence measures by visualizations of the MCMC chains. Visualizations are useful especially when analyzing reasons of convergence problems. Convergence measures can only tell that the simulations did not convergence, not why they did not. MCMC chains have traditionally been visualized in three ways. Each variable in the chain can be plotted as a separate time series, or alternatively the marginal distributions can be visualized as histograms. The third option is a scatter or contour plot of two parameters at a time, possibly showing the trajectory of the chain on the projection. The obvious problem with these visualizations is that they do not scale up to large models with lots of parameters. The number of displays would be large, and it would be hard to grasp the underlying high-dimensional relationships of the chains based on the component-wise displays.

Some new methods have been suggested. For three dimensional distributions advanced computer graphics methods can be used to visualize the shape of the distribution. Alternatively, if the outputs of the models can be visualized in an intuitive way, the chain can be visualized by animating the outputs of models corresponding to successive MCMC samples. These visualizations are, however, applicable only to special models.

This seems like an accurate summary to me. If visualizations for MCMC have changed much in 2003, the news certainly hadn't reached me. I'd only add a slight modification to point out that with high resolution and small multiples, we can plot dozens of trace plots on the screen at once, rather than the three or four which has become standard (because that's what Bugs does).

In any case, it's a problem trying to see everything at once in a high-dimensional model. Venna et al. propose to use discriminant analysis on the multiple chains to identify directions in which there is poor mixing, and then display the simulations on this transformed scale.

Here's an example, a two-dimensional linear discriminant analysis projection of 10 chains simulated from a hierarchical mixture model:


And here's another plot, this time showing the behavior of the chains near convergence, using discriminative component analysis:


The next step, once these patterns are identified, would be to go back to the original parameters in the models and try to understand what's happening inside the chains.

Venna et al. have with what seems like a great idea, and it looks like it could be implemented automatically in Bugs etc. The method is simple and natural enough that probably other people have done it too, but I've never seen it before.

P.S. I wish these people had sent me a copy of their paper years ago so I didn't have to wait so long to discover it.

Garbage Time


Phil amusingly introduces the basketball term "garbage time" to refer to the point where a discussion thread reduces to back-and-forth arguments without much hope of further progress.

That got me thinking how garbage time starts at different points on different blogs. Here at Statistical Modeling, it doesn't happen much at all. The Freakonomics blog often has high-quality comments (as I discussed here), but there are so many of them that there's not really a chance for progress to be made in the comment threads. The posts at 538 get lots of comments, but garbage time there usually starts around comment #2 or so.

Alek Tabarrok posts this beautiful graph that was prepared for the ultimate in bureaucratic institutions:


Learn to program!


I often run into people who'd like to learn how to program, but don't know where to start. Over the past few years, there has been an emergence of interactive tutorial systems, where a student is walked through the basic examples and syntax.

  • Try Ruby! will teach you Ruby, a Python-like language that's extremely powerful when it comes to preprocessing data.
  • WeScheme will teach you Scheme, a Lisp-like language that makes writing interpreters for variations of Scheme very easy.
  • Lists by Andrew Plotkin is a computer game that requires you to be able to program in Lisp. Lisp is the second-oldest programming language (after Fortran), but Ruby and Python do most of what Lisp has traditionally been useful for.

Maybe there will be a similar tool for R someday!

Thanks to Edward for the pointers!

Linking the unlinkable


Two of the bloggers I find the most entertaining and thought-provoking are Phil Nugent and Steve Sailer. I don't know that they would agree with each other on anything, but they do have one thing in common, which is that they like to review movies. Anyway, each of them has a super-long blogroll, and what I'm wondering is: what's the shortest set of links that will take you from Nugent to Sailer (or vice-versa). It has to be a series of links going from one to the other--i.e., it's not enough that both link to the same page (Arts & Letters Daily, in case you're wondering).

I'm hoping that a long long chain is needed--it's too much to hope that you just "can't get there from here," but I'm pessimistically guessing that, the Internet being what it is, you can get there in two links.

P.S. I wasted a few more minutes and found that Nugent links to 2 Blowhards, who links to Sailer. So that's it. A bit of a letdown, but I guess inevitable given the huge number of links on these guys' pages. I'm hoping that Nugent will hear about this and eliminate his 2 Blowhards link, thus make my linking question more interesting. And, believe me, 2 Blowhards is not nearly as interesting as Nugent or Sailer.

Gueorgi Kossinets writes:

We have an opening in the Content Ads Quality group at Google (Mountain View).

They can email their resume directly to me, which should speed up the process. I will also be happy to talk informally about the type of work we do, Google culture, etc.

Update on estimates of war deaths


I posted a couple days ago on a controversy over methods of counting war deaths. This is not an area I know much about, and fortunately some actual experts (in addition to Mike Spagat, who got the ball rolling) wrote in to comment.

Their comments are actually better than my original discussion, and so I'm reposting them here:

Bethany Lacina writes:

I didn't work on the Spagat et al. piece, but I'm behind the original battle deaths data. Your readers might be interested in the "Documentation of Coding Decisions" available at the Battle Deaths Database website. The complete definition of "battle deaths"--admittedly a tricky concept--starts on page 5. The discussion of Guatemala starts on page 219.

The goal of the Documentation is to preserve all the sources we used and the logic of how we interpreted them. If you or any of your readers know of sources we haven't consulted, for any conflict, it would be terrific to hear about them:

Ameilia Hoover writes:

This debate is missing a key part -- namely, any sort of awareness that there are estimation methods out there that improve on both surveys (usual stalking horse of Spagat, et al.) and convenience data such as press reports (usual stalking horse of many other people).

Spagat et al. are more or less correct about all of the many, many problems with survey data. They're right to criticize OMG (OMG!). But this isn't, or at any rate shouldn't be, a debate between survey and convenience methods.

The authors dismiss (at page 936; again in footnote 2) estimation techniques other than retrospective mortality surveys and "collation of other reports". But while it's true that demographers often (usually? Help me out here, demographers) use retrospective survey data in their analyses, there's also a long-standing literature that uses census data instead, matching across sources in order to model (a) patterns of inclusion in convenience sources and (b) the number of uncounted cases. This method accurately counts deer, rabbits, residents of the United States, children with various genetic disorders, and HIV patients in Rome (to name a few examples I can think of) -- and, yes, also conflict-related deaths.

Bethany Lacina's link to the PRIO documentation is really interesting on this point. For El Salvador, the case with which I'm most familiar, PRIO's best estimate is 75,000 total deaths -- 55,000 battle deaths and 20,000 "one sided" deaths. I think this is reasonable-ish (maybe the total is between 50,000 and 100,000?), but there's no actual evidence to support such a number. The sources PRIO cites are expert guesses, rather than statistical analyses of any sort.

PRIO's El Salvador estimates are based on *neither* documented/documentable convenience data (e.g., press reports, NGO reports) *nor* survey data. The United Nations-sponsored Truth Commission for El Salvador's list of documented (and partially documented) deaths includes about 14,000 total deaths, many of which are duplicates. Two other NGO databases include about 6,000 and about 1,500 deaths, respectively. Again, there's significant overlap and many duplicates. Yet no one imagines that the total deaths in this conflict were 21,500. In the Salvadoran case as in many others, inclusion in the data is incredibly biased toward urban, educated, and politically active victims. (They're also biased in any number of other ways, of course.)

Prof. Gelman is right to point out the discrepancy between the Guatemala survey numbers, the Guatemala convenience (PRIO) numbers, and the number that most people cite as the best approximation for Guatemala (200,000). Importantly, that "200,000" is based in large part on census numbers. (See and, statistical analyses from the Commission for Historical Clarification, Guatemala's Truth Commission.) So why ignore census correction methods?

Given that discrepancies between survey and convenience data are very often dwarfed by discrepancies between those numbers and the numbers we believe to be correct, I worry that the surveys-versus-convenience-data fight isn't more about protecting academic projects and prerogatives than about actually finding the correct answer.

Romesh Silva writes:

The claim that demographers often/usually use retrospective mortality surveys in their analyses is a bit off the mark. Looks like it is borne out of some confusion in some parts of the academy between the methods of demographers and epidemiologists... Broadly speaking, demographers use a wide array of sources including population censuses, vital registration systems, demographic surveillance systems, and surveys (of all flavors: longitudinal, panel, and retrospective). In the field of conflict-related mortality, demographers have actually relied almost exclusively on sources other than surveys. For example, Patrick Heuveline and Beth Daponte have used population censuses (and voter registration lists) in Cambodia and Iraq, respectively, and demographers at the ICTY (Helge Brunborg and Ewa Tabeau) have used various types of "found data" which equate to (incomplete) registration lists along side census correction methods. Distinguished demographers Charles Hirschman and Sam Preston were in the minority amongst demographers, when they used a household survey to estimate Vietnamese military and civilian casualties between 1965 and 1975. The folks who routinely use surveys in the field of conflict-related mortality are epidemiologists, not demographers. The folks at Johns Hopkins, Columbia's Mailman School of Public Health, Harvard Humanitarian Initiative, Physicians for Human Rights, MSF, Epicentre, etc use variants of the SMART methodology with a 2-stage cluster design are epidemiologists. This design and methodology has been coarsely adapted from a back-of-the-envelope method used to evaluate vaccination coverage in least developed countries. However, epidemiologists at the London School of Hygiene and Tropical Medicine have recently noted that this method "tends to be followed without considering alternatives" and "there is a need for expert advice to guide health workers measuring mortality in the field" (See

I just thought it might help to put this all in one place.

Stephen Dubner quotes Gary Becker as saying:

According to the economic approach, therefore, most (if not all!) deaths are to some extent "suicides" in the sense that they could have been postponed if more resources had been invested in prolonging life.

Dubner describes this as making "perfect sense" and as being "so unusual and so valuable."

When I first saw this I was irritated and whipped off a quick entry on the sister blog. But then I had some more systematic thoughts of how Becker's silly-clever statement, and Dubner's reaction to it, demonstrate several logical fallacies that I haven't seen isolated before.

No, it's not true that most deaths are suicides

I'll get to the fallacies in a moment but first I'll explain in some detail why I disagree with Becker's statement. The claim that most deaths are suicides seemed evidently ridiculous to me (not just bold, counterintuitive, taboo-shattering, etc., but actually false), and my inclination in such settings is to mock rather than explicate--but Becker and Dubner are smart guys, and if they can get confused on this topic, I'm sure others can too.

The following is the last paragraph in a (positive) referee report I just wrote. It's relevant for lots of other articles too, I think, so I'll repeat it here:

Just as a side note, I recommend that the authors post their estimates immediately; I imagine their numbers will be picked up right away and be used by other researchers. First, this is good for the authors, as others will cite their work; second, these numbers should help advance research in the field; and, third, people will take the estimates seriously enough that, if there are problems, they will be uncovered. It makes sense to start this process now, so if anything bad comes up, it can be fixed before the paper gets published!

I have to admit that I'm typically too lazy to post my estimates right away; usually it doesn't happen until someone sends me an email request and then I put together a dataset. But, after writing the above paragraph, maybe I'll start following my own advice.

Mike Spagat writes:

I hope that this new paper [by Michael Spagat, Andrew Mack, Tara Cooper, and Joakim Kreutz] on serious errors in a paper on conflict mortality published in the British Medical Journal will interest you. For one thing I believe that it is highly teachable. Beyond I think that it's important for the conflict field (if I do say so myself). Another aspect of this is that the BMJ is refusing to recognized that there are any problems with the paper. This seems to be sadly typical behavior of journals when they make mistakes.

Spagat et al's paper begins:

In a much-cited recent article, Obermeyer, Murray, and Gakidou (2008a) examine estimates of wartime fatalities from injuries for thirteen countries. Their analysis poses a major challenge to the battle-death estimating methodology widely used by conflict researchers, engages with the controversy over whether war deaths have been increasing or decreasing in recent decades, and takes the debate over different approaches to battle-death estimation to a new level. In making their assessments, the authors compare war death reports extracted from World Health Organization (WHO) sibling survey data with the battle-death estimates for the same countries from the International Peace Research Institute, Oslo (PRIO). The analysis that leads to these conclusions is not compelling, however. Thus, while the authors argue that the PRIO estimates are too low by a factor of three, their comparison fails to compare like with like. Their assertion that there is "no evidence" to support the PRIO finding that war deaths have recently declined also fails. They ignore war-trend data for the periods after 1994 and before 1955, base their time trends on extrapolations from a biased convenience sample of only thirteen countries, and rely on an estimated constant that is statistically insignificant.

Here they give more background on the controversy. They make a pretty convincing case that many open questions remain before we can rely on survey-based estimates of war deaths. In particular, they very clearly show that the survey-based estimates provide no evidence at all regarding questions of trends in war deaths--the claims of Obermeyer et al. regarding trends were simply based on a statistical error. The jury is still out, I think, on what numbers should be trusted in any particular case.

Here's a summary of the data used by Obermeyer et al.:


Who's on Facebook?


David Blei points me to this report by Lars Backstrom, Jonathan Chang, Cameron Marlow, and Itamar Rosenn on an estimate of the proportion of Facebook users who are white, black, hispanic, and asian (or, should I say, White, Black, Hispanic, and Asian).

Funding research


Via Mendeley, a nice example of several overlapping histograms:


The x axis is overlabelled, but I don't want to nitpick.

Previous post on histogram visualization: The mythical Gaussian distribution and population differences

Update 12/21/09: JB links to an improved version of the histograms by Eric Drexler below. And Eric links to the data. Thanks!


Dan Goldstein points to a draft article by Andreas Graefe and J. Scott Armstrong:

Dean Eckles writes:

I have a hopefully interesting question about methods for analyzing varying coefficients as a way of describing similarity and variation between the groups.

In particular, we are studying individual differences in susceptibility to different influence strategies. We have quantitative outcomes (related to buying books), and influence strategy (7 levels, including a control) is a within-subjects factor with two implementations of each strategy (also within-subjects).

Wired reports a great new opportunity to make money online by suing internet companies for revealing the data:

An in-the-closet lesbian mother is suing Netflix for privacy invasion, alleging the movie rental company made it possible for her to be outed when it disclosed insufficiently anonymous information about nearly half-a-million customers as part of its $1 million contest to improve its recommendation system.

I'm not sure whether the litigators have read this particular section of the Netflix prize rules:

To prevent certain inferences being drawn about the Netflix customer base, some of the rating data for some customers in the training and qualifying sets have been deliberately perturbed in one or more of the following ways: deleting ratings; inserting alternative ratings and dates; and modifying rating dates.

So yes, you can match a set of reviews with someone else, but how will you know that it's really a person and not a random coincidence? 0.5 million review traces give plenty of opportunity for a false positive match. Netflix learned from AOL's data release disaster, which resulted in a few people getting fired.

But this theme is important. Many internet companies provide free services in return for the ability to employ user data for profit. Andrew Parker looked at which companies make profit out of user data. Usually, the data is never given away, but just used to make other people's lives easier. Let's say that you bookmark a particular page - others won't see that you've done it, but they will see that there are people that find that page worthy of saving - therefore it can be listed higher up in search results.

A more problematic area is medicine. Wired reports that there is a market out there for medical records, and that anonymity protection isn't very secure.

Keeping medical data public would allow massive advances in medicine. For example, the Personal Genomes project seeks to analyze a number of volunteers in a lot of detail (see, for example, Steven Pinker's medical record). If a few million people did that, we'd know so much more about disease, risks, factors affecting it, effectiveness of drugs, diet, the effects of genome.

One-sided disclosure gets many people worried - their insurance rates might go up, they might not get a job. It would help if everyone was doing that: nobody feels well being naked when others wear swimsuits.

But we should also ask ourselves as a society - what is insurance? Is insurance a protection against uncontrollable risk or is it an instrument of equality? Is genome our destiny or an uncontrollable risk?

Previous posts on this topic: EU data protection guidelines, Privacy vs Transparency.

Some scams


Tyler Cowen links to this article by Frank Stajano and Paul Wilson:

The success of many attacks on computer systems can be traced back to the security engineers not understanding the psychology of the system users they meant to protect. We [Stajano and Wilson] examine a variety of scams and "short cons" that were investigated, documented and recreated for the BBC TV programme The Real Hustle and we extract from them some general principles about the recurring behavioural patterns of victims that hustlers have learnt to exploit. We argue that an understanding of these inherent "human factors" vulnerabilities, and the necessity to take them into account during design rather than naïvely shifting the blame onto the "gullible users", is a fundamental paradigm shift for the security engineer which, if adopted, will lead to stronger and more resilient systems security.

I wasn't blown away by the theoretical arguments in the article, but the scams are fascinating.

Universities and $


In this article about college funding, Kevin Carey says something that I've long believed, which is that government-supported financial aid doesn't quite work how you might imagine: colleges can just raise their prices along with any aid packages that come along. The price tag for college is not fixed, and so what looks like a subsidy for low-income students can just end up being a way for universities to jack up their prices by a corresponding amount.

But Carey also says some things that don't convince me so much. My impression is that he just threw in all sorts of negative attitudes about universities, without thinking about how they all fit together.

In discussing this, I'm not trying to pick on Kevin Carey, who makes excellent points about the desirability of publicly available information on what students actually learn in college. My point here is to use this generally fine article to highlight some ways in which people get confused when talking about higher education.

Carey writes:

Essentially, colleges don't figure out how much money they need to spend and then go get it. Instead, they get as much money as they can and then spend it. Since reputations are relational-the goal is to be better than the other guy-there is no practical limit on how much colleges can spend in pursuit of self-glorification. As former Harvard President Derek Bok wrote, "Universities share one characteristic with compulsive gamblers and exiled royalty: There is never enough money to satisfy their desires."

I agree that this describes colleges, and I'll take Bok's word for it that it describes compulsive gamblers and exiled royalty too. But doesn't it really describe almost anybody? I mean, who among us, Ubs excepted, figures out how much money they needs to spend and then goes and gets it? The much much more common pattern, I think, is that people get what jobs they can do and, ideally, want to do, and then if they need more money, sure, they try to get more. But when people make more, they tend to spend more and feel the need for even more, etc. I don't see at all what's special about universities here--this just seems like a cheap shot to me. Universities are like other organizations: they're happy to take money that people are willing to give to them. I mean, I don't see Apple saying, "Hey, we have enough money--we're gonna give out i-pods for free."

p = 0.5


In the middle of a fascinating article on South Africa's preparations for the World Cup, R. W. Johnson makes the following offhand remark:

Any minute now the usual groaning will be heard from teams which claim that they, uniquely, have been drawn in a 'group of death'. What is the point, one might ask, in groaning about a random draw? Well, the trouble starts there, for the draw is not entirely random. In practice, seven teams are seeded, according to how well they've been doing in international matches, along with an eighth team, the host nation, whose passage into the second round is thus made easier - on paper. The draw depends on which balls rise to the top of the jar and thus get plucked out first; but it's rumoured that certain balls get heated in an oven before a draw, thus guaranteeing that they will bubble to the top. The weakest two teams aside from South Africa and North Korea are South Korea and New Zealand. The odds are, of course, heavily against any two or more of these bottom four finding themselves in the same group. If they do, we will have to be deeply suspicious of the draw.

This got me wondering. What is the probability that the bottom four teams will actually end up in different groups?

Given the rules as stated above, eight of the teams (including South Africa) start in eight different groups. There are 24 slots remaining. Now let's assign the next three low-ranking teams. The first has a 21/24 chance of being in one of the seven groups that does not have South Africa; the next has a 18/23 chance of being in one of the six remaining groups, and the next has a 15/22 chance of being in one of the five remaining. Combining these, the probability that the bottom four teams are in four different groups is 1-(21/24)*(18/23)*(15/22) = 0.53. (Unless I did the calculation wrong. Such things happen.)

So, no, I don't think that if two of these teams happen to find themselves in the same group, that "we will have to be deeply suspicious of the draw."

P.S. The 53% event happened: the four bottom-ranked teams are in different brackets. So we can breathe a sigh of relief.

Lowering the minimum wage


Paul Krugman asks, "Would cutting the minimum wage raise employment?" The macroeconomics discussion is interesting, if over my head.

But, politically, of course nobody's going to cut the minimum wage. Can you imagine the unpopularity of a minimum wage cut during a recession? I can't imagine that all the editorial boards of all the newspapers in the country could convince a majority of Congress to vote for this one, whatever its economic merits.

Which makes me wonder why the idea is being discussed at all. Is it an attempt to shoot down a minimum wage increase that might be in the works? Krugman mentions that Serious People are proposing a minimum wage cut, but he doesn't mention who those Serious People are. I can't imagine that they're serious about thinking this might happen.

Other voices, other blogs


What follows is a "meta" sort of discussion, so I'll put it below the fold and most of you can skip it.

Jimmy pointed me to this news article. My reaction to this is that the standards in teaching are low enough that someone like Xiao-Li or me can be considered to be an entertaining lecturer. It would be a lot hard to get by in standup.

Say a little prior for me: more on climate change


Four out of the last 15 posts on this blog have been related to climate change, which is probably a higher ratio than Andrew would like. But lots of people keep responding to them, so the principle "give the people what they want" suggests that another one won't hurt too much. So, here it is. If you haven't read the other posts, take a look at Andrew's thoughts about forming scientific attitudes, and my thoughts on Climategate and my suggestions for characterizing beliefs. And definitely read the comments on those, too, many of which are excellent.

I want to get a graphic "above the fold", so here's the plot I'll be talking about.

Here's the entry from the statistical lexicon:

The "All Else Equal" Fallacy: Assuming that everything else is held constant, even when it's not gonna be.

My original note about this fallacy came a couple years ago when New York Times columnist John Tierney made the counterintuitive claim (later blogged by Steven Levitt) that driving a car is good for the environment. As I wrote at the time:

These guys are making a classic statistical error, I think, which is to assume that all else is held constant. This is the error that also leads people to misinterpret regression coefficients causally. (See chapters 9 and 10 of our book for discussion of this point.) In this case, the error is to assume that the walker and the driver will be making the same trip. In general, the driver will take longer trips--that's one of the reasons for having a car, that you can easily take longer trips. Anyway, my point is not to get into a long discussion of transportation pricing, just to point out that this seemingly natural calculation is inappropriate because of its mistaken assumption that you can realistically change one predictor, leaving all the others constant.

I hadn't thought much about this but then I see that Levitt repeated this error in his new Freakonomics book and on his blog, where he writes:


| 1 Comment

This story makes me think of a few things:

The lively discussion on Phil's entries on global warming here and here prompted me to think about the sources of my own attitudes toward this and other scientific issues.

For the climate change question, I'm well situated to have an informed opinion: I have a degree in physics, two of my closest friends have studied the topic pretty carefully, and I've worked on a couple related research projects, one involving global climate models and one involving tree ring data.

In our climate modeling project we were trying to combine different temperature forecasts on a scale in which Africa was represented by about 600 grid boxes. No matter how we combined these precipitation models, we couldn't get any useful forceasts out of them. Also, I did some finite-element analysis many years ago as part of a research project on the superheating of silicon crystals (for more details of the project, you can go to my published research papers and scroll way, way, way down). We were doing analysis on a four-inch wafer, and even that was tricky, so I'm not surprised that you'll have serious problems trying to model the climate in this way. As for the tree-ring analysis, I'm learning more about this now--we're just at the beginning of a three-year NSF-funded project--but, so far, it seems like one of those statistical problems that's easy to state but hard to solve, involving a sort of multilevel modeling of splines that's never been done before. It's tricky stuff, and I can well believe that previous analyses will need to be seriously revised.

Notwithstanding my credentials in this area, I actually take my actual opinions on climate change directly from Phil: he's more qualified to have an opinion on this than I am--unlike me, he's remained in physics--and he's put some time into reading up and thinking about the issues. He's also a bit of an outsider, in that he doesn't do climate change research himself. And if I have any questions about what Phil says, I can run it by Upmanu--a water-resources expert--and see what he thinks.

What if you don't know any experts personally?

It helps to have experts who are personal friends. Steven Levitt has been criticized for not talking over some of his climate-change speculations with climate expert Raymond Pierrehumbert at the University of Chicago (who helpfully supplied a map showing how Levitt could get to his office), but I can almost sort-of understand why Levitt didn't do this. It's not so easy to understand what a subject-matter expert is saying--there really are language barriers, and if the expert is not a personal friend, communication can be difficult. It's not enough to simply be at the same university, and perhaps Levitt realized this.

Twitteo killed the bloggio star


I've seen the future of Liebling opimality, and it ain't pretty.

A. J. Liebling (author of The Honest Rainmaker and many other classics) once boasted, "I can write faster than anyone who can write better and I can write better than anyone who can write faster." I've long admired this sentiment, as has political journalist Mickey Kaus, who has lived it by moving from magazine and book writing to blogging and, now, twittering.

I'm worried, though, now that Kaus's blogging has become more twitter-like, that he's approaching a logical extreme of Liebling optimality, which is to make his posts shorter and shorter and faster and faster until he's reduced to sitting at his keyboard, posting single characters, one at a time, very rapidly:

e...r...y...4...2...n...u...and so forth.

Some spots on the efficient frontier are more comfortable than others, no?

P.S. On the other hand, I'm sure Kaus still has another book or two or three within him, if he decides to move back in the other direction along that curve.

I recently came across some links showing readers how to make their own data analysis and graphics from scratch. This is great stuff--spreading power tools to the masses and all that.

From Nathan Yau: How to Make a US County Thematic Map Using Free Tools and How to Make an Interactive Area Graph with Flare. I don't actually think the interactive area graphs are so great--they work with the Baby Name Wizard but to me they don't do much in the example that Nathan shows--but, that doesn't really matter, what's cool here is that he's showing us all exactly how to do it. This stuff is gonna put us statistical graphics experts out of business^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^Ha great service.

And Chris Masse points me to these instructions from blogger Iowahawk on downloading and analyzing a historical climate dataset. Good stuff.

David points me to this news article by Dennis Cauchon, which begins:

Professor Risk


From the co-author of the celebrated Scholarpedia article on Bayesian statistics...

Visualizing UK budget


I was impressed by the Where Does My Money Go? - an interactive visualization interface to the UK government budget. If one ignores the painful color scheme (see below), the interactivity of exploring the data is notable.


One particularly interesting aspect is a regional spending breakdown, which shows which regions are contributing to the budget and which ones are disproportionally benefiting from it.

The British also have a great website that quantitatively analyzes the behavior in their parliament: Public Whip.

Question about Regression


Marcos Sanches writes:

Hey, I don't think I ever posted a link to this. It's a discussion in the journal Statistics in Medicine of an article by David Lunn, David Spielhalter, Andrew Thomas ,and Nicky Best. (Sorry but I can't find the Lunn et al. article online, or I'd link to it.) Anyway, here's my discussion. Once upon a time . . .

I first saw BUGS in a demonstration version at a conference in 1991, but I didn't take it seriously until over a decade later, when I found that some of my Ph.D. students in political science were using Bugs to fit their models. It turned out that Bugs's modeling language was ideal for students who wanted to fit complex models but didn't have a full grasp of the mathematics of likelihood functions, let alone Bayesian inference and integration. I also learned that the modular structure of BUGS was a great way for students, and researchers in general, to think more about modeling and less about deciding which conventional structure should be fit to data.

Since then, my enthusiasm for BUGS has waxed and waned, depending on what sorts of problems I was working on. For example, in our study of income and voting in U.S. states [1], my colleagues fit all our models in BUGS. Meanwhile we kept running into difficulty when we tried to expand our model in different ways, most notably when going from varying-intercept multilevel regressions, to varying-intercept, varying-slope regressions, to models with more than two varying coefficients per group. Around this time I discovered lmer [2], a function in R which fits multilevel linear and generalized linear models allowing for varying intercepts and slopes. The lmer function can have convergence problems and does not account for uncertainty in the variance parameters, but it is faster than Bugs and in many cases more reliable-so much so that Jennifer Hill and I retooled our book on multilevel models to foreground lmer and de-emphasize Bugs, using the latter more as a way of illustrating models than as a practical tool.

What does BUGS do best and what does it do worst?

Climate skeptics, deniers, hawks, and True Believers


Lots of accusations are flying around in the climate change debate. People who believe in anthropogenic (human-caused) climate change are accused of practicing religion, not science. People who don't are called "deniers", which some of them think is an attempt to draw a moral link with holocaust deniers. Al Gore referred to Sarah Palin as a "climate change denier," and Palin immediately responded that she believes the climate changes, she just doesn't think the level of greenhouse gases in the atmosphere has anything to do with it. What's the right word to use for people like her? And yes, we do need some terminology if we want to be able to discuss the climate change debate!

Differential Evolution MCMC

| 1 Comment

John Salvatier writes:

I remember that you once mentioned an MCMC algorithm based on Differential Evolution, so I thought you might be interested in this paper, which introduces an algorithm based on Differential Evolution and claims to be useful even in high dimensional and multimodal problems.

Cool! Could this be implemented in Bugs, Jags, HBC, etc?

Jenny quotes Erica Wagner:

Isaac Bashevis Singer wrote for more than four decades on an Underwood portable. For him, his machine was a kind of first editor. "If this typewriter doesn't like a story, it refuses to work," he said. "I don't get a man to correct it since I know if I get a good idea the machine will make peace with me again. I don't believe my own words saying this, but I've had the experience so many times that I'm really astonished. But the typewriter is 42 years old. It should have some literary experience, it should have a mind of its own."

Hey, I've been writing for almost 42 years myself!

More to the point, the Singer quote reminds me of my own experience in doing mathematics. It's virtually impossible for me to write down a formula with pen on paper unless I understand what the formula means. The act of writing enforces rigor. It makes perfect sense to me that a similar thing would happen to Singer when typing stories.

A journalist contacted me to ask what I thought about this article by Marshall Burke, Edward Miguel, Shanker Satyanath, John Dykema, and David Lobell:

ClimateGate: How do YOU choose what to believe?


Like a lot of scientists -- I'm a physicist -- I assumed the "Climategate" flap would cause a minor stir but would not prompt any doubt about the threat of global warming, at least among educated, intelligent people. The evidence for anthropogenic (that is, human-caused) global warming is strong, comes from many sources, and has been subject to much scientific scrutiny. Plenty of data are freely available. The basic principles can be understood by just about anyone, and first- and second-order calculations can be perfomed by any physics grad student. Given these facts, questioning the occurrence of anthropogenic global warming seems crazy. (Predicting the details is much, much more complicated). And yet, I have seen discussions, articles, and blog posts from smart, educated people who seem to think that anthropogenic climate change is somehow called into question by the facts that (1) some scientists really, deeply believe that global warming skeptics are wrong in their analyses and should be shut out of the scientific discussion of global warming, and (2) one scientist may have fiddled with some of the numbers in making one of his plots. This is enough to make you skeptical of the whole scientific basis of global warming? Really?

"Orange" ain't so special

| No Comments

Mark Liberman comes in with a data-heavy update (and I mean "data-heavy" in a good way, not as some sort of euphemism for "data-adipose") on my comments of the other day. I'm glad to see that he agrees with me that my impressedness with Laura Wattenberg's observation was justified.

Yet more antblogging

| No Comments


James Waters writes:

Equation search, part 2


Some further thoughts on the Eureqa program which implements the curve-fitting method of Michael Schmidt and Hod Lipson:

The program kept running indefinitely, so I stopped it in the morning, at which point I noticed that the output didn't quite make sense, and I went back and realized that I'd messed up when trying to delete some extra data in the file. So I re-ran with the actual data. The program functioned as before but moved much quicker to a set of nearly-perfect fits (R-squared = 99.9997%, and no, that's not a typo). Here's what the program came up with:


The model at the very bottom of the list is pretty pointless, but in general I like the idea of including "scaffolding" (those simple models that we construct on the way toward building something that fits better) so I can't really complain.

It's hard to fault the program for not finding y^2 = x1^2 + x2^2, given that it already had such a success with the models that it did find.

Commenter Michael linked to a blog by somebody called The Last Psychiatrist, discussing the recent study by Rank and Hirschl estimating that half the kids in America in the 1970s were on food stamps at some point in their childhood. I've commented on some statistical aspects of that study, but The Last Psychiatrist makes some good points regarding how the numbers can and should be interpreted.

Hey, made ya look!

From Ubs:

How fast is Rickey? Rickey is so fast that he can steal more bases than Rickey. (And nobody steals more bases than Rickey.)

Maybe so, actually.

Equation search, part 1


For some reason, Aleks doesn't blog anymore; he just sends me things to blog. I guess he's decided his time is more usefully spent doing other things. I, however, continue to enjoy blogging as an alternative to real work and am a sucker for all the links Aleks sends me (except for the videos, which I never watch).

Aleks's latest find is a program called Eureqa that implements a high-tech curve-fitting method of Michael Schmidt and Hod Lipson. (And, unlike Clarence Thomas, I mean "high-tech" in a good way.) Schmidt and Lipson describe their algorithm as "distilling free-form natural laws from experimental data," which seems a bit over the top, but the basic idea seems sound: Instead of simply running a linear regression, the program searches through a larger space of functional forms, building models like Tinker Toys by continually adding components until the fit stops improving:


I have some thoughts on the limitations of this approach (see below), but to get things started I wanted to try out an example where I suspected this new approach would work well where more traditional statistical methods would fail.

The example I chose was a homework assignment that I included in a couple of my books. Here's the description:

I give students the following twenty data points and ask them to fit y as a function of x1 and x2.

y x1 x2
15.68 6.87 14.09
6.18 4.4 4.35
18.1 0.43 18.09
9.07 2.73 8.65
17.97 3.25 17.68
10.04 5.3 8.53
20.74 7.08 19.5
9.76 9.73 0.72
8.23 4.51 6.88
6.52 6.4 1.26
15.69 5.72 14.62
15.51 6.28 14.18
20.61 6.14 19.68
19.58 8.26 17.75
9.72 9.41 2.44
16.36 2.88 16.1
18.3 5.74 17.37
13.26 0.45 13.25
12.1 3.74 11.51
18.15 5.03 17.44
16.8 9.67 13.74
16.55 3.62 16.15
18.79 2.54 18.62
15.68 9.15 12.74
4.08 0.69 4.02
15.45 7.97 13.24
13.44 2.49 13.21
20.86 9.81 18.41
16.05 7.56 14.16
6 0.98 5.92
3.29 0.65 3.22
9.41 9 2.74
10.76 7.83 7.39
5.98 0.26 5.97
19.23 3.64 18.89
15.67 9.28 12.63
7.04 5.66 4.18
21.63 9.71 19.32
17.84 9.36 15.19
7.49 0.88 7.43

[If you want to play along, try to fit the data before going on.]

A few days ago I posted some skeptical notes about the comparison of unemployment rates over time within education categories. My comments were really purely statistical; I know next to nothing about unemployment numbers.

One of the advantages of running this blog is I sometimes get emails from actual experts. In this case, economist John Schmitt wrote in:

Your post looks at just how comparable the unemployment rates are in 2009 and the early 1980s. The specific issue concerns whether we should factor in the big changes in educational attainment between the early 1980s and the present --our working population is a lot better educated today than it was in the early 1980s.

According to the piece that motivated your blog post, the unemployment rate for workers at each level of education is higher now than it was in the early 1980s. So, in a mechanical sense, the unemployment rate is lower in 2009 than it was in the early 1980s only because a larger portion of the population in 2009 has shifted to the "low-unemployment" higher education groups.

You take the view that the aggregate unemployment rate is what matters, not the disaggregated unemployment rates by education. Dean Baker and I, however, did a recent analysis in the spirit of the education-based analysis you cite.

We focused on how much older the workforce is today (rather than how much better educated it is), and conclude that if you want to do a sensible comparison, you'll want to factor in the age change.

The main argument from our [Schmitt and Baker's] paper:

Tom Ball writes:

Didn't know if you had seen this article [by Jason Richwine] about political allegiance and IQ but wanted to make sure you did. I'm surprised the author hasn't heard or seen of your work on Red and Blue states! What do you think?

I think the article raises some interesting issues but he seems to be undecided about whether to take the line that intelligent Americans mostly have conservative views ("[George W.] Bush's IQ is at least as high as John Kerry's" and "Even among the nation's smartest people, liberal elites could easily be in the minority politically") or the fallback position that, yes, maybe liberals are more intelligent than conservatives, but intelligence isn't such a good thing anyway ("The smartest people do not necessarily make the best political choices. William F. Buckley once famously declared that he would rather give control of our government to "the first 400 people listed in the Boston telephone directory than to the faculty of Harvard University."). One weakness of this latter argument is that the authorities he relies on for this point--William F. Buckley, Irving Kristol, etc.--were famous for being superintelligent. Richwine is in the awkward position of arguing that Saul Bellow's aunt (?) was more politically astute than Bellow, even though, in Kristol's words, "Saul's aunt may not have been a brilliant intellectual." Huh? We're taking Richwine's testimony on Saul Bellow's aunt's intelligence?

Richwine also gets into a tight spot when he associates conservativism as "following tradition" and liberalism with "non-traditional ideas." What is "traditional" can depend on your social setting. What it takes to be a rebel at the Columbia University faculty club is not necessarily what will get you thrown out of a country club in the Dallas suburbs. I think this might be what Tom Ball was thinking about when he referred to Red State, Blue State: political and cultural divisions mean different things in different places.

I do, however, agree with Richwine's general conclusion, which is that you're probably not going to learn much by comparing average IQ's of different groups. As Richwine writes, "The bottom line is that a political debate will never be resolved by measuring the IQs of groups on each side of the issue." African-Americans have low IQ's, on average, Jews have high IQ's on average, and both groups vote for the Democrats. Latinos have many socially conservative views but generally don't let those views get in the way of voting for Democrats.

Comment spam


All of a sudden we're getting a lot of comment spam, and so we've changed the settings so that it immediately approves comments only from "any authenticated commenters." Until we can figure out how to solve the spam problem, I guess we'll go back to approving comments a couple times a day.

I encourage youall to "authenticate" your comments (whatever that means) so they will appear immediately and people can respond right away, rather than having to wait to see your comment until it has been approved.

Aaron Swartz links to this rant from Philip Greenspun on university education. Despite Swartz's blurb, I didn't actually see any "new ideas" in Greenspun's article. (I agree with Greenspun's advice that teachers not grade their own students, but no, this isn't a new idea, it's just a good idea that's difficult enough to implement that it usually isn't done).

That's ok. New ideas are overrated. But this bit was just hilarious:

I'm on an email list of media experts for the American Statistical Association: from time to time a reporter contacts the ASA, and their questions are forwarded to us. Last week we got a question from Cari Tuna about the following pattern she had noticed:

Measured by unemployment, the answer appears to be no, or at least not yet. The jobless rate was 10.2% in October, compared with a peak of 10.8% in November and December of 1982.

But viewed another way, the current recession looks worse, not better. The unemployment rate among college graduates is higher than during the 1980s recession. Ditto for workers with some college, high-school graduates and high-school dropouts.

So how can the overall unemployment rate be lower today but higher among each group?

Several of us sent in answers. Call us media chasers or educators of the populace; whatever. Luckily I wasn't the only one to respond: I sent in a pretty lame example that I'd recalled from an old statistics textbook; whereas Xiao-Li Meng, Jeff Witmer, and others sent in more up-to-date items that Ms. Tuna had the good sense to use in her article.

There's something about this whole story that bothers me, though, and that is the implication that the within-group comparisons are real and the aggregate is misleading. As Tuna puts it:

The Simpson's Paradox in unemployment rates by education level is but the latest example. At a glance, the unemployment rate suggests that U.S. workers are faring better in this recession than during the recession of the early 1980s. But workers at each education level are worse off . . .

This discussion follows several examples where, as the experts put it, "The aggregate number really is meaningless. . . . You can't just look at the overall rate. . . ."

Here's the problem. Education categories now do not represent the same slices of the population that they did in 1976. A larger proportion of the population are college graduates (as is noted in the linked news article), and thus the comparison of college grads (or any other education category) from 1982 to the college grads today is not quite an apples-to-apples comparison. Being a college grad today is less exclusive than it was back then.

In this sense, the unemployment example is different in a key way from the other Simpson's paradox examples in the news article. In those other examples, the within-group comparison is clean, while the aggregate comparison is misleading. In the unemployment example, it's the aggregate that has a cleaner interpretation, while the within-group comparisons are a bit of a mess.

As a statistician and statistical educator, I think we have to be very careful about implying that the complicated analysis is always better. In this example, the complicated analysis can mislead! It's still good to know about Simpson's paradox, to understand how the within-group and aggregate comparisons can differ--but I think it's highly misleading in this case to imply that the aggregate comparison is wrong in some way. It's more of a problem of groups changing their meaning over time.

Regular readers of this blog are familiar with the pinch-hitter syndrome: People whose job it is to do just one thing are not always so good at that one thing. I first encountered this when noting the many silly errors introduced into my books by well-meaning copy-editors with too much time on their hands. As I wrote a few years ago:

This is a funny thing. A copy editor is a professional editor. All they do (or, at least, much of what they do) is edit, so how is it that they do such a bad job compared to a statistician, for whom writing is only a small part of the job description?

The answer certainly isn't that I'm so wonderful. Non-copy-editor colleagues can go through anything I write and find lots of typos, grammatical errors, confusing passages, and flat-out mistakes.

No, the problem comes with the copy editor, and I think it's an example of the pinch-hitter syndrome. The pinch-hitter is the guy who sits on the bench and then comes up to bat, often in a key moment of a close game. When I was a kid, I always thought that pinch hitters must be the best sluggers in baseball, because all they do (well, almost all) is hit. But of course this isn't the case--the best hitters play outfield, or first base, or third base, or whatever. If the pinch hitter were really good, he'd be a starter. So, Kirk Gibson in the 1988 World Series notwithstanding, pinch hitters are generally not the best hitters.

There must be some general social-science principle here, about generalists and specialists, roles in an organization, etc?

This idea was recently picked up by a real-life baseball statistician--Eric Seidman of Baseball Prospectus--who writes:

I wanted to talk to you about the pinch-hitter theory you presented, as I've noticed it in an abundance of situations as well.

When I read your theory it made perfect sense, although a slight modification is needed, namely in that it makes more sense as a relief-pitcher theory. In sabermetrics, we have found that pitchers perform better as relievers than they do as starters. In fact, if a starter becomes a reliever, you can expect him to lop about 1.4 runs off of his ERA and vice-versa, simply by virtue of facing batters more often. When you get to facing the batting order the 2nd and 3rd time through, relievers are almost always better options because they are fresh. Their talent levels are nowhere near those of the starters--otherwise, they would BE starters--but in that particular situation, their fresh "eyes" as it pertains to this metaphor are much more effective.

For another example, when working on my book Bridging the Statistical Gap, I found that my editor would make great changes but would miss a lot of ancillary things that I would notice upon delving back in after a week away from it. Applying that to the relief pitcher idea, the editor was still more talented when it came to editing, but his being "in too deep", the equivalent of facing the opposing batting order a few times, made my fresh eyes a bit more accurate.

I'm wondering if you have seen this written about in other areas, as it really intrigues me as a line of study, applying psychological concepts as well as those in statistics.

These are interesting thoughts--first, the idea of applying to relief pitchers, and, second, the "fresh eyes" idea, which is more adds some subtlety to the concept. I'm still not quite sure what he's saying about the pitchers, though: Is he saying that because relief pitchers come in with fresh arms, they can throw harder, or is he saying that, because hitters see starters over and over again, they can improve their swing as the game goes on, whereas when the reliever comes in, the hitters are starting afresh?

Beyond this, I'm interested in Seidman's larger question, about whether this is a more general psychological/sociological phenomenon. Do any social scientists out there have any thoughts?

P.S. I seem to recall Bill James disparaging the ERA statistic--he felt that "unearned" runs count too, and they don't happen by accident. So I'm surprised that the Baseball Prospectus people use ERA rather than RA. Is it just because ERA is what we're all familiar with, so the professional baseball statisticians want to talk our language? Or is ERA actually more useful than I thought?

Scientists behaving badly


Steven Levitt writes:

My view is that the emails [extracted by a hacker from the climatic research unit at the University of East Anglia] aren't that damaging. Is it surprising that scientists would try to keep work that disagrees with their findings out of journals? When I told my father that I was sending my work saying car seats are not that effective to medical journals, he laughed and said they would never publish it because of the result, no matter how well done the analysis was. (As is so often the case, he was right, and I eventually published it in an economics journal.)

Within the field of economics, academics work behind the scenes constantly trying to undermine each other. I've seen economists do far worse things than pulling tricks in figures. When economists get mixed up in public policy, things get messier. So it is not at all surprising to me that climate scientists would behave the same way.

I have a couple of comments, not about the global-warming emails--I haven't looked into this at all--but regarding Levitt's comments about scientists and their behavior:

1. Scientists are people and, as such, are varied and flawed. I get particularly annoyed with scientists who ignore criticisms that they can't refute. The give and take of evidence and argument is key to scientific progress.

2. Levitt writes, about scientists who "try to keep work that disagrees with their findings out of journals." This is or is not ethical behavior, depending on how it's done. If I review a paper for a journal and find that it has serious errors or, more generally, that it adds nothing to the literature, then I should recommend rejection--even if the article claims to have findings that disagree with my own work. Sure, I should bend over backwards and all that, but at some point, crap is crap. If the journal editor doesn't trust my independent judgment, that's fine, he or she should get additional reviewers. On occasion I've served as an outside "tiebreaker" referee for journals on controversial articles outside of my subfield.

Anyway, my point is that "trying to keep work out of journals" is ok if done through the usual editorial process, not so ok if done by calling the journal editor from a pay phone at 3am or whatever.

I wonder if Levitt is bringing up this particular example because he served as a referee for a special issue of a journal that he later criticized. So he's particularly aware of issues of peer review.

3. I'm not quite sure how to interpret the overall flow of Levitt's remarks. On one hand, I can't disagree with the descriptive implications: Some scientists behave badly. I don't know enough about economics to verify his claim that academics in that field "constantly trying to undermine each other . . . do far worse things than pulling tricks in figures"--but I'll take Levitt's word for it.

But I'm disturbed by the possible normative implications of Levitt's statement. It's certainly not the case that everybody does it! I'm a scientist, and, no, I don't "pull tricks in figures" or anything like this. I don't know what percentage of scientists we're talking about here, but I don't think this is what the best scientists do. And I certainly don't think it's ok to do so.

What I'm saying is, I think Levitt is doing a big service by publicly recognizing that scientists sometimes--often?--do unethical behavior such as hiding data. But I'm unhappy with the sense of amused, world-weary tolerance that I get from reading his comment.

Anyway, I had a similar reaction a few years ago when reading a novel about scientific misconduct. The implication of the novel was that scientific lying and cheating wasn't so bad, these guys are under a lot of pressure and they do what they can, etc. etc.--but I didn't buy it. For the reasons given here, I think scientists who are brilliant are less likely to cheat.

4. Regarding Levitt's specific example--he article on car seats that was rejected by medical journals--I wonder if he's being too quick to assume that the journals were trying to keep his work out because it disagreed with previous findings.

As a scientist whose papers have been rejected by top journals in many different fields, I think I can offer a useful perspective here.

Much of what makes a paper acceptable is style. As a statistician, I've mastered the Journal of the American Statistical Association style and have published lots of papers there. But I've never successfully published a paper in political science or economics without having a collaborator in that field. There's just certain things that a journal expects to see. It may be comforting to think that a journal will not publish something "because of the result," but my impression is that most journals like a bit of controversy--as long as it is presented in their style. I'm not surprised that, with his training, Levitt had more success publishing his public health work in econ journals.

P.S. Just to repeat, I'm speaking in general terms about scientific misbehavior, things such as, in Levitt's words, "pulling tricks in figures" or "far worse things." I'm not making a claim that the scientists at the University of East Anglia were doing this, or were not doing this, or whatever. I don't think I have anything particularly useful to add on that; you can follow the links in Freakonomics to see more on that particular example.

All Meehl, all the time

| No Comments

Brad Evans points me to this website devoted the publications of the great Paul Meehl.

Commenter RogerH pointed me to this article by Welton, Ades, Carlin, Altman, and Sterne on models for potentially biased evidence in meta-analysis using empirically based priors. The "Carlin" in the author list is my longtime collaborator John, so I really shouldn't have had to hear about this through a blog comment. Anyway, they write:

We present models for the combined analysis of evidence from randomized controlled trials categorized as being at either low or high risk of bias due to a flaw in their conduct. We formulate a bias model that incorporates between-study and between-meta-analysis heterogeneity in bias, and uncertainty in overall mean bias. We obtain algebraic expressions for the posterior distribution of the bias-adjusted treatment effect, which provide limiting values for the information that can be obtained from studies at high risk of bias. The parameters of the bias model can be estimated from collections of previously published meta-analyses. We explore alternative models for such data, and alternative methods for introducing prior information on the bias parameters into a new meta-analysis. Results from an illustrative example show that the bias-adjusted treatment effect estimates are sensitive to the way in which the meta-epidemiological data are modelled, but that using point estimates for bias parameters provides an adequate approximation to using a full joint prior distribution. A sensitivity analysis shows that the gain in precision from including studies at high risk of bias is likely to be low, however numerous or large their size, and that little is gained by incorporating such studies, unless the information from studies at low risk of bias is limited.We discuss approaches that might increase the value of including studies at high risk of bias, and the acceptability of the methods in the evaluation of health care interventions.

I really really like this idea. As Welton et al. discuss, their method represents two key conceptual advances:

1. In addition to downweighting questionable or possibly-biased studies, they also shift them to adjust in the direction of correcting for the bias.

2. Instead of merely deciding which studies to trust based on prior knowledge, literature review, and external considerations, they also use the data, through a meta-analysis, to estimate the amount of adjustment to do.

And, as a bonus, the article has excellent graphs. (It also has three ugly tables, with gratuitous precision such as "-0.781 (-1.002, -0.562)," but the graph-to-table ratio is much better than usual in this sort of statistical research paper, so I can't really complain.)

This work has some similarities to the corrections for nonsampling errors that we do in survey research. As such, I have one idea here. Would it be possible to take the partially-pooled estimates from any given analysis and re-express them as equivalent weights in a weighted average? (This is an idea I've discussed with John and is also featured in my "Survey weighting is a mess" paper.) I'm not saying there's anything so wonderful about weighted estimates, but it could help in understanding these methods to have a bridge to the past, as it were, and see how they compare in this way to other approaches.

Payment demanded for the meal

There's no free lunch, of course. What assumptions did Welton et al. put in to make this work? They write:

We base the parameters of our bias model on empirical evidence from collections of previously published meta-analyses, because single meta-analyses typically provide only limited information on the extent of bias . . . This, of course, entails the strong assumption that the mean bias in a new meta-analysis is exchangeable with the mean biases in the meta-analyses included in previous empirical (meta-epidemiological) studies. For example, the meta-analyses that were included in the study of Schulz et al. (1995) are mostly from maternity and child care studies, and we must doubt whether the mean bias in studies on drugs for schizophrenia (the Clozapine example meta-analysis) is exchangeable with the mean biases in this collection of meta-analyses.

Assumptions are good. I expect their assumptions are better than the default alternatives, and it's good to have the model laid out there for possible criticism and improvement.

P.S. The article focuses on medical examples but I think the methods would also be appropriate for experiments and observational studies in social science. A new way of thinking about the identification issues that we're talking about all the time.

What can search predict?

Actually, I don't have an RSS myself, but I think you get my point.

P.S. This reminds me that I have to talk with Sharad and Duncan again about their results on people's perceptions of their friends' attitudes. Last we spoke, I felt like we were closing in on a way of distinguishing between two stories: (1) people think their friends are like them, and (2) people predict their friends' attitudes from their friends' other characteristics. But we didn't completely close the deal. Sharad: if you're reading this, let's talk!

P.P.S. I'm a little irritated by one aspect of Sharad's blog, which is that the nearly contentless illustration at the top is much prettier than the ugly, sloppy bit of data graphics at the bottom. The illustration is great, but if you care that much about how things look, why not spend a few minutes on your statistical graphs? It's not just about appearances. With better graphics, you can learn more from the data. Especially if you also use multilevel models, so you can get good estimates about subsets of your population.

Recent Comments

  • Andrew Gelman: I still want to know what they were thinking, listing read more
  • Ben: Does reposting other people's (admittedly stat-related) comments count towards your read more
  • David Shor: How does the census handle it? read more
  • David Shor: I put a lot of weight on six and seven read more
  • dmk38: If you are going to be doing such a systematic read more
  • daniel: This is called a "mixed mode" survey and there is read more
  • Your blog post got me off my duff to talk read more
  • MV: Cap-recap? read more
  • Cyrus: This is an interesting question. One way to view it read more
  • Kaiser: Gianluca: your formulation is also common in business applications. k read more
  • Jeff S: Multi-unit dwellings can be surveyed without violating IRB. You need read more
  • Kaiser: This is an interesting problem that has broad implications for read more
  • Andrew Gelman: Joshua: No, negativity is no red herring. It's fundamental here read more
  • Joshua Vogelstein: Seems like negativity is sometimes a red herring. The more read more
  • Gianluca Baio: To add on Kaiser's point, while the ICER is still read more

About this Archive

This page is an archive of entries from December 2009 listed from newest to oldest.

November 2009 is the previous archive.

January 2010 is the next archive.

Find recent content on the main index or look in the archives to find all content.