November 2009 Archives

For observations like this:

Here's a little pet peeve of mine [Wattenberg's]: nothing rhymes with orange. You've heard that before, right? Orange is famous for its rhymelessness. There's even a comic strip called "Rhymes with Orange." Fine then, let me ask you something. What the heck rhymes with purple?

She continues, reasonably enough:

If you stop and think about it, you'll find that English is jam-packed with rhymeless common words. What rhymes with empty, or olive, or silver, or circle? You can even find plenty of one-syllable words like wolf, bulb, and beige. Yet orange somehow became notorious for its rhymelessness, with the curious result that people now assume its status is unique.


P.S. Also this.

DIC question


Kent Holsinger writes:

I'm fitting a moderately complicated hierarchical model including a structural equation model with latent variables to plant traits measured in a greenhouse.

Just one of these little things


In my paper with Aaron and Nate, we were getting probabilities such as 5e-85 or 9e-16 from our normal-distribution model. Sure, the probability that a vote in D.C. would be decisive was low, but 10^-85? No way. So we switched to a t_4 distribution, which smoothed these probabilities down to 2e-12 and 2e-10. Not perfect, I'm sure, but more plausible.

Just remember: don't be a slave to your model. Work with it. Taking a model too seriously is really just another way of not taking it seriously at all. (By this I mean that, if you say you really believe a probability such as 10^-85, what you're really doing is devaluing the concept of belief in probabilities, and I'm thinking that will leak back and corrupt all the other probabilistic statements you make.)

My talk at the Institut Henri Poincaré tomorrow 2pm:

Progress in statistical computation often leads to advances in statistical modeling. For example, it is surprisingly common that an existing model is reparameterized, solely for computational purposes, but then this new configuration motivates a new family of models that is useful in applied statistics. One reason why this phenomenon may not have been noticed in statistics is that reparameterizations do not change the likelihood. In a Bayesian framework, however, a transformation of parameters typically suggests a new family of prior distributions. We discuss examples in censored and truncated data, mixture modeling, multivariate imputation, stochastic processes, and multilevel models.

Here's the first slide:

Probably. But I'd like to see an analysis closer to the raw data.

An irritating copy editor at a journal changed all my instances of "for example" to "e.g." Because, y'know, so many more people read Latin than English. Ugh! I used ot go and change all these back, but I'm just too lazy now; I only do it with my books and don't bother with the articles.

Just so you know: if you ever see "e.g." or "i.e." in something I write: no, it's not me, it's the copy editor. Can't these people serve society in some useful way, maybe get jobs as spammers or debt collectors?

This comment by Tyler Cowen on Sarah Palin's poor Scrabble strategy reminds me of my blog a few months ago with six suggested Scrabble reforms. Without further ado:

In the old days, I guess I never would've heard about this one. With the internet, all sorts of horrible bits of local news get spread around the world.

I was watching Mad Men the other day (we're still watching season 1 on DVD, so, please, no spoilers in the comments) and it struck me that just about none of the characters on the show are in their forties (on average, the unhappiest time in people's lives, according to Andrew Oswald's research). Almost everyone is in their 20s and 30s, with a few kids and a few people in their 50s and up.

This is just one show, of course. The last show I watched, The Sopranos, was full of fortysomethings. But it got me wondering what the ages are of characters on various fictional and dramatic productions. It would be fun to have a big pile of this information and then play with it in various ways.

Once you start with this, of course, there's all sorts of things you could go around tabulating. If anybody out there happens to do such an analysis (or knows of something out there already), please let me know.

Julien Emile-Geay writes about a postdoc opportunity for a postdoc in climate
dynamics, applied statistics, or applied mathematics:

"Beyond the Hockey Stick: new approaches to paleoclimate reconstruction"


Including today's talk. See here.

In response to my offhand remark that the kernel densities in the article by Chen and Rodden are "tacky" and would be much better as histograms, commenter Anne asks:

What's wrong with a kernel density? Too opaque a connection with the data? I [Anne] have had some unpleasant surprises using histograms lately, so I've been trying to get a feel for the alternatives.

My reply: Here are my problems with kernel densities (in this example, and more generally):

1. Annoying artifacts, such as all-positive quantities whose kernel density estimates go into the negative zone. This can be fixed, but (a) it typically isn't, and (b) when there isn't an obvious bound, you still have the issue of the kernel density including places that it shouldn't.

2. It's hard to see where the data are. As I wrote in my blog linked above, I think it's better to just see the data directly. Especially for something like vote proportions that I can understand pretty well directly. For example, when I see the little peak at 3% in the density in Figure 2 of Chen and Rodden, or the falloff after 80%, I'd rather just see what's happening there rather than trying to guess by taking the density estimate and mentally un-convolving the kernel.

3. The other thing I like about a histogram is that it contains the seeds of its own destruction--that is, an internal estimate of uncertainty, based on variation in the heights of the histogram bars. See here for more discussion of this point, in particular the idea that the goal of a histogram or density estimate is not to most accurately estimate the true superpopulation density (whatever that means in this example) but rather to get an understanding of what the data are telling you.

Defining ethnicity down


One theme that comes up a lot when we discuss race and politics in the United States is the way that the concept "race" itself changes over time. For example, nowadays you hear a lot about white voters, but fifty years ago, the central category was occupied by white Protestants. White Catholics were considered a separate category, not black but not fully mainstream white, sort of like Hispanics and Asians today. (This is not to say that today's commentators treat whites as a monolithic bloc, there's still lots of talk about the Catholic vote, but I think that "white" as a category is perceived as having more meaning today as a national political category, compared to how things were thought of than in the mid-twentieth century.)

Another example is the perception of Asian Americans. I was thinking about this topic recently after seeing this offhand comment in a blog by Tom Maguire:

The Japanese are neither brown-skinned nor Muslim nor poor . . .

I'm pretty sure he's right about there being very few Muslim or poor people in Japan. (Not zero in any case but a small fraction of the total population of the country. Apparently Japan has a high rate of relative poverty, though.)

But I always thought the Japanese did have brown skin. I guess people used to say "yellow," but it always seemed more like "tan" to me. I wonder if this is some sort of modern redefinition: if Japanese are honorary whites, then their skin gets lightened too? I'm not suggesting any malign intent here on the part of Maguire, just wondering if there's some relabeling going on implicitly.

Sort of vaguely related to the idea that, until very recently, in the U.S. when people said "Asian" they meant east Asian and not, for example, south Asian. (Consider, for example, HABAW. I think she would've been referred to as HIBAW had she been south-Asian-looking. Especially considering that she had a British accent.)

P.S. That first paragraph above was pretty much of a mess. Probably because it's based on my speculations and not backed by any hard facts.

P.P.S. I can't quite bring myself to post this on 538; it doesn't quite seem of general interest. Scrolling through Maguire's blog, I noticed that he and his commenters don't have a very high impression of Nate, so I don't know what they'd think of my comments here.

P.P.P.S. I encountered Maguire's blog through a typically circuitous internet path, starting with a search offi my own blog to find this graph that had been posted by Greg Mankiw and then going to the source, then to this update, to the main page of that blog, where i scrolled though a few pages of Youtube links until I found this link which caught my eye.

Hal Daume writes some reasonable things here, mocking some silly rules that have been proposed for evaluating clustering procedures.

What's interesting to me is that such a wrongheaded idea (not Hal's, but the stuff that he's criticizing) could be taken so seriously in the first place.

Perhaps it's a problem with mathematics, that it takes people to a level of abstraction where they forget their original goals. I've seen this a lot in statistics, for example when people devise extremely elaborate procedures to calculate p-values that don't correspond to any actual data collection procedure. (Here I'm thinking of calculations of the distributions of contingency tables with fixed margins.)

P.S. Scroll to the end of the comments to see that Hal's a better person than I am, in that he doesn't waste his time cleaning out the spam from his blog comments.

ENGREF, 19 avenue du Maine, 75014 Paris (Salle 208)

9h30 - 10h15 : Posterior predictive checking and generalized graphical models
(Andrew Gelman, Columbia University - Department of Statistics)

10h15 - 10h35 : Pause

10h35 - 11h20 : Conditional predictive p-values and the double use of the data (Judith Rousseau, Paris-Dauphine - Ceremade)

11h20 - 12h00 : Forum, questions-réponses : le calcul des Facteurs de Bayes (Chantal Guihenneuc, ) - la loi Normale tronquée et traitement en WinBugs (Jean-Louis Foulley, INRA-Jouy)

12h00 - 14h00 : Pause pour le repas (libre)

14h00 - 14h45 : Modèles à équations structurelles : les modèles libres vs les réseaux bayesiens (Christian Derquenne, EDF - R&D)

14h45 - 15h30 : Approche bayésienne des modèles structurels
(Séverine Demeyer, LNE)

15h30 - 15h45 : Pause

15h45 - 16h30 : Enseigner l'inférence bayésienne aux débutants : l'analyse des données expérimentales (Bruno Lecoutre, CNRS - Université de Rouen)

16h30 - 17h00 : Discussion générale sur la prochaine journée d'Applibugs, suggestion de prochaines interventions

Blogging explosion!


If you've been paying attention to this blog over the weekend, you'll have learned that Rush Limbaugh accused Gallup of "upping the sample to black Americans to keep [Obama] up at 50%" in the polls. I remarked that, if you want to rig the polls and you're clever--and Gallup is nothing if not clever--you can do it without resorting to racial sampling.

Alan Abramowitz adds:

It is beyond weird that Rush Limbaugh is now accusing the Gallup Poll of deliberately over-counting Democrats because the truth is that the Gallup Poll has for the past several months consistently shown a smaller Democratic advantage in party identification than other national polls.

Connecting the dots


A key principle in applied statistics is that you should be able to connect between the raw data, your model, your methods, and your conclusions.

Unfortunately, this principle isn't often well understood. We've all seen it a zillion times: someone shows you a regression analysis with a counterintuitive result, but then when you ask to see where in the data this is happening, you're told: Don't worry, it's a regression, we controlled for everything. Or you'll see a regression or some other analysis backed up (if you could say that) by a couple of anecdotes. Again, though, you have to put full trust in the statistical analysis, because you can select an anecdote to support almost any point.

It is possible, however, to do better. IIn my own work, I try to link data to results in several ways: most obviously, with scatterplots showing data and fitted models (lots of examples in ARM) but also with graphical model checking. Your model's wrong, you know, and it can be a good idea ot explore the ways it doesn't match the data used to fit it, and to explore the ways it doesn't jibe with other information you have.

Anyway, this was really all just by way of introducing a beautiful little example from Seth Masket on the topic of national unemployment rates and congressional elections. After Masket posted a graph showing zero correlation between unemployment rates and the President's party's losses in midterm elections, Ross Douthat responded skeptically in the New York Times:

In the last 50 years, there's only been one midterm election fought with unemployment above 8 percent, let alone 10. (That would be 1982, when Reagan's Republicans lost 22 House seats.) The sample size of relevant races is way too small to draw any useful generalizations, in other words, and it's better to fall back on common sense . . .

Masket responded:

I agree with you that the lack of historical cases with very high unemployment should give us some humility in predicting next year's election. . . . As it happens, the average midterm seat loss for the president's party over the past sixty years is 22 seats. So if we knew nothing else about next year's election, the Democrats losing 22 House seats would be a reasonable guess. The fact that the one case with unemployment over nine percent (1982) produced precisely the average number of seat losses suggests that unemployment really isn't a factor.

Very nice.

P.S. For another example of the power of combining models with simple numbers, and also on the topic of unemployment rates, see Greg Mankiw's useful discussion of the difficulties of evaluating interventions when n=1:


Updated graph is here.

Also this scary, scary picture. Here I'd prefer to go back a few years on the x-axis. The graph with the forecast pretty much had to start near 2009--that's when the with/without-recovery-plan lines come from. But the historical jobs graph would be much better going back ten years or longer. Sure, you want enough resolution so you can see the trend in the past year, but you also want enough context to have a sense of the fluctuations, so you can see how often it is that 5 million jobs disappear like this.

Harry Selker and Alastair Wood say yes.

P.S. The answer is no. The offending language is no longer in the bill (perhaps in response to Selker and Wood's article).

P.P.S. Somebody checked again, and the offending language is still there!

Some sort of update to ggplot2


Jeroen Ooms writes:

Here's a first version of a new web application for exploratory graphical analysis. It attempts to implement the layered graphics from the R package ggplot2 in a user-friendly way. This two-minute demo video demonstrates a quick how-to.

He asks for feedback, so if you have any, feel free to comment. I don't know ggplot2 but my impression is that I should really be using it. Maybe Yu-Sung and Daniel should consider using it for mrp.

Everybody's a critic

| No Comments

Christopher Nelson tries his hand at being a graphics curmudgeon.

Mark Blumenthal links to Rush Limbaugh accusing Gallup of "upping the sample to black Americans to keep [Obama] up at 50%" in the polls. (For the context, see the last paragraph of the transcript.)

Frank Newport of Gallup responds here. Newport denies it all, but he would, wouldn't he?

Seriously, though, it's hard to believe that Limbaugh really believes that Gallup is fudging the numbers. As a big-time radio host, he's gotta know all about marketing surveys, right? I'm just assuming he said that "upping the sample" bit as more of a joke or an off-the-wall speculation. It did raise two interesting questions in my mind, though:

1. The assumption behind Limbaugh's argument--as with many arguments about polls--is that the published poll results have an effect of their own, beyond he president's underlying popularity. For example, maybe some senator would vote for the health care bill if he read that Obama's approval rating was 51% but would vote no if he read that Obama only had 49% approval. This might very well be true--it makes sense--I just don't really know.

2. What if you were a pollster and really did want to cheat and overrepresent Democrats? How would you do it? Contra Limbaugh's suggestion, I don't think you'd oversample blacks. I'm assuming Gallup does telephone surveys, and it's not like there's a separate telephone directory for blacks. Also, as several commenters to Newport noted, the percentage of blacks among the survey respondents is easy enough to check. And, for that matter, many survey organizations (possibly including Gallup) do post-sampling weighting adjustments for race, anyway, in which case oversampling blacks won't do anything for you at all.

If you're doing a telephone poll and want to oversample Democrats, you can just call states and area codes where more Democrats live. Call New York, LA, Chicago, etc. You can even call people in Democratic-leaning white areas if you want to mix things up a bit. That'll do the trick. Bury it deep enough in the sampling algorithm and maybe nobody will notice!

P.S. I looked at Gallup's home page and was surprised not to see any link to a description of their sampling methods. Or maybe it's somewhere and I didn't see it.

P.P.S. Blumenthal sent me this helpful link.

Jimmy points me to this article, "Why most discovered true associations are inflated," by J. P. Ioannidis. As Jimmy pointed out, this is exactly what we call type M (for magnitude) errors. I completely agree with Ioannidis's point, which he seems to be making more systematically than David Weakliem and I did in our recent article on the topic.

My only suggestion beyond what Ioannidis wrote has to do with potential solutions to the problem. His ideas include: "being cautious about newly discovered effect sizes, considering some rational down-adjustment, using analytical methods that correct for the anticipated inflation, ignoring the magnitude of the effect (if not necessary), conducting large studies in the discovery phase, using strict protocols for analyses, pursuing complete and transparent reporting of all results, placing emphasis on replication, and being fair with interpretation of results."

These are all good ideas. Here are two more suggestions:

1. Retrospective power calculations. See page 312 of our article for the classical version or page 313 for the Bayesian version. I think these can be considered as implementations of Iaonnides's ideas of caution, adjustment, and correction.

2. Hierarchical modeling, which partially pools estimated effects and reduces Type M errors as well as handling many multiple comparisons issues. Fuller discussion here (or see here for the soon-to-go-viral video version).

P.S. Here's the first mention of Type M errors that I know of. The problem is important enough, though, that I suspect there are articles on the topic going back to the 1950s or earlier in the psychometric literature.

Postdoc opportunities working with Prof. Andrew Gelman in the Department of Statistics on problems related to hierarchical modeling and statistical computing, with projects including high-dimensional modeling, missing-data imputation, and parallel computing. Application areas include public opinion and voting, social networks, international development, dendrochronology, and models of cancer and drug abuse. Applicants should have experience with Bayesian methods, a willingness to program, and an interest in learning. Applications will be considered as they arrive. The application consisting of cover letter, cv and a selection of published or unpublished articles should be emailed to Please also arrange for three letters of recommendation to be sent to the same email address. This is an exciting place to work: our research group involves several faculty, postdocs, graduate students, and undergraduates working on a wide range of interesting applied problems. We also have strong links to the Earth Institute, the Center for Computational Learning Systems, and the Columbia Population Research Center, as well as to Statistics, Political Science, and other academic departments at Columbia. As a postdoc here, you will have an opportunity to work on collaborative projects on theory, application, computation, and graphics. You can talk to our current and former postdocs if you want to hear how great it is to work here. Positions are usually for two years. Columbia University is an Equal Opportunity/Affirmative Action employer.

Also, if you're finishing up your Ph.D. in statistics, have interest in public health and international development, and would like to work with me, please contact me regarding the Earth Institute postdoc. Application deadline is 1 Dec, so time to get moving on this!

Seth writes:

Is this a fair statement, do you think?
Science revolves around the discovery of new cause-effect relationships but the entire statistics literature says almost nothing about how to do this.

It's part of an abstract for a talk I [Seth] will give at the ASA conference next July. Haven't submitted the abstract yet so can revise it or leave it out.

My reply: This seems reasonable to me.

You could clarify that the EDA literature is all about discovery of new relationships but with nothing about causality, while the identification literature is all about causality but nothing about the discovery of something new.

Nate, Daniel, and I have an op-ed in the Times today, about senators' positions and state-level opinion on health care. We write:

Lawmakers' support for or opposition to reform generally has less to do with the views of their constituents and more to do with the issue of presidential popularity. . . .

For instance, Senator Blanche Lincoln, a Democrat who has been a less-than-strong supporter of the present health care bill, recently told The Times, "I am responsible to the people of Arkansas, and that is where I will take my direction." But where does she look for her cue? Hers is a poor state whose voters support health care subsidies six percentage points more than the national average. On the other hand, Mr. Obama got just 40 percent of the vote there.

Likewise, in Louisiana, where the Annenberg surveys showed health care reform to be popular but where Mr. Obama is not, the Democrats are not assured of Mary Landrieu's vote. . . .

Here's our graph that makes this point:



Daniel Corsi writes:

I am a PhD student in epidemiology at McMaster University and I am interested in exploring how characteristics of communities are related to child health in developing countries.

I have been using multilevel models to relate physical characteristics of communities such as the number of schools, health clinics, sanitation facilities etc to child height for age and weight for age using observational/survey data.

I have several questions with regards to the group (community-level) level predictors in these models.

I was checking out the comments at my bloggingheads conversation with Eliezer Yudkowsky, and I noticed the following, from commenter bbbeard:

My sense is that there is a fundamental sickness at the heart of Bayesianism. Bayes' theorem is an uncontroversial proposition in both frequentist and Bayesian camps, since it can be formulated precisely in terms of event ensembles. However, the fundamental belief of the Bayesian interpretation, that all probabilities are subjective, is problematic -- for its lack of rigor. . . .

Andrew Roberts writes:

I teach political science at Northwestern. I have a book coming out with U of Chicago Press called "The Thinking Student's Guide to College" and I wanted to ask you a question about one part.

I have a section where I advocate a few "neglected majors". One of them is statistics. I wrote the following (see below) about statistics, but it seems a little dull to me. I'd be curious if you would add anything that would make the major seem more attractive. (FYI, the other neglected majors are linguistics, regional studies, and sociology).

To fully understand just about any phenomenon in the world, from atoms to people to countries, you need a grasp of statistics. Statistics teaches you how to measure quantities, collect data, and then draw inferences from that information. Though this might sound boring, these tasks are necessary to explain most of the forces affecting our lives, whether the workings of markets, the movement of public opinion, or the spread of disease. Not only does a statistics major give you the skills to answer these questions, it is also extremely marketable. There is hardly a firm which could not benefit from a trained statistician, and statisticians are just as desirable for public interest groups hoping to help the disadvantaged. And if you worry that you are not the math type, statistics is considerably less demanding than a pure math major and does more to help you understand the real world in all its complexities.

"Considerably less demanding than a pure math major," huh? OK, OK . . .

My main suggestion would be to be less apologetic. No need to say "Though this might sound boring"!

Perhaps some of you have specific suggestions for Andrew Roberts for his book?

In the spirit of Christian Robert, I'd like to link to my own adaptive Metropolis paper (with Cristian Pasarica):

A good choice of the proposal distribution is crucial for the rapid convergence of the Metropolis algorithm. In this paper, given a family of parametric Markovian kernels, we develop an adaptive algorithm for selecting the best kernel that maximizes the expected squared jumped distance, an objective function that characterizes the Markov chain. We demonstrate the effectiveness of our method in several examples.

The key idea is to use an importance-weighted calculation to home in on a jumping kernel that maximizes expected squared jumped distance (and thus minimizes first-order correlations). We have a bunch of examples to show how it works and to show how it outperforms the more traditional approach of tuning the acceptance rate:


Regarding the adaptivity issue, our tack is to recognize that the adaptation will be done in stages, along with convergence monitoring. We stop adapting once approximate convergence has been reached and consider the earlier iterations as burn-in. Given what is standard practice here anyway, I don't think we're really losing anything in efficiency by doing things this way.

Completely adaptive algorithms are cool too, but you can do a lot of useful adaptation in this semi-static way, adapting every 100 iterations or so and then stopping the adaptation when you've reached a stable point.

The article will appear in Statistica Sinica.

In response to my note on the limited ideological constraints faced by legislators running for reelection, Alan Abramowitz writes:

I [Abramowitz] agree--although they probably have less leeway now than in the past due to growing pressure toward ideological conformity within parties, especially GOP. But one thing that struck me as very interesting in your graph is that it looks like the advantage of a moderate voting record is considerably smaller now than it used to be, down from over 4 percentage points in the 1980s to maybe 1.5 points on average now. It suggests to me that the electorate has become increasingly partisan and that fewer voters are going to defect to an incumbent from the opposing party regardless of voting record. This could reflect more concern among voters with party control of Congress itself. Along these lines, one thing I've found in the NES data is a growing correlation between presidential job evaluations and voting for both House and Senate candidates over time.

My reply: Yes, that makes sense. The trend is suggestive although (as you can see from the error bars) not statistically significant. Recently I have not had my thoughts organized enough to write any articles on this stuff, but it feels good to at least post these fragments for others to chew on.

More on risk aversion etc etc etc


A correspondent writes:

You may be interested in this article by Matthew Rabin which makes the point that you make in your article: if you are an expected utility maximizer then turning down small actuarially unfair bets (e.g. 50% win $120; 50% lose $100) implies that you would never accept a bet where could lose $1000 (even if you might win an infinite amount of money). (But proved in more generality).

This was taught to me in the first year of my econ phd program (which I'm currently in!) as why you probably don't want to extrapolate from decisions over small bets to risk aversion in general, not as why we should throw out risk aversion and expected utility maximization completely. Of course, decision theorists do all kinds of things to try to "fix" this problem.

My reply: Yitzhak (as we called him in high school) wrote his paper after mine had appeared; unfortunately my article was in a statistics journal and he had not heard about it. (This was before I could publicize everything on the blog. And, even now, I think a few papers of mine manage to get out there without being noticed.)

I'm glad they teach this stuff in grad schools now--although, in a way, this still proves my point, in that the nonlinear-utility-function-for-money model is still considered such a standard that they feel the need to debunk it.

My correspondent replied: "I wouldn't call it a debunking....we still go on to use it as the workhorse model in everything we do...."

I think there are good and bad things about this "workhorse model":

The other day I commented on an article by Peter Bancel and Roger Nelson that reported evidence that "the coherent attention or emotional response of large populations" can affect the output of quantum-mechanical random number generators.

I was pretty dismissive of the article; in fact elsewhere I gave my post the title, "Some ESP-bashing red meat for you ScienceBlogs readers out there."

Dr. Bancel was pointed to my blog and felt I wasn't giving the full story. I'll give his comments and then at the end add some thoughts of my own. Bancel wrote:

1. From what I read, 2012 is a big-budget, low-brains remake of Miracle Mile, while completely missing the point of the original. So sad.

2. Meryl Streep was totally wasted in Fantastic Mr. Fox. And I don't mean she was drunk--well, maybe she was, who knows?--but her talent went largely unused. Seems like a crime to have Meryl Streep in a movie and not make more use of what she can do.

On the other hand, everyone deserves to relax now and then. If Streep is going to be taking a break, there's no harm in her doing it in the context of a movie.

And, on the plus side, they didn't let Gilbert Gottfried or anyone who sounds like him get anywhere near the place.

Email update


Somebody in Nigeria wants to send me money. I can't give you the details, but let me say that if this does work out, I'll never have to worry about book royalties again. I feel a little guilty about dealing in blood diamonds, but if this deal works out, I can make it all right by donating a lot to charity.

I've been ranting lately about how I don't like the term "risk aversion," and I was thinking it might help to bring up this post from last year:

Jonathan Rodden and Jowei Chen sent me this article:

When one of the major parties in the United States wins a substantially larger share of the seats than its vote share would seem to warrant, the conventional explanation lies in manipulation of maps by the party that controls the redistricting process. Yet this paper uses a unique data set from Florida to demonstrate a common mechanism through which substantial partisan bias can emerge purely from residential patterns. When partisan preferences are spatially dependent and partisanship is highly correlated with population density, any districting scheme that generates relatively compact, contiguous districts will tend to produce bias against the urban party. In order to demonstrate this empirically, we apply automated districting algorithms driven solely by compactness and contiguity parameters, building winner-take-all districts out of the precinct-level results of the tied Florida presidential election of 2000. The simulation results demonstrate that with 50 percent of the votes statewide, the Republicans can expect to win around 59 percent of the seats without any "intentional" gerrymandering. This is because urban districts tend to be homogeneous and Democratic while suburban and rural districts tend to be moderately Republican. Thus in Florida and other states where Democrats are highly concentrated in cities, the seemingly apolitical practice of requiring compact, contiguous districts will produce systematic pro-Republican electoral bias.

My thoughts:

"Subject: Our marketing plan"


I had mixed feelings about this one.

My first reaction was disappointment that Ellis Weiner, the author of the great book, Decade of the Year, is reduced to writing one-page bits where he's flopping around like a seal on a beach trying to squeeze a few laughs out of . . . whatever. You get the idea.

My second reaction, after reading the piece--I never read this sort of thing but I was curious what Weiner's been up to lately--was how scarily accurate his description was of the book-marketing process, the way they basically want you, the author, to do all the promotion, and also the bit where the publicist says, "I sort of have my hands full, promoting twenty-three new releases this fall, but I'm really excited about working on your book..."

My third reaction was: hey, this story really wasn't very good at all. Basically, he was taking a so-so idea (I mean, really, how many readers can really relate to complaints about book publicity) and then not taking it enough over the top. How did this end up getting published in the New Yorker, a place where I assume the best humor writers in America would kill to be in? Personal connections are part of the story, possibly. Beyond this, I suppose that magazine editors are the sort of people who would be particularly amused by a joke about the publishing industry. It's a real step down from the days of Veronica Geng, that's for sure.

P.S. I hate to be so negative, but I'm pretty sure that New Yorker writers don't spend their time reading statistics blogs. So I think I'm safe in writing this without worrying that I've hurt his feelings.

Matthew Yglesias remarks that, when staking out positions, congressmembers are not very strongly constrained by the ideologies of their constituents.

Wow, that was a lot of big words. What I meant to say was: Congressmembers and Senators can pretty much vote how they want on most issues, whatever their constituents happen to believe. Not always, of course, but a representative can take a much more liberal or conservative line than the voters in his or her district or state, and still do fine when election time comes.

Yglesias gives some examples from the U.S. Senate, and I just wanted to back him up by citing some research from the House of Representatives.

First, here's a graph (based on research with Jonathan Katz) showing that, when running for reelection, it helps for a congressmember to be a moderate--but not by much:


Being a moderate is worth about 2% of the vote in a congressional election: it ain't nuthin, but it certainly is not a paramount concern for most representatives.

Statistics for firefighters!


This is one I'd never thought about . . . Daniel Rubenson writes:

I'm an assistant professor in the Politics Department at Ryerson University in Toronto. I will be teaching an intro statistics course soon and I wanted to ask your advice about it. The course is taught to fire fighters in Ontario as part of a certificate program in public administration that they can take. The group is relatively small (15-20 students) and the course is delivered over an intensive 5 day period. It is not entirely clear yet whether we will have access to computers with any statistical software; the course is taught off campus at a training facility run by the Ontario Fire College.

Finding signal from noise


A reporter contacted me to ask my impression of this article by Peter Bancel and Roger Nelson, which reports evidence that "the coherent attention or emotional response of large populations" can affect the output of quantum-mechanical random number generators.

I spent a few minutes looking at the article, and, well, it's about what you might expect. Very professionally done, close to zero connection between their data and whatever they actually think they're studying.

Masanao points me to this. Incidentally, Don Rubin was a psychology grad student for awhile.

In the discussion of the attention-grabbing "global cooling" chapter of the new Freakonomics book,some critics have asked how it is that free-market advocates such as Levitt and Dubner can recommend climate engineering (a "controlled injection of sulfur dioxide into the stratosphere"), which seems like the ultimate in big-government solutions? True, the Freakonomics recommendation comes from a private firm, but it's hard to imagine it would just be implemented in somebody's backyard--I think you'd have to assume that some major government involvement would be necessary.

So what gives?

Ben Highton writes:

One of my colleagues thinks he remembers an essay your wrote in response to the Cox/Katz argument about using "involuntary exits" from the House (due to death, etc.) as a means to get leverage on the incumbency advantage as distinct from strategic retirement in their gerrymandering book. Would you mind sending me a copy?

My reply:

It's in our rejoinder to my article with Zaiying Huang, Estimating incumbency advantage and its variation, as an example of a before/after study (with discussion), JASA (2008). See page 450. Steve Ansolabehere assisted me in discussing this point.

P.S. There was a question about how this relates to David Lee's work on estimating incumbency advantage using discontinuities in the vote. My short answer is that Lee's work is interesting, but he's not measuring the effect of politicians' incumbency status. He's measuring the effect of being in the incumbent party, which in a country without strong candidate effects (India, perhaps, according to Leigh Linden) can make sense but doesn't correspond to what we think of as incumbency effects in the United States. Identification strategies are all well and good, but you have to look carefully at what you're actually identifying!

David Afshartous writes:

Regarding why one should not control for post-treatment variables (p.189, Data Analysis Using Regression and Multilevel/Hierarchical Models), the argument is very clear as shown in Figure 9.13, i.e., we would be comparing units that are not comparable as can be seen by looking at potential outcomes z^0 and z^1 which can never both be observed. How would you respond to someone that says "well, what about a cross-over experiment", wouldn't it be okay for that case?" I suppose one could reply that in a cross-over we do not have z^0 and z^1 in a strict sense, since we observe the effect of T=0 and T=1 on z at different times rather than the counterfactual for an identical time point, etc. Would you add anything further?

My reply: it could be ok, it depends on the context. One point that Rubin has made repeatedly over the past few decades is that inference depends on a model. With a clean, completely randomized design, you don't need much model to get inferences. A crossover design is more complicated. If you make some assumptions about how the treatment at time 1 affects the outcome after time 2, then you can go from there.

To put it another way, the full Bayesian analysis always conditions on all information. Whether this looks like "controlling" for an x-variable, in a regression sense, depends on the model that you're using.

Tyler Cowen links to a blog by Paul Kedrosky that asks why winning times in the Boston marathon have been more variable, in recent years, than winning times in New York. This particular question isn't so interesting--when I saw the title of the post, my first thought was "the weather," and, in fact, that and "the wind" are the most common responses of the blog commenters--but it reminded me of a more general question that we discussed the other day, which is how to think about Why questions.

Many years ago, Don Rubin convinced me that it's a lot easier to think about "effects of causes" than "causes of effects." For example, why did my cat die? Because she ran into the street, because a car was going too fast, because the driver wasn't paying attention, because a bird distracted the cat, because the rain stopped so the cat went outside, etc. When you look at it this way, the question of "why" is pretty meaningless.

Similarly, if you ask a question such as, What caused World War 1, the best sort of answers can take the form of potential-outcomes analyses. I don't think it makes sense to expect any sort of true causal answer here.

But, now let's get back to the "volatility of the Boston marathon" problem. Unlike the question of "why did my cat die" or "why did World War 1 start," the question, "Why have the winning times in the Boston marathon been so variable" does seem answerable.

What happens if we try to apply some statistical principles here?

Principle #1: Compared to what? We can't try to answer "why" without knowing what we are comparing to. This principle seems to work in the marathon-times example. The only way to talk about the Boston times as being unexpectedly variable is to know what "expectedly variable" is. Or, conversely, the New York times are unexpectedly stable compared to what was happening in Boston those same years. Either way, the principle holds that we are comparing to some model or another.

Principle #2: Look at effects of causes, rather than causes of effects. This principle seems to break down in marathon example, where it seems very natural to try to understand why an observed phenomenon is occurring.

What's going on? Perhaps we can understand in the context of another example, something that came up a couple years ago in some of my consulting work. The New York City Department of Health had a survey of rodent infestation, and they found that African Americans and Latinos were more likely than whites to have rodents in their apartments. This difference persisted (albeit at a lesser magnitude) after controlling for some individual and neighborhood-level predictors. Why does this gap remain? What other average differences are there among the dwellings of different ethnic groups?

OK, so now maybe we're getting somewhere. The question on deck now is, how do the "Boston vs. NY marathon" and "too many rodents" problems differ from the "dead cat" problem.

One difference is that we have data on lots of marathons and lots of rodents in apartments, but only one dead cat. But that doesn't quite work as a demarcation criterion (sorry, forgive me for working under the influence of Popper): even if there were only one running of each marathon, we could still quite reasonably answer questions such as, "Why was the winning time so much lower in NY than in Boston?" And, conversely, if we had lots of dead cats, we could start asking questions about attributable risks, but it still wouldn't quite make sense to ask why the cats are dying.

Another difference is that the marathon question and the roach question are comparisons (NY vs. Boston and blacks/hispanics vs. whites), while the dead cat stands alone (or swings alone, I guess I should say). Maybe this is closer to the demarcation we're looking for, the idea being that a "cause" (in this sense) is something that takes you away from some default model. In these examples, it's a model of zero differences between groups, but more generally it could be any model that gives predictions for data.

In this model-checking sense, the search for a cause is motivated by an itch--a disagreement with a default model--which has to be scratched and scratched until the discomfort goes away, by constructing a model that fits the data. Said model can then be interpreted causally in a Rubin-like, "effects of causes," forward-thinking way.

Is this the resolution I'm seeking? I'm not sure. But I need to figure this out, because I'm planning on basing my new intro stat course (and book) on the idea of statistics as comparisons.

P.S. I remain completely uninterested in questions such as, What is the cause? Is it A or is it B? (For example, what caused the differences in marathon-time variations in Boston and New York--is it the temperature, the precipitation, the wind, or something else? Of course if it can be any of these factors, it can be all of them. I remain firm in my belief that any statistical method that claims to distinguish between hypotheses in this way is really just using sampling variation as a way to draw artificial distinctions, fundamentally in a way no different from the notorious comparisons of statistical significance to non-significance.

This last point has nothing to do with causal inference and everything to do with my preference for continuous over discrete models in applications in which I've worked in social science, environmental science, and public health.

So go read the stuff on the main page now before it scrolls off your screens.

No, not me. It's somebody else. Story here.



Joshua Clover's fun new book, 1989, features a blurb from Luc Sante, author some years back of the instant-classic, Low Life. 1989 has some similarities to Low Life--both are about culture and politics--but Clover is much more explicit in making his connections, whereas Sante left most of his implications unsaid. I read Low Life when it came out, and I immediately felt: This-book-is-awesome-and-I get-it-but-nobody-else-will. I think, actually, that just about everybody who read Low Life had that same reaction, which is what being a "cult classic" is all about.

1989 was a bit of a nostalgia-fest for me, at least in the chapter about rap. (I'm not really familiar enough with the other musical styles of that era, so the other chapters were harder for me to follow. I read Cauty and Drummond's "The Manual" (another cult classic, I think) several years ago but only had a vague sense of them and I've never thought of taking their music seriously as some sort of cultural indicator.)

I remember when I first heard Straight Outta Compton, how dense and echoing the sound was, an intensity comparable (in my view) to the movie Salvador.

Judea Pearl sends along this article and writes:

This research note was triggered by a discussant on your blog, who called my attention to Wooldridge's paper, in response to my provocative question: "Has anyone seen a proof that adjusting for intemediary would introduce bias?"

It led to some interesting observations which I am now glad to share with your bloggers.

As Pearl writes, it is standard advice to adjust for all pre-treatment variables in an experiment or observational study--and I think everyone would agree on the point--but exactly how to "adjust" is not always clear. For example, you wouldn't want to just throw an instrumental variable as another regression predictor. And this then leads to tricky questions of what to do with variables that are sort-of instruments and sort-of covariates. I don't really know what to do in such situations, and maybe Pearl and Woolridge are pointing toward a useful way forward.

I'm curious what Rosenbaum and Rubin (both of whom are cited in Pearl's article) have to say about this paper. And, of course, WWJD.

6 cents a word


Helen DeWitt links to a blog by John Scalzi pointing out that today's science fiction magazine writers get the same rate--6 cents per word--at F. Scott Fitzgerald did tor his short stories in 1920. After correcting for inflation, this means Fitzgerald was paid 20 times as much.

Scalzi writes that this "basically sucks." But I'd frame this somewhat differently. After all, this is F. Scott Fitzgerald we're talking about. I'd guess he really is worth at least 20 times per word what the authors of articles in Fantasy & Science Fiction etc.

P.S. As a blogger, my word rate is 0 cents, of course.

1. Understanding the 'Russian Mortality Paradox' in Central Asia: Evidence from Kyrgyzstan

Short answer: alcohol and suicide.

2. Lumberjacks as a counterexample to the idea of a "risk premium"

They take lots of risks and don't get paid well for it.

3. Cell size and scale

This is a visualization you won't want to miss.

4. Three guys named Matt

5. The political philosophy of the private eye

A genre that was rendered obsolete in 1961 (but nobody realizes it).

The two blogs

| 1 Comment

Tyler Cowen writes:

Andrew Gelman will have a second blog. I don't yet understand the forthcoming principle of individuation across the two blogs.

I don't like the term "risk aversion" (see here and here). For a long time I've been meaning to write something longer and more systematic on the topic, but every once in awhile I see something that reminds me of the slipperiness of the topic.

For example, Alex Tabarrok asks, "Why are Americans more risk averse about medicine than Europeans?" It's a good question, and it's something I've wondered about myself. But I don't know what he's talking about when he says that "the stereotype is that Americans are more risk-loving" than Europeans. Huh? Americans are notorious for worrying about risks, with car seats, bike helmets, high railings on any possible place where someone could fall, Purell bottles everywhere, etc etc. The commenters on Alex's blog are all talking about drug company regulations, but it seems like a broader cultural thing to me.

But I'm bothered by the term "risk aversion." Why exactly is it appropriate to refer to strict rules on drug approvals as "risk averse"? In a general English-language use of the words, I understand it, but it gets slippery when you try to express it more formally.

Asa writes:

I took your class on multilevel models last year and have since found myself applying them in several different contexts. I am about to start a new project with a dataset in the tens of millions of observations. In my experience, multilevel modeling has been most important when the number of observations in at least one subgroup of interest is small. Getting started on this project, I have two questions:

1) Do multilevel models still have the potential to add much accuracy to predictions when n is very large in all subgroups of interest?

2) Do you find SAS, STATA, or R to be more efficient at handling multilevel/"mixed effects" models with such a large dataset (wont be needing any logit/poisson/glm models)?

My reply:

Regarding software, I'm not sure, but my guess is that Stata might be best with large datasets. Stata also has an active user community that can help with such questions.

For your second question, if n is large in all subgroups, then multilevel modeling is typically not needed. But if n is large in all subgroups, you can simply fit a separate model in each group. That is equivalent to a full-interaction model. At that point you might be interested in details within subgroups, and then you might want a multilevel model.

Asa then wrote:

Yes, a "full interaction" model was the alternative I was thinking of. And yes, I can imagine the results from that model raising further questions about whats going on within groups as well.

My previous guess was that SAS would be the most efficient for multilevel modeling with big data. But I just completely wrecked my (albeit early 2000's era) laptop looping proc mixed a bunch of times with a much smaller dataset.

I don't really know on the SAS vs. Stata issue. In general, I have warmer feelings toward Stata than SAS, but, on any particular problem, who knows? I'm pretty sure that R would choke on any of these problems.

On the other hand, if you end up breaking the problem into smaller pieces anyway, maybe the slowness of R wouldn't be so much of a problem. R does have the advantage of flexibility.

Aleks sends along this amusing news article by Jennifer Levitz:

A new study found that rates of marriage outside the faith were sharply curbed among young Jews who have taken "birthright" trips to Israel . . . Over the past decade, Taglit-Birthright Israel, a U.S. nonprofit founded by Jewish businessmen, has sponsored nearly 225,000 young Jewish adults for free 10-day educational tours of Israel as a way to foster Jewish identity. . . .

A study [by Brandeis University researcher Leonard Saxe and partly funded by Taglit-Birthright] showed that 72% of those who went on the trip married within the faith, compared with 46% of people who applied for the trip but weren't selected in a lottery. . . . The Brandeis study looked at 1,500 non-Orthodox Jewish adults who took Taglit trips or applied for one between 2001 and 2004. . . . The Brandeis study looked at 1,500 non-Orthodox Jewish adults who took Taglit trips or applied for one between 2001 and 2004.

The article also said that 10,000 people participated in these trips last summer, which suggests that the 1,500 people in the research study represent a very small fraction of the participants from 2001-2004. I have no idea if this is a random sample, or what. Also I wonder about the people who participated in the lottery, were selected, but didn't go on the trip. Excluding these people (if there are many of them) could bias the results. The news article unfortunately doesn't link to any research report.

Philip Stark sent along this set of calculations on the probability that the hidden message in Gov. Schwartzenegger's message could've occurred by chance. The message, if you haven't heard, is:

Med School Interview Questions


The questions are no big deal, but what I find interesting is that medical school do personal interviews at all. No place where I've ever worked has interviewed grad school applicants. It's hard for me to see what you get from it, that it would be worth the cost. I guess there must be quite a bit of psychology literature on this question.

Constructing informative priors


Christiaan de Leeuw writes:

I write to you with a question about the construction of informative priors in Bayesian analysis. Since most Bayesians at the statistics department here are more of the 'Objective' Bayes persuasion, I wanted some outside opinions as well.

Jay Kaufman writes:

I received the following email:

Hello, my name is Lauren Schmidt, and I recently graduated from the Brain & Cognitive Sciences graduate program at MIT, where I spent a lot of time doing online research using human subjects. I also spent a lot of time being frustrated with the limitations of various existing online research tools. So now I am co-founding a start-up, HeadLamp Research, with the goal of making online experimental design and data collection as fast, easy, powerful, and painless as can be. But we need your help to come up with an online research tool that is as useful as possible!

We have a short survey (5-10 min) on your research practices and needs, and we would really appreciate your input if you are interested in online data collection.

I imagine they're planning to make money off this start-up and so I think it would be only fair if they pay their survey participants. Perhaps they can give them a share of the profits, if any exist?

Guilherme Rocha writes:

Recent Comments

  • NikFromNYC: CLIMATEGATE 101: "Don't leave stuff lying around on ftp sites read more
  • TCO: McIntyre has had issues pointed out with the Wegman report read more
  • numeric: The short story is that if your close contact became read more
  • John Mashey: 1) The general point is certainly right. When I took read more
  • Dan G: The discussion of the newsarticle - which was written by read more
  • Funeral Home.: Excellent beat ! I would like to apprentice while read more
  • Justin Esarey: Andy, I share your skepticism about using p-values as inferential read more
  • geschenke für freundin: Wow , I have learnt so much from you i read more
  • geschenke für freundin: Wow , I have learnt so much from you i read more
  • physicsphile: With hard work and a good, helpful superviser, its read more
  • Dominik Lukes: I think this article on "On Chomsky and the Two read more
  • Darryl Manco: Social media is more than saying one has 10K followers read more
  • Joseph: I agree about the issue of civility. It makes it read more
  • Andrew Gelman: Richard: Good point. I have no idea why Edward Wegman, read more
  • Richard Hooker: I'm not sure I agree about the lazy thesis. In read more

About this Archive

This page is an archive of entries from November 2009 listed from newest to oldest.

October 2009 is the previous archive.

December 2009 is the next archive.

Find recent content on the main index or look in the archives to find all content.