April 2005 Archives

The other day in our research group we discussed a recent paper by Delia Grigg and Jonathan Katz (Social Sciences, Caltech) on majority-minority districts and Congressional elections. Jeronimo presented the paper, and David Epstein and I discussed it. This was a lively discussion, partly because Jonathan's conclusions disagreed with the findings of David's work on majority-minority redistricting (for example, this paper with Cameron and O'Halloran). In fact, scanning David's online C.V., he appears to have a paper with Sharyn O'Halloran from 2000 entitled, "The Impact of Majority-Minority Districts on Congressional Elections," which is the exact same title as Grigg and Katz's paper!

The Grigg and Katz paper had two main conclusions: first, that majority-minority districts (MMDs) increase minority representation, and second, that there is no evidence that MMDs help the Republicans. According to David, the first claim is in agreement with what all others have found, so we focused on the second claim, which would seem to contradict David's earlier study that found MMDs helping the Republicans.

This is (or has been) a big deal in redistricting in the U.S.: is it appropriate to carve out districts with mostly minority population, in order to increase the representation of ethnic minorities in the legislature? Will such rediscticting, paradoxically, help the Republicans (a party supported by only a small proportion of ethnic minority voters in the U.S.)?

I don't have any recent data easily at hand, but here's some representation data from 1989 (reproduced in this 2002 paper in Chance):

Proportion of
Proportion of seats in House
U.S. population of Representatives
Catholic 28% 27%
Methodist 4% 14%
Jewish 2% 7%
Black 12% 9%
Female 51% 6%
Under 25 37% 0

Major comments

On to Grigg and Katz . . . their paper has some data on individual districts but focuses on analyses with state-years as units, comparing states with no minority-majority districts to states that have at least one majority-minority district. Our main comment was that these comparisons will be difficult, for two reasons:

1. States with majority-minority districts are much different than states without. For one thing, states without MMD's are much smaller (this can be seen in Figure 3 of Griggs and Katz; the discreteness of the "no MMD" proportions imply that these are states with few congressional districts.

If we are imagining MMD's to be a "treatment" and are interested in the effect of MMD's, then we want to compare comparable states that do and don't have MMD's. Keeping all 50 states in the analysis might not make sense, since many states just don't have a chance of getting MMD's.

2. We were wondering if it would be helpful to look at the number of MMD's in a state. We could imagine that a state would necessarily have 1 or 2 MMD's, just from geography, but then redistricters could have the option to increase that to 3 or 4. In this case, we'd want to compare numbers, not just 0 vs. 1-or-more.

Other comments

Grigg and Katz used a parametric-form seats-votes curve (from King and Browning, 1987) to estimate partisan bias and electoral responsiveness in groups of state elections. I suspect they'd get much more precise and informative results using the newer JudgeIt approach (see here for a description and derivation).

To confirm things, I'd suggest that Griggs and Katz fit their models, using as outcome the Republican share of seats in the state. This is cruder than partisan bias but might show some general patterns, and it's less subject to criticisms of their parametric seats-votes model.

I liked how they presented their data and results using graphs. But we had a couple of questions. First, what are those "no MMD" points on the far right of Figure 14? We were wondering which was the state that was 35% minority, with minority Congressional seat shares around 40%, but no majority-minority districts. We were also confused about the tables on page 27 because we couldn't get the numbers to add up.

In summary . . .

The Grigg and Katz paper is an innovative look at majority-minority districting, following an approach of looking at the whole state rather than one district at a time. This is an approach Gary King and I have liked in studying redistricting in other contexts. However, I am not sure what to make of Grigg and Katz's substantive conclusions, because I don't know that their comparisons of states are appropriate for this observational study, and I worry about their measure of partisan bias. I hope these comments are helpful in their revision of this paper, and I thank Jonathan for sharing the paper with us.

P.S. Some people find political redistricting, or race-based redistricting, distasteful. By evaluating these programs, we are not making any moral judgment one way or another. Rather, we're trying to answer some empirical questions that could be relevant for considering such plans in the political process.

A question on graphics

| No Comments

Boris writes,

Fully Bayesian analyses of hierarchical linear models have been considered for at least forty years. A persistent challenge has been choosing a prior distribution for the hierarchical variance parameters. Proposed models include uniform distributions (on various scales), inverse-gamma distributions, and other families. We have recently made some progress in this area (see this paper, to appear in the online journal Bayesian Analysis).


The inverse-gamma has been popular recently (partly because it was used in the examples in the manual for the Bugs software package) but it has some unattractive properties--most notably, in the limit that the hyperparameters approach zero, the posterior distribution does not approach any reasonable limit. This casts suspicion on the standard use of prior densities such as inverse-gamma(0.001,0.001).


We have a new folded-noncentral-t family of prior distributions that are conditionally conjugate for the hierarchical normal model. The trick is to use a redundant multiplicative parameterization for the hierarchical standard deviation parameter--writing it as a product of two random variables, one with normal prior distribution and the other with an square-root-inverse-gamma model. The product is a folded-noncentral-t.

A special case of this model is the half-Cauchy (that is, the positive part of a Cauchy distribution, centered at 0). We tried it out on a standard example--the 8-schools problem from Chapter 5 of Bayesian Data Analysis--and it works well, even for the more challenging 3-schools problem, where usual noninformative prior distributions don't work so well.

Hierarchy of hierarchies

The half-Cauchy and its related distributions need some hyperparameters to be specified. The next step is to estimate these from data, by setting up a hyper-hyper-prior distribution on the multiple variance parameters that will exist in any hierarchical model. Actually, it's sort of cool--I think it's the next logical step to take in Bayesian Anova. We have an example in Section 6 of see this paper.

What comes next?

The next step is to come up with reasonable models for deep interactions (see here for some of our flailing in this area). Currently, the most challenging problems in multilevel models arise with sparse data with many possible levels of variance--and these are the settings where hierarchical hyper-hyper modeling of variance parameters should be possible. I think we're on the right track, at least for the sorts of social-science problems we work on.

The other challenge is multivariate hierarchical modeling, for example that arise in varying-intercept, varying-slope models. Here I think the so-called Sam method has promise, but we're still thinking about this.

A fix to the mvrnorm function


Recently when our Bayesian Data Analysis class was doing Gibbs sampling in R, some students noticed that they get missing values when sampling from multivariate normal distribution using the function mvrnorm in R (in the package MASS). The function seems to fail in some cases. This is due to some weirdness in the basic R function eigen that computes eigenvalues, so mvrnorm itself is not to blame. However mvrnorm might be the function you use more often so I wrote a simple patched version of mvrnorm which should fix this problem. It will also alert the user if the patch fails.

Details and more are here.

Have you done any MCMC sampling in R? Or, what programs to you use to do iterative sampling? How do you summarize the results? Just trust BUGS do everything?

Sometimes writing your own sampler may be inevitable. In our research there have been cases when BUGS just gets stuck. I'm writing an R program that should make writing samplers a semi-automatic task. Also I'm finishing the beta version of a random variable object class in R. Manipulating such objects instead of raw simulations should make Bayesian programming much more intuitive.

If you have any comments about your experiences about writing your own samplers and summarizing simulations, please leave a comment! Your comments will be helpful in developing the programs.

Carrie noticed an article in the Carlat Report describing some methods used in sponsored research to induce bias in drug trials:

1. Make sure your drug has a dosage advantage. This way, you can present your findings as a “head-to-head” trial without worrying that your drug will be outperformed. Thus, a recent article on Cymbalta concluded that “in three comparisons, the mean improvement for duloxetine was significantly greater than paroxetine or fluoxetine.” (Depression and Anxiety 2003, 18; 53-61). Not a surprising outcome, considering that Cymbalta was ramped up to a robust 120 mg QD, while both Prozac and Paxil were kept at a meek 20 mg QD.

2. Dose their drug to cause side effects. . . . The original Lexapro marketing relied heavily on a study comparing Lexapro 10 mg and 20 mg QD with Celexa 40 mg QD—yes, patients in the Celexa arm were started on 40 mg QD (J Clin Psychiatry 2002; 63:331-336). The inevitably higher rate of discontinuation with high-dose Celexa armed Forest reps with the spin that Lexapro is the best tolerated of the SSRIs. . . .

3. Pick and choose your outcomes. If the results of the study don’t quite match your high hopes for the drug, start digging around in the data, and chances are you’ll find something to make you smile! Neurontin (gabapentin) is a case in point. . . .

4. Practice "creative writing" in the abstract.

Carlat also cites a study from the British Medical Journal finding that "Studies sponsored by pharmaceutical firms were four times more likely to show results favoring the drug being tested than studies funded by other sources."

I don't know enough about medical trials to have a sense of how big a problem this is (or, for that matter, how to compare the negatives of biased research to the positives associated with research sponsorship), but at the very least it would seem to be a great example for that "how to lie with statistics" lecture in an intro statistics class.

One thing that interests me about Carlat's methods is that only item 3 ("Pick and choose your outcome") and possibly item 4 ("Practice creative writing") fit into the usual "how to lie with statistics" framework. Items 1 and 2, which involve rigging the design, are new to me. So maybe this would be a good article for an experimental design class.

For more examples and discussion, see the article by Daniel Safer in Journal of Nervous and Mental Disease 190, 583-592 (2002), cited by Carlet.

Too busy

| No Comments

Here's another Applied Micro talk I won't be able to see. But it looks interesting . . .

We had an interesting discussion on the blog entry last week about Bayesian statistics, where we wrote,

Boris presented the TSCS paper at midwest and was being accused by Neal Beck for not being a real Bayesian. Beck was making the claim that "we're not Bayesians" because we're using uninformative priors. He's seems to be under the assumption that bayesians only use informative priors.

Neal reports that he had more to say and graciously emailed me a longer verision of his comment about Bayesian methods. Neal has some interesting things to say. I'll present his comments, then my reactions.

I'm planning to teach a new course on statistical graphics next spring.


Graphical methods play several key roles in statistics:

- "Exploratory data analysis": finding patterns in raw data. This can be a challenge, especially with complex data sets.
- Understanding and making sense of a set of statistical analyses (that is, finding patterns in "cooked data")
- Clear presentation of results to others (and oneself!)

Compared to other areas of statistics, graphical methods require new ways of thinking and also new tools.

The borders of "statistical graphics" are not precisely defined. Neighboring fields include statistical computing, statistical communication, and multivariate analysis. Neighboring fields outside statistics include computer programming and graphics, visual perception, data mining, and graphical presentation.

Structure of the course:

Class meetings will include demonstrations, discussions of readings, and lectures. Depending on their individual interests, different students will have to master different in-depth topics. All students will learn to make clear and informative graphs for data exploration, substantive research, and presentation to self and others.

Students will work in pairs on final projects. A final project can be a new graphical analysis of a research topic of interest, an innovative graphical presentation of important data or data summaries, an experiment investigating the effectiveness of some graphical method, or a computer program implementing a useful graphical method. Each final project should take the form of a publishable article.

The primary textbook will be R Graphics, by Paul Murrell (to be published Summer, 2005).

See below for more information on the course; also see here for a related course by Bill Cleveland (inventor of lowess, among other things). Any further suggestions would be appreciated.

Against parsimony, again


The comments to a recent entry on "what is a Bayesian" moved toward a discussion of parsimony in modeling (also noted here). I'd like to comment on something that Dan Navarro wrote. First I'll repeat Dan's comments, then give my reactions.

Loss aversion etc

| 9 Comments | 7 TrackBacks

If a person is indifferent between [x+$10] and [55% chance of x+$20, 45% chance of x], for any x, then this attitude cannot reasonably be explained by expected utility maximization. The required utility function for money would curve so sharply as to be nonsensical (for example, U($2000)-U($1000) would have to be less than U($1000)-U($950)). This result is shown in a specific case as a classroom demonstration in Section 5 of a paper of mine in the American Statistician in 1998 and, more generally, as a mathematical theorem in a paper by my old economics classmate Matthew Rabin in Econometrica in 2000.

I was thinking about this stuff recently because of a discussion I had with Deb Frisch on her blog. I like Matt's 2000 paper a lot, but Deb seems to be really irritated by it. Her main source of irritation seems to be that Matt writes, "The theorem is entirely 'nonparametric,' assuming nothing about the utility function except concavity." But actually he assumes fairly strong assumptions about preferences (basically, a more general version of my [x, x+$10, x+$20] gamble above), and under expected utility, this has strong implications about the utility function.

Matt's key assumption could be called "translation invariance"--the point is that the small-stakes risk aversion holds at a wide range of wealth levels. That's the key assumption--the exact functional form isn't the issue. Deb compares to a power-law utility function, but expected-utility preferences under this power law would not show substantial small-scale risk aversion across a wide range of initial wealth levels.

Deb did notice one mistake in Matt's paper (and in mine too). Matt attributes the risk-averse attitude at small scales to "loss aversion." As Deb points out, this can't be the explanation, since if the attitude is set up as "being indifferent between [x+$10] and [55% chance of x+$20, 45% chance of x]", then no losses are involved. I attributed the attitude to "uncertainty aversion," which has the virtue of being logically possible in this example, but which, thinking about it now, I don't really believe.

Right now, I'm inclined to attribute small-stakes risk aversion to some sort of rule-following. For example, it makes sense to be risk averse for large stakes, and a natural generalization is to continue that risk aversion for payoffs in the $10, $20, $30 range. Basically, a "heuristic" or a simple rule giving us the ability to answer this sort of preference question.

Attitudes, not preference or actions

By the way, I've used the term "attitude" above, rather than "preference." I think "preference" is too much of a loaded word. For example, suppose I ask someone, "Do you prefer $20 or [55% chance of $30, 45% chance of $10]?" If he or she says, "I prefer the $20," I don't actually consider this any sort of underlying preference. It's a response to a question. Even if it's set up as a real choice, where they really get to pick, it's just a preference in a particular setting. But for most of these studies, we're really talking about attitudes.

Continuing the discussion of Neal Beck's comment on David Park's models: the concept of Bayesian inference has been steadily generalized over the decades. Let me steal some words from my 2003 article in the International Statistical Review:

It is an important tradition in Bayesian statistics to formalize potentially vague ideas, starting with the axiomatic treatment of prior information and decision making from the 1920s through the 1950s. For a more recent example, consider hierarchical modeling.

Intermittent phone service


I had always thought of "households with phones" and "households without phones" as two disjoint populations, with only the first group reachable by a telephone survey. In fact, I used this as an example in teaching surveys to distinguish between the "population" of phone households and the "universe" of all households. But when doing the weighting for the NYC Social Indicators Survey, we learned that about as many people in the U.S. have intermittent phone service as have no phone service--and if people with intermittent service have a phone about half the time, then they are indeed represented (although underrepresented) in phone surveys.

Are we not Bayesians?

| 12 Comments | 1 TrackBack

David reports,

Boris presented the TSCS paper at midwest and was being accused by Neal Beck for not being a real Bayesian. Beck was making the claim that "we're not Bayesians" because we're using uninformative priors. He's seems to be under the assumption that bayesians only use informative priors. Boris should have just directed him to your book and told him to read chapters 1 and 2! I know you've spoken to Beck before, but have you ever had such an exchange with him on this topic? He kept making the claim that if you use diffuse priors, all you're doing is MLE. It may be true that for many simple anaylses that bayesian inference and MLE can produce similar results, but Bayesian inference can easily be extended to more complex problems (something that MLE may have a harder time doing).

What is Bayesian inference?

My reply: Bayesian inference is characterized by the use of the posterior distribution--the distribution of unknowns, conditional on knowns. Bayesian inference can be done with different sorts of models. In general, more complex models are better (see here, also with some interesting discussion), but a simpler model is less effort to set up and can be used as a starting point in a wide range of examples.

Diffuse prior distributions

Diffuse prior distributions are a type of simplification. Other simplifications we commonly use are conventional data models such as the normal distribution, and conventional transformations such as logit or probit. Bayesian inference with these models is still Bayesian. In the model-checking stage of Bayesian data analysis (see Chapter 6 of our book), you can check the fit of the model and think about how to improve it.

More technically, an improper prior distribution can be considered as "noninformative" if it is a stable limit of proper prior distributions (see Sections 2.2-2.4 of this paper).

Hmmm . . . let me try to put this more aphoristically. Bayesian inference with the right model is better than Bayesian inference with a wrong model. "Improper" models (that is, models without a joint probability distribution for all knowns and unknowns in the model) cannot be right. But Bayesian inference with a wrong model is still Bayesian.

Update (19 Apr 05): Neal says he was misquoted. He also says he'll reply soon.

We would like to incorporate matching methods into a Bayesian regression framework for causal inference, with the ultimate goal of being able to do more effective inference using hierarchical modeling. The founding work here are papers by Cochran and Rubin in 1973, demonstrating that matching followed by regression outperforms either method alone, and papers by Rosenbaum and Rubin in 1984 on propensity scores.

Right now, our starting points are two recent review articles, one by Guido Imbens on the theory of regression and matching adjustments, and one by Liz Stuart on practical implementations of matching. So far, I've read Guido's article and have a bunch of comments/questions. Much of this involves my own work (since that's what I'm most familiar with), so I apologize in advance for that.

There was a lively discussion of my entry with googlefights between Clinton and Bush, so I thought it might be worth saying how this could be used for a project in an intro stats class.

Higher-income states support the Democrats, but higher-income voters support the Republicans. This confuses a lot of people (for example, see here and here).

Boris presented our paper on the topic at the Midwest Political Science meeting last weekend. Here's the presentation (we're still working on the paper).

Here's the abstract for the paper:

After reading Seth Roberts's article on self-experimentation, I had a dialogue with him about when to move from individual experimentation to a full-scale controlled experiment with a large-enough n to obtain statistically significant results. My last comment said:

But back to the details of your studies. What about the weight-loss treatment? That seems pretty straightforward--drink X amount of sugar water once a day, separated by at least an hour from any meals. To do a formal study, you'd have to think a bit about what would be a good control treatment (and then there are some statistical-power issues, for example in deciding whether it's worth trying to estimate a dose-response relation for X), but the treatment itself seems well defined.

Seth replied as follows:

Here are some relevant "facts":

Long ago, John Tukey said that he would rather have a sample of n = 3 (randomly selected) than Kinsey's really large non-random samples. He did not explain how one would get a randomly selected person to answer intimate questions. Once one considers that point Kinsey's work looks a little better -- because ANY actual sample will involve some compromise (probably large) with perfectly random sampling. Likewise, the closer one looks at the details of doing a study with n = 100, the more clearly one sees the advantages of smaller n studies.

How do the results of self-experimentation make their way in the world? An example is provided by blood-sugar testing for diabetics. Now it is everywhere -- "the greatest advance since the discovery of insulin," one diabetic told me. It began with self-experimentation by Richard Bernstein, an engineer at the time. With great difficulty, Bernstein managed to present his work at a scientific conference. It was then followed up by a British academic researcher, who began with relatively small n studies. I don't think he ever did a big study (e.g., n = 100). The benefits were perfectly clear with small n. From there it spread to become the norm. Likewise, I don't think that a really large study of my weight-loss ideas will ever be necessary. The benefits should be perfectly clear with small n. Fisher once said that what is really convincing is not a single study with a really low p value but repeated studies with p < .05. Likewise, I don't think that one study with n = 100 is half as convincing as several diverse studies with much smaller n.

It is so easy to believe that bigger is better (when in fact that is far from clear) that I wonder if it is something neurological: Our brains are wired to make us think that way. I cannot remember ever hearing a study proposed that I thought was too small; and I have heard dozens of proposed studies that I thought were too large. When I discussed this with Saul Sternberg, surely one of the greatest experimental psychologists of all time, he told me that he himself had made this very mistake: Gone too quickly to a large study. He wanted to measure something relatively precisely so he did an experiment with a large n (20 is large in cognitive psychology). The experiment failed to repeat the basic effect.

P.S. Seth's paper was also noted here.
See also here for Susan's comments.

No connection to statistics

| No Comments

From The Red Hot Typewriter: The Life and Times of John D. MacDonald, by Hugh Merrill:

A new book on R graphics


Jouni pointed me to a forthcoming book on statistical graphics in R, written by Paul Murrell at the University of Auckland (New Zealand). R is the open-source version of S and by far the best all-around computer package for statistical research and practice.

Based on the webpage, the book looks like it's going to be great. I was hoping to use it as one of the texts for my new course on statistical graphics, but now I'm thinking I'll also include it as a recommended text in all my classes. I particularly like Figure 1.8 (the "graphical table") which reminds me of my own work on turning tables into graphs.

More on Bayes in China

| No Comments

Here's an update of whether they didn't teach Bayesian statistics in China because the "prior distribution" violated the principles of communism:

Chuanhai writes, "Zaiying Huang and I took a Bayesian course taught in the department of Mathematics at Wuhan University in 1984-1985."

Hao writes, "Interesting. I didn't learn Bayes in China and never heard of this. But it sounds possible at that time."

Hongyu: "I did not hear this, I only learned the Bayes theorem but nothing else."


In my only "mathematical statistics" course back in college, my teacher told us the philosophical views of Bayesian statistics without much detail, but it sounded very cool.

Mao's quote should be interpreted (in the most direct Chinglish way) as "the truth needs to be examined using empirical facts". So I don't it completely conflicts with the views of Bayesian statistics.

Just my 2 cents!

Finally, Xiao-Li clarifies:

It's not my teachers, but rather time (or generation!) differences. It was late 70th when I got to colleague, when the culture revolution just ended. And indeed, my teachers told me about this in reference to why they did not "dare" to study Bayes *during* culture revolution (or study anything else for that matter). By 84-85, things have changed considerably. Indeed, in 85, I took a seminar course at Fudan during which I learned empirical Bayes. And according to Don, that is why I was admitted because I wrote a personal statement on why I wanted to study empirically Bayes. Don said he was impressed because finally there was a Chinese student who did not just say how good his/her mathematics was, but of course retrospectively I have to confess that I really didn't know much about what I was talking about! :-)

Of course Xiao-Li is being modest. He understood everything, but just in Chinese, not English!

Jeronimo pointed out this analysis by a bunch of statisticians comparing the 2004 exit polls with election results. The report (by Josh Mitteldorf, Kathy Dopp, and several others) claim an "absence of any statistically-plausible explanation for the discrepancy between Edison/Mitofsky’s exit poll data and the official presidential vote tally" and hence suggest that the vote itself may have been rigged.

Mittledorf et al. (hereafter, "US Count") present a number of arguments based on the results in the Edison/Mitofsky report that leave me intrigued but not convinced.

1. US Count starts with a histogram (on page 6 of their report) demonstrating that Bush outperformed the exit polls in most of the states. US Count then perform some implies some p-value calculations showing how unlikely this would be "less than 1 in 10,000,000" if the state errors were independent. But the errors are clearly a national phenomenon, so a calculation based on independent state errors misses the point. The real issue, as US Count recognizes elsewhere in its report, is: How plausible is it that Kerry voters responded to exit polls at a 6% higher rate than Bush voters, as would be needed to explain the errors?

2. US Count make various calculations of error rates in different precincts. These are interesting--at the very least, I don't think the patterns they find should be occurring if the poll is executed according to plan--but I don't see how they rule out an overall higher rate of response by Kerry than by Bush voters.

3. US Count notes that the exit polls predicted the Senate races better than the Presidential races in the states. Here are the errors (just for the 32 states in the data with Senate races):


(By the way, I give all error rates in terms of Democrats' share of the vote. Edison/Mitofsky gave errors in vote margin, which I found confusing. My numbers are just theirs divided by 2.)

Anyway, there is definitely something going on, but again it appears to be a national phenomenon. I'm not quite sure what sort of hypothesis of "cheating" would explain this.

Considering the US Votes hypothesis more carefully, it makes sense to look at the Edison/Mitofsky "composite estimates," which combine the exit poll with a "Prior Estimate, which is based upon analysis of the available pre-election surveys in each state." Unsurprisingly, these composite estimates are better (see page 20 of the Edison/Mitofsky report). And in assessing the hypothesis that the polls are OK but the votes were rigged, it makes sense to use these better estimates as a baseline.

Here are the errors in the Presidential and Senate vote from the composite predictions (exit polls combined with pre-election polls):


Discrepancies have changed but are still there. One hypothesis for the differences between Presidential and Senate error, considered and dismissed by US Votes, is split-ticket voting. In fact, though, the states with more split-ticket voting (as crudely measured by the absolute difference between Democratic Senatorial and Presidential vote shares) do show bigger absolute differences between Senate and Presidential errors.

4. Discrepancies are lowest with paper ballots and higher with election machines. I don't know that there are any reasonable hypotheses of fraud occurring with all machines (mechanical, touch screen, punch cards, optical scan), so I'm inclined to agree with Edison/Mitofsky that these differences can be better explained by rural/urban and other differences between precincts with different types of polling equipment.

5. The exit poll data do show some strange patterns, though. Let's go back to the state-by-state errors in the Presidential vote for the 49 states in the data. Here's a plot of the state error vs. Kerry's vote share:


What gives with the negative slope (which is, just slightly, "statistically significant")? This is not what you'd expect to see if the poll discrepancies are entirely due to sampling error. With only sampling error, the poll gives a so-called unbiased estimate of the population average, and so the errors should be uncorrelated with the actual outcome.

This doesn't mean there was any election fraud. It just means that the exit poll estimates (above, I was using Edison/Mitofsky's "best Geo estimator"; their "within-precinct error" gives very similar results) are not simply based on a random sample of all ballots cast in a precinct. As Edison/Mitofsky note on page 31 of their report, there are sources of error other than random sampling, most notably differental nonresponse. Perhaps these factors include votes taken during hours when the exit pollsters weren't there or other coverage issues. In some elections, vote tallies are statistically stable over time, but it doesn't have to be that way. Or maybe there were some other adjustments going on with the polls.


US Votes is correct to point out an inherent contradiction in the Edison/Mitofsky report, which is that it blamed the exit polls for the discrepancy while at the same time not seeming to look hard enough to find out where the problems were occurring. (To me, the most interesting table of the Edison/Mitofsky report came on page 35, where they report average within-precinct errors of 3.3%, 0.9%, 1.2%, 2.5%, and 1.1%--all in favor of the Democrats--in the five most recent elections. (Again, I'm dividing all their numbers by 2 to give errors in vote proportion rather than vote differential.))

The errors appear to be nationwide and would seem to be more consistent with nonresponse and undercoverage rather than something more local such as fraud.

Just scanning the web, I found more on this here, here, here, here, and here.

As Jeronimo said
, let's just hope this doesn't happen in Mexico!

Full disclosure: Five years ago, I briefly consulted for Voter News Service and met Warren Mitofsky. I have no current conflicts of interest.

P.S. Mark Blumenthal discusses these issues here and here.

On January 19, 2005 the now well-known Edison Media Research and Mitofsky International (EM) organizations published a report that evaluated their exit-poll system for the National Election Pool (NEP). In a nutshell the report concluded that the discrepancies with the exit-polls and the actual vote tally was because those who voted for Bush were less likely than those who voted for Kerry to respond to the pollsters. This post-hoc theorizing is being challenged by a group of statisticians who argue that ”The required pattern of exit poll participation by Kerry and Bush voters to satisfy the E/M exit poll data defies empirical experience and common sense under any assumed scenario.” (p.12)

What would happen if this same situation happens in another country like Mexico? Think about it…

I'll be speaking at Harvard next Monday on some joint work with Tian Zheng, Matt Salganik, Tom DiPrete, and Julien Teitler:

Networks--sets of objects connected by relationships--are important in a number of fields. The study of networks has long been central to sociology, where researchers have attempted to understand the causes and consequences of the structure of relationships in large groups of people. Using insight from previous network research, McCarty, Bernard, Killworth, et al. (1998, 2001) developed and evaluated a method for estimating the sizes of hard-to-count populations using network data collected from a simple random sample of Americans. In this paper we show how, using a multilevel overdispersed Poisson regression model, these data can also be used to estimate aspects of social structure in the population. Our work goes beyond most previous research by using variation as well as average responses to learn about social networks and leads to some interesting results. We apply our method to the McCarty et al. data and find that Americans vary greatly in their number of acquaintances. Further, Americans show great variation in propensity to form ties to people in some groups (e.g., males in prison, the homeless, and American Indians), but little variation for other groups (e.g., people named Michael or Nicole). We also explore other features of these data and consider ways in which survey data can be used to estimate network structure.

Our paper is here. And here's a paper by McCarty, Killworth, Bernard, Johnsen, and Shelley describing some of their work that we used as a starting point. (They estimate average network size at 290 but we get an estimate, using their data, of 750. The two estimates differ in corresponding to different depths of the social network.) McCarty et al. were very collegial in sharing their data with us, which we reanalyzed using a multilevel model. Here's a presentation I found on the web from Killworth on this stuff.

Update: Our paper will appear in the Journal of the American Statistical Association.

Recent Comments

  • noahpoah: I'm pretty sure MCMCglmm can fit hierarchical probit models. http://cran.r-project.org/web/packages/MCMCglmm/index.html read more
  • Kaiser: Great find. Judging from the direction of the distortion, one read more
  • Gustav: The MCMCglmm package? read more
  • Jules Papandrea: sounds like the wmd freaks of the early bush years read more
  • Kevin Canini: You didn't specify the context in which the chart appears, read more
  • Ivan Karamazov: Methinks the criteria are too narrow to describe good literature. read more
  • J.J. Hayes: The thing about 1984 is that it can be read read more
  • laura: Dear Professor Gelman, I went to your talk today! I read more
  • Tom V: @Muz: Yes, definitely The Dispossessed. It had a very similar read more
  • Marius Cobzarenco: I completely fail to see how "1984" is a left-wing read more
  • Nameless: The best way to justify the welfare state for a read more
  • Jeremy Miles: You need to come to Southern California to do a read more
  • ang: Prof. Gelman, enjoyed your talk yesterday. What do you make read more
  • Muz: The Dispossessed by Ursula Le Guin. It might be science read more
  • Dave Robinson: While a book but not necessarily a novel (as probably read more

About this Archive

This page is an archive of entries from April 2005 listed from newest to oldest.

March 2005 is the previous archive.

May 2005 is the next archive.

Find recent content on the main index or look in the archives to find all content.