Christopher Avery, Mark Glickman, Caroline Hoxby, and Andrew Metrick wrote a paper recently ranking colleges and universities based on the "revealed preferences" of the students making decisions about where to attend. They apply, to data on 3000 high-school students, statistical methods that have been developed to evaluate chess players, hospitals, and other things. If a student has been accepted to colleges A and B, and he or she chooses to attend A, this counts as a "win" for A and a "loss" for B.

# November 2004 Archives

Or maybe the right quote is, "It's what you learn after you know it all that counts." In any case, Chris Genovese and I have a little discussion on his blog on the topic of estimating the uncertainty in function estimates. My part of the discussion is pretty vague, but Chris is promising a link to an actual method, so this should be interesting.

Jasjeet Sekhon sent me the following note regarding analyses of votes for Bush and Kerry in counties in Florida:

Hi Sam and Andrew, I just saw your blog entries on the e-voting controversy. A week ago I posted a short research note about the optical voting machine vs. DRE issue in Florida. It is entitled "The 2004 Florida Optical Voting Machine Controversy: A Causal Analysis Using Matching". In this note, I try to obtain balance on all of the baseline variables I can find, and the results give NO support to the conjecture that optical voting machines resulted in fewer Kerry votes than the DREs would have. Of course, one really needs precinct-level data to make inferences to many of the counties in the state.Also of interest to you may be a research note by Jonathan Wand.

He has also obtained a ZERO effect for optical

machines by using Walter's and mine robust estimator. In that

analysis, ALL of the counties are used. Wand introduces a key new

variable: he uses campaign finance contributions as a covariate. But

has he notes the linearity assumption is dubious with this dataset.Cheers,

Jas.

Many of the wells used for drinking water in Bangladesh and other South Asian countries are contaminated with natural arsenic, affecting an estimated 100 million people. Arsenic is a cumulative poison, and exposure increases the risk of cancer and other diseases.

**Is my well safe?**

One of the challenges of reducing arsenic exposure is that there's no easy way to tell if your well is safe. Kits for measuring arsenic levels exist (and the evidence is that aresenic levels are stable over time in any given well), but we and other groups are just beginning to make these kits widely available locally.

Suppose your neighbor's well is low in arsenic. Does this mean that you can relax? Not necessarily. Below is a map of arsenic levels in all the wells in a small area (see the scale of the axes) in Araihazar upazila in Bangladesh:

Blue and green dots are the safest wells, yellow and orange exceed the Bangladesh standard of 50 micrograms per liter, and red and black indicate the highest levels of arsenic.

**Bad news: dangerous wells are near safe wells**

As you can see, even if your neighbor has a blue or green well, you're not necessarily safe. (The wells are located where people live. The empty areas between the wells are mostly cropland.) Safe and dangerous wells are intermingled.

**Good news: safe wells are near dangerous wells**

There is an upside, though: if you currently use a dangerous well, you are probably close to a safe well. The following histogram shows the distribution of distances to the nearest safe well, for the people in the map above who currently (actually, as of 2 years ago) have wells that are yellow, orange, red, or black:

**Switching and sharing**

So if you are told where that safe well is, maybe you can ask your neighbor who owns that well to share. In fact, a study by Alex Pfaff, Lex van Geen, and others has found that people really do switch wells when they are told that their well is unsafe. We're currently working on a cell-phone-based communication system to allow people in Bangladesh to get some of this information locally.

**General implications for decision analysis**

This is an interesting example for decision analysis because decisions must be made locally, and the effectiveness of various decision strategies can be estimated using direct manipulation of data, bypassing formal statistical analysis.

**Other details**

Things are really more complicated than this because the depth of the well is an important predictor, with different depths being "safe zones" in different areas, and people are busy drilling new wells as well as using and measuring existing ones. Some more details are at our papers in Risk Analysis and Environmental Science & Technology.

Here is one of my favorite homework assignments. I give students the following twenty data points and ask them to fit y as a function of x1 and x2.

x1 x2 y

0.4 19.7 19.7

2.8 19.1 19.3

4.0 18.2 18.6

6.0 5.2 7.9

1.1 4.3 4.4

2.6 9.3 9.6

7.1 3.6 8.0

5.3 14.8 15.7

9.7 11.9 15.4

3.1 9.3 9.8

9.9 2.8 10.3

5.3 9.9 11.2

6.7 15.4 16.8

4.3 2.7 5.1

6.1 10.6 12.2

9.0 16.6 18.9

4.2 11.4 12.2

4.5 18.8 19.3

5.2 15.6 16.5

4.3 17.9 18.4

[If you want to play along, try to fit the data before going on.]

Gonzalo pointed me to a paper by Michael Hout, Laura Mangels, Jennifer Carlson, and Rachel Best at Berkeley that points out some systematic differences between election outcomes in e-voting and non-e-voting counties in Florida. To jump to the punch line: they have found an interesting pattern, which closer study suggests arises from just two of the e-voting counties: Broward and Palm Beach, which unexpectedly swung about 3% toward Bush in 2004. They also make some pretty strong causal claims which I would think should be studied further, but with some skepticism.

**Pretty pictures**

Before getting to a discussion of this paper, let me show you a few pictures (adapted from an analysis performed Bruce Shaw here at Columbia).

First, a scatterplot of the counties in Florida, displaying the change in the Republican vote percentage from 2000 to 2004, plotting vs. the Republican vote percentage in 2000 (in both cases, using the Republican % of the two-party vote). Red circles indicate the counties that used e-voting in 2000, and black circles used optical scans. The radius of each circle is roughly proportional to the log of the #votes in the county.

There are three obvious patterns in the figure:

1. The e-voting counties, especially the largest of them, were more Democrat-leaning.

2. For the optical scan counties, there was a consistent trend that the counties that favored Bush more in 2000, tended to move even more toward Bush in 2000.

3. For the e-voting counties, no such trend is apparent. In particular, the two large red circles on the left of the plot (Broward and Palm Beach) moved strongly toward the Republicans.

Next: looking at previous years, and commenting on the Hout et al. paper.

This is an important question because, as they note in the article,

Rural areas of developing countries contain almost the entire stock of the world's tropical forest. The poverty levels in these areas and the world demands for forest conservation have generated discussions concerning the determinants of deforestation and the appropriate policies for conservation.When a neighbors have cleared their land of forest, a farmer is likely to clear his or her land also. However, as Robalino and Pfaff note, neighboring plots of land will have many potentially unobserved similarities, and so mere correlation between neighbors' decisions is not sufficient evidence of causation.

Rosalino and Pfaff estimate the effect of neighbors' actions on individual deforestation decisions using a two-stage probit regression. In their model, they treat the slopes of the neighboring farmers' land as an instrumental variable. I don't fully understand instrumental variables, but this looks like an interesting example as well as being an important application.

A common design in an experiment or observational study is to have two groups--treated and control units--and to take "before" and "after" measurements on each unit. (The basis of any experimental or observational study is to compare treated to control units; for example, there might be improvement from before to after whether or not a treatment was applied.)

**The usual model**

The usual statistical model for such data is a regression of "after" on "before" with parallel lines for treatment and control groups, with the difference between the lines representing the treatment effect. The implication is that the treatment has a constant effect, with the only difference between the two groups being an additive shift.

We went back and looked at some before-after data that we had kicking around (two observational studies from political science and an educational experiment) and found that this standard model was *not* true--in fact, treatment and control groups looked systematically different, in consistent ways.

**Actually...**

In the examples we studied, the correlation between "before" and "after" measurements was higher in the control group than in the treatment group. When you think about this, it makes sense: applying the "treatment" induces changes in the units, and so it is reasonable to expect a lower correlation with the "before" measurement.

Another way of saying this is: if the treatment effect varies (instead of being a constant), it will reduce the before-after correlation. So our finding can be interpreted as evidence that treatment effects generally vary, which of course makes sense.

In fact, the only settings we found where the controls did *not* have a higher before-after correlation than treated units, were when treatment effects were essentially zero.

The term "decision analysis" has multiple meanings in Bayesian statistics. When we use the term here, we are not talking about problems of parameter estimation, squared error loss, etc. Rather, we use "decision analysis" to refer to the solution of particular decision problems (such as in medicine, public health, or business) by averaging over uncertainties as estimated from a probability model. (See here for an example.)

That said, decision analysis has fundamental difficulties, most notably that it requires one to set up a utility function, which on one hand can be said to represent subjective feelings but on the other hand is presumably solid enough that it is worth using as the basis for potentially elaborate calculations.

From a foundational perspective, this problem can be resolved using the concept of *institutional decision analysis*.

Partisan voting is back. It is fed by new issues that fall on the

left/right ideological continuum. These are likely to be social,

religious and racial issues. This trend has led to an increase in

rationalization and, therefore, a weakening role for

retrospection. Voters are less willing to vote based on past

performance but more willing to offer evaluations that, even if

untrue, rationalize their partisan predispositions and vote

choices.

This is one of my three dissertation papers. It is the basis of my job talk. Any thoughts, citations, etc. are appreciated.

We call an Environmental Index an agglomeration of data compiled to provide a relative measure of environmental conditions. Environmental data is often sparse or non-random missing; many concepts, such as environmental risk or sustainability, are still being defined; indexers must balance modeling sophistication with modeling facility and model interpretability. We review our approaches to these constraints in the construction of the 2002 ESI and the UN Development Programme risk report.

This presentation, delivered at INFORMS2004, is a sketch of some work completed at CIESIN from 2001-2004 - where I spent two years as a gra. A paper has been submitted on diagnostics for multiple imputation used in the ESI. I hope to generate a paper on the bayesian network aggregation used in the risk index. I'm talking dec. 9th.

Lisa Levine sent me this interesting combinatorical explanation (written by Harry Graber, it seems) of why good anagrams are possible. He uses an example an anagram transcription by Richard Brodie of the Khayyam/Fitzgerald Rubaiyat (and here's another webpage with information on it).

Graber writes:

Radon is a radioactive gas that is generally believed to cause lung cancer, even in low concentrations, and might exists in high concentrations in the basement of your house (see the map).

The EPA recommends that you should test your home for radon and then fix the problem if your measurement is 4 picoCuries per liter or higher. We estimate that this strategy, if followed, would cost about $25 billion and save about 110,000 lives over the next thirty years.

We can do much better by using existing information on radon levels to target homes that are likely to have high levels. If meausrements are more targeted, we estimate that the same savings of 110,000 lives can be achieved at a cost of only $15 billion. The problem with the EPA's recommendation is that, by measuring everyone, including those who will probably have very low radon, it increases the number of false alarms--high measurements that occur just by chance in low-radon houses.

We found formal decision analysis to be a useful tool in quantifying the recommendations of where to measure and remediate. (For more details, see Section 22.4 of Bayesian Data Analysis and this paper).

Carrie McLaren has an interesting interview with Frank Ackerman and Lisa Heinzerling in the current Stay Free magazine, on the topic of cost-benefit analysis, as it is used in environmental regulations (for example, how much money is it worth spending to reduce arsenic exposures by a specified amount). Apparently, a case of chronic bronchitis has been judged to have a cost of $260,000, and IQ points are worth $8300 each. Ackerman and Heinzerling argue that cost-benefit analysis is "fundamentally flawed," basically because it involves a lot of arbitrary choices that allow regulators to do whatever they want and justify their choices with numbers.

This made me a little worried, since I've done some cost-benefit analysis myself! In particular, I'm sympathetic to the argument that cost-benefit analysis requires arbitrary choices of the value of a life (for example). Garbage in, garbage out, and all that. But, on the plus side, cost-benefit analysis allows one to quantify the gains from setting priorities. Even if you don't "believe" a particular value specified for value of a life, you can calculate conditional on that assumed value, as a starting point to understanding the full costs of different decision options.

With this mixture of interest and skepticism as background, I was interested to read the following exchange in the Stay Free interview:

What can be done to move cross-validation from a research idea to a routine step in Bayesian data analysis?

Cross-validation is a method for evaluating model using the following steps: (1) remove part of the data, (2) fit the model the smaller dataset excluding the removed part, (3) use the fitted model to predict the removed part, (4) summarizing the prediction error by comparing to the actual left-out data. The entire procedure can then be repeated with different pieces of data left out. Various versions of cross-validation compare to different choices of leaving out data--for example, removing just one point, or removing a randomly-selected 1/10 of the data, or removing half the data.

Several conceptual and computational challenges arise when attempting to apply cross-validation for Bayesian multilevel modeling.

**Background on cross-validation**

Unlike predictive checking (which is a method to discover ways in which a particular model does not fit the data), cross-validation is used to estimate the predictive error of a model and to compare models (choosing the model with lower estimated predictive error).

**Computational challenges**

With leave-one-out cross-validation, the model must be re-fit n times. That can take a long time, since fitting a Bayesian model even once can require iterative computation!

In classical regression, there are analytic formulas for estimates and predictions with one data point removed. But for full Bayesian computation, there are no such formulas.

Importance sampling has sometimes been suggested as a solution: if the posterior distribution is p(theta|y), and we remove data point y_i, then the leave-one-out posterior distribution is p(theta|y_{-i}), which is proportional to p(theta|y)/p(y_i|theta). One could then just use draws of theta from the posterior distribution and weight by 1/p(y_i|theta). However, this isn't a great practical solution since the weights, 1/p(y_i|theta), are unbounded, so the importance-weighted estimate can be unstable.

I suspect a better approach would be to use importance resampling (that is, sampling without replacement from the posterior draws of theta using 1/p(y_i|theta) as sampling weights) to get a few draws from an approximate leave-one-out posterior distribution, and then use a few steps of Metropolis updating to get closer.

For particular models (such as hierarchical linear and generalized linear models) it would also seem reasonable to try various approximations, for example estimating predictive errors conditional on the posterior distribution of the hyperparameters. If we avoid re-estimating hyperparameters, the computation becomes much quicker--basically, it's classical regression--and this should presumably be reasonable when the number of groups is high (another example of the blessing of dimensionality!).

**Leaving out larger chunks; fewer replications**

The computational cost of performing each cross-validation suggests that it might be better to do fewer. For example, instead of leaving out one data point and repeating n times, we could leave out 1/10 of the data and repeat 10 times.

**Multilevel models: cross-validating clusters**

When data have a multilevel (hierarchical) structure, it would make sense to cross-validate by leaving out data individually or in clusters, for example, leaving out a student within a school or leaving out an entire school. The two cross-validations test different things. Thus, there would be a cross-validation at each level of the model (just as there is an R-squared at each level).

Comparing models in the presence of lots of noise, as in binary-data regression

A final difficulty of cross-validation is that, in models where the data-level variation is high, most of the predictive error will be due to this data-level variation, and so vastly different models can actually have similar levels of cross-validation error.

Shouhao and I have noticed this problem in a logistic regression of vote preferences on demographic and geographic predictors. Given the information we have, most voters are predicted to have a probability between .3 and .7 of supporting either party. The predictive root mean squared error is necessarily then close to .5, no matter what we do with the model. However, when evaluating errors at the group level (leaving out data from an entire state), the cross-validation appears to be more informative.

Summary

Cross-validation is an important technique that should be standard, but there is no standard way of applying it in a Bayesian context. A good summary of some of the difficulties is in the paper, "Bayesian model assessment and comparison using cross-validation predictive densities," by Aki Vehtari and Jouko Lampinen, Neural Computation 14 (10), 2339-2468. Yet another idea is DIC, which is a mixed analytical/computational approximation to an esitmated predictive error.

I don't really know what's the best next step toward routinizing Bayesian cross-validation.

A wise statistician once told me that to succeed in statistics, one could either be really smart, or be Bayesian. He was joking (I think), but maybe an appropriate correlary to that sentiment is that to succeed in Bayesian statistics, one should either be really smart, or be a good programmer. There's been an explosion in recent years in the number and type of algorithms Bayesian statisticians have available for fitting models (i.e., generating a sample from a posterior distribution), and it cycles: as computers get faster and more powerful, more complex model-fitting algorithms are developed, and we can then start thinking of more complicated models to fit, which may require even more advanced computational methods, which creates a demand for bigger better faster computers, and the process continues. As new computational methods are developed, there is rarely well-tested, publicly-available software for implementing the algorithms, and so statisticians spend a fair amount of time doing computer programming. Not that there's anything wrong with that, but I know I at least have been guilty (once or twice, a long time ago) of being a little bit lax about making sure my programs actually work. It runs without crashing, it gives reasonable-looking results, it must be doing the right thing, right? Not necessarily, and this is why standard debugging methods from computer science and software engineering aren't always helpful for testing statistical software. The point is that we don't know exactly what the software is supposed output (if we knew what our parameter estimates were supposed to be, for example, we wouldn't need to write the program in the first place), so if software has an error that doesn't cause crashes or really crazy results, we might not notice.

So computing can be a problem. When fitting Bayesian models, however, it can also come to the rescue. (Like alcohol to Homer Simpson, computing power is both the cause of, and solution to, our problems. This particular problem, anyway.) The basic idea is that if we generate data from the model we want to fit and then analyze those data under the same model, we'll know what the results should be (on average), and can therefore test that the software is written correctly. Consider a Bayesian model p(θ)p(y|θ), where p(y|θ) is the sampling distribution of the data and p(θ) is the prior distribution for the parameter vector θ. If you draw a "true" parameter value θ0 from p(θ), then draw data y from p(y|θ0), and then analyze the data (i.e., generate a sample from the posterior distribution, p(θ|y)) under this same model, θ0 and the posterior sample will both be drawn from the same distribution, p(θ|y), if the software works correctly. Testing that the software works then amounts to testing that θ0 looks like a draw from p(θ|y). There are various ways to do this. Our proposed method is based on the idea that if θ0 and the posterior sample are drawn from the same distribution, then the quantile of θ0 with respect to the posterior sample should follow a Uniform(0,1) distribution. Our method for testing software is as follows:

1. Generate θ0 from p(θ)

2. Generate y from p(y|θ0)

3. Generate a sample from p(θ|y) using the software to be tested.

4. Calculate the quantile of θ0 with respect to the posterior sample. (If θ is a vector, do this for each scalar component of θ.)

Steps 1-4 comprise one replication. Performing many replications gives a sample of quantiles that should be uniformly distributed if the software works. To test this, we recommend performing a z test (individually for each component of θ) on the following transformation of the quantiles: h(q) = (q-.5)2. If q is uniformly distributed, h(q) has mean 1/12 and variance 1/180. Click here for a draft of our paper on software validation [reference updated], which explains why we (we being me, Andrew Gelman, and Don Rubin) like this particular transformation of the quantiles, and also presents an omnibus test for all components of θ simultaneously. We also present examples and discuss design issues, the need for proper prior distributions, why you can't really test software for implementing most frequentist methods this way, etc.

This may take a lot of computer time, but it doesn't take much more programming time than that required to write the model-fitting program in the first place, and the payoff could be big.

Here's another journalistic account of the Red/Blue divide. It's from today's (11/03/04) NY Times by Nicholas D. Kristof. He asserts that the poor (from America's heartland) vote Republican and the wealthy (from suburban America) vote Democratic.

In the aftermath of this civil war that our nation has just fought, one result is clear: the Democratic Party's first priority should be to reconnect with the American heartland.

I'm writing this on tenterhooks on Tuesday, without knowing the election results. But whether John Kerry's supporters are now celebrating or seeking asylum abroad, they should be feeling wretched about the millions of farmers, factory workers and waitresses who ended up voting - utterly against their own interests - for Republican candidates.

One of the Republican Party's major successes over the last few decades has been to persuade many of the working poor to vote for tax breaks for billionaires. Democrats are still effective on bread-and-butter issues like health care, but they come across in much of America as arrogant and out of touch the moment the discussion shifts to values.

"On values, they are really noncompetitive in the heartland," noted Mike Johanns, a Republican who is governor of Nebraska. "This kind of elitist, Eastern approach to the party is just devastating in the Midwest and Western states. It's very difficult for senatorial, Congressional and even local candidates to survive."

In the summer, I was home - too briefly - in Yamhill, Ore., a rural, working-class area where most people would benefit from Democratic policies on taxes and health care. But many of those people disdain Democrats as elitists who empathize with spotted owls rather than loggers.

One problem is the yuppification of the Democratic Party. Thomas Frank, author of the best political book of the year, "What's the Matter With Kansas: How Conservatives Won the Heart of America," says that Democratic leaders have been so eager to win over suburban professionals that they have lost touch with blue-collar America.

"There is a very upper-middle-class flavor to liberalism, and that's just bound to rub average people the wrong way," Mr. Frank said. He notes that Republicans have used "culturally powerful but content-free issues" to connect to ordinary voters.

To put it another way, Democrats peddle issues, and Republicans sell values. Consider the four G's: God, guns, gays and grizzlies.

One-third of Americans are evangelical Christians, and many of them perceive Democrats as often contemptuous of their faith. And, frankly, they're often right. Some evangelicals take revenge by smiting Democratic candidates.

Then we have guns, which are such an emotive issue that Idaho's Democratic candidate for the Senate two years ago, Alan Blinken, felt obliged to declare that he owned 24 guns "and I use them all." He still lost.

As for gays, that's a rare wedge issue that Democrats have managed to neutralize in part, along with abortion. Most Americans disapprove of gay marriage but do support some kind of civil unions (just as they oppose "partial birth" abortions but don't want teenage girls to die from coat-hanger abortions).

Finally, grizzlies - a metaphor for the way environmentalism is often perceived in the West as high-handed. When I visited Idaho, people were still enraged over a Clinton proposal to introduce 25 grizzly bears into the wild. It wasn't worth antagonizing most of Idaho over 25 bears.

"The Republicans are smarter," mused Oregon's governor, Ted Kulongoski, a Democrat. "They've created ... these social issues to get the public to stop looking at what's happening to them economically."

"What we once thought - that people would vote in their economic self-interest - is not true, and we Democrats haven't figured out how to deal with that."

Bill Clinton intuitively understood the challenge, and John Edwards seems to as well, perhaps because of their own working-class origins. But the party as a whole is mostly in denial.

To appeal to middle America, Democratic leaders don't need to carry guns to church services and shoot grizzlies on the way. But a starting point would be to shed their inhibitions about talking about faith, and to work more with religious groups.

Otherwise, the Democratic Party's efforts to improve the lives of working-class Americans in the long run will be blocked by the very people the Democrats aim to help.

Do we still see an (income) paradox in 2004? Let's first look at the state level. A quick correlation between median family income and percent Republican vote shows a -0.41 pearson's (and -.46 spearman) correlation. Both are significant. So at the state level, it looks like lower income states are voting for the Republican candidate and higher income states are voting for the Democratic candidate.

What about the individual level? Let's look at the exit polls.

R D

<15K 36% 63%

$15-30K 41 58

$30-50K 48 51

$50-75K 55 44

$75-100K 53 46

$100-150K 56 43

$150-200K 57 43

>$200K 62 37

So it looks like the paradox is alive an kicking. So do we still believe it's an aggregation problem? Is the paradox only alive in rural areas, but dead in urban areas? More to come...

In political science, there is an increasing availability of cross country survey data, such as the Comparative Study of Electoral Systems (CSES, 33 countries) and the World Values Study (WVS, more than 70 countries). What is the best way to analyze data with this structure, specially when one suspects a great deal of heterogeneity across countries?

The structure of cross-section survey data has a small number of countries relative to the number of observations in each country. This, of course, is the exact opposite of panel data. Methods such as random effects logit or probit work well under the assumption that the number of countries goes to infinity and the number of observations in each country is small. In fact, the computational strategies (Gauss-Hermite quadrature and variants) are only guaranteed to work when the number of observations per country is small. Another useful technique, robust standard errors clustered by country, is also known to provide overconfident standard errors when the number of clusters (in our case, countries) is small. Bayesian Multilevel models would work, but are we really worried about efficiency when we have more than 1000 observations per country?

Different people in the discipline have been suggesting a two step strategy. The first step involves estimating separate models for each country, obviously including only variables which vary within countries. Then one estimates a model for each coefficient as a function of contextual level variables (that are the main interest). Since the number of observations in each country is large, under standard assumptions the individual level estimates are consistent and asymptotically normal. We can take each of the individual level estimates to be a reduced form parameter of a fully interactive model.

The country level model might be estimated via ordinary least squares, or one of the various weighting schemes proposed in the meta-analysis literature (in addition to, of course, Bayesian Meta-Analysis). What are the potential problems and advantages of such approach? Here are the ones I can think of:

Advantages :

1) We don't need to give a distribution for the individual level estimates. That is, one need not to assume that the "random effects" have, for example, a normal distribution. The coefficients are simply estimated from the data.

2) Computational Speed when compared to full MCMC methods.

3) Some monte carlo evidence showing that the standard errors are closer to the nominal levels than alternative strategies.

Disadvantages:

1) When fitting discrete choice models (e.g. probit, logit) we need to worry about the scale invariance (we estimate beta/sigma in each country, but we do not constraint sigma to be the same across countries). Any ideas on how to solve this problem?

2) Efficiency losses (which I think are minimal)

Further issues:

1) Does it have any advantages over an interactive regression model with, say, clustered standard errors? Or GEE? Relatedly, do we interpret the effects as regular conditional (i.e. random effects) model?

2) Is it worrisome to fit a maximum likelihood in the first step and a bayesian model at the second?

We (John Huber, Georgia Kernell and Eduardo Leoni) took this approach in this paper, if you want to see an application. It is still a very rough draft, comments more than welcome.

There are a bunch of methods floating around for estimating ideal points of legislators and judges. We've done some work on the logistic regression ("3-parameter Rasch") model, and it might be helpful to see some references to other approaches.

I don't have any unified theory of these models, and I don't really have any good reason to prefer any of these models to any others. Just a couple of general comments: (1) Any model that makes probabilistic predictions can be judged on its own terms by comparing to actual data. (2) When a model is multidimensional, the number of dimensions is a modeling choice. (In our paper, we use 1-dimensional models but in any given application we would consider that as just a starting point. More dimensions will explain more of the data, which is a good thing.) I do not consider the number of dimensions to be, in any real sense, a "parameter" to be estimated.

Now, on to the models.

Most of us are familiar with the Poole and Rosenthal model for ideal points in roll-call voting. The website has tons of data and some cool dynamic graphics.

For a nice overview of distance-based models, see Simon Jackman's webpage on ideal-point models. This page has a derivation of the model from first principles along with code for fitting it yourself.

Aleks Jakulin has come up with his own procedure for hierarchical classification of legislators using roll-call votes and has lots of detail and cool pictures on his website. He also discusses the connection of these measures to voting power.

Jan de Leeuw has a paper on ideal point estimation as an example of principal component analysis. The paper is mostly about computation but it has an interesting discussion of some general ideas about how to model this sort of data.

Any other good references on this stuff? Let us know.

In his paper, Homer Gets a Tax Cut: Inequality and Public Policy in the American Mind, Larry Bartels studies the mystery of why most Americans support repeal of the estate tax, even at the same time that they believe the rich should pay more in taxes.

For example, as described in Bartels's article for The American Prospect,

In the sample as a whole, almost 70 percent favored repeal [of the estate tax]. But even among people with family incomes of less than $50,000 (about half the sample), 66 percent favored repeal. . . . Among people who said that the difference in incomes between rich and poor has increased in the past 20 years and that it is a bad thing, 66 percent favored repeal. . . . Among people who said that the rich are asked to pay too little in federal income taxes (more than half the sample), 68 percent favored repeal. And, most remarkably, among those respondents sharing all of these characteristics -- the 11 percent of the sample with the strongest conceivable set of reasons to support the estate tax -- 66 percent favored repeal.

Bartels's basic explanation of this pattern is that most people are confused, and they (mistakenly) think the estate tax repeal will benefit them personally.

His explanation sounds reasonable to me, at least in explaining many peoples' preferences on the issue. But I wonder if ideology can explain some of it, too. If you hold generally conservative views of the economy and politics, then the estate tax might seem unfair--and having this norm of fairness would be consistent with other views such as "the rich should have to pay their share." The point is that voters don't necessarily think in terms of total tax burden; rather, they can legitimately view each separate tax on its own and rate it with regard to fairness, effectiveness, etc.

Statistically, I'm envisioning a mixture model of different types of voters, some of whom support tax cuts for ideological reasons (I don't mean to "ideological" negatively here--I'm just referring to those people who tend to support tax cuts on principle), others who support the estate tax repeal because they (generally falsely) think it benefits them, and of course others who oppose repeal for various reasons. A mixture model might be able to separate these different groups more effectively than can be done using simple regressions.

## Recent Comments

anon:Wonder no longer (about Jost): http://thesituationist.wordpress.com/2011/03/02/ideological-bias-in-social-psychology/ read moreCosma Shalizi:What about Wasserman's All of Statistics? My understanding from Larry read moreRoss:This is it, as "Hopefully Anonymous" says information like this read moreFloat:What I find weird with this fee limitation is that read moreBen:Work around for savePlot error is to create a window read moreWayne:I like RStudio's promise, but haven't found it superior to read morelark:We are middle class (90K/ year) and we are sending read moreBP:I think the point is that Harvard should focus more read moreHarvard:If Harvard wants to help the average American, it read moreAlex Reutter:The "problem" is that it's "bad" for the Ivy+ schools read morePhil:On the one hand, I do think that posts like read moreDikran Karagueuzian:Many thanks for the thoughtful suggestions to all who responded. read moreIan Fellows:JJ and the RStudio team have done a great job. read moreK? O'Rourke:Believe the broken link has been fixed. (I do recall read moreBen:Did anyone get "savePlot" function to work? there may be read more