Results matching “R”

Doing graphics the George Orwell way

I want to talk about some similarities between writing and statistical graphics. Just about everybody knows something about writing, and I'd like to help transfer some of this expertise to thinking about statistical graphics.

The story begins with some ugly pie charts I noticed the other day. I wascommenting on them and suddenly realized . . . the graphs weren't as bad as I thought they were! To be more precise, the graphs had a lot of failings, but the sum total of all these problems wasn't so bad.

Here are the actual charts:

sanford1.PNG

sanford2.PNG

As I wrote earlier, these graphs have lots of obviously-fixable problems, most notably that the wedges aren't labeled directly. Instead, the reader has to go back and forth, back and forth, between the chart and the legend. On the other hand, the information is conveyed unambiguously.

I'd like to make the analogy to sloppy writing--misspellings, grammatical errors, sentence fragments and run-ons, garden-path sentences, distracting cliches, and all the rest. (All these "errors" can be used to good effect. No rule is absolute. For sure, baby. Much of the time, though, I think these really are mistakes rather than intentional use for )emphasis or clarity.)

Why is sloppy writing a bad thing? For example, what's wrong with using "it's" instead of "its," or messing up subject-verb agreement, or losing track of an adverb's pointer, in a setting where the meaning is clear? The problem is that it creates work for your readers, who often have to double back to figure out the meaning. If you're Ezra Pound writing a poem, maybe you want to have that effect, but I don't think it's the goal of most journalists, news bloggers, etc.

OK, back to the pie charts. They could be worse, but they require a lot of work to read. Arguably, this criticism could be thrown at any graph: for example, I love line plots, but if you've never seen a line plot before, you'll struggle with it. The difference is that you can learn to read line plots, but you'll never be able to quickly read the pie charts shown above: no matter what, you have to back and forth between the pie, the legend, the pie, the legend, and so forth, to keep it all in your mind at once.

To push the analogy further, I'm recommending what might be called the George Orwell approach to statistical graphics: the goal is to be clear as a window pane. This isn't the only option, though. There's the Chris Ware style: graphs that are tiny and nearly impossible to read, but if you stare at them for a long time you realize they actually make sense. Or the Martin Amis style: flashy gimmicks that make the graph fun to read even if you don't care so much about the subject. Or the Veronica Geng style: playing it straight while going over the top at the same time. And so forth.

I think some of the confusion that has arisen from Ed Tufte's work is that people read his book and then want to go make cool graphs of their own. But cool like Amis, not cool like Orwell. We each have our own styles, and I'm not trying to tell you what to do, just to help you look at your own writing and graphics so you can think harder about what you want your style to be.

P.S. Yes, yes, I'm sure I have various usage, grammatical, and stylistic errors above. Give me a break, man! It's just a blog entry. More to the point . . . by now you should trust me enough to think, when you see something discordant, that maybe I've done it on purpose!

P.P.S. Another issue is cost or effort. It wasn't necessarily worth it for Tom Schaller to learn a bunch of new graphical tools just to make his blog entry slightly easier to read. In my discussion above, I'm ignoring the investment in time required to think in terms of graphics and to learn the relevant software.

Ben Hyde pointed me to this data-based dating site. I have no comments on how it works for dates, but they have a lot of fun maps, for example this:

Are some human lives worth more than others?


Scale
268,864 people have answered

And this:

If you knew for sure you would not get caught,
would you commit murder for any reason?


Scale
359,761 people have answered

This is great; I can't resist giving a couple more:

Progress

Going through the Profiles in Research published by the Journal of Educational and Behavioral Statistics, I was amused to see the following concluding paragraph in the interview with Lyle Jones:

Despite my [Jones's] strong preference for interval estimation, there are situations for which a test of significance still may be appropriate. One is multiple comparisons, such as comparisons between all pairs of states for average student achievement scale scores in NAEP [National Assessment of Educational Progress]. A related application is assessing the goodness of fit of a model to an array of values. In these cases, interval estimation is not easily employed and the careful application of significance tests may continue to serve about as well as any alternative.

No! Not at all! My paper with Jennifer and Masanao specifically shows how interval estimation (i.e., multilevel modeling) solves the NAEP comparisons problem just fine (setting aside the question of whether we should be interested in these state-level averages in the first place). It's good to knows that some progress has been made since 2003.

Hard sell for Bayes

Here's our estimate of public support for vouchers, broken down by religion/ethnicity, income, and state:

vouchermapsBAYES2000.png

(Click on image to see larger version.)

We're mapping estimates from a hierarchical Bayes model fit to data from the 2000 Annenberg survey (approximately 50,000 respondents).

In case you're wondering what Bayesian modeling did for us, here are the corresponding maps from the raw data (weighted to adjust for voter turnout, but that doesn't actually do that much anyway):

vouchermapsRAW2000.png

(Click on image to see larger version.)

OK, so Bayes gives you a lot. The costs?

Beta distribution explorer

Brendan O'Connor created a small applet that allows exploring the beta distribution interactively (just hit arrow keys on the keyboard):

beta_explorer.png

This is a good example of what interactive visualization can do - Andreas Buja was also showing some cool examples some time ago.

He also has source available (for Processing).

What with all this discussion of causal inference, I thought I'd rerun a blog entry from a couple years ago about my personal trick for understanding instrumental variables:

A correspondent writes:

I've recently started skimming your blog (perhaps steered there by Brad deLong or Mark Thoma) but despite having waded through such enduring classics as Feller Vol II, Henri Theil's "Econometrics", James Hamilton's "Time Series Analysis", and T.W. Anderson's "Multivariate Analysis", I'm finding some of the discussions such as Pearl/Rubin a bit impenetrable. I don't have a stats degree so I am thinking there is some chunk of the core curriculum on modeling and causality that I am missing. Is there a book (likely one of yours - e.g. Bayesian Data Analysis) that you would recommend to help fill in my background?

1. I recommend the new book, "Mostly Harmless Econometrics," by Angrist and Pischke (see my review here).

2. After that, I'd read the following chapters from my book with Jennifer:

Chapter 9: Causal inference using regression on the treatment variable

Chapter 10: Causal inference using more advanced models

Here are some pretty pictures, from the low-birth-weight example:

fig10.3.png

and from the Electric Company example:

fig23.1_small.png

3. Beyond this, you could read the books by Morgan and Winship and Pearl, but both these are a bit more technical and less applied that the two books linked to above.

The commenters may have other suggestions.

A student at another university writes in with some questions about Red State, Blue State:

court.png

To learn why I made this graph, see here.

Robin Hanson is skeptical of my response in the following exchange:

Hanson: What do the customers who are paying your salary get from you?

Gelman: They learn how to fit multilevel models.

Going beyond exchangeable models

Richard Hahn writes:

In some talk slides you recently posted you have the following bullet point: "Need to go beyond exchangeability to shrink batches of parameters in a reasonable way." If you think other readers of the blog might find it interesting, I'd love to see you elaborate on this. While the whole talk is, of course, an elaboration, you do not elsewhere explicitly mention exchangeability. Isn't the point of de Finetti-style theorems that exchangeability is precisely the "reasonable" assumption that leads to parametric models with nice conditional independence properties? Such results entail that we're at liberty to make sophisticated, highly structured models based on conditional independence with the knowledge that a set of exchangeability judgments on observables lies back of them. Even very flexible, fancy DP-based Bayesian nonparametric models are based on notions of exchangeable random partitions. I'm probably just misreading you, but would be very interested in a clarification about what exactly you mean. If not, at root, exchangeability, then what else exactly is driving the batch shrinkage and how is it not ad hoc?

My quick reply: Consider a two-way data structure modeled as y_ij = a_i + b_j + c_ij, with no other information on the rows, the columns, or the individual cells. Then you have no choice but to model the a_i's and b_j's exchangeably. But the c_ij's can be modeled conditional on the a_j's and b_j's--that is, these latent parameters can be considered as group-level predictors. The model is still exchangeable on the i's and the j's, but not on the (ij)'s. This is sometimes called "partial exchangeability." More generally, one can consider three-way models, etc.

Daniel Egan sent me a link to an article, "Standardized or simple effect size: What should be reported?" by Thom Baguley, that recently appeared in the British Journal of Psychology. Here's the abstract:

It is regarded as best practice for psychologists to report effect size when disseminating quantitative research findings. Reporting of effect size in the psychological literature is patchy -- though this may be changing -- and when reported it is far from clear that appropriate effect size statistics are employed. This paper considers the practice of reporting point estimates of standardized effect size and explores factors such as reliability, range restriction and differences in design that distort standardized effect size unless suitable corrections are employed. For most purposes simple (unstandardized) effect size is more robust and versatile than standardized effect size. Guidelines for deciding what effect size metric to use and how to report it are outlined. Foremost among these are: (i) a preference for simple effect size over standardized effect size, and (ii) the use of confidence intervals to indicate a plausible range of values the effect might take. Deciding on the appropriate effect size statistic to report always requires careful thought and should be influenced by the goals of the researcher, the context of the research and the potential needs of readers.

Egan writes:

I run into the problem of reporting coefficients all the time, mostly in the context of presenting effects to non-statisticians. While my audiences are generally bright, the obvious question always asked is "which of these is the biggest effect?" The fact that a sex dummy has a large numerical point estimate relative to number-of-purchases is largely irrelevant - its because sex's range is tiny compared to other covariates. But moreover, sex is irrelevant to "policy-making" - we can't change a persons sex! So what we're interested in is the viable range over which we could influence an independent variable, and the second-order likely affect upon the dependent. So two questions: 1. For pedagogical effect, is there any way of getting around these problems? How can we communicate the effects to non-statisticians easily (and think someone who has exactly 10 minutes to understand your whole report) 2. Is there any easy way to infer the elasticity of the effect - i.e. how much can we change the dependent, by attempting to exogenously change one of the independents? While I know that I could design the experiment to do this, I work in far more observational data - and this "effect" size is really what matters the most.

My quick reply to Egan is to refer to my article with Iain Pardoe on average predictive comparisons, where we discuss some of these concerns.

I also have some thoughts on the Baguley article:

Work with me in Paris on a postdoc!

Among other things, while on sabbatical in Paris next year I'll be working with my longtime collaborator Frederic Bois, a toxicologist who uses hierarchical Bayes models extensively. We have a project in toxicology that necessarily also involves research in Bayesian computation.

And, there's a postdoctoral position available! Here are the details:

In the most recent round of our recent discussion, Judea Pearl wrote:

There is nothing in his theory of potential-outcome that forces one to "condition on all information" . . . Indiscriminate conditioning is a culturally-induced ritual that has survived, like the monarchy, only because it was erroneously supposed to do no harm.

I agree with the first part of Pearl's statement but not the second part (except to the extent that everything we do, from Bayesian data analysis to typing in English, is a "culturally induced ritual"). And I think I've spotted a key point of confusion.

To put it simply, Donald Rubin's approach to statistics has three parts:

1. The potential-outcomes model for causal inference: the so-called Neyman-Rubin model in which observed data are viewed as a sample from a hypothetical population that, in the simplest case of a binary treatment, includes y_i^1 and y_i^2 for each unit i).

2. Bayesian data analysis: the mode of statistical inference in which you set up a joint probability distribution for everything in your model, then condition on all observed information to get inferences, then evaluate the model by comparing predictive inferences to observed data and other information.

3. Questions of taste: the preference for models supplied from the outside rather than models inspired by data, a preference for models with relatively few parameters (for example, trends rather than splines), a general lack of interest in exploratory data analysis, a preference for writing models analytically rather than graphically, an interest in causal rather than descriptive estimands.

As that last list indicates, my own taste in statistical modeling differs in some ways from Rubin's. But what I want to focus on here is the distinction between item 1 (the potential outcomes notation) and item 2 (Bayesian data analysis).

The potential outcome notation and Bayesian data analysis are logically distinct concepts!

Items 1 and 2 above can occur together or separately. All four combinations (yes/yes, yes/no, no/yes, no/no) are possible:

- Rubin uses Bayesian inference to fit models in the potential outcome framework.

- Rosenbaum (and, in a different way, Greenland and Robins) use the potential outcome framework but estimate using non-Bayesian methods.

- Most of the time I use Bayesian methods but am not particularly thinking about causal questions.

- And, of course, there's lots of statistics and econometrics that's non-Bayesian and does not use potential outcomes.

Bayesian inference and conditioning

In Bayesian inference, you set up a model and then you condition on everything that's been observed. Pearl writes, "Indiscriminate conditioning is a culturally-induced ritual." Culturally-induced it may be, but it's just straight Bayes. I'm not saying that Pearl has to use Bayesian inference--lots of statisticians have done just fine without ever cracking open a prior distribution--but Bayes is certainly a well-recognized approach. As I think I wrote the other day, I use Bayesian inference not because I'm under the spell of a centuries-gone clergyman; I do it because I've seen it work, for me and for others.

Pearl's mistake here, I think, is to confuse "conditioning" with "including on the right-hand side of a regression equation." Conditioning depends on how the model is set up. For example, in their 1996 article, Angrist, Imbens, and Rubin showed how, under certain assumptions, conditioning on an intermediate outcome leads to an inference that is similar to an instrumental variables estimate. They don't suggest including an intermediate variable as a regression predictor or as a predictor in a propensity score matching routine, and they don't suggest including an instrument as a predictor in a propensity score model.

If a variable is "an intermediate outcome" or "an instrument," this is information that must be encoded in the model, perhaps using words or algebra (as in econometrics or in Rubin's notation) or perhaps using graphs (as in Pearl's notation). I agree with Steve Morgan in his comment that Rubin's notation and graphs can both be useful ways of formulating such models. To return to the discussion with Pearl: Rubin is using Bayesian inference and conditioning on all information, but "conditioning" is relative to a model and does not at all imply that all variables are put in as predictors in a regression.

Another example of Bayesian inference is the poststratification which I spoke of yesterday (see item 3 here). But, as I noted then, this really has nothing to do with causality; it's just manipulation of probability distributions in a useful way that allows us to include multiple sources of information.

P.S. We're lucky to be living now rather than 500 years ago, or we'd probably all be sitting around in a village arguing about obscure passages from the Bible.

A websearch turned up this link to our report on Jeff and Justin's research. It's great to see this stuff out there, but, really, "LGBTQI"? The way things are going, we'll be going through the whole alphabet soon! There's gotta be another way. Once you have "Q" in there, doesn't that pretty much cover all the contingencies?

To continue with our discussion (earlier entries 1, 2, and 3):

1. Pearl has mathematically proved the equivalence of Pearl's and Rubin's frameworks. At the same time, Pearl and Rubin recommend completely different approaches. For example, Rubin conditions on all information, whereas Pearl does not do so. In practice, the two approaches are much different. Accepting Pearl's mathematics (which I have no reason to doubt), this implies to me that Pearl's axioms do not quite apply to many of the settings that I'm interested in.

I think we've reached a stable point in this part of the discussion: we can all agree that Pearl's theorem is correct, and we can disagree as to whether its axioms and conditions apply to statistical modeling in the social and environmental sciences. I'd claim some authority on this latter point, given my extensive experience in this area--and of course, Rubin, Rosenbaum, etc., have further experience--but of course I have no problem with Pearl's methods being used on political science problems, and we can evaluate such applications one at a time.

2. Pearl and I have many interests in common, and we've each written two books that are relevant to this discussion. Unfortunately, I have not studied Pearl's books in detail and I doubt he's had the time to read my books in detail also. It takes a lot of work to understand someone else's framework, work that we don't necessarily want to do if we're already spending a lot of time and effort developing our own research programmes. It will probably be the job of future researchers to make the synthesis. (Yes, yes, I know that Pearl feels that he already has the synthesis, and that he's proved this to be the case, but Pearl's synthesis doesn't yet take me all the way to where I want to go, which is to do my applied work in social and environmental sciences.) I truly am open to the probability that everything I do can be usefully folded into Pearl's framework someday.

That said, I think Pearl is on shaky ground when he tries to say that Don Rubin or Paul Rosenbaum is making a major mistake in causal inference. If Pearl's mathematics implies that Rubin and Rosenbaum are making a mistake, then my first step would be to apply the syllogism the other way and see whether Pearl's assumptions are appropriate for the problem at hand.

3. I've discussed a poststratification example. As I discussed yesterday (see the first item here), a standard idea, both in survey sampling and causal inference, is to perform estimates conditional on background variables, and then average over the population distribution of the background variables to estimate the population average. Mathematically, p(theta) = sum_x p(theta|x)p(x). Or, if x is discrete and takes on only two values, p(theta) = (N_1 p(theta|x=1) + N_2 p(theta|x=2)) / (N_1 + N_2).

This has nothing at all to do with causal inference: it's straight Bayes.

Pearl thinks that if the separate components p(theta|x) are nonidentifiable, that you can't do this, and you should not include x in the analysis. He writes:

I [Pearl] would really like to see how a Bayesian method estimates the treatment effect in two subgroups where it is not identifiable, and then, by averaging the two results (with two huge posterior uncertainties) gets the correct average treatment effect, which is identifiable, hence has a narrow posterior uncertainly. . . . I have no doubt that it can be done by fine-tuned tweaking . . . But I am talking about doing it the honest way, as you described it: "the uncertainties in the two separate groups should cancel out when they're being combined to get the average treatment effect." If I recall my happy days as a Bayesian, the only operation allowed in combining uncertainties from two subgroups is taking a linear combination of the two, weighted by the (given) relative frequencies of the groups. But, I am willing to learn new methods.

I'm glad that Pearl is willing to learn new methods--so am I--but, no new methods are needed here! This is straightforward, simple Bayes. Rod Little has written a lot about these ideas. I wrote some papers on it in 1997 and 2004. Jeff Lax and Justin Phillips do it in their multilevel modeling and poststratification papers where, for the first, time, they get good state-by-state estimates of public opinion on gay rights issues. No "fine-tuned tweaking" required. You just set up the model and it all works out. If the likelihood provides little to no information on theta|x but it does provide good information on the marginal distribution of theta, then this will work out fine.

In practice, of course, nobody is going to control for x if we have no information on it. Bayesian poststratification really becomes useful in that it can put together different sources of partial information, such as data with small sample sizes in some cells, along with census data on population cell totals.

Please, please don't say "the correct thing to do is to ignore the subgroup identity." If you want to ignore some information, that's fine--in the context of the models you are using, it might even make sense. But Jeff and Justin and the rest of us use this additional information all the time, and we get a lot out of it. What we're doing is not incorrect at all. It's Bayesian inference. We set up a joint probability model and then work from it. If you want to criticize the probability model, that's fine. If you want to criticize the entire Bayesian edifice, then you'll have to go up against mountains of applied successes.

As I wrote earlier, you don't have to be a Bayesian (or, I could say, you don't have to be a Bayesian)--I have a great respect for the work of Hastie, Tibshirani, Robins, Rosenbaum, and many others who are developing methods outside the Bayesian framework)--but I think you're on thin ice if you want to try to claim that Bayesian analysis is "incorrect."

4. Jennifer and I and many others make the routine recommendation to exclude post-treatment variables from analysis. But, as both Pearl and Rubin have noted in different contexts, it can be a very good idea to include such variables--it's just not a good idea to include them as regression predictors.) If the only think you're allowed to do is regression (as in chapter 9 of ARM), then I think it's a good idea to exclude post-treatment predictors. If you're allowed more general models, then one can and should include them. I'm happy to have been corrected by both Pearl and Rubin on this one.

5. As I noted yesterday (see second-to-last item here), all statistical methods have holes. This is what motivates us to consider new conceptual frameworks as well as incremental improvements in the systems with which we are most familiar.

Summary . . . so far

I doubt this discussion is over yet, but I hope the above notes will settle some points. In particular:

- I accept (on authority of Pearl, Wasserman, etc.) that Pearl has proved the mathematical equivalence of his framework and Rubin's. This, along with Pearl's other claim that Rubin and Rosenbaum have made major blunders in applied causal inference (a claim that I doubt), leads me to believe that Pearl's axioms are in some way not appropriate to the sorts of problems that Rubin, Rosenbaum, and I work on: social and environmental problems that don't have clean mechanistic causation stories. Pearl believes his axioms do apply to these problems, but then again he doesn't have the extensive experience that Rosenbaum and Rubin have. So I think it's very reasonable to suppose that his axioms aren't quite appropriate here.

- Poststratification works just fine. It's straightforward Bayesian inference, nothing to do with causality at all.

- I have been sloppy when telling people not to include post-treatment variables. Both Rubin and Pearl, in their different ways, have been more precise about this.

- Much of this discussion is motivated by the fact, that, in practice, none of these methods currently solves all our applied problems in the way that we would like. I'm still struggling with various problems in descriptive/predictive modeling, and causation is even harder!

- Along with this, taste--that is, working with methods we're familiar with--matters. Any of these methods is only as good as the models we put into them, and we typically are better modelers when we use languages with which we're more familiar. (But not always. Sometimes it helps to liberate oneself, try something new, and break out of the implicit constraints we've been working on.)

I visited AT&T Labs today--lots of fun, a great group of people, an interesting mix of statistics and machine learning. They showed me some cool visualizations that I'll display soon.

Anyway, while I was there, somebody asked me about voters with different educational levels. In discussing it, we realized we wanted to break this down by ethnicity and age. So I quickly prepared a grid of graphs for him.

On the train ride back, I spent a few minutes making the graphs prettier:

edu.png

These are based on raw Pew data, reweighted to adjust for voter turnout by state, income, and ethnicity. No modeling of vote on age, education, and ethnicity. I think our future estimates based on the 9-way model will be better, but these are basically OK, I think. All but six of the dots in the graph are based on sample sizes greater than 30.

To follow up on yesterday's discussion, I wanted to go through a bunch of different issues involving graphical modeling and causal inference.

Contents:
- A practical issue: poststratification
- 3 kinds of graphs
- Minimal Pearl and Minimal Rubin
- Getting the most out of Minimal Pearl and Minimal Rubin
- Conceptual differences between Pearl's and Rubin's models
- Controlling for intermediate outcomes
- Statistical models are based on assumptions
- In defense of taste
- Argument from authority?
- How could these issues be resolved?
- Holes everywhere
- What I can contribute

Philip Dawid (a longtime Bayesian researcher who's done work on graphical models, decision theory, and predictive inference) saw our discussion on causality and sends in some interesting thoughts, which I'll post here and then very briefly comment on:

Having just read through this fascinating interchange, I [Dawid] confess to finding Shrier and Pearl's examples and arguments more convincing that Rubin's. At the risk of adding to the confusion, but also in hope of helping at least some others, let me briefly describe yet another way (related to Pearl's, but with significant differences) of formulating and thinking about the problem. For those who, like me, may be concerned about the need to consider the probabilistic behaviour of counterfactual variables, on the one hand, or deterministic relationships encoded graphically, on the other, this provides an observable-focused, fully stochastic, alternative. A full presentation of the essential ideas can be found in Chapters 9 (Confounding and Sufficient Covariates) and 10 (Reduction of Sufficient Covariate) of my online document "Principles of Statistical Causality".

Like Pearl, I like to think of "causal inference" as the task of inferring what would happen under a hypothetical intervention, say F_E = e, that sets the value of the exposure E at e, when the data available are collected, not under the target "interventional regime", but under some different "observational regime". We could code this regime as F_E = idle. We can think of the non-stochastic variable F_E as a parameter, indexing the joint distribution of all the variables in the problem, under the regime indicated by its value.

Aleks was nice enough to pass this on to us.

Stuart Buck writes:

You posted about this once on your blog, i.e., how many times observational studies have been refuted by clinical trials. Check out the following, especially Table 3.

This Thursday at 7pm Jake Hofman and Suresh Velagapundi will present a session at New York R Statistical Programming Meetup at NYU - Silver Center (100 Washington Square East, Room 401). Here's the outline:

Background:
  • Conditional probability & Bayes' Rule
  • Treating parameters as random variables & putting distributions on them
  • Bayesian inference: from priors & likelihoods to posteriors
From Principles to Practice:
  • Simple plan; difficult to execute (normalization)
  • Resort to approximation methods (variational & MCMC)
  • Model selection / complexity control a la Bayes

Greg Mankiw links to an article that illustrates the challenges of interpreting raw numbers causally. This would really be a great example for your introductory statistics or economics classes, because the article, by Robert Book, starts off by identifying a statistical error and then goes on to make a nearly identical error of its own! Fun stuff.

This is a pretty long one. It's an attempt to explore some of the differences between Judea Pearl's and Don Rubin's approaches to causal inference, and is motivated by recent article by Pearl.

Pearl sent me a link to this piece of his, writing:

I [Pearl] would like to encourage a blog-discussion on the main points raised there. For example:

Whether graphical methods are in some way "less principled" than other methods of analysis.

Whether confounding bias can only decrease by conditioning on a new covariate.

Whether the M-bias, when it occurs, is merely a mathematical curiosity, unworthy of researchers attention.

Whether Bayesianism instructs us to condition on all available measurements.

I've never been able to understand Pearl's notation: notions such as a "collider of an M-structure" remain completely opaque to me. I'm not saying this out of pride--I expect I'd be a better statistician if I understood these concepts--but rather to give a sense of where I'm coming from. I was a student of Rubin and have used his causal ideas for awhile, starting with this article from 1990 on estimating the incumbency advantage in politics. I'm pleased to see these ideas gaining wider acceptance. In many areas (including studying incumbency, in fact), I think the most helpful feature of Rubin's potential-outcome framework is to get you, as a researcher, to think hard about what you are in fact trying to estimate. In much of the current discussion of identification strategies, regression discontinuities, differences in differences, and the like, I think there's too much focus on technique and not enough thought put into what the estimates are really telling you. That said, it makes sense that other theoretical perspectives such as Pearl's could be useful too.

To return to the article at hand: Pearl is clearly frustrated by what he views as Rubin's bobbing and weaving to avoid a direct settlement of their technical dispute. From the other direction, I think Rubin is puzzled by Pearl's approach and is not clear what the point of it all is.

I can't resolve the disagreements here, but maybe I can clarify some technical issues.

Controlling for pre-treatment and post-treatment variables

Much of Pearl's discussion turns upon notions of "bias," which in a Bayesian context is tricky to define. We certainly aren't talking about the classical-statistical "unbiasedness," in which E(theta.hat | theta) = theta for all theta, an idea that breaks down horribly in all sorts of situations (see page 248 of Bayesian Data Analysis). Statisticians are always trying to tell people, Don't do this, Don't do that, but the rules for saying this can be elusive. This is not just a problem for Pearl: my own work with Rubin suffers from similar problems. In chapter 7 of Bayesian Data Analysis (a chapter that is pretty much my translation of Rubin's ideas), we talk about how you can't do this and you can't do that. We avoid the term "bias," but then it can be a bit unclear what our principles are. For example, we recommend that your model should, if possible, include all variables that affect the treatment assignment. This is good advice, but really we could go further and just recommend that an appropriate analysis should include all variables that are potentially relevant, to avoid omitted-variable bias (or the Bayesian equivalent). Once you've considered a variable, it's hard to go back to the state of innocence in which that information was never present.

If I'm reading his article correctly, Pearl is making two statistical points, both in opposition to Rubin's principle that a Bayesian analysis (and, by implication, any statistical analysis) should condition on all available information:

1. When it comes to causal inference, Rubin says not to control for post-treatment variables (that is, intermediate outcomes), which seems to contradict Rubin's more general advice as a Bayesian to condition on everything.

2. Rubin (and his collaborators such as Paul Rosenbaum) state unequivocally that a model should control for all pre-treatment variables, even though including such variables, in Pearl's words, "may create spurious associations between
treatment and outcome and this, in turns, may increase or decrease confounding bias."

Let me discuss each of these criticisms, as best as I can understand them. Regarding the first point, a Bayesian analysis can control for intermediate outcomes--that's ok--but then the causal effect of interest won't be summarized by a single parameter--a "beta"--from the model. In our book, Jennifer and I recommend not controlling for intermediate outcomes, and a few years ago I heard Don Rubin make a similar point in a public lecture (giving an example where the great R. A. Fisher made this mistake). Strictly speaking, though, you can control for anything; you just then should suitably postprocess your inferences to get back to your causal inferences of interest.

I don't fully understand Pearl's second critique, in which he says that it's not always a good idea to control for pre-treatment variables. My best reconstruction is that Pearl's thinking about a setting where you could estimate a causal effect in a messy observational setting in which there are some important unobserved confounders, and it could well happen that controlling for a particular pre-treatment variable happens to make the confounding worse. The idea, I think, is that if you have an analysis where various problems cancel each other out, then fixing one of these problems (by controlling for one potential counfounder) could result in a net loss. I can believe this could happen in practice, but I'm wary of setting this up as a principle. I'd rather control for all the pre-treatment predictors that I can, and then make adjustments if necessary to attempt to account for remaining problems in the model. Perhaps Pearl's position and mine are not so far apart, however, if his approach of not controlling for a covariate could be seen as an approximation to a fuller model that controls for it while also adjusting for other, unobserved, confounders.

The sum of unidentifiable components can be identifiable

At other points, Pearl seems to be displaying a misunderstanding of Bayesian inference (at least, as I see it). For example, he writes:

For example, if we merely wish to predict whether a given person is a smoker, and we have data on the smoking behavior of seat-belt users and non-users, we should condition our prior probability P(smoking) on whether that person is a "seat-belt user" or not. Likewise, if we wish to predict the causal effect of smoking for a person known to use seat-belts, and we have separate data on how smoking affects seat-belt users and non-users, we should use the former in our prediction. . . . However, if our interest lies in the average causal effect over the entire population, then there is nothing in Bayesianism that compels us to do the analysis in each subpopulation separately and then average the results. The class-specific analysis may actually fail if the causal effect in each class is not identifiable.

I think this discussion misses the point in two ways.

First, at the technical level, yes you definitely can estimate the treatment effect in two separate groups and then average. Pearl is worried that the two separate estimates might bot be identifiable--in Bayesian terms, that they will individually have large posterior uncertainties. But, if the study really is being done in a setting where the average treatment effect is identifiable, then the uncertainties in the two separate groups should cancel out when they're being combined to get the average treatment effect. If the uncertainties don't cancel, it sounds to me like there must be some additional ("prior") information that you need to add.

The second way that I disagree with Pearl's example is that I don't think it makes sense to estimate the smoking behavior separately for seat-belt users and non-users. This just seems like a weird thing to be doing. I guess I'd have to see more about the example to understand why someone would do this. I have a lot of confidence in Rubin, so if he actually did this, I expect he had a good reason. But I'd have to see the example first.

Final thoughts

Hal Stern once told me the real division in statistics was not between the Bayesians and non-Bayesians, but between the modelers and the non-modelers. The distinction isn't completely clear--for example, where does the "Bell Labs school" of Cleveland, Hastie, Tibshirani, etc. fall?--but I like the idea of sharing a category as all the modelers over the years--even those who have not felt the need to use Bayesian methods.

Reading Pearl's article, however, reminded me of another distinction, this time between discrete models and continuous models. I have a taste for continuity and always like setting up my model with smooth parameters. I'm just about never interested in testing whether a parameter equals zero; instead, I'd rather infer about the parameter in a continuous space. To me, this makes particular sense in the sorts of social and environmental statistics problems where I work. For example, is there an interaction between income, religion, and state of residence in predicting one's attitude toward school vouchers? Yes. I knew this ahead of time. Nothing is zero, everything matters to some extent. As discussed in chapter 6 of Bayesian Data Analysis, I prefer continuous model expansion to discrete model averaging.

In contrast, Pearl, like many other Bayesians I've encountered, seems to prefer discrete models and procedures for finding conditional independence. In some settings, this can't matter much: if a source of variation is small, then maybe not much is lost by setting it to zero. But it changes one's focus, pointing Pearl toward goals such as "eliminating bias" and "covariate selection" rather than toward the goals of modeling the relations between variables. I think graphical models are a great idea, but given my own preferences toward continuity, I'm not a fan of the sorts of analyses that attempt to discover whether variables X and Y really have a link between them in the graph. My feeling is, if X and Y might have a link, then they do have a link. The link might be weak, and I'd be happy to use Bayesian multilevel modeling to estimate the strength of the link, partially pool it toward zero, and all the rest--but I don't get much out of statistical procedures that seek to estimate whether the link is there or not.

Finally, I'd like to steal something I wrote a couple years ago regarding disputes over statistical methodology:

Different statistical methods can be used successfully in applications--there are many roads to Rome--and so it is natural for anyone (myself included) to believe that our methods are particularly good for applications. For example, Adrian Raftery does excellent applied work using discrete model averaging, whereas I don't feel comfortable with that approach. Brad Efron has used bootstrapping to help astronomers solve their statistical problems. Etc etc. I don't think that Adrian's methods are particularly appropriate to sociology, or Brad's to astronomy--these are just powerful methods that can work in a variety of fields. Given that we each have successes, it's unsurprising that we can each feel strongly in the superiority of our own approaches. And I certainly don't feel that the approaches in Bayesian Data Analysis are the end of the story. In particular, nonparametric methods such as those of David Dunson, Ed George, and others seem to have a lot of advantages.

Similarly, Pearl has achieved a lot of success and so it would be silly for me to argue, or even to think, that he's doing everything all wrong. I think this expresses some of Pearl's frustration as well: Rubin's ideas have clearly been successful in applied work, so it would be awkward to argue that Rubin is actually doing the wrong thing in the problems he's worked on. It's more that any theoretical system has holes, and the expert practitioners in any system know how to work around these holes.

P.S. More here (and follow the links for still more).

This note by Steve Hsu on the history of the Wranglers (winners of a mathematics competition held each year from 1753-1909 at Cambridge University) reminded me of my experience in the U.S. math olympiad training program in high school. At the time, it seemed clear that we were clearly ordered by ability (with my position somewhere between 15th and 20th out of 24!). In retrospect, I think there are a lot of tricks to solving and writing up solutions to "Olympiad problems," and I didn't know a lot of these tricks.

It was the usual paradox of measurement: I was confusing reliability with validity, as they say in the psychometric literature.

Daljit Dhadwal writes:

On the Ask Metafilter site, someone asked the following:

How does statistical analysis differ when analyzing the entire population rather than a sample? I need to do some statistical analysis on legal cases. I happen to have the entire population rather than a sample. I'm basically interested in the relationship between case outcomes and certain features (e.g., time, the appearance of certain words or phrases in the opinion, the presence or absence of certain issues). Should I do anything different than I would if I were using a sample? For example, is a p-value meaningful in this kind of case?

My reply:

This is a question that comes up a lot. For example, what if you're running a regression on the 50 states. These aren't a sample from a larger number of states; they're the whole population.

To get back to the question at hand, it might be that you're thinking of these cases as a sample from a larger population that includes future cases as well. Or, to put it another way, maybe you're interested in making predictions about future cases, in which case the relevant uncertainty comes from the year-to-year variation. That's what we did when estimating the seats-votes curve: we set up a hierarchical model with year-to-year variation estimated from a separate analysis. (Original model is here, later version is here.)

So, one way of framing the problem is to think of your "entire population" as a sample from a larger population, potentially including future cases. Another frame is to think of there being an underlying probability model. If you're trying to understand the factors that predict case outcomes, then the implicit full model includes unobserved factors (related to the notorious "error term") that contribute to the outcome. If you set up a model including a probability distribution for these unobserved outcomes, standard errors will emerge.

Statistics on hiring statisticians

After finding the Howard Wainer interview, I looked up the entire series of Profiles in Research published by the Journal of Educational and Behavioral Statistics. I don't have much to say about most of these interviews: some of these people I'd never heard of, and I don't really have much research overlap with the others. Probably I have the most overlap with R. D. Bock, who's done a lot of work on multilevel modeling, but, for whatever reason, his stories didn't grab my interest.

But I was curious about the interview with Arthur Jensen. I've never met him--he gave a talk at the Berkeley statistics department once when I was there, but for some reason I wasn't able to attend the talk. But I've heard of him. As the interviewers (Daniel Robinson and Howard Wainer) state:

More on the median voter

A correspondent read my recent note on the limited influence of the median voter and writes:

My understanding of median voter theorem is that each election has its own median voter, and that the median voter's influence is limited to the outcome of that election only. I don't understand, then, why the graph in your post is evidence that the median voter has little influence. It seems to me that there are two elections being considered in that graph, with two different median voters. The graph appears to consider "moderation" to be having a moderate voting record in Congress, but it seems to me that the median voter in Congress is likely quite different from the median voter in any particular Congressional district. The power of the median voter in Congress, it seems to me, is to affect the outcome of Congressional votes, not to improve his own chances for re-election, which are determined by his proximity to the median voter in his district. Thus, I'm not sure why we would expect moderation, as measured by the median Congressional voter, to translate into electoral success, which we would expect to be determined by the median district voter.

My reply:

Should Mark Sanford resign?

At our sister blog, Tom Schaller says no:

Is Sanford a cad for bolting his family on Father's Day weekend? Of course, but that is a private, moral failing, rather than a failure of public duty. . . .

I [Schaller] oppose most of what Mr. Sanford stands for politically. His showy rejection of federal stimulus money targeted for his state was a crass publicity stunt designed to garner national attention for Mr. Sanford at the expense of his constituents, many of whom are struggling economically. . . . Should Mr. Sanford's ambitions founder on the shoals of a personal scandal, however, yet another opportunity will be lost to establish the long-overdue separation between private comportment and public service. So here's hoping he doesn't resign or, if he does, it is a matter of personal choice rather than him bowing to political pressure.

I see where Schaller is coming from. Lots of people have complicated personal lives, and it's not clear at all that these difficulties have much if anything to do with governing. But I don't know if I agree with him on the wall of separation between private comportment and public service.

Consider the Sanford case. Schaller's a Democrat, so he can evaluate Sanford on his policies. But if Schaller were a Republican, he might very well want Sanford out of there because he tarnishes the brand, makes the party a laughingstock, etc. Also makes it harder for Sanford to convincingly follow a "family values" agenda which Schaller (if he were a Republican) might want. These are legitimate concerns for a Republican to have. Even if you don't think Sanford's personal indiscretions are important, you might want him gone and replaced by a more effective Republican. Just as, from the other direction, a Democrat would've preferred a zipped-fly version of Bill Clinton.

Visualizing correlations circularly

Some time ago FlowingData had an article on visualizing tables - which really is about visualizing spreadsheets in terms of correlations between columns. While Circos generates very colorful displays:

circos.png

Today I was impressed by a much cleaner and Tuftier variant on the theme by Mike Bostock, called Dependency Tree:

dependency-tree.png

Click on the link, it's interactive. Jeff Heer and Bostock also have a new JavaScript visualization toolkit out ProtoVis, which simplifies the creation of such stuff. The computer scientist in me finds this development very cool. But I still like my correlation matrices.

"A paved United States in our day"

Sometimes you hear discussion of how the red states get more from the government than they pay in taxes while the blue states get less and pay more. This is slightly misleading because the blue states are richer and rich people pay a higher rate of income tax, but it does raise the interesting question of the regionally distributive effects of national taxing and spending poliicies.

minimap.jpg

For some perspective on where this is coming from: In our office is a map from 1924 titled "Good Roads Everywhere" that shows a proposed system of highways spanning the country, "to be built and forever maintained by the United States Government." The map, made by the National Highways Association, also includes the following explanation for the proposed funding system: "Such a system of National Highways will be paid for out of general taxation. The 9 rich densely populated northeastern States will pay over 50 per cent of the cost. They can afford to, as they will gain the most. Over 40 per cent will be paid for by the great wealthy cities of the Nation. . . . The farming regions of the West, Mississippi Valley, Southwest and South will pay less than 10 per cent of the cost and get 90 per cent of the mileage." Beyond its quaint slogans ("A paved United States in our day") and ideas that time has passed by ("Highway airports"), the map gives a sense of the potential for federal taxing and spending to transfer money between states and regions.

P.S. Yes, I posted this last year, but without the pretty map image (click on it for higher resolution, which unfortunately still isn't quite good enough to make out the text)..

The Howard Wainer story.

On of the fun parts is this story from his days as an assistant professor:

Casey Mulligan is consistent

Back in April, in an article about partisan perceptions of the economy, John Sides and I wrote:

A scary thought

A colleague and I were talking the other day about how much we pay our research assistants. It turns out that she pays much more. In fact, sometimes I don't get around to paying my research assistants at all, but she pays hers a decent amount.

My colleague, who's an untentured professor, said that was understandable because she makes less money than I do, so she can better relate to the students' lifestyles. That's a pretty scary thought--it should really go the other way, right? I get paid more so I should be able to afford to be more generous. But maybe she's right; if so, it's a sobering insight.

One major impediment, scientists agree, is the grant system itself. It has become a sort of jobs program, a way to keep research laboratories going year after year . . .

I was on an NIH panel a couple of years ago with about 25 other scientists, reviewing something like 90 grants. It was pointless. 25 people is just too many to make a decision. What happened was that there were 3 or 4 people who were experienced in the process, who ended up guiding the entire discussion.

The highlight--or, I should say, lowlight--was when we were reviewing a proposal involving the study of the carcinogenic effects of hookah (water pipe) smoking. I asked if this was really such a big deal, and one of the panel members told me that smoking tobacco through a hookah is something like 10 times worse than smoking a cigarette. If so, the public health consequences could be pretty serious, even if not so many people did it. I said this sounded like a reasonable point to me. Then this guy across the table from me spoke up and said that he knew somebody who was 80 years old, had been smoking with a hookah all his life and was none the worse from it. At this point, I blew up. I couldn't believe that the "my elderly aunt smokes and she didn't get cancer" argument could be brought up at an NIH panel!

Statistical Tests and Election Fraud

My final thoughts on those Iran vote analyses:

Our article (by Yu-Sung, Jennifer, Masanao, and myself, and based also on work with Kobi, Grazia, and Peter Messeri) will be appearing in the Journal of Statistical Software, in a special issue on missing-data imputation. Here's the abstract:

Our mi package in R has several features that allow the user to get inside the imputation process and evaluate the reasonableness of the resulting models and imputations. These features include: flexible choice of predictors, models, and transformations for chained imputation models; binned residual plots for checking the fit of the conditional distributions used for imputation; and plots for comparing the distributions of observed and imputed data in one and two dimensions. In addition, we use Bayesian models and weakly informative prior distributions to construct more stable estimates of imputation models. Our goal is to have a demonstration package that (a) avoids many of the practical problems that arise with existing multivariate imputation programs, and (b) demonstrates state-of-the-art diagnostics that can be applied more generally and can be incorporated into the software of others.

We've made lots of improvements since listing the package last year (here). There's still a lot more work to do, in many different directions (including multilevel models, nonignorable models, the self-cleaning oven, and making the program run faster in sorts of ways), and we keep improving it. But it's good to have something out there.

To actually get the R package, just open your R window, click on Packages, Install packages, and grab mi.

Pinchas Lev writes:

Sometimes people think it's a disaster when you have more predictors than data points, but I always point out that, no, it's better to have 9 predictors than just 1 or 2. After all, if you really wanted just 1 or 2, you could just throw out most of your data!

Nate's chart is excellent, especially the ordering of the candidates in order of the percent favoring resignation:

sanford2.PNG

I also like the gratuitious exclamation marks which add fun value without actually making the graph any harder to read. The key reason this works is that Nate wisely did not fill in the blank squares with "No!"s.

My only comments are:

Andrew Knight points me to this Kafkaesque report on Bayesian methods and evidence-based medicine. It's always good to see things like this out there,

My main disagreement with the report is on their framework in which there is a fixed data model and different choices of prior distribution. As we discuss in Section 2.8 of Bayesian Data Analysis, I much prefer the framework in which a single prior distribution (or "population distribution") is applied to many different data settings. I think that framing it my way makes the benefits of Bayesian inference much clearer.

I also don't like all the tables. But that's not really a Bayesian issue.

The roach-bombing puzzle

I've been assured, and I believe, that the effective way to get rid of the roaches in your apartment is to clean the place, put poison in the cracks, and then seal them. Some people do that. But a lot of people go for the "bombing" approach: the exterminator comes to the building once a month, drops the bomb, leaves, and comes back the next month.

My question is: what are these people thinking?? Why do these people willingly get bombed once a month instead of following the simpler and effective approach? Part of this is ignorance, surely, but I think there's more to it than that, some underlying psychological appeal. I don't think it's just ignorance because, when I talk with people who get bombed and discuss the "clean, poison, and seal" approach, I've found them to be very resistant and (I would say) "defensive." They seem to want to believe that bombing is effective and really don't want to hear about alternative strategies.

What's going on? I have some theories. Maybe bombing seems like less effort than cleaning the food out of your closet and sealing the cracks. Also it seems sort of decisive. On the other hand, shouldn't people pause a little when they think about needing the exterminator every month? Yet, that doesn't seem to bother people. Conceptually, getting the exterminator to bomb your apartment feels to me a bit like "taking a pill." Maybe there's some technological appeal. Sort of like the way that photovoltaics are sexy in a way that passive solar isn't.

I don't know. I'll have to ask some psychologists of my acquaintance who work on environmental decision making.

Some election audits in California

Hall, J.L., L.W. Miratrix, P.B. Stark, M. Briones, E. Ginnold, F. Oakley, M. Peaden, G. Pellerin, T. Stanionis and T. Webber, 2009. Implementing Risk-Limiting Audits in California, USENIX EVT/WOTE, In press.

Related discussion here.

Donna Harrington writes:

I will be teaching a new multilevel models course in the fall and am currently reading your text, /Data Analysis Using Regression and Multilevel/Hierarchical Models/ as I prepare. I am enjoying the book and am considering adopting it for use in the course.

Would you be willing to share the syllabus you have used for your Applied Regression and Multilevel Models course? I am particularly interested in seeing how much of the book you use in a one semester course.

My reply:

I have to admit that, over the years, I've made my syllabuses less and less detailed as I've focused more and more on writing the books. For a multilevel modeling course, I suggested the following:

- chapters 3,4,5: linear and logistic regression
- chapter 7: basics of simulation
- chapter 9: basics of causal inference
- chapters 11-14: multilevel linear and logistic regression (up to and including varying-intercept, varying-slope models)
- chapter 18: all the theory that they'll need.

For a one-semester introductory course, my usual strategy for a one-semester course is to focus chapters 2-10: that is, cover everything except multilevel modeling. Linear regression, logistic, glm, computation, and causal inference. Then for the last part of the course, I can choose among some options, including: intro to multilevel models, sample size and power calculations, and missing data imputation.

P.S. To those of you who haven't had the opportunity to take a course from me: Don't worry about it. I'm better at writing than teaching. Maybe you're better off learning out of one of my books with somebody else actually teaching the class.

A political scientist writes:

Here's a question that occurred to me that others may also have. I imagine "Mister P" will become a popular technique to circumvent sample size limitations and create state-level data for various public opinion variables. Just wondering: are there any reasons why one wouldn't want to use such estimates as a state-level outcome variable? In particular, does the dependence between observations caused by borrowing strength in the multilevel model violate the independence assumptions of standard statistical models? Lax and Phillips use "Mister P" state-level estimates as a predictor, but I'm not sure if someone has used them as an outcome or whether it would be appropriate to do so
.

First off, I love that the email to me was headed, "mister p question." And I know Jeff will appreciate that too. We had many discussions about what to call the method.

To get back to the question at hand: yes, I think it should be ok to use estimates from Mister P as predictor or outcome variables in a subsequent analysis. In either case, it could be viewed as an approximation to a full model that incorporates your regression of interest, along with the Mr. P adjustments.

I imagine, though, that there are settings where you could get the wrong answer by using the Mr. P estimates as predictors or as outcomes. One way I could imagine things going wrong is through varying sample sizes. Estimates will get pooled more in the states with fewer respondents, and I could see this causing a problem. For a simple example, imagine a setting with a weak signal, lots of noise, and no state-level predictors. Then you'd "discover" that small states are all near the average, and large states are more variable.

Another way a problem could arise, perhaps, is if you have a state-level predictor that is not statistically significant but still induces a correlation. With the partial pooling, you'll see a stronger relation with the predictor in the Mr. P estimates than in the raw data, and if you pipe this through to a regression analysis, I could imagine you could see statistical significance when it's not really there.

I think there's an article to be written on this.

Robin Hanson writes,

In academia, one often finds folks who are much more (or less) smart and insightful than their colleagues, where most who know them agree with this assessment. Since academia is primarily an institution for credentialling folks as intellectually impressive, so that others can affiliate with them, one might wonder how such mis-rankings can persist.

I added the bold font myself for emphasis. Granted, Robin is far from a typical economist. Nonetheless, that he would write such an extreme statement without even feeling the need to justify it (and, no, I don't think it's true, at least not in the "academia" that I know about) . . . that I see as a product of being in an economics department.

P.S. Robin definitely is correct about the "more (or less) smart and insightful" bit. But here I think there are two things going on. First, in any group of people you'll see some variation, especially given that there are other factors going on than "smart and insightful" when it comes to selecting people in an academic environment. Second, there's more to life--even to academic life--than being smart and insightful. Even setting aside teaching, advising, administration, etc., some other crucial qualities for academic research include working hard, having the "taste" to work on important problems, intellectual honesty, and caring enough about getting the right answer. I know some very smart and insightful people who have not made the contributions that they are capable of, because (I think) of gaps in some of these other important traits.

My former Berkeley colleague Phil Stark has written a series of articles on election auditing which might be of interest to some of you. Here they are:

Stark, P.B., 2009. Auditing a collection of races simultaneously. Working draft.

Miratrix, L.W., and Stark, P.B., 2009. Election Audits using aTrinomial Bound. Submitted to IEEE Transactions on Information Forensics and Security: Special Issue on Electronic Voting.

Stark, P.B., 2009. Risk-limiting post-election audits: P-values from common probability inequalities. Submitted to IEEE Transactions on Information Forensics and Security: Special Issue on Electronic Voting.

Stark, P.B., 2009. CAST: Canvass Audits by Sampling and Testing. Submitted to IEEE Transactions on Information Forensics and Security: Special Issue on Electronic Voting.

Stark, P.B., 2008. A Sharper Discrepancy Measure for Post-Election Audits. Annals of Applied Statistics, 2, 982-985.

Stark, P.B., 2008. Conservative Statistical Post Election Audits. Annals of Applied Statistics, 2, 550-581.

Phil has an interesting background: he got into statistics after working on inverse problems in geology. The methods he uses are based on exact error bounds, really much different than the Bayesian stuff I do, much more focused on getting conservative p-values and the like. As a result, the things he does in his papers are nothing at all like what I would do in these problems.

In a larger sense, though, I believe in methodological pluralism, and I'm glad to see a researcher such as Phil, who's working from such a different statistical framework as mine, work on these problems.

Update here and data here. I haven't looked at this in detail, but Walter Mebane is the expert on this stuff so I'm inclined to believe him. Even though he uses tables instead of graphs.

Again, just to emphasize: this sort of statistical analysis doesn't prove anything by itself, but it can be useful in giving people a sense of where to focus attention if they want to look further.

How will support for same-sex marriage change over time? One way to speculate is to break down current support across age groups, and that's what Justin and I have done, building off of our forthcoming paper.

We plot explicit support for allowing same-sex marriage broken down by state and by age. Seven states cross the 50% mark overall as of our current estimates, but the generation gap is huge. If policy were set by state-by-state majorities of those 65 or older, none would allow same-sex marriage. If policy were set by those under 30, only 12 states would not allow-same-sex marriage.

marriagebyage.png

Of Beauty, Sex, and Power

Our article has appeared in The American Scientist. (Here's a link to the full article; hit control-plus to make the font more readable.) I highly recommend it for your introductory (or advanced) statistics classes. We start with a silly story of a flawed statistical analysis of sex ratios that managed to sneak into a serious scientific journal, then discuss general issues of how to interpret inconclusive statistical findings (including a brief analysis of data from People Magazine's 50 Most Beautiful People lists), and then loop back and discuss the statistical reasons that exaggerated claims can get amplified by the news media.

20096592237373-2009-07GelmanF4.jpg

The article begins as follows:

Alex Scacco and Bernd Beber follow up on their analysis of the Iran election data:

After we wrote our op-ed using the province-level data, we've now also done some preliminary tests with the county-level data. In the latter dataset, the last digits don't appear fraudulent. Why might we find suspicious last digits at the province level, while, at the same time, Walter Mebane and Boudewijn Roukema find evidence that first and second digits are fishy at the county level?

We can only speculate about what happened behind closed doors, but here is a scenario of top-down fraud that is consistent with the patterns found in the quantitative analyses mentioned above:

Greg Mankiw writes:

The next time you hear someone cavalierly point to international comparisons in life expectancy as evidence against the U.S. healthcare system, you should be ready to explain how schlocky that argument really is.

He points to the following claim by Gary Becker:

National differences in life expectancies are a highly imperfect indicator of the effectiveness of health delivery systems.for example, life styles are important contributors to health, and the US fares poorly on many life style indicators, such as incidence of overweight and obese men, women, and teenagers. To get around such problems, some analysts compare not life expectancies but survival rates from different diseases. The US health system tends to look pretty good on these comparisons.

Becker cites a study that finds that the U.S. does better than Europe in cancer survival rates and in the availability of hip and knee replacements and cataract surgery.

It makes a lot of sense to think of health as multidimensional, so that some countries can do better in life expectancy while others do better in hip replacements and cancer survival.

But I disagree with Mankiw's claim that it's "schlocky" to compare life expectancy. If the U.S. really is spending lots more per person on health care and really getting less in life expectancy compared to other countries . . . that seems like relevant information.

The Devil is in the Digits

Bernd Beber and Alex Scacco present another quantitative analysis of the Iranian election data, this time looking at last digits. They write:

[Suspicions of fraud] have led experts to speculate that the election results released by Iran's Ministry of the Interior had been altered behind closed doors. But we don't have to rely on suggestive evidence alone. We can use statistics more systematically to show that this is likely what happened. Here's how.

We'll concentrate on vote counts -- the number of votes received by different candidates in different provinces -- and in particular the last and second-to-last digits of these numbers. For example, if a candidate received 14,579 votes in a province (Mr. Karroubi's actual vote count in Isfahan), we'll focus on digits 7 and 9.

I want to explore the distinction between self-experimentation and formal experimentation in the context of a recent discussion on Seth's blog.

The story begins with two people who found, via self-experimentation, how to make their acne go away:

A student . . . had gone on a camping trip and found that her acne went away. At first she thought it was the sunshine; but then, by self-experimentation, she discovered that the crucial change was that she had stopped using soap to wash her face.
A friend of Seth writes: "I started "washing" my face with water about a month ago, and [now] my face is acne free and soft as a pair of brand new UGG boots. [He had had acne for years.]"

In the comments section, someone writes:

While it would be nice to think that all we have to do to get rid of acne is stop using those expensive cleanser and just use water - this is just anecdotal evidence you present. It would require a large clinical trial to be conclusive.

Seth replies that informal experimentation is cheaper and faster than more formal clinical trials. Also, different things might work for different people, so whether or not a treatment has been evaluated a large study, it might make sense to test it yourself--especially for something such as acne or weight loss that is not an urgent concern.

This got me thinking . . . what are the benefits (if any) of a formal controlled trial? In statistics, we usually frame these benefits by comparing to observational studies. The big risk in an observational study is that the treatment and control groups will differ in important ways (as in the famous hormone replacement therapy story). Is this worth the cost? Maybe. Sometimes.

A related issue is bias, a word which I am using in the conversational rather than the statistical sense. For example, how would you want to evaluate the risks and effectiveness of a new drug that was developed by a pharmaceutical company at the cost of millions of dollars? I'd be suspicious of an observational study: even if conducted by professionals, there just seem to be too many ways for things to be biased.

In Seth's acne example, there is no financial source of bias. And, as Seth points out, the test is free to apply on yourself. If I had a kid with acne, I'd give it a try and do an experiment--which means trying the soap and no-soap conditions on different days (or different weeks, or months) and measuring and recording acne levels. One thing I've gathered from Seth's work is that there are big benefits to be gained by doing self-experimentation with careful measurement and record keeping, rather than simply trying different things and trying to remember what works.

On the other hand, yeah, I'm skeptical about Seth's acne claims, and I think a larger study would be more likely to convince me. But I don't think it would have to be expensive. All Seth (or somebody) needs is to set up a protocol for deciding when to wash with soap or water and a protocol for measuring acne, then he could get a bunch of volunteers to flip coins and try it. This blog has a few thousand readers, and Seth's diet forum has thousands of participants, so it shouldn't be so hard to find people to do this. I'm not so interested in acne myself, but according to Seth (and others, I assume), "acne really matters," so maybe it's worth giving this a try.

This is not meant as a put-down of Roth; I think Marquand is great.

P.S. Is this what a Twitter post is like? Basically, I'm too lazy to back up my statement here with evidence. But I think that any of you out there who've read both Roth and Marquand will agree upon a moment's reflection that I'm right.

P.P.S. When I'm talking about Marquand, I exclude the way overrated The Late George Apley. I'm talking about real Marquand books like Point of No Return, Wickford Point, etc.

The Tourist

Good airplane reading. Lived up to the reviews.

The American Statistical Association awarded the 2009 Excellence in Statistical Reporting Award to Sharon Begley of Newsweek. From the official announcement:

The above remark, which came in the midst of my discussion of an analysis of Iranian voting data, illustrates a gap--nay, a gulf--in understanding between statisticians and (many) nonstatisticians, one of whom commented :that my quote "makes it sound that [I] have not a shred of a clue what a p-value is."

Perhaps it's worth a few sentences of explanation.

Benford's law is an amusing mathematical pattern in which the first digits of randomly sampled numbers tend to have a distribution in which 1 is the most common first digit, followed by 2, then 3, and so forth. It's the distribution of digits that arises from numbers that are sampled uniformly on a logarithmic scale.

In our Teaching Statistics book, Deb and I describe a classroom demonstration where we show how Benford's law applies to street addresses sampled randomly from the telephone book. In a more serious vein, Walter Mebane has written about the application of Benford's law to vote counts.

In the past several days, a few people have asked me about applying these ideas to the recent Iranian election. Today, someone pointed me to an article by Boudewijn Roukema, which states:

The results of the 2009 Iranian presidential election presented by the Iranian Ministry of the Interior (MOI) are analysed based on Benford's Law and an empirical variant of Benford's Law. The null hypothesis that the vote count distributions satisfy these distributions is rejected at a significance of p < 0.007, based on the presence of 41 vote counts for candidate K that start with the digit 7, compared to an expected 21.2-22 occurrences expected for the null hypothesis. A less significant anomaly suggested by Benford's Law could be interpreted as an overestimate of candidate A's total vote count by several million votes. Possible signs of further anomalies are that the logarithmic vote count distributions of A, R, and K are positively skewed by 4.6, 5.8, and 2.5 standard errors in the skewness respectively, i.e. they are inconsistent with a log-normal distribution with p ` 4 × 10−6, 7 × 10−9, and 1.2 × 10−2 respectively. M's distribution is not significantly skewed.

I don't buy it. First off, the whole first-digit-of-7 thing seems irrelevant to me. Second, the sample size is huge, so a p-value of 0.007 isn't so impressive. After all, we wouldn't expect the model to really be true with actual votes. It's just a model! Finally, I don't see why we should be expecting distributions to be lognormal.

Maybe there's something I'm missing here, but that's my quick take. This is not to say that I think the election was fair, or rigged, or whatever--I have absolutely zero knowledge on that matter--just that I don't find this analysis convincing of anything. I will say, though, that Roukema deserves credit for presenting the analysis clearly.

P.S. In response to comments: let me emphasize that I'm not saying that I think nothing funny was going on in the election. As I wrote, I'm commenting on the statistics, I don't know the facts on the ground. To move my comments in a more constructive direction (I hope), let me pull out this useful comment from Roukema's article: "One possible method to test whether this is just an odd fluke would be
to check the validity of the vote counts for candidate K in the voting areas
where the official number of votes for K starts with the digit 7." Further investigation could be a good thing here.

I did not find Roukema's argument convincing; that does not mean that I consider it a bad thing that the article was written. The article is a first draft of an analysis; it might end up leading to nothing, or it might be unconvincing as it stands now but lead to some important breakthroughs. We can see what further analysis turns up. Again, my verdict is not a Yes or a No, it's an "I'm not convinced."

The defining values

From Flat Earth News:

You could argue that every profession has its defining value. For carpenters, it might be accuracy: a carpenter who isn't accurate shouldn't be a carpenter. For diplomats, it might be loyalty: they can lie and spy and cheat and pull all sorts of dirty tricks, and as long as they are loyal to their government, they are doing their job. For journalists, the defining value is honesty--the attempt to tell the truth. That is our primary purpose. All that we do--all that is said about us--must flow from the single source of truth-telling.

What is the defining value of statisticians?

P.S. My favorite of the responses below is Mike Anderson's:

Separate the signal from the noise, then look at the noise for more signals.

I like this because (a) it acknowledges the presence of "noise" (that is, variation) but (b) recognizes that the "signal" is what's most important.

Mandelbrot on taxonomy

Taxonomies are fractal with, at any node, some number of branches (typically one or two major branches and several minor ones). Here's a fascinating article by Benoit Mandelbrot from 1955 on models of taxonomic structures. Great stuff. The article was published in Information Theory--3rd London Symposium, ed. Colin Cherry, and is hard to find online. At least it was until now.

mandelbrot2.png

As part of our Red State, Blue State research, we developed statistical tools for estimating public opinion among subsets of the population. Recently Yu-Sung Su, Yair Ghitza, and I applied these methods to see where school vouchers are more or less popular.

We started with the 2000 National Annenberg Election Survey, which had responses from about 50,000 randomly-sampled Americans to the question: "Give tax credits or vouchers to help parents send their children to private schools--should the federal government do this or not?" 45% of those who expressed an opinion on this question said yes, but the percentage varied a lot by state, income level, and religious/ethnic group; These maps show our estimates:

vouchermaps2000A.png

(Click on image to see larger version.)

Vouchers are most popular among high-income white Catholics and Evangelicals and low-income Hispanics. In general, among white groups, the higher the income, the more popular are school vouchers. But among nonwhites, it goes the other way, with vouchers being popular in the lower income categories but then becoming less popular among the middle class.

You can also see that support for vouchers roughly matches Republican voting, but not completely. Vouchers are popular in the heavily Catholic Northeast and California, less so in many of the mostly Protestant states in the Southeast. We also see a regional pattern among African Americans, where vouchers are most popular outside the South.

We checked our results by fitting the same model to the Annenberg survey from 2004, and, much to our relief, we found similar patterns:

The only thing that puzzles me about this article (sent to me by Chris Wiggins) is that at first it's presented as new: "The trend is buried deep in United States census data . . " A couple paragraphs down, the article explains that these patterns were published last year by Lena Edlund and Doug Almond (who presented the results in our quantitative political science seminar). In any case, it's an excellent news article and discusses the issues well. The only thing I'd like to see are some sample sizes, so that students who are given this article to read can compute the standard errors on their own.

Also, I have a couple problems with their graph. First, I'm not a fan of expressing sex ratios as #boys per 100 girls. To me, it's clearer just to give %girls (or %boys) as a straight number: 48.8% or whatever. Second, it's a mistake to make these as bar graphs starting at zero. Here, zero is not a reasonable baseline: it's not like you're really expecting to see zero girl births. I appreciate that they were trying to make a pretty graph, but in this case I'd go with a simple dot plot with +/- 1 standard error bars on the points. Or, better still, a line plot with time on the x-axis (one point for each decade) lines connecting the dots for each ethnic group, and also the vertical lines indicating standard errors.

Line plots are the best, and it's great when you can put time on the x-axis.

"A fondness for collecting a salary and getting away with as little intellectual intercourse as possible is endemic to the academic world." Not just the academic world, I think. Working is hard work. That's why they call it work. On the other hand, I'm doing this for free.

This issue reminds me of a discussion that's sometimes come up about a well-known listserv participant who is (a) very helpful, and (b) very rude. Or maybe I'm exaggerating a bit: this person is (a) often helpful, and (b) often rude. Anyway, I've always maintained that, rudeness aside, this person is altruistic, providing free statistical help to strangers. But it's true that answering listserv questions isn't intellectually taxing. Sort of like writing this blog, it's work-like without usually quite being work.

P.S. I think the point is best made by keeping the listserv and its well-known participant anonymous.

Fred Bookstein was at my talk in Seattle on voting power (the relevant articles are here and here) but didn't get a chance to ask a question, so he's asking it now:

Why is voting power considered a "good" in all those models? What is good about it? With what generally shared human desiderata, if any, is it associated?

As the saying goes, everybody wants to go to heaven but nobody wants to die. Or, to put in political terms, people want lower taxes and more government services--with the gap filled, presumably, with a mixture of borrowed funds and savings realized by cutting government waste. In their new book "Class War? What Americans Really Think about Economic Inequality," Benjamin Page and Lawrence Jacobs put together survey data and make a convincing case that this cynical story is not a fair summary of public opinion in the United States. Actually, most Americans--Democrats and Republicans alike--support government intervention in health care, education, and jobs, and are willing to pay more in taxes for these benefits.

Page and Jacobs recognize that Americans are confused on some of these issues, for example not realizing that sales taxes cost lower-income people more, as a percentage of their earnings, while the personal income tax hits higher-income groups more, on average. The result is widespread confusion about what are the most effective ways to pay for government spending. People are also confused about how to cut the budget. To choose a well-known example that is not in the book at hand, Americans overwhelmingly support reducing the share of the federal budget that goes to foreign aid, but they also vastly overestimate the current share of the budget that goes to this purpose (average estimate of 15%, compared to an actual value of 0.3%).

Confusions on specific tax and budget items aside, Page and Jacobs are persuasive that majority public opinion is consistent with tax increases targeted to specific government programs aimed at bringing a basic standard of living and economic opportunity to all Americans. They discuss how survey respondents generally feel that such an expansion of the role of government is consistent with generally expressed free-market attitudes, a philosophy which they call "conservative egalitarianism."

This is a book of public opinion, not policy, and the authors offer no judgment on whether the public's majority preference is achievable. For example, a vast majority of Americans--including 80% of Republicans--feel that "Government should spend whatever is necessary to ensure that all children have really good public schools they can go to" (p. 59), and another clear majority--this time including 60% of Republicans--agree with the statement that "The government in Washington ought to see to it that everyone who wants to work can find a job" (p. 62). It is an open question whether these goals are possible given the tax increases that voters are willing to accept.

Carl Klarner writes:

I'm currently doing work on state legislative elections that uses Democratic success as the dependent variable. I do these analyses with either the percent of the two-party vote for the Democrat as Y, or a dichotomous measure of a Democratic victory as Y.

Jeff Lax and Justin Phillips posted this summary of attitudes on a bunch of gay rights questions:

gay.png

They did it all using multilevel regression and poststratification. And a ton of effort.

P.S. My only criticisms of the above graph are:

(a) I'd just put labels at 20%, 30%, 40%, etc. I think the labels at 25, 35, etc., are overkill and make the numbers harder to read. And the tick marks should be smaller.

(b) The use of color and the legend on the upper left are well done. But they should place the items in the legend in the same order as the averages in the graphs. Thus, it should be same-sex marriage, then 2nd parent acdoption, then civil unions, then health benefits, and so forth.

As I noted a couple days ago, gay marriage has had the largest recent increases in popularity in liberal states where the general population was already pro-gay.

But if you count the number of same-sex couples, you see something different, with the fastest increases in conservative areas of the country. Gary Gates writes:

You discussed the issue of social networks and knowing gay people as a possible explanation. You might want to look at some of the work of Greg Herek (a psychologist at UC-Davis) who is now saying that "knowing" someone is becoming a much less salient predictor of support for gay rights. Since nearly everyone now knows a gay person, he claims that the issue today is more whether or not you have a closer personal relationship with a gay person.

Your findings were also intriguing to me when comparing them to some of the work I [Gates] have done on the enumeration of same-sex couples in the US Census and the American Community Survey. This paper looks at changes in the counts over time.

I [Gates] find the largest changes (which I interpret as increased visibility of same-sex couples) in the most conservative parts of the country.

I looked at Gates's report and it looks like good stuff. It would definitely be a good idea to reconcile his findings of the largest increases in conservative parts of the country, with Lax and Phillips's findings that public opinion on gay marriage has changed fastest in liberal states.

Bizarre bumper sticker

I saw this one today, can't figure it out:

"Don't take away my rights because you won't control your child"

What is this, the right to punch somebody else's kids?? I can't imagine somebody exercising that particular right very often before getting hurt.

It's a funny thing: we typically think of bumper sticker slogans as being simplistic, but in this case it appears to be the opposite: the compression of an idea into a short phrase has made it incomprehensible to outsiders such as myself. Or maybe that's the point. I wouldn't want to see the owner of this car near any kids, that's for sure.

David VandenBos writes:

I stumbled upon your blog a few weeks ago . . . However, a good amount of your technical articles go over my head because of my lack of statistics education/training/experience. Do you have any basic reading suggestions for learning applied statistics? My organization captures tons of info and safely tucks it away into databases, but I'm really interested in learning how to get it out and make use of it.

Does anybody have any suggestions? I like my book with Jennifer but maybe there's something more basic to start with? There's also this online book on statistical graphics by Rafe Donahue which is actually fun to read.

P.S. I don't think any of the usual intro stat books would be good here. I think they focus too much on conventional topics and not enough on applied statistics. Not really the fault of these books: they're designed for the undergraduate curriculum, not for practitioners.

Triple blind is only the beginning

Kieran points me to this.

Gay marriage: a tipping point?

Fancy statistical analysis can indeed lead to better understanding.

Jeff Lax and Justin Phillips used the method of multilevel regression and poststratification ("Mister P"; see here and here) to estimate attitudes toward gay rights in the states. They put together a dataset using national opinion polls from 1994 through 2009 and analyzed several different opinion questions on gay rights.

Policy on gay rights in the U.S. is mostly set at the state level, and Lax and Phillips's main substantive finding is that state policies are strongly responsive to public opinion. However, in some areas, policies are lagging behind opinion somewhat.

A fascinating trend

Here I'll focus on the coolest thing Lax and Phillips found, which is a graph of state-by-state trends in public support for gay marriage. In the past fifteen years, gay marriage has increased in popularity in all fifty states. No news there, but what was a surprise to me is where the largest changes have occurred. The popularity of gay marriage has increased fastest in the states where gay rights were already relatively popular in the 1990s.

In 1995, support for gay marriage exceeded 30% in only six states: New York, Rhode Island, Connecticut, Massachusetts, California, and Vermont. In these states, support for gay marriage has increased by an average of almost 20 percentage points. In contrast, support has increased by less than 10 percentage points in the six states that in 1995 were most anti-gay-marriage--Utah, Oklahoma, Alabama, Mississippi, Arkansas, and Idaho.

Here's the picture showing all 50 states:

lax6.png

I was stunned when I saw this picture. I generally expect to see uniform swing, or maybe even some "regression to the mean," with the lowest values increasing the most and the highest values declining, relative to the average. But that's not what's happening at all. What's going on?

Some possible explanations:

- A "tipping point": As gay rights become more accepted in a state, more gay people come out of the closet. And once straight people realize how many of their friends and relatives are gay, they're more likely to be supportive of gay rights. Recall that the average American knows something like 700 people. So if 5% of your friends and acquaintances are gay, that's 35 people you know--if they come out and let you know they're gay. Even accounting for variation in social networks--some people know 100 gay people, others may only know 10--there's the real potential for increased awareness leading to increased acceptance.

Conversely, in states where gay rights are highly unpopular, gay people will be slower to reveal themselves, and thus the knowing-and-accepting process will go slower.

- The role of politics: As gay rights become more popular in "blue states" such as New York, Massachusetts, California, etc., it becomes more in the interest of liberal politicians to push the issue (consider Governor David Paterson's recent efforts in New York). Conversely, in states where gay marriage is highly unpopular, it's in the interest of social conservatives to bring the issue to the forefront of public discussion. So the general public is likely to get the liberal spin on gay rights in liberal states and the conservative spin in conservative states. Perhaps this could help explain the divergence.

Where do we go next in studying this?

- We can look at other issues, not just on gay rights, to see where this sort of divergence occurs, and where we see the more expected uniform swing or regression-to-the-mean patterns.

- For the gay rights questions, we can break up the analysis by demographic factors--in particular, religion and age--to see where opinions are changing the fastest.

- To study the "tipping point" model, we could look at survey data on "Do you know any gay people?" and "How many gay people do you know?" over time and by state.

- To study the role of politics, we could gather data on the involvement of state politicians and political groups on gay issues.

I'm sure there are lots of other good ideas we haven't thought of.

P.S. More here.

Triple-blinding

Fred Bookstein writes:

Your blog comment about triple-blinding was a joke, but there IS a triple-blinding procedure in which the identity of the two groups is not revealed to the statistician on the project until the very end. At all times the data analyses proceed solely in reference to a comparison of some unspecified "group A" with a similarly unspecified "group B," and the identification of who were the intervened-upon and who were not is concealed from him or her until the computations are finished. (There are some other assumptions, e.g. absence of baseline differences, required for this to make sense; it applies mainly in contexts like randomized clinical trials.) You can't really purge the Discussion section of an article of the possibility of spin, but at least you can get the right scatters and tables into the dossier that they're spinning. The possibility was called to my attention a while ago by Michael Myslobodsky, a wise old man from my schizophrenia research world, who did not remotely intend it as a joke.

Interesting. My only experience along these lines is when I was working with a student doing matching for a public health study: There were something like 100 treated units and 1000 potential controls, and we wanted to select 300 of these as matched controls. The researchers were careful to give us only the background information and no outcomes.

Google Fusion Tables

Google just launched a pre-alpha "Fusion Tables". The visualization capability is okay, the interface is not fully stable, but the cool thing is the ability to merge two tables, something I've spent a lot of time doing manually in the past, or with ad-hoc scripts.

Here's an example where I merge their GDP table with a disease table. I need to pick the "WHO Regions/Country" in the right column, so that both tables get aligned:

fusion-tables.png

Afterwards, I can do a scatter plot of GDP rank (X) with child mortality/1000 (Y):

gdp-child-mortality.png

So, high GDP makes child mortality less likely, but not always, and it's not a correlation.

Even if Fusion tables is pre-alpha, the table fusion capability makes it immediately useful. The collaboration features look cool, but it will take some time to get them to work right. Then we'll have proper horizontal collaboration.

Banova

Mark Bucciarelli writes:

I'm interesting in applying the Bayesian ANOVA you described in your 2005 paper to some data I am analyzing.

Is your arm package for R the place to start? (I'm on Linux/Mac, so I'll have to build OpenBugs and maybe update the RBugs package.)

Or is there a more direct path?

I'm analyzing the impact of ad attributes on the variance in click rates; e.g., product category, time of day, graphical vs. text, etc, etc.

My reply: For most of the computations in that article, I actually used a Fortran program that I wrote (and ran from Splus). There's no way this could be recovered; it would be easier to start from scratch. For the last example (in Figures 6 and 7), I used Bugs; actually, I repeated this example in my book with Hill. For your example, I'd suggest Bugs if your sample size isn't too large. Or you could try Doug Bates's lmer() function which we use in our arm package. lmer doesn't currently express uncertainty in the variance parameters but it's a good start, certainly much better than trying to use R's aov() function or similar procedures in other packages.

Double blind, double trouble

Double-Bubble-Tin-Sign-C13111649.jpg

A correspondent who prefers to remain anonymous asks:

Since you publish a lot of papers, I wonder if you've ever come across this issue. Journal reviews are supposed to be double-blind, but authors always have great familiarity with their own work, and cite it frequently. So what is the sense of sending an "anonymized" review copy to a journal editor when a line like "In a previous paper (Smith and Jones, 1999) we showed that ..." lets you know right away that Smith and Jones are the authors of the paper being reviewed?

I have thought about altering the review copy to make it look we are citing a paper by someone else ["In a previous paper, Smith and Jones (1999) showed that..."]. Should I even worry about this? How do you handle it?

My reply: I don't think it matters much. If the rules say to anonymize the references, then I do so, but I don't really worry whether a reviewer can figure out whether it is me writing the article. From the other direction, I review lots of articles (more than I write, actually), and I am very rarely curious enough to bother trying to figure out (for example, using Google) who is writing them.

What bothers me more, actually, is the idea that somebody out there is submitting a crappy article but citing me in such a way that the reviewers think I wrote it. The other thing I worry about is when I review an article negatively, that the authors might be able to figure out that I'm the reviewer. Or, that someone else is reviewing an article negatively and in the review points to my work, leading the author to think that I'm being the bad guy.

P.S. Somebody once told me about triple-blind submission, where even the author doesn't know who wrote the article. Apparently this is standard in medical research.

P.P.S. More thoughts here.

A great image viewer

I used to display my .png files using the default viewer in Windows. Then Aleks told me about Irfanview. It's much better.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48