Results matching “R”

Equation search, part 1

For some reason, Aleks doesn't blog anymore; he just sends me things to blog. I guess he's decided his time is more usefully spent doing other things. I, however, continue to enjoy blogging as an alternative to real work and am a sucker for all the links Aleks sends me (except for the videos, which I never watch).

Aleks's latest find is a program called Eureqa that implements a high-tech curve-fitting method of Michael Schmidt and Hod Lipson. (And, unlike Clarence Thomas, I mean "high-tech" in a good way.) Schmidt and Lipson describe their algorithm as "distilling free-form natural laws from experimental data," which seems a bit over the top, but the basic idea seems sound: Instead of simply running a linear regression, the program searches through a larger space of functional forms, building models like Tinker Toys by continually adding components until the fit stops improving:

eureqa.png

I have some thoughts on the limitations of this approach (see below), but to get things started I wanted to try out an example where I suspected this new approach would work well where more traditional statistical methods would fail.

The example I chose was a homework assignment that I included in a couple of my books. Here's the description:

I give students the following twenty data points and ask them to fit y as a function of x1 and x2.

y x1 x2
15.68 6.87 14.09
6.18 4.4 4.35
18.1 0.43 18.09
9.07 2.73 8.65
17.97 3.25 17.68
10.04 5.3 8.53
20.74 7.08 19.5
9.76 9.73 0.72
8.23 4.51 6.88
6.52 6.4 1.26
15.69 5.72 14.62
15.51 6.28 14.18
20.61 6.14 19.68
19.58 8.26 17.75
9.72 9.41 2.44
16.36 2.88 16.1
18.3 5.74 17.37
13.26 0.45 13.25
12.1 3.74 11.51
18.15 5.03 17.44
16.8 9.67 13.74
16.55 3.62 16.15
18.79 2.54 18.62
15.68 9.15 12.74
4.08 0.69 4.02
15.45 7.97 13.24
13.44 2.49 13.21
20.86 9.81 18.41
16.05 7.56 14.16
6 0.98 5.92
3.29 0.65 3.22
9.41 9 2.74
10.76 7.83 7.39
5.98 0.26 5.97
19.23 3.64 18.89
15.67 9.28 12.63
7.04 5.66 4.18
21.63 9.71 19.32
17.84 9.36 15.19
7.49 0.88 7.43

[If you want to play along, try to fit the data before going on.]

A few days ago I posted some skeptical notes about the comparison of unemployment rates over time within education categories. My comments were really purely statistical; I know next to nothing about unemployment numbers.

One of the advantages of running this blog is I sometimes get emails from actual experts. In this case, economist John Schmitt wrote in:

Your post looks at just how comparable the unemployment rates are in 2009 and the early 1980s. The specific issue concerns whether we should factor in the big changes in educational attainment between the early 1980s and the present --our working population is a lot better educated today than it was in the early 1980s.

According to the piece that motivated your blog post, the unemployment rate for workers at each level of education is higher now than it was in the early 1980s. So, in a mechanical sense, the unemployment rate is lower in 2009 than it was in the early 1980s only because a larger portion of the population in 2009 has shifted to the "low-unemployment" higher education groups.

You take the view that the aggregate unemployment rate is what matters, not the disaggregated unemployment rates by education. Dean Baker and I, however, did a recent analysis in the spirit of the education-based analysis you cite.

We focused on how much older the workforce is today (rather than how much better educated it is), and conclude that if you want to do a sensible comparison, you'll want to factor in the age change.

The main argument from our [Schmitt and Baker's] paper:

Are Liberals Smarter Than Conservatives?

Tom Ball writes:

Didn't know if you had seen this article [by Jason Richwine] about political allegiance and IQ but wanted to make sure you did. I'm surprised the author hasn't heard or seen of your work on Red and Blue states! What do you think?

I think the article raises some interesting issues but he seems to be undecided about whether to take the line that intelligent Americans mostly have conservative views ("[George W.] Bush's IQ is at least as high as John Kerry's" and "Even among the nation's smartest people, liberal elites could easily be in the minority politically") or the fallback position that, yes, maybe liberals are more intelligent than conservatives, but intelligence isn't such a good thing anyway ("The smartest people do not necessarily make the best political choices. William F. Buckley once famously declared that he would rather give control of our government to "the first 400 people listed in the Boston telephone directory than to the faculty of Harvard University."). One weakness of this latter argument is that the authorities he relies on for this point--William F. Buckley, Irving Kristol, etc.--were famous for being superintelligent. Richwine is in the awkward position of arguing that Saul Bellow's aunt (?) was more politically astute than Bellow, even though, in Kristol's words, "Saul's aunt may not have been a brilliant intellectual." Huh? We're taking Richwine's testimony on Saul Bellow's aunt's intelligence?

Richwine also gets into a tight spot when he associates conservativism as "following tradition" and liberalism with "non-traditional ideas." What is "traditional" can depend on your social setting. What it takes to be a rebel at the Columbia University faculty club is not necessarily what will get you thrown out of a country club in the Dallas suburbs. I think this might be what Tom Ball was thinking about when he referred to Red State, Blue State: political and cultural divisions mean different things in different places.

I do, however, agree with Richwine's general conclusion, which is that you're probably not going to learn much by comparing average IQ's of different groups. As Richwine writes, "The bottom line is that a political debate will never be resolved by measuring the IQs of groups on each side of the issue." African-Americans have low IQ's, on average, Jews have high IQ's on average, and both groups vote for the Democrats. Latinos have many socially conservative views but generally don't let those views get in the way of voting for Democrats.

Comment spam

All of a sudden we're getting a lot of comment spam, and so we've changed the settings so that it immediately approves comments only from "any authenticated commenters." Until we can figure out how to solve the spam problem, I guess we'll go back to approving comments a couple times a day.

I encourage youall to "authenticate" your comments (whatever that means) so they will appear immediately and people can respond right away, rather than having to wait to see your comment until it has been approved.

Aaron Swartz links to this rant from Philip Greenspun on university education. Despite Swartz's blurb, I didn't actually see any "new ideas" in Greenspun's article. (I agree with Greenspun's advice that teachers not grade their own students, but no, this isn't a new idea, it's just a good idea that's difficult enough to implement that it usually isn't done).

That's ok. New ideas are overrated. But this bit was just hilarious:

I'm on an email list of media experts for the American Statistical Association: from time to time a reporter contacts the ASA, and their questions are forwarded to us. Last week we got a question from Cari Tuna about the following pattern she had noticed:

Measured by unemployment, the answer appears to be no, or at least not yet. The jobless rate was 10.2% in October, compared with a peak of 10.8% in November and December of 1982.

But viewed another way, the current recession looks worse, not better. The unemployment rate among college graduates is higher than during the 1980s recession. Ditto for workers with some college, high-school graduates and high-school dropouts.

So how can the overall unemployment rate be lower today but higher among each group?

Several of us sent in answers. Call us media chasers or educators of the populace; whatever. Luckily I wasn't the only one to respond: I sent in a pretty lame example that I'd recalled from an old statistics textbook; whereas Xiao-Li Meng, Jeff Witmer, and others sent in more up-to-date items that Ms. Tuna had the good sense to use in her article.

There's something about this whole story that bothers me, though, and that is the implication that the within-group comparisons are real and the aggregate is misleading. As Tuna puts it:

The Simpson's Paradox in unemployment rates by education level is but the latest example. At a glance, the unemployment rate suggests that U.S. workers are faring better in this recession than during the recession of the early 1980s. But workers at each education level are worse off . . .

This discussion follows several examples where, as the experts put it, "The aggregate number really is meaningless. . . . You can't just look at the overall rate. . . ."

Here's the problem. Education categories now do not represent the same slices of the population that they did in 1976. A larger proportion of the population are college graduates (as is noted in the linked news article), and thus the comparison of college grads (or any other education category) from 1982 to the college grads today is not quite an apples-to-apples comparison. Being a college grad today is less exclusive than it was back then.

In this sense, the unemployment example is different in a key way from the other Simpson's paradox examples in the news article. In those other examples, the within-group comparison is clean, while the aggregate comparison is misleading. In the unemployment example, it's the aggregate that has a cleaner interpretation, while the within-group comparisons are a bit of a mess.

As a statistician and statistical educator, I think we have to be very careful about implying that the complicated analysis is always better. In this example, the complicated analysis can mislead! It's still good to know about Simpson's paradox, to understand how the within-group and aggregate comparisons can differ--but I think it's highly misleading in this case to imply that the aggregate comparison is wrong in some way. It's more of a problem of groups changing their meaning over time.

Regular readers of this blog are familiar with the pinch-hitter syndrome: People whose job it is to do just one thing are not always so good at that one thing. I first encountered this when noting the many silly errors introduced into my books by well-meaning copy-editors with too much time on their hands. As I wrote a few years ago:

This is a funny thing. A copy editor is a professional editor. All they do (or, at least, much of what they do) is edit, so how is it that they do such a bad job compared to a statistician, for whom writing is only a small part of the job description?

The answer certainly isn't that I'm so wonderful. Non-copy-editor colleagues can go through anything I write and find lots of typos, grammatical errors, confusing passages, and flat-out mistakes.

No, the problem comes with the copy editor, and I think it's an example of the pinch-hitter syndrome. The pinch-hitter is the guy who sits on the bench and then comes up to bat, often in a key moment of a close game. When I was a kid, I always thought that pinch hitters must be the best sluggers in baseball, because all they do (well, almost all) is hit. But of course this isn't the case--the best hitters play outfield, or first base, or third base, or whatever. If the pinch hitter were really good, he'd be a starter. So, Kirk Gibson in the 1988 World Series notwithstanding, pinch hitters are generally not the best hitters.

There must be some general social-science principle here, about generalists and specialists, roles in an organization, etc?

This idea was recently picked up by a real-life baseball statistician--Eric Seidman of Baseball Prospectus--who writes:

I wanted to talk to you about the pinch-hitter theory you presented, as I've noticed it in an abundance of situations as well.

When I read your theory it made perfect sense, although a slight modification is needed, namely in that it makes more sense as a relief-pitcher theory. In sabermetrics, we have found that pitchers perform better as relievers than they do as starters. In fact, if a starter becomes a reliever, you can expect him to lop about 1.4 runs off of his ERA and vice-versa, simply by virtue of facing batters more often. When you get to facing the batting order the 2nd and 3rd time through, relievers are almost always better options because they are fresh. Their talent levels are nowhere near those of the starters--otherwise, they would BE starters--but in that particular situation, their fresh "eyes" as it pertains to this metaphor are much more effective.

For another example, when working on my book Bridging the Statistical Gap, I found that my editor would make great changes but would miss a lot of ancillary things that I would notice upon delving back in after a week away from it. Applying that to the relief pitcher idea, the editor was still more talented when it came to editing, but his being "in too deep", the equivalent of facing the opposing batting order a few times, made my fresh eyes a bit more accurate.

I'm wondering if you have seen this written about in other areas, as it really intrigues me as a line of study, applying psychological concepts as well as those in statistics.

These are interesting thoughts--first, the idea of applying to relief pitchers, and, second, the "fresh eyes" idea, which is more adds some subtlety to the concept. I'm still not quite sure what he's saying about the pitchers, though: Is he saying that because relief pitchers come in with fresh arms, they can throw harder, or is he saying that, because hitters see starters over and over again, they can improve their swing as the game goes on, whereas when the reliever comes in, the hitters are starting afresh?

Beyond this, I'm interested in Seidman's larger question, about whether this is a more general psychological/sociological phenomenon. Do any social scientists out there have any thoughts?

P.S. I seem to recall Bill James disparaging the ERA statistic--he felt that "unearned" runs count too, and they don't happen by accident. So I'm surprised that the Baseball Prospectus people use ERA rather than RA. Is it just because ERA is what we're all familiar with, so the professional baseball statisticians want to talk our language? Or is ERA actually more useful than I thought?

Scientists behaving badly

Steven Levitt writes:

My view is that the emails [extracted by a hacker from the climatic research unit at the University of East Anglia] aren't that damaging. Is it surprising that scientists would try to keep work that disagrees with their findings out of journals? When I told my father that I was sending my work saying car seats are not that effective to medical journals, he laughed and said they would never publish it because of the result, no matter how well done the analysis was. (As is so often the case, he was right, and I eventually published it in an economics journal.)

Within the field of economics, academics work behind the scenes constantly trying to undermine each other. I've seen economists do far worse things than pulling tricks in figures. When economists get mixed up in public policy, things get messier. So it is not at all surprising to me that climate scientists would behave the same way.

I have a couple of comments, not about the global-warming emails--I haven't looked into this at all--but regarding Levitt's comments about scientists and their behavior:

1. Scientists are people and, as such, are varied and flawed. I get particularly annoyed with scientists who ignore criticisms that they can't refute. The give and take of evidence and argument is key to scientific progress.

2. Levitt writes, about scientists who "try to keep work that disagrees with their findings out of journals." This is or is not ethical behavior, depending on how it's done. If I review a paper for a journal and find that it has serious errors or, more generally, that it adds nothing to the literature, then I should recommend rejection--even if the article claims to have findings that disagree with my own work. Sure, I should bend over backwards and all that, but at some point, crap is crap. If the journal editor doesn't trust my independent judgment, that's fine, he or she should get additional reviewers. On occasion I've served as an outside "tiebreaker" referee for journals on controversial articles outside of my subfield.

Anyway, my point is that "trying to keep work out of journals" is ok if done through the usual editorial process, not so ok if done by calling the journal editor from a pay phone at 3am or whatever.

I wonder if Levitt is bringing up this particular example because he served as a referee for a special issue of a journal that he later criticized. So he's particularly aware of issues of peer review.

3. I'm not quite sure how to interpret the overall flow of Levitt's remarks. On one hand, I can't disagree with the descriptive implications: Some scientists behave badly. I don't know enough about economics to verify his claim that academics in that field "constantly trying to undermine each other . . . do far worse things than pulling tricks in figures"--but I'll take Levitt's word for it.

But I'm disturbed by the possible normative implications of Levitt's statement. It's certainly not the case that everybody does it! I'm a scientist, and, no, I don't "pull tricks in figures" or anything like this. I don't know what percentage of scientists we're talking about here, but I don't think this is what the best scientists do. And I certainly don't think it's ok to do so.

What I'm saying is, I think Levitt is doing a big service by publicly recognizing that scientists sometimes--often?--do unethical behavior such as hiding data. But I'm unhappy with the sense of amused, world-weary tolerance that I get from reading his comment.

Anyway, I had a similar reaction a few years ago when reading a novel about scientific misconduct. The implication of the novel was that scientific lying and cheating wasn't so bad, these guys are under a lot of pressure and they do what they can, etc. etc.--but I didn't buy it. For the reasons given here, I think scientists who are brilliant are less likely to cheat.

4. Regarding Levitt's specific example--he article on car seats that was rejected by medical journals--I wonder if he's being too quick to assume that the journals were trying to keep his work out because it disagreed with previous findings.

As a scientist whose papers have been rejected by top journals in many different fields, I think I can offer a useful perspective here.

Much of what makes a paper acceptable is style. As a statistician, I've mastered the Journal of the American Statistical Association style and have published lots of papers there. But I've never successfully published a paper in political science or economics without having a collaborator in that field. There's just certain things that a journal expects to see. It may be comforting to think that a journal will not publish something "because of the result," but my impression is that most journals like a bit of controversy--as long as it is presented in their style. I'm not surprised that, with his training, Levitt had more success publishing his public health work in econ journals.

P.S. Just to repeat, I'm speaking in general terms about scientific misbehavior, things such as, in Levitt's words, "pulling tricks in figures" or "far worse things." I'm not making a claim that the scientists at the University of East Anglia were doing this, or were not doing this, or whatever. I don't think I have anything particularly useful to add on that; you can follow the links in Freakonomics to see more on that particular example.

All Meehl, all the time

Brad Evans points me to this website devoted the publications of the great Paul Meehl.

Commenter RogerH pointed me to this article by Welton, Ades, Carlin, Altman, and Sterne on models for potentially biased evidence in meta-analysis using empirically based priors. The "Carlin" in the author list is my longtime collaborator John, so I really shouldn't have had to hear about this through a blog comment. Anyway, they write:

We present models for the combined analysis of evidence from randomized controlled trials categorized as being at either low or high risk of bias due to a flaw in their conduct. We formulate a bias model that incorporates between-study and between-meta-analysis heterogeneity in bias, and uncertainty in overall mean bias. We obtain algebraic expressions for the posterior distribution of the bias-adjusted treatment effect, which provide limiting values for the information that can be obtained from studies at high risk of bias. The parameters of the bias model can be estimated from collections of previously published meta-analyses. We explore alternative models for such data, and alternative methods for introducing prior information on the bias parameters into a new meta-analysis. Results from an illustrative example show that the bias-adjusted treatment effect estimates are sensitive to the way in which the meta-epidemiological data are modelled, but that using point estimates for bias parameters provides an adequate approximation to using a full joint prior distribution. A sensitivity analysis shows that the gain in precision from including studies at high risk of bias is likely to be low, however numerous or large their size, and that little is gained by incorporating such studies, unless the information from studies at low risk of bias is limited.We discuss approaches that might increase the value of including studies at high risk of bias, and the acceptability of the methods in the evaluation of health care interventions.

I really really like this idea. As Welton et al. discuss, their method represents two key conceptual advances:

1. In addition to downweighting questionable or possibly-biased studies, they also shift them to adjust in the direction of correcting for the bias.

2. Instead of merely deciding which studies to trust based on prior knowledge, literature review, and external considerations, they also use the data, through a meta-analysis, to estimate the amount of adjustment to do.

And, as a bonus, the article has excellent graphs. (It also has three ugly tables, with gratuitous precision such as "-0.781 (-1.002, -0.562)," but the graph-to-table ratio is much better than usual in this sort of statistical research paper, so I can't really complain.)

This work has some similarities to the corrections for nonsampling errors that we do in survey research. As such, I have one idea here. Would it be possible to take the partially-pooled estimates from any given analysis and re-express them as equivalent weights in a weighted average? (This is an idea I've discussed with John and is also featured in my "Survey weighting is a mess" paper.) I'm not saying there's anything so wonderful about weighted estimates, but it could help in understanding these methods to have a bridge to the past, as it were, and see how they compare in this way to other approaches.

Payment demanded for the meal

There's no free lunch, of course. What assumptions did Welton et al. put in to make this work? They write:

We base the parameters of our bias model on empirical evidence from collections of previously published meta-analyses, because single meta-analyses typically provide only limited information on the extent of bias . . . This, of course, entails the strong assumption that the mean bias in a new meta-analysis is exchangeable with the mean biases in the meta-analyses included in previous empirical (meta-epidemiological) studies. For example, the meta-analyses that were included in the study of Schulz et al. (1995) are mostly from maternity and child care studies, and we must doubt whether the mean bias in studies on drugs for schizophrenia (the Clozapine example meta-analysis) is exchangeable with the mean biases in this collection of meta-analyses.

Assumptions are good. I expect their assumptions are better than the default alternatives, and it's good to have the model laid out there for possible criticism and improvement.

P.S. The article focuses on medical examples but I think the methods would also be appropriate for experiments and observational studies in social science. A new way of thinking about the identification issues that we're talking about all the time.

What can search predict?

Actually, I don't have an RSS myself, but I think you get my point.

P.S. This reminds me that I have to talk with Sharad and Duncan again about their results on people's perceptions of their friends' attitudes. Last we spoke, I felt like we were closing in on a way of distinguishing between two stories: (1) people think their friends are like them, and (2) people predict their friends' attitudes from their friends' other characteristics. But we didn't completely close the deal. Sharad: if you're reading this, let's talk!

P.P.S. I'm a little irritated by one aspect of Sharad's blog, which is that the nearly contentless illustration at the top is much prettier than the ugly, sloppy bit of data graphics at the bottom. The illustration is great, but if you care that much about how things look, why not spend a few minutes on your statistical graphs? It's not just about appearances. With better graphics, you can learn more from the data. Especially if you also use multilevel models, so you can get good estimates about subsets of your population.

For observations like this:

Here's a little pet peeve of mine [Wattenberg's]: nothing rhymes with orange. You've heard that before, right? Orange is famous for its rhymelessness. There's even a comic strip called "Rhymes with Orange." Fine then, let me ask you something. What the heck rhymes with purple?

She continues, reasonably enough:

If you stop and think about it, you'll find that English is jam-packed with rhymeless common words. What rhymes with empty, or olive, or silver, or circle? You can even find plenty of one-syllable words like wolf, bulb, and beige. Yet orange somehow became notorious for its rhymelessness, with the curious result that people now assume its status is unique.

Indeed.

P.S. Also this.

DIC question

Kent Holsinger writes:

I'm fitting a moderately complicated hierarchical model including a structural equation model with latent variables to plant traits measured in a greenhouse.

Just one of these little things

In my paper with Aaron and Nate, we were getting probabilities such as 5e-85 or 9e-16 from our normal-distribution model. Sure, the probability that a vote in D.C. would be decisive was low, but 10^-85? No way. So we switched to a t_4 distribution, which smoothed these probabilities down to 2e-12 and 2e-10. Not perfect, I'm sure, but more plausible.

Just remember: don't be a slave to your model. Work with it. Taking a model too seriously is really just another way of not taking it seriously at all. (By this I mean that, if you say you really believe a probability such as 10^-85, what you're really doing is devaluing the concept of belief in probabilities, and I'm thinking that will leak back and corrupt all the other probabilistic statements you make.)

Parameterization and Bayesian Modeling

My talk at the Institut Henri Poincaré tomorrow 2pm:

Progress in statistical computation often leads to advances in statistical modeling. For example, it is surprisingly common that an existing model is reparameterized, solely for computational purposes, but then this new configuration motivates a new family of models that is useful in applied statistics. One reason why this phenomenon may not have been noticed in statistics is that reparameterizations do not change the likelihood. In a Bayesian framework, however, a transformation of parameters typically suggests a new family of prior distributions. We discuss examples in censored and truncated data, mixture modeling, multivariate imputation, stochastic processes, and multilevel models.

Here's the first slide:

Probably. But I'd like to see an analysis closer to the raw data.

Pinch-hitter syndrome strikes again

An irritating copy editor at a journal changed all my instances of "for example" to "e.g." Because, y'know, so many more people read Latin than English. Ugh! I used ot go and change all these back, but I'm just too lazy now; I only do it with my books and don't bother with the articles.

Just so you know: if you ever see "e.g." or "i.e." in something I write: no, it's not me, it's the copy editor. Can't these people serve society in some useful way, maybe get jobs as spammers or debt collectors?

This comment by Tyler Cowen on Sarah Palin's poor Scrabble strategy reminds me of my blog a few months ago with six suggested Scrabble reforms. Without further ado:

In the old days, I guess I never would've heard about this one. With the internet, all sorts of horrible bits of local news get spread around the world.

I was watching Mad Men the other day (we're still watching season 1 on DVD, so, please, no spoilers in the comments) and it struck me that just about none of the characters on the show are in their forties (on average, the unhappiest time in people's lives, according to Andrew Oswald's research). Almost everyone is in their 20s and 30s, with a few kids and a few people in their 50s and up.

This is just one show, of course. The last show I watched, The Sopranos, was full of fortysomethings. But it got me wondering what the ages are of characters on various fictional and dramatic productions. It would be fun to have a big pile of this information and then play with it in various ways.

Once you start with this, of course, there's all sorts of things you could go around tabulating. If anybody out there happens to do such an analysis (or knows of something out there already), please let me know.

Earth science / statistics postdoc in LA

Julien Emile-Geay writes about a postdoc opportunity for a postdoc in climate
dynamics, applied statistics, or applied mathematics:

"Beyond the Hockey Stick: new approaches to paleoclimate reconstruction"

hockey.png

I've updated my list of presentations

Including today's talk. See here.

What's wrong with a kernel density?

In response to my offhand remark that the kernel densities in the article by Chen and Rodden are "tacky" and would be much better as histograms, commenter Anne asks:

What's wrong with a kernel density? Too opaque a connection with the data? I [Anne] have had some unpleasant surprises using histograms lately, so I've been trying to get a feel for the alternatives.

My reply: Here are my problems with kernel densities (in this example, and more generally):

1. Annoying artifacts, such as all-positive quantities whose kernel density estimates go into the negative zone. This can be fixed, but (a) it typically isn't, and (b) when there isn't an obvious bound, you still have the issue of the kernel density including places that it shouldn't.

2. It's hard to see where the data are. As I wrote in my blog linked above, I think it's better to just see the data directly. Especially for something like vote proportions that I can understand pretty well directly. For example, when I see the little peak at 3% in the density in Figure 2 of Chen and Rodden, or the falloff after 80%, I'd rather just see what's happening there rather than trying to guess by taking the density estimate and mentally un-convolving the kernel.

3. The other thing I like about a histogram is that it contains the seeds of its own destruction--that is, an internal estimate of uncertainty, based on variation in the heights of the histogram bars. See here for more discussion of this point, in particular the idea that the goal of a histogram or density estimate is not to most accurately estimate the true superpopulation density (whatever that means in this example) but rather to get an understanding of what the data are telling you.

Defining ethnicity down

One theme that comes up a lot when we discuss race and politics in the United States is the way that the concept "race" itself changes over time. For example, nowadays you hear a lot about white voters, but fifty years ago, the central category was occupied by white Protestants. White Catholics were considered a separate category, not black but not fully mainstream white, sort of like Hispanics and Asians today. (This is not to say that today's commentators treat whites as a monolithic bloc, there's still lots of talk about the Catholic vote, but I think that "white" as a category is perceived as having more meaning today as a national political category, compared to how things were thought of than in the mid-twentieth century.)

Another example is the perception of Asian Americans. I was thinking about this topic recently after seeing this offhand comment in a blog by Tom Maguire:

The Japanese are neither brown-skinned nor Muslim nor poor . . .

I'm pretty sure he's right about there being very few Muslim or poor people in Japan. (Not zero in any case but a small fraction of the total population of the country. Apparently Japan has a high rate of relative poverty, though.)

But I always thought the Japanese did have brown skin. I guess people used to say "yellow," but it always seemed more like "tan" to me. I wonder if this is some sort of modern redefinition: if Japanese are honorary whites, then their skin gets lightened too? I'm not suggesting any malign intent here on the part of Maguire, just wondering if there's some relabeling going on implicitly.

Sort of vaguely related to the idea that, until very recently, in the U.S. when people said "Asian" they meant east Asian and not, for example, south Asian. (Consider, for example, HABAW. I think she would've been referred to as HIBAW had she been south-Asian-looking. Especially considering that she had a British accent.)

P.S. That first paragraph above was pretty much of a mess. Probably because it's based on my speculations and not backed by any hard facts.

P.P.S. I can't quite bring myself to post this on 538; it doesn't quite seem of general interest. Scrolling through Maguire's blog, I noticed that he and his commenters don't have a very high impression of Nate, so I don't know what they'd think of my comments here.

P.P.P.S. I encountered Maguire's blog through a typically circuitous internet path, starting with a search offi my own blog to find this graph that had been posted by Greg Mankiw and then going to the source, then to this update, to the main page of that blog, where i scrolled though a few pages of Youtube links until I found this link which caught my eye.

Hal Daume writes some reasonable things here, mocking some silly rules that have been proposed for evaluating clustering procedures.

What's interesting to me is that such a wrongheaded idea (not Hal's, but the stuff that he's criticizing) could be taken so seriously in the first place.

Perhaps it's a problem with mathematics, that it takes people to a level of abstraction where they forget their original goals. I've seen this a lot in statistics, for example when people devise extremely elaborate procedures to calculate p-values that don't correspond to any actual data collection procedure. (Here I'm thinking of calculations of the distributions of contingency tables with fixed margins.)

P.S. Scroll to the end of the comments to see that Hal's a better person than I am, in that he doesn't waste his time cleaning out the spam from his blog comments.

ENGREF, 19 avenue du Maine, 75014 Paris (Salle 208)

9h30 - 10h15 : Posterior predictive checking and generalized graphical models
(Andrew Gelman, Columbia University - Department of Statistics)

10h15 - 10h35 : Pause

10h35 - 11h20 : Conditional predictive p-values and the double use of the data (Judith Rousseau, Paris-Dauphine - Ceremade)

11h20 - 12h00 : Forum, questions-réponses : le calcul des Facteurs de Bayes (Chantal Guihenneuc, ) - la loi Normale tronquée et traitement en WinBugs (Jean-Louis Foulley, INRA-Jouy)

12h00 - 14h00 : Pause pour le repas (libre)

14h00 - 14h45 : Modèles à équations structurelles : les modèles libres vs les réseaux bayesiens (Christian Derquenne, EDF - R&D)

14h45 - 15h30 : Approche bayésienne des modèles structurels
(SĂ©verine Demeyer, LNE)

15h30 - 15h45 : Pause

15h45 - 16h30 : Enseigner l'inférence bayésienne aux débutants : l'analyse des données expérimentales (Bruno Lecoutre, CNRS - Université de Rouen)

16h30 - 17h00 : Discussion générale sur la prochaine journée d'Applibugs, suggestion de prochaines interventions

Blogging explosion!

If you've been paying attention to this blog over the weekend, you'll have learned that Rush Limbaugh accused Gallup of "upping the sample to black Americans to keep [Obama] up at 50%" in the polls. I remarked that, if you want to rig the polls and you're clever--and Gallup is nothing if not clever--you can do it without resorting to racial sampling.

Alan Abramowitz adds:

It is beyond weird that Rush Limbaugh is now accusing the Gallup Poll of deliberately over-counting Democrats because the truth is that the Gallup Poll has for the past several months consistently shown a smaller Democratic advantage in party identification than other national polls.

Connecting the dots

A key principle in applied statistics is that you should be able to connect between the raw data, your model, your methods, and your conclusions.

Unfortunately, this principle isn't often well understood. We've all seen it a zillion times: someone shows you a regression analysis with a counterintuitive result, but then when you ask to see where in the data this is happening, you're told: Don't worry, it's a regression, we controlled for everything. Or you'll see a regression or some other analysis backed up (if you could say that) by a couple of anecdotes. Again, though, you have to put full trust in the statistical analysis, because you can select an anecdote to support almost any point.

It is possible, however, to do better. IIn my own work, I try to link data to results in several ways: most obviously, with scatterplots showing data and fitted models (lots of examples in ARM) but also with graphical model checking. Your model's wrong, you know, and it can be a good idea ot explore the ways it doesn't match the data used to fit it, and to explore the ways it doesn't jibe with other information you have.

Anyway, this was really all just by way of introducing a beautiful little example from Seth Masket on the topic of national unemployment rates and congressional elections. After Masket posted a graph showing zero correlation between unemployment rates and the President's party's losses in midterm elections, Ross Douthat responded skeptically in the New York Times:

In the last 50 years, there's only been one midterm election fought with unemployment above 8 percent, let alone 10. (That would be 1982, when Reagan's Republicans lost 22 House seats.) The sample size of relevant races is way too small to draw any useful generalizations, in other words, and it's better to fall back on common sense . . .

Masket responded:

I agree with you that the lack of historical cases with very high unemployment should give us some humility in predicting next year's election. . . . As it happens, the average midterm seat loss for the president's party over the past sixty years is 22 seats. So if we knew nothing else about next year's election, the Democrats losing 22 House seats would be a reasonable guess. The fact that the one case with unemployment over nine percent (1982) produced precisely the average number of seat losses suggests that unemployment really isn't a factor.

Very nice.

P.S. For another example of the power of combining models with simple numbers, and also on the topic of unemployment rates, see Greg Mankiw's useful discussion of the difficulties of evaluating interventions when n=1:

stimulus-vs-unemployment-april.gif

Updated graph is here.

Also this scary, scary picture. Here I'd prefer to go back a few years on the x-axis. The graph with the forecast pretty much had to start near 2009--that's when the with/without-recovery-plan lines come from. But the historical jobs graph would be much better going back ten years or longer. Sure, you want enough resolution so you can see the trend in the past year, but you also want enough context to have a sense of the fluctuations, so you can see how often it is that 5 million jobs disappear like this.

Harry Selker and Alastair Wood say yes.

P.S. The answer is no. The offending language is no longer in the bill (perhaps in response to Selker and Wood's article).

P.P.S. Somebody checked again, and the offending language is still there!

Some sort of update to ggplot2

Jeroen Ooms writes:

Here's a first version of a new web application for exploratory graphical analysis. It attempts to implement the layered graphics from the R package ggplot2 in a user-friendly way. This two-minute demo video demonstrates a quick how-to.

He asks for feedback, so if you have any, feel free to comment. I don't know ggplot2 but my impression is that I should really be using it. Maybe Yu-Sung and Daniel should consider using it for mrp.

Everybody's a critic

Christopher Nelson tries his hand at being a graphics curmudgeon.

Mark Blumenthal links to Rush Limbaugh accusing Gallup of "upping the sample to black Americans to keep [Obama] up at 50%" in the polls. (For the context, see the last paragraph of the transcript.)

Frank Newport of Gallup responds here. Newport denies it all, but he would, wouldn't he?

Seriously, though, it's hard to believe that Limbaugh really believes that Gallup is fudging the numbers. As a big-time radio host, he's gotta know all about marketing surveys, right? I'm just assuming he said that "upping the sample" bit as more of a joke or an off-the-wall speculation. It did raise two interesting questions in my mind, though:

1. The assumption behind Limbaugh's argument--as with many arguments about polls--is that the published poll results have an effect of their own, beyond he president's underlying popularity. For example, maybe some senator would vote for the health care bill if he read that Obama's approval rating was 51% but would vote no if he read that Obama only had 49% approval. This might very well be true--it makes sense--I just don't really know.

2. What if you were a pollster and really did want to cheat and overrepresent Democrats? How would you do it? Contra Limbaugh's suggestion, I don't think you'd oversample blacks. I'm assuming Gallup does telephone surveys, and it's not like there's a separate telephone directory for blacks. Also, as several commenters to Newport noted, the percentage of blacks among the survey respondents is easy enough to check. And, for that matter, many survey organizations (possibly including Gallup) do post-sampling weighting adjustments for race, anyway, in which case oversampling blacks won't do anything for you at all.

If you're doing a telephone poll and want to oversample Democrats, you can just call states and area codes where more Democrats live. Call New York, LA, Chicago, etc. You can even call people in Democratic-leaning white areas if you want to mix things up a bit. That'll do the trick. Bury it deep enough in the sampling algorithm and maybe nobody will notice!

P.S. I looked at Gallup's home page and was surprised not to see any link to a description of their sampling methods. Or maybe it's somewhere and I didn't see it.

P.P.S. Blumenthal sent me this helpful link.

Type M errors are all over the place

Jimmy points me to this article, "Why most discovered true associations are inflated," by J. P. Ioannidis. As Jimmy pointed out, this is exactly what we call type M (for magnitude) errors. I completely agree with Ioannidis's point, which he seems to be making more systematically than David Weakliem and I did in our recent article on the topic.

My only suggestion beyond what Ioannidis wrote has to do with potential solutions to the problem. His ideas include: "being cautious about newly discovered effect sizes, considering some rational down-adjustment, using analytical methods that correct for the anticipated inflation, ignoring the magnitude of the effect (if not necessary), conducting large studies in the discovery phase, using strict protocols for analyses, pursuing complete and transparent reporting of all results, placing emphasis on replication, and being fair with interpretation of results."

These are all good ideas. Here are two more suggestions:

1. Retrospective power calculations. See page 312 of our article for the classical version or page 313 for the Bayesian version. I think these can be considered as implementations of Iaonnides's ideas of caution, adjustment, and correction.

2. Hierarchical modeling, which partially pools estimated effects and reduces Type M errors as well as handling many multiple comparisons issues. Fuller discussion here (or see here for the soon-to-go-viral video version).

P.S. Here's the first mention of Type M errors that I know of. The problem is important enough, though, that I suspect there are articles on the topic going back to the 1950s or earlier in the psychometric literature.

Postdoc openings here in fall, 2010 !!!

Postdoc opportunities working with Prof. Andrew Gelman in the Department of Statistics on problems related to hierarchical modeling and statistical computing, with projects including high-dimensional modeling, missing-data imputation, and parallel computing. Application areas include public opinion and voting, social networks, international development, dendrochronology, and models of cancer and drug abuse. Applicants should have experience with Bayesian methods, a willingness to program, and an interest in learning. Applications will be considered as they arrive. The application consisting of cover letter, cv and a selection of published or unpublished articles should be emailed to asc.coordinator@stat.columbia.edu. Please also arrange for three letters of recommendation to be sent to the same email address. This is an exciting place to work: our research group involves several faculty, postdocs, graduate students, and undergraduates working on a wide range of interesting applied problems. We also have strong links to the Earth Institute, the Center for Computational Learning Systems, and the Columbia Population Research Center, as well as to Statistics, Political Science, and other academic departments at Columbia. As a postdoc here, you will have an opportunity to work on collaborative projects on theory, application, computation, and graphics. You can talk to our current and former postdocs if you want to hear how great it is to work here. Positions are usually for two years. Columbia University is an Equal Opportunity/Affirmative Action employer.

Also, if you're finishing up your Ph.D. in statistics, have interest in public health and international development, and would like to work with me, please contact me regarding the Earth Institute postdoc. Application deadline is 1 Dec, so time to get moving on this!

Seth writes:

Is this a fair statement, do you think?
Science revolves around the discovery of new cause-effect relationships but the entire statistics literature says almost nothing about how to do this.

It's part of an abstract for a talk I [Seth] will give at the ASA conference next July. Haven't submitted the abstract yet so can revise it or leave it out.

My reply: This seems reasonable to me.

You could clarify that the EDA literature is all about discovery of new relationships but with nothing about causality, while the identification literature is all about causality but nothing about the discovery of something new.

Nate, Daniel, and I have an op-ed in the Times today, about senators' positions and state-level opinion on health care. We write:

Lawmakers' support for or opposition to reform generally has less to do with the views of their constituents and more to do with the issue of presidential popularity. . . .

For instance, Senator Blanche Lincoln, a Democrat who has been a less-than-strong supporter of the present health care bill, recently told The Times, "I am responsible to the people of Arkansas, and that is where I will take my direction." But where does she look for her cue? Hers is a poor state whose voters support health care subsidies six percentage points more than the national average. On the other hand, Mr. Obama got just 40 percent of the vote there.

Likewise, in Louisiana, where the Annenberg surveys showed health care reform to be popular but where Mr. Obama is not, the Democrats are not assured of Mary Landrieu's vote. . . .

Here's our graph that makes this point:

Statfight!

Daniel Corsi writes:

I am a PhD student in epidemiology at McMaster University and I am interested in exploring how characteristics of communities are related to child health in developing countries.

I have been using multilevel models to relate physical characteristics of communities such as the number of schools, health clinics, sanitation facilities etc to child height for age and weight for age using observational/survey data.

I have several questions with regards to the group (community-level) level predictors in these models.

I was checking out the comments at my bloggingheads conversation with Eliezer Yudkowsky, and I noticed the following, from commenter bbbeard:

My sense is that there is a fundamental sickness at the heart of Bayesianism. Bayes' theorem is an uncontroversial proposition in both frequentist and Bayesian camps, since it can be formulated precisely in terms of event ensembles. However, the fundamental belief of the Bayesian interpretation, that all probabilities are subjective, is problematic -- for its lack of rigor. . . .

Andrew Roberts writes:

I teach political science at Northwestern. I have a book coming out with U of Chicago Press called "The Thinking Student's Guide to College" and I wanted to ask you a question about one part.

I have a section where I advocate a few "neglected majors". One of them is statistics. I wrote the following (see below) about statistics, but it seems a little dull to me. I'd be curious if you would add anything that would make the major seem more attractive. (FYI, the other neglected majors are linguistics, regional studies, and sociology).

To fully understand just about any phenomenon in the world, from atoms to people to countries, you need a grasp of statistics. Statistics teaches you how to measure quantities, collect data, and then draw inferences from that information. Though this might sound boring, these tasks are necessary to explain most of the forces affecting our lives, whether the workings of markets, the movement of public opinion, or the spread of disease. Not only does a statistics major give you the skills to answer these questions, it is also extremely marketable. There is hardly a firm which could not benefit from a trained statistician, and statisticians are just as desirable for public interest groups hoping to help the disadvantaged. And if you worry that you are not the math type, statistics is considerably less demanding than a pure math major and does more to help you understand the real world in all its complexities.

"Considerably less demanding than a pure math major," huh? OK, OK . . .

My main suggestion would be to be less apologetic. No need to say "Though this might sound boring"!

Perhaps some of you have specific suggestions for Andrew Roberts for his book?

In the spirit of Christian Robert, I'd like to link to my own adaptive Metropolis paper (with Cristian Pasarica):

A good choice of the proposal distribution is crucial for the rapid convergence of the Metropolis algorithm. In this paper, given a family of parametric Markovian kernels, we develop an adaptive algorithm for selecting the best kernel that maximizes the expected squared jumped distance, an objective function that characterizes the Markov chain. We demonstrate the effectiveness of our method in several examples.

The key idea is to use an importance-weighted calculation to home in on a jumping kernel that maximizes expected squared jumped distance (and thus minimizes first-order correlations). We have a bunch of examples to show how it works and to show how it outperforms the more traditional approach of tuning the acceptance rate:

jumpingplot.png

Regarding the adaptivity issue, our tack is to recognize that the adaptation will be done in stages, along with convergence monitoring. We stop adapting once approximate convergence has been reached and consider the earlier iterations as burn-in. Given what is standard practice here anyway, I don't think we're really losing anything in efficiency by doing things this way.

Completely adaptive algorithms are cool too, but you can do a lot of useful adaptation in this semi-static way, adapting every 100 iterations or so and then stopping the adaptation when you've reached a stable point.

The article will appear in Statistica Sinica.

In response to my note on the limited ideological constraints faced by legislators running for reelection, Alan Abramowitz writes:

I [Abramowitz] agree--although they probably have less leeway now than in the past due to growing pressure toward ideological conformity within parties, especially GOP. But one thing that struck me as very interesting in your graph is that it looks like the advantage of a moderate voting record is considerably smaller now than it used to be, down from over 4 percentage points in the 1980s to maybe 1.5 points on average now. It suggests to me that the electorate has become increasingly partisan and that fewer voters are going to defect to an incumbent from the opposing party regardless of voting record. This could reflect more concern among voters with party control of Congress itself. Along these lines, one thing I've found in the NES data is a growing correlation between presidential job evaluations and voting for both House and Senate candidates over time.

My reply: Yes, that makes sense. The trend is suggestive although (as you can see from the error bars) not statistically significant. Recently I have not had my thoughts organized enough to write any articles on this stuff, but it feels good to at least post these fragments for others to chew on.

More on risk aversion etc etc etc

A correspondent writes:

You may be interested in this article by Matthew Rabin which makes the point that you make in your article: if you are an expected utility maximizer then turning down small actuarially unfair bets (e.g. 50% win $120; 50% lose $100) implies that you would never accept a bet where could lose $1000 (even if you might win an infinite amount of money). (But proved in more generality).

This was taught to me in the first year of my econ phd program (which I'm currently in!) as why you probably don't want to extrapolate from decisions over small bets to risk aversion in general, not as why we should throw out risk aversion and expected utility maximization completely. Of course, decision theorists do all kinds of things to try to "fix" this problem.

My reply: Yitzhak (as we called him in high school) wrote his paper after mine had appeared; unfortunately my article was in a statistics journal and he had not heard about it. (This was before I could publicize everything on the blog. And, even now, I think a few papers of mine manage to get out there without being noticed.)

I'm glad they teach this stuff in grad schools now--although, in a way, this still proves my point, in that the nonlinear-utility-function-for-money model is still considered such a standard that they feel the need to debunk it.

My correspondent replied: "I wouldn't call it a debunking....we still go on to use it as the workhorse model in everything we do...."

I think there are good and bad things about this "workhorse model":

The other day I commented on an article by Peter Bancel and Roger Nelson that reported evidence that "the coherent attention or emotional response of large populations" can affect the output of quantum-mechanical random number generators.

I was pretty dismissive of the article; in fact elsewhere I gave my post the title, "Some ESP-bashing red meat for you ScienceBlogs readers out there."

Dr. Bancel was pointed to my blog and felt I wasn't giving the full story. I'll give his comments and then at the end add some thoughts of my own. Bancel wrote:

1. From what I read, 2012 is a big-budget, low-brains remake of Miracle Mile, while completely missing the point of the original. So sad.

2. Meryl Streep was totally wasted in Fantastic Mr. Fox. And I don't mean she was drunk--well, maybe she was, who knows?--but her talent went largely unused. Seems like a crime to have Meryl Streep in a movie and not make more use of what she can do.

On the other hand, everyone deserves to relax now and then. If Streep is going to be taking a break, there's no harm in her doing it in the context of a movie.

And, on the plus side, they didn't let Gilbert Gottfried or anyone who sounds like him get anywhere near the place.

Email update

Somebody in Nigeria wants to send me money. I can't give you the details, but let me say that if this does work out, I'll never have to worry about book royalties again. I feel a little guilty about dealing in blood diamonds, but if this deal works out, I can make it all right by donating a lot to charity.

I've been ranting lately about how I don't like the term "risk aversion," and I was thinking it might help to bring up this post from last year:

Jonathan Rodden and Jowei Chen sent me this article:

When one of the major parties in the United States wins a substantially larger share of the seats than its vote share would seem to warrant, the conventional explanation lies in manipulation of maps by the party that controls the redistricting process. Yet this paper uses a unique data set from Florida to demonstrate a common mechanism through which substantial partisan bias can emerge purely from residential patterns. When partisan preferences are spatially dependent and partisanship is highly correlated with population density, any districting scheme that generates relatively compact, contiguous districts will tend to produce bias against the urban party. In order to demonstrate this empirically, we apply automated districting algorithms driven solely by compactness and contiguity parameters, building winner-take-all districts out of the precinct-level results of the tied Florida presidential election of 2000. The simulation results demonstrate that with 50 percent of the votes statewide, the Republicans can expect to win around 59 percent of the seats without any "intentional" gerrymandering. This is because urban districts tend to be homogeneous and Democratic while suburban and rural districts tend to be moderately Republican. Thus in Florida and other states where Democrats are highly concentrated in cities, the seemingly apolitical practice of requiring compact, contiguous districts will produce systematic pro-Republican electoral bias.

My thoughts:

"Subject: Our marketing plan"

I had mixed feelings about this one.

My first reaction was disappointment that Ellis Weiner, the author of the great book, Decade of the Year, is reduced to writing one-page bits where he's flopping around like a seal on a beach trying to squeeze a few laughs out of . . . whatever. You get the idea.

My second reaction, after reading the piece--I never read this sort of thing but I was curious what Weiner's been up to lately--was how scarily accurate his description was of the book-marketing process, the way they basically want you, the author, to do all the promotion, and also the bit where the publicist says, "I sort of have my hands full, promoting twenty-three new releases this fall, but I'm really excited about working on your book..."

My third reaction was: hey, this story really wasn't very good at all. Basically, he was taking a so-so idea (I mean, really, how many readers can really relate to complaints about book publicity) and then not taking it enough over the top. How did this end up getting published in the New Yorker, a place where I assume the best humor writers in America would kill to be in? Personal connections are part of the story, possibly. Beyond this, I suppose that magazine editors are the sort of people who would be particularly amused by a joke about the publishing industry. It's a real step down from the days of Veronica Geng, that's for sure.

P.S. I hate to be so negative, but I'm pretty sure that New Yorker writers don't spend their time reading statistics blogs. So I think I'm safe in writing this without worrying that I've hurt his feelings.

Matthew Yglesias remarks that, when staking out positions, congressmembers are not very strongly constrained by the ideologies of their constituents.

Wow, that was a lot of big words. What I meant to say was: Congressmembers and Senators can pretty much vote how they want on most issues, whatever their constituents happen to believe. Not always, of course, but a representative can take a much more liberal or conservative line than the voters in his or her district or state, and still do fine when election time comes.

Yglesias gives some examples from the U.S. Senate, and I just wanted to back him up by citing some research from the House of Representatives.

First, here's a graph (based on research with Jonathan Katz) showing that, when running for reelection, it helps for a congressmember to be a moderate--but not by much:

median.png

Being a moderate is worth about 2% of the vote in a congressional election: it ain't nuthin, but it certainly is not a paramount concern for most representatives.

Statistics for firefighters!

This is one I'd never thought about . . . Daniel Rubenson writes:

I'm an assistant professor in the Politics Department at Ryerson University in Toronto. I will be teaching an intro statistics course soon and I wanted to ask your advice about it. The course is taught to fire fighters in Ontario as part of a certificate program in public administration that they can take. The group is relatively small (15-20 students) and the course is delivered over an intensive 5 day period. It is not entirely clear yet whether we will have access to computers with any statistical software; the course is taught off campus at a training facility run by the Ontario Fire College.

Finding signal from noise

A reporter contacted me to ask my impression of this article by Peter Bancel and Roger Nelson, which reports evidence that "the coherent attention or emotional response of large populations" can affect the output of quantum-mechanical random number generators.

I spent a few minutes looking at the article, and, well, it's about what you might expect. Very professionally done, close to zero connection between their data and whatever they actually think they're studying.

Masanao points me to this. Incidentally, Don Rubin was a psychology grad student for awhile.

In the discussion of the attention-grabbing "global cooling" chapter of the new Freakonomics book,some critics have asked how it is that free-market advocates such as Levitt and Dubner can recommend climate engineering (a "controlled injection of sulfur dioxide into the stratosphere"), which seems like the ultimate in big-government solutions? True, the Freakonomics recommendation comes from a private firm, but it's hard to imagine it would just be implemented in somebody's backyard--I think you'd have to assume that some major government involvement would be necessary.

So what gives?

Ben Highton writes:

One of my colleagues thinks he remembers an essay your wrote in response to the Cox/Katz argument about using "involuntary exits" from the House (due to death, etc.) as a means to get leverage on the incumbency advantage as distinct from strategic retirement in their gerrymandering book. Would you mind sending me a copy?

My reply:

It's in our rejoinder to my article with Zaiying Huang, Estimating incumbency advantage and its variation, as an example of a before/after study (with discussion), JASA (2008). See page 450. Steve Ansolabehere assisted me in discussing this point.

P.S. There was a question about how this relates to David Lee's work on estimating incumbency advantage using discontinuities in the vote. My short answer is that Lee's work is interesting, but he's not measuring the effect of politicians' incumbency status. He's measuring the effect of being in the incumbent party, which in a country without strong candidate effects (India, perhaps, according to Leigh Linden) can make sense but doesn't correspond to what we think of as incumbency effects in the United States. Identification strategies are all well and good, but you have to look carefully at what you're actually identifying!

David Afshartous writes:

Regarding why one should not control for post-treatment variables (p.189, Data Analysis Using Regression and Multilevel/Hierarchical Models), the argument is very clear as shown in Figure 9.13, i.e., we would be comparing units that are not comparable as can be seen by looking at potential outcomes z^0 and z^1 which can never both be observed. How would you respond to someone that says "well, what about a cross-over experiment", wouldn't it be okay for that case?" I suppose one could reply that in a cross-over we do not have z^0 and z^1 in a strict sense, since we observe the effect of T=0 and T=1 on z at different times rather than the counterfactual for an identical time point, etc. Would you add anything further?

My reply: it could be ok, it depends on the context. One point that Rubin has made repeatedly over the past few decades is that inference depends on a model. With a clean, completely randomized design, you don't need much model to get inferences. A crossover design is more complicated. If you make some assumptions about how the treatment at time 1 affects the outcome after time 2, then you can go from there.

To put it another way, the full Bayesian analysis always conditions on all information. Whether this looks like "controlling" for an x-variable, in a regression sense, depends on the model that you're using.

Continuing puzzlement over "Why" questions

Tyler Cowen links to a blog by Paul Kedrosky that asks why winning times in the Boston marathon have been more variable, in recent years, than winning times in New York. This particular question isn't so interesting--when I saw the title of the post, my first thought was "the weather," and, in fact, that and "the wind" are the most common responses of the blog commenters--but it reminded me of a more general question that we discussed the other day, which is how to think about Why questions.

Many years ago, Don Rubin convinced me that it's a lot easier to think about "effects of causes" than "causes of effects." For example, why did my cat die? Because she ran into the street, because a car was going too fast, because the driver wasn't paying attention, because a bird distracted the cat, because the rain stopped so the cat went outside, etc. When you look at it this way, the question of "why" is pretty meaningless.

Similarly, if you ask a question such as, What caused World War 1, the best sort of answers can take the form of potential-outcomes analyses. I don't think it makes sense to expect any sort of true causal answer here.

But, now let's get back to the "volatility of the Boston marathon" problem. Unlike the question of "why did my cat die" or "why did World War 1 start," the question, "Why have the winning times in the Boston marathon been so variable" does seem answerable.

What happens if we try to apply some statistical principles here?

Principle #1: Compared to what? We can't try to answer "why" without knowing what we are comparing to. This principle seems to work in the marathon-times example. The only way to talk about the Boston times as being unexpectedly variable is to know what "expectedly variable" is. Or, conversely, the New York times are unexpectedly stable compared to what was happening in Boston those same years. Either way, the principle holds that we are comparing to some model or another.

Principle #2: Look at effects of causes, rather than causes of effects. This principle seems to break down in marathon example, where it seems very natural to try to understand why an observed phenomenon is occurring.

What's going on? Perhaps we can understand in the context of another example, something that came up a couple years ago in some of my consulting work. The New York City Department of Health had a survey of rodent infestation, and they found that African Americans and Latinos were more likely than whites to have rodents in their apartments. This difference persisted (albeit at a lesser magnitude) after controlling for some individual and neighborhood-level predictors. Why does this gap remain? What other average differences are there among the dwellings of different ethnic groups?

OK, so now maybe we're getting somewhere. The question on deck now is, how do the "Boston vs. NY marathon" and "too many rodents" problems differ from the "dead cat" problem.

One difference is that we have data on lots of marathons and lots of rodents in apartments, but only one dead cat. But that doesn't quite work as a demarcation criterion (sorry, forgive me for working under the influence of Popper): even if there were only one running of each marathon, we could still quite reasonably answer questions such as, "Why was the winning time so much lower in NY than in Boston?" And, conversely, if we had lots of dead cats, we could start asking questions about attributable risks, but it still wouldn't quite make sense to ask why the cats are dying.

Another difference is that the marathon question and the roach question are comparisons (NY vs. Boston and blacks/hispanics vs. whites), while the dead cat stands alone (or swings alone, I guess I should say). Maybe this is closer to the demarcation we're looking for, the idea being that a "cause" (in this sense) is something that takes you away from some default model. In these examples, it's a model of zero differences between groups, but more generally it could be any model that gives predictions for data.

In this model-checking sense, the search for a cause is motivated by an itch--a disagreement with a default model--which has to be scratched and scratched until the discomfort goes away, by constructing a model that fits the data. Said model can then be interpreted causally in a Rubin-like, "effects of causes," forward-thinking way.

Is this the resolution I'm seeking? I'm not sure. But I need to figure this out, because I'm planning on basing my new intro stat course (and book) on the idea of statistics as comparisons.

P.S. I remain completely uninterested in questions such as, What is the cause? Is it A or is it B? (For example, what caused the differences in marathon-time variations in Boston and New York--is it the temperature, the precipitation, the wind, or something else? Of course if it can be any of these factors, it can be all of them. I remain firm in my belief that any statistical method that claims to distinguish between hypotheses in this way is really just using sampling variation as a way to draw artificial distinctions, fundamentally in a way no different from the notorious comparisons of statistical significance to non-significance.

This last point has nothing to do with causal inference and everything to do with my preference for continuous over discrete models in applications in which I've worked in social science, environmental science, and public health.

So go read the stuff on the main page now before it scrolls off your screens.

No, not me. It's somebody else. Story here.

1989

Joshua Clover's fun new book, 1989, features a blurb from Luc Sante, author some years back of the instant-classic, Low Life. 1989 has some similarities to Low Life--both are about culture and politics--but Clover is much more explicit in making his connections, whereas Sante left most of his implications unsaid. I read Low Life when it came out, and I immediately felt: This-book-is-awesome-and-I get-it-but-nobody-else-will. I think, actually, that just about everybody who read Low Life had that same reaction, which is what being a "cult classic" is all about.

1989 was a bit of a nostalgia-fest for me, at least in the chapter about rap. (I'm not really familiar enough with the other musical styles of that era, so the other chapters were harder for me to follow. I read Cauty and Drummond's "The Manual" (another cult classic, I think) several years ago but only had a vague sense of them and I've never thought of taking their music seriously as some sort of cultural indicator.)

I remember when I first heard Straight Outta Compton, how dense and echoing the sound was, an intensity comparable (in my view) to the movie Salvador.

Judea Pearl sends along this article and writes:

This research note was triggered by a discussant on your blog, who called my attention to Wooldridge's paper, in response to my provocative question: "Has anyone seen a proof that adjusting for intemediary would introduce bias?"

It led to some interesting observations which I am now glad to share with your bloggers.

As Pearl writes, it is standard advice to adjust for all pre-treatment variables in an experiment or observational study--and I think everyone would agree on the point--but exactly how to "adjust" is not always clear. For example, you wouldn't want to just throw an instrumental variable as another regression predictor. And this then leads to tricky questions of what to do with variables that are sort-of instruments and sort-of covariates. I don't really know what to do in such situations, and maybe Pearl and Woolridge are pointing toward a useful way forward.

I'm curious what Rosenbaum and Rubin (both of whom are cited in Pearl's article) have to say about this paper. And, of course, WWJD.

6 cents a word

Helen DeWitt links to a blog by John Scalzi pointing out that today's science fiction magazine writers get the same rate--6 cents per word--at F. Scott Fitzgerald did tor his short stories in 1920. After correcting for inflation, this means Fitzgerald was paid 20 times as much.

Scalzi writes that this "basically sucks." But I'd frame this somewhat differently. After all, this is F. Scott Fitzgerald we're talking about. I'd guess he really is worth at least 20 times per word what the authors of articles in Fantasy & Science Fiction etc.

P.S. As a blogger, my word rate is 0 cents, of course.

In the Applied Statistics Blog this week

1. Understanding the 'Russian Mortality Paradox' in Central Asia: Evidence from Kyrgyzstan

Short answer: alcohol and suicide.

2. Lumberjacks as a counterexample to the idea of a "risk premium"

They take lots of risks and don't get paid well for it.

3. Cell size and scale

This is a visualization you won't want to miss.

4. Three guys named Matt

5. The political philosophy of the private eye

A genre that was rendered obsolete in 1961 (but nobody realizes it).

The two blogs

Tyler Cowen writes:

Andrew Gelman will have a second blog. I don't yet understand the forthcoming principle of individuation across the two blogs.

Slipperiness of the term "risk aversion"

I don't like the term "risk aversion" (see here and here). For a long time I've been meaning to write something longer and more systematic on the topic, but every once in awhile I see something that reminds me of the slipperiness of the topic.

For example, Alex Tabarrok asks, "Why are Americans more risk averse about medicine than Europeans?" It's a good question, and it's something I've wondered about myself. But I don't know what he's talking about when he says that "the stereotype is that Americans are more risk-loving" than Europeans. Huh? Americans are notorious for worrying about risks, with car seats, bike helmets, high railings on any possible place where someone could fall, Purell bottles everywhere, etc etc. The commenters on Alex's blog are all talking about drug company regulations, but it seems like a broader cultural thing to me.

But I'm bothered by the term "risk aversion." Why exactly is it appropriate to refer to strict rules on drug approvals as "risk averse"? In a general English-language use of the words, I understand it, but it gets slippery when you try to express it more formally.

Asa writes:

I took your class on multilevel models last year and have since found myself applying them in several different contexts. I am about to start a new project with a dataset in the tens of millions of observations. In my experience, multilevel modeling has been most important when the number of observations in at least one subgroup of interest is small. Getting started on this project, I have two questions:

1) Do multilevel models still have the potential to add much accuracy to predictions when n is very large in all subgroups of interest?

2) Do you find SAS, STATA, or R to be more efficient at handling multilevel/"mixed effects" models with such a large dataset (wont be needing any logit/poisson/glm models)?

My reply:

Regarding software, I'm not sure, but my guess is that Stata might be best with large datasets. Stata also has an active user community that can help with such questions.

For your second question, if n is large in all subgroups, then multilevel modeling is typically not needed. But if n is large in all subgroups, you can simply fit a separate model in each group. That is equivalent to a full-interaction model. At that point you might be interested in details within subgroups, and then you might want a multilevel model.

Asa then wrote:

Yes, a "full interaction" model was the alternative I was thinking of. And yes, I can imagine the results from that model raising further questions about whats going on within groups as well.

My previous guess was that SAS would be the most efficient for multilevel modeling with big data. But I just completely wrecked my (albeit early 2000's era) laptop looping proc mixed a bunch of times with a much smaller dataset.

I don't really know on the SAS vs. Stata issue. In general, I have warmer feelings toward Stata than SAS, but, on any particular problem, who knows? I'm pretty sure that R would choke on any of these problems.

On the other hand, if you end up breaking the problem into smaller pieces anyway, maybe the slowness of R wouldn't be so much of a problem. R does have the advantage of flexibility.

Jewish Marriage Tied to Israel Trip

Aleks sends along this amusing news article by Jennifer Levitz:

A new study found that rates of marriage outside the faith were sharply curbed among young Jews who have taken "birthright" trips to Israel . . . Over the past decade, Taglit-Birthright Israel, a U.S. nonprofit founded by Jewish businessmen, has sponsored nearly 225,000 young Jewish adults for free 10-day educational tours of Israel as a way to foster Jewish identity. . . .

A study [by Brandeis University researcher Leonard Saxe and partly funded by Taglit-Birthright] showed that 72% of those who went on the trip married within the faith, compared with 46% of people who applied for the trip but weren't selected in a lottery. . . . The Brandeis study looked at 1,500 non-Orthodox Jewish adults who took Taglit trips or applied for one between 2001 and 2004. . . . The Brandeis study looked at 1,500 non-Orthodox Jewish adults who took Taglit trips or applied for one between 2001 and 2004.

The article also said that 10,000 people participated in these trips last summer, which suggests that the 1,500 people in the research study represent a very small fraction of the participants from 2001-2004. I have no idea if this is a random sample, or what. Also I wonder about the people who participated in the lottery, were selected, but didn't go on the trip. Excluding these people (if there are many of them) could bias the results. The news article unfortunately doesn't link to any research report.

Null and Vetoed: "Chance Coincidence"?

Philip Stark sent along this set of calculations on the probability that the hidden message in Gov. Schwartzenegger's message could've occurred by chance. The message, if you haven't heard, is:

Med School Interview Questions

The questions are no big deal, but what I find interesting is that medical school do personal interviews at all. No place where I've ever worked has interviewed grad school applicants. It's hard for me to see what you get from it, that it would be worth the cost. I guess there must be quite a bit of psychology literature on this question.

Constructing informative priors

Christiaan de Leeuw writes:

I write to you with a question about the construction of informative priors in Bayesian analysis. Since most Bayesians at the statistics department here are more of the 'Objective' Bayes persuasion, I wanted some outside opinions as well.

Jay Kaufman writes:

I received the following email:

Hello, my name is Lauren Schmidt, and I recently graduated from the Brain & Cognitive Sciences graduate program at MIT, where I spent a lot of time doing online research using human subjects. I also spent a lot of time being frustrated with the limitations of various existing online research tools. So now I am co-founding a start-up, HeadLamp Research, with the goal of making online experimental design and data collection as fast, easy, powerful, and painless as can be. But we need your help to come up with an online research tool that is as useful as possible!

We have a short survey (5-10 min) on your research practices and needs, and we would really appreciate your input if you are interested in online data collection.

I imagine they're planning to make money off this start-up and so I think it would be only fair if they pay their survey participants. Perhaps they can give them a share of the profits, if any exist?

Guilherme Rocha writes:

The new blog

Here. Official opening is Monday but youall get to see it earlier.

Matt Stephenson writes:

Monday 2 Nov, 5-6:30pm at the Methodology Institute, LSE. No link to the seminar on the webpage, so I'll give you the information here:

Why we (usually) don't worry about multiple comparisons

Applied researchers often find themselves making statistical inferences in settings that would seem to require multiple comparisons adjustments. We challenge the Type I error paradigm that underlies these corrections. Moreover we posit that the problem of multiple comparisons can disappear entirely when viewed from a hierarchical Bayesian perspective. We propose building multilevel models in the settings where multiple comparisons arise.

Multilevel models perform partial pooling (shifting estimates toward each other), whereas classical procedures typically keep the centers of intervals stationary, adjusting for multiple comparisons by making the intervals wider (or, equivalently, adjusting the $p$-values corresponding to intervals of fixed width). Thus, multilevel models address the multiple comparisons problem and also yield more efficient estimates, especially in settings with low group-level variation, which is where multiple comparisons are a particular concern.

This work is joint with Jennifer Hill and Masanao Yajima.

(Here's a video version of a related talk that I gave at a meeting on statistics and neuroscience.)

P.S. My talk briefly touches upon some work done by a researcher at the London School of Economics!

P.P.S. I'm speaking at LSE on Tuesday also (on a different topic).

P.P.P.S. I'll be speaking again a couple times in London later in the academic year, but on other topics. All my talks there will be different.

Tuesday 3 Nov, 4-5:30pm in Room R505, Department of Government, LSE.

Culture wars, voting and polarization: divisions and unities in modern American politics

On the night of the 2000 presidential election, Americans sat riveted in front of their televisions as polling results divided the nation's map into red and blue states. Since then the color divide has become a symbol of a culture war that thrives on stereotypes--pickup-driving red-state Republicans who vote based on God, guns, and gays; and elitist, latte-sipping blue-state Democrats who are woefully out of touch with heartland values. But how does this fit into other ideas about America being divided between the haves and the have-nots? Is political polarization real, or is the real concern the perception of polarization?

This work is joint with David Park, Boris Shor, Joseph Bafumi, Jeronimo Cortina, and Delia Baldassarri.

(Here's a video version of the talk, from when I gave it at Google.)

I'll be interested to see if people can explain to me the relevance (or lack thereof) of this work to politics in Britain and other countries.

P.S. I'm speaking at LSE on Monday also (on a different topic).

P.P.S. I'll be speaking again a couple times in London later in the academic year, but on other topics. All my talks there will be different.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48