Results matching “R”

A Glimpse of Our Future

Jeff pointed me to this graph from congressmember Paul Ryan:

paul ryan budget.gif

Ryan is actually being generous to the Democrats here. You can't imagine how things are going to look around 2150 or so!

Graphs are gimmicks, substituting fancy displays for careful analysis and rigorous reasoning. It's basically a tradeoff: the snazzier your display, the more you can get away with a crappy underlying analysis. Conversely, a good analysis doesn't need a fancy graph to sell itself. The best quantitative research has an underlying clarity and a substantive importance whose results are best presented in a sober, serious tabular display. And the best quantitative researchers trust their peers enough to present their estimates and standard errors directly, with no tricks, for all to see and evaluate. Let's leave the dot plots, pie charts, moving zip charts, and all the rest to the folks in the marketing department and the art directors of Newsweek and USA Today. Here at this blog we're doing actual research and we want to see, and present, the hard numbers.

To get a sense of what's at stake here, consider two sorts of analyses. At one extreme are controlled experiments with clean estimate and p-value, and a well-specified regressions with robust standard errors, where the p-values really mean something. At the other extreme are descriptive data summaries--often augmented with models such as multilevel regressions chock full of probability distributions that aren't actually justified by any randomization, either in treatment assignment or data collection--with displays of all sorts of cross-classified model estimates. The problem with this latter analysis is not really the modeling--if you state your assumptions carefully, models are fine--but the display of all sorts of numbers and comparisons that no way are statistically significant.

For example, suppose a research article with a graph showing three lines with different slopes. It's natural for the reader to assume, if such a graph is featured prominently in the article, that the three slopes are statistically significantly different from each other. But what if no p-value is given? Worse, what there are no point estimates are no standard errors to be found? Let alone the sort of multiple comparisons correction that might be needed, considering all the graphs that might have been displayed? Now, I'm not implying any scientific misconduct here--and, to keep personalities out of this, I've refrained from linking to the article that I'm thinking about here--but it's sloppy at best and statistical malpractice at worst to foreground a comparison that has been presented with no rigorous--or even approximately rigorous--measure of uncertainty. And, no, it's not an excuse that the researchers actually "believe" their claim. Sincerity is no defense, There's a reason our forefathers developed p-values and all the rest, and let's remember those reasons.

The positive case for tables

So far I've explained my aversion to graphs as an adornment to, or really a substitute for, scientific research. I've been bothered for a while by the trend of graphical displays in journal articles, but only in writing this piece right here have I realized the real problem, which is not so much that graphs are imprecise, or hard to read, or even that they encourage us to evaluate research by its "production values" (as embodied in fancy models in graphs) rather than its fundamental contributions, but rather that graphs are inherently a way of implying results that are often not statistically significant. (And all but the simplest graphs allow so many different visual comparisons, that even if certain main effects actually do past the p-value test, many many more inevitably won't. Some techniques have been developed to display multiple-comparisons-corrected uncertainty bounds, but these are rarely included in graphs for the understandable reason that they magnify visual clutter.)

But enough about graphs. Now I'd like to talk a bit about why tables are not merely a necessary evil but are actually a positive good.

A table lays down your results, unadorned, for the readers--and, most importantly, scientific peers--to judge. Good tables often have lots of numbers. That's fine--different readers may be interested in different things. A table is not meant to be read as a narrative, so don't obsess about clarity. It's much more important to put in the exact numbers, as these represent the most important summary of your results, estimated local average treatment effects and all the rest.

It's also helpful in a table to have a minimum of four significant digits. A good choice is often to use the default provided by whatever software you have used to fit the model. Software designers have chosen their defaults for a good reason, and I'd go with that. Unncessary rounding is risky; who knows what information might be lost in the foolish pursuit of a "clean"-looking table?

There is also the question of what words should be used for the rows and columns of the table. In tables of regressions, most of the rows represent independent variables. Here, I recommend using the variable names provided by the computer program, which are typically abbreviations in all caps. Using these abbreviations gets the reader a little closer to your actual analysis and also has the benefit that, if he or she wants to replicate your study with the same dataset, it will be clearer how to do it. In addition, using these raw variable names makes it more clear that you didn't do anything shifty such as transforming or combining your variables before putting them in your regression.

We'd do well to take a lead from our most prominent social science colleagues--the economists--who have, by and large, held the line on graphics and have insisted on tabular presentations of results in their journals. One advantage of these norms is that, when you read an econ paper, you can find the numbers that you want; the authors of these articles are laying it on the line and giving you their betas. Beyond this, the standardization is a benefit in itself: a patterned way of presenting results allows the expert readers--who, after all, represent the most important audience for journal articles--to find and evaluate the key results in an article without having to figure out new sorts of displays. Less form, more content: that's what tables are all about. If you've found something great and you want to share it with the world, sure, make a pretty graph and put it on a blog. But please, please, keep these abominations out of our scientific journals.

B-b-b-but . . .

Yes, you might reply, sure, graphics are manipulative tricks and tables are the best. But doesn't the ambitious researcher need to make graphs, just to keep up with everybody else, just to get his or her research noticed? It's the peacock's tail all over again--I don't want to waste my precious time making fancy 3-D color bar charts, but if I don't, my work will get lost in the nation's collective in-box.

To this I say, No! Stand firm! Don't bend your principles for short-term gain. We're all in this together and we all have to be strong, to resist the transformation of serious social science into a set of statistical bells and whistles. Everything up to and including ordered logistic regression is OK, and it's fine--nay, mandatory--to use heteroscedasticity-consistent standard errors. But No to inapporpriate models and No to graphical displays that imply research findings where none necessarily exist.

Beyond this, even in the short term I think there are some gains from going graph-free.
And, finally, the time you save not agonizing over details of graphs can be instead be used to think more seriously about your research. Undoubtedly there's a time substitution: effort spent tinkering with graphs (or, for that matter, blogging) is effort not spent collecting data, working out models, proving theorems, and all the rest. If you must make a graph, try only to graph unadorned raw data, so that you're not implying you have anything you don't. And I recommend using Excel, which has some really nice defaults as well as options such as those 3-D colored bar charts. If you're gonna have a graph, you might as well make it pretty. I recommend a separate color for each bar--and if you want to throw in a line as well, use a separate y-axis on the right side of the graph.

Damn. It felt good to get that off my chest. I hope nobody reads this, though, or I might be able to fool people with graphs much longer.

The U.S. Census has tons of data, but sometimes it can be hard to get at what you want. John Transue writes, regarding the goal of getting racial composition of U.S. counties:

This Firefox extension is incredibly useful for grabbing data from many links on the same page [convenient if you want to get files for all 50 states at once].

The data came from this Census page.

Yair used this page.

Kobi pointed me to this news article that discusses this research article by Richard Vining, Amy Steigerwalt, and Susan Navarro Smeicer which claims that the American Bar Association has a liberal bias in its evaluation of Supreme Court nominees. They write:

Manoel Galdino pointed me to a discussion on the Polmeth list on the topic of reporting p-values and regression coefficients. (The polmeth listserv doesn't seem to have a way to link to threads, but if you go here for March 2009 you can scroll down to the posts on "Displaying regression coefficients.") I don't want to go on and on about this, but in the interest of advancing the ball forward just a bit, here are a few thoughts:

From a discussion of Richard Ford and John Updike:

I can't think of a single good title among all of Updike's stories and novels. OK, I guess The Witches of Eastwick isn't a bad title. But that's about it. Nothing in the oeuvre to match the title, The Sportswriter.

Philip K. Dick was another writer who couldn't come up with a good title to save his life. Hemingway, though--he knew how to write a title. Over and over again, he came up with winners. It's a real skill.

It was only years after publishing Teaching Statistics: A Bag of Tricks that I realized I should've called it Learning Statistics: A Bag of Tricks. Maybe it was years after writing Rabbit, Run, that Updike realized he should've called it Anhedonia or whatever. (And the sequels: Rabbit Redux and the rest . . . great novels, but awful, awful titles. What was he thinking??)

Raymond Carver's titles are good, but that impresses me less since I'm not so impressed with his stories. George V. Higgins's titles were OK--not bad, mostly not great--but his novels had some classic last lines. He really knew how to sum it up, often with a character making a devastating offhand remark.

After writing this, I scanned my bookshelves. Most of the books on the shelf have good titles. Apparently it's just not that hard to do. Looked at from that perspective, there's almost something heroic in Updike's inability (or perhaps unwillingness) to come up with more than one or two good titles among dozens of books and hundreds (probably thousands) of stories. and articles.

P.S. Gore Vidal is another great writer who can't seem to come up with a good title to save his life. Cheever, on the other hand, could really whip 'em off: The Swimmer, The Housebreaker of Shady Hill, and all the rest.

Alex Frankel sent in this:

A professor at Oxford University and his team have perfected a model whereby they can calculate whether the relationship will succeed. In a study of 700 couples, Professor James Murray, a maths expert, predicted the divorce rate with 94 per cent accuracy. His calculations were based on 15-minute conversations between couples who were asked to sit opposite each other in a room on their own and talk . . . Professor Murray and his colleagues recorded the conversations and awarded each husband and wife positive or negative points depending on what was said. Partners who showed affection, humour or happiness as they talked were given the maximum points, while those who displayed contempt or belligerence received the minimum. . . .

I looked up James Murray and couldn't find any article describing these results; 94% accuracy sounds pretty good to me, but it's difficult to make any comment based only on news reports. It appears, though, that Murray's main home is the University of Washington, not Oxford--at least, there seems to be a lot more info on Murray at UW than at Oxford--and he's cowritten a book on The Mathematics of Marriage, so this isn't a new area for him.

There must be a bit of a discussion of this sort of thing in the clinical psychology literature? Perhaps this would be a good topic for teaching logistic regression forecasting, better than our usual boring examples.

One thing about the news report puzzled me, though; at the end, it says:

The forecast of who would get divorced in his study of 700 couples over 12 years was 100 per cent correct, he said. But "what reduced the accuracy of our predictions was those couples who we thought would stay married and unhappy actually ended up getting divorced".

Huh?? If the accuracy was 100%, then what does he mean by "what reduced the accuracy of our predictions"? Were they hoping for 110%?

The other day I mentioned this article by Lionel Page that found a momentum effect in tennis matches; more specifically: "winning the first set has a significant and strong effect on the result of the second set. A player who wins a close first set tie break will, on average, win one game more in the second set."

tennis.png

I'd display these data with a heat map rather than with overplotted points, but you get the idea.

This looked reasonable to me, but Guy Molyeneux sent in some skeptical comments, which I'll give, followed by Page's response. Molyeneux writes:

GPU Supercomputers

An example of a computer cluster

Image via Wikipedia

Computer games have been the driver behind large advances in 3D graphics hardware over the past decade. It turned out that the hardware developed for rotating, projecting and rendering many triangles can also be used for other purposes, and this is the notion of "general-purpose computing on graphics processing units" or just GPGPU.

The research and development community's center is gpgpu.org. There are several environments for development of software on such hardware, CUDA, brook, OpenCL, libsh, CTM.

I have looked through their list of CUDA example applications, but couldn't find any statistical applications. Some related ones in machine learning and Markov chains claim 50-fold speedups over conventional PC architectures, without the complexity of running a whole cluster of computers. Now, computation of likelihood and MCMC are inherently extremely paralelizable, and such hardware could make it easier to fit sophisticated models. This would be a good topic for a computationally minded PhD thesis.

Main effects and interactions

We all know to look at main effects first and then look for interactions. But a former student pointed me to some disturbing advice from some statistics textbooks. I'll give his quotes and then my reactions:

Self-experimentation

Jimmy sent this along:

Still, Mr. Perry wondered whether caffeine would help him. When he retired from rowing last July, he decided to do a randomized, blinded, placebo-controlled experiment on himself.

Commenter BorisG asked why I made my pretty maps using pre-election polls rather than exit polls. I responded in the comments but then I did a search and noticed that Kos had a longer discussion of this point on his blog. So I thought maybe it would help to discuss this all further here.

To start with, I appreciate the careful scrutiny. One of my pet peeves is people assuming a number or graph is correct, just because it has been asserted. BorisG and Kos and others are doing a useful service by subjecting my maps to criticism.

Several issues are involved:

- Data availability;
- Problems with pre-election polls;
- Problems with election polls;
- Differences between raw data and my best model-assisted estimates;
- Thresholding at 50%;
- Small sample sizes in some states.

I'll discuss each of these issues in turn.

JAMA Editors Go Nuts

This is pretty funny.

Jose Aleman points me to this conference on 18-19 June at Fordham University:

This conference is about applications of the R software and Graphics system to important policy and research problems, not about R per se. It provides an excellent opportunity to bring together researchers from various disciplines using R in their reproducible research work. We hope to provide practical help to students and researchers alike.

It says here that I'm an invited speaker. I don't actually remember being asked to do so, but if I did, then I guess I'll be there! Fordham is conveniently located near the zoo so perhaps I can somehow combine this with a family trip.

P.S. I checked more carefully and the conference is actually at the Manhattan branch of Fordham (at Lincoln Center). So no zoo trip, unfortunately!

Aleks pointed me to this dead-serious tutorial from www.usa.gov. Among the amusing bits:

"Blogs require talented writers, as blogs are just another form of writing. You can't have a good blog without a good writer, with knowledgeable opinions or information."
"How often will it be updated? The latest best practice shows that when a blog is first posted, it should be updated every day for the first 30 days (to establish a consistent relationship with the search engines). After the initial 30 days, it should be updated at least 2-3 times a week to stay high in the rankings."
"Avoid slang and arcane terms, unless you define them."
"Never use "click here" or similar terms."
"Read your link aloud--is it easy to enunciate?"

And, my favorite:

"Choose words that have as few syllables as possible."

On the upside, I learned that Montgomery County, MD, Division of Solid Waste has a blog titled "Talkin' Trash." Quite a bit snappier than "Statistical modeling, causal inference, and social science," I gotta say.

P.S. To be serious for a moment, I think they could've replaced most of their guidelines by a single bit of writing advice I once heard:

Tell 'em what they don't already know.

Sharad's blog

Sharad Goel is a brilliant guy who works at Yahoo with Duncan Watts and just started a blog on statistical topics. It's great so far and I'm sure will continue to be so.

How did white people vote?

I posted the maps at 538.

And here's what we did:

Scrabble rants

According to Carl Bialik, "za," "qi," and "zzz" were added recently to the list of official Scrabble words. I'm not so bothered by "zzz"--if somebody has two blanks to blow on this one, go for it!--but "za" and "qi"??? I don't even like "cee," let alone "qat," "xu," and other abominations. (I'm also not a big fan of "aw.")

Without further ado, here are my suggestions for reforming Scrabble.

1. Change one of the I's to an O. We've all had the unpleasant experience of having too many I's in our rack. What's the point?

2. Change one of the L's to an H. And change them both to 2-point letters. The H is ridiculously overvalued.

3. V is horrible. Change one of them to an N and let the remaining V be worth 6 points.

4. Regarding Q: Personally, I'd go the Boggle way and have a Qu tile. But I respect that Scrabble traditionalists enjoy the whole hide-the-Q game, so for them I guess I'd have to keep the Q as is.

5. Get rid of a bunch of non-English words such as qat, xu, jo, etc. Beyond this, for friendly games, adopt the Ubs rule, under which, if others aren't familiar with a word you just played, you (a) have to define it, and (b) can't use it this time--but it becomes legal in the future.

6. This brings me to challenges. When I was a kid we'd have huge fights over challenges because of their negative-sum nature: when player A challenges player B, one of them will lose his or her turn. At some point we switched to the mellower rule that, if you're challenged and the word isn't in the dictionary, you get another try--but you have to put your new word down immediately, you get no additional time to think. And if you challenge and you are wrong, you don't lose your turn. (We could've made this symmetric by saying that the challenger would have to play immediately when his or her turn came up--that seems like a reasonable rule to me--but we didn't actually go so far, as challenges were always pretty rare.)

Regarding points 1, 2, and 3 above: I know that traditionalists will say that all these bugs are actually features, that a good Scrabble player will know how to handle a surplus of I's or deal with a V. I disagree. There's enough challenge in trying to make good words without artificially making some of the rare letters too common. I mean, if you really believed that it's a good thing that there are two V's worth only 4 points each, why not go whole hog and get rid of a bunch of E's, T's, A's, N's, and R's, and replace them with B's and C's and suchlike?

P.S. Also interesting is this chart showing the frequencies of letters from several different corpuses. I'm not surprised that, for example, the frequency of letters from a dictionary is different from that of spoken words, but I was struck by the differences in letter frequencies comparing different modern written sources. For example, E represents 12.4% of all letters from a corpus of newspapers, whereas it is only 11.2% in corpuses of fiction and magazines. I wonder how much of this is explained by "the."

Life Expectancy at birth (years) {{col-begin}}...

Image via Wikipedia

Johannes pointed me to FindMyWorth, a website that provides another formula for monetary value of a human life, this one conditioned on income, spending, financial growth rate, rate of return, life expectancy and quality of life. If you live in Qatar, you're worth the most, almost $6M:

quatar.png

While one could argue a lot about the formula, the author Zeeshan-ul-hassan Usmani has made a good example of how to properly publish a working paper in this age: not just that he has the paper, he has an interactive demonstration, graphs, data, and a 30-second "executive" summary of the methodology for all of us with attention deficit disorders. He could have a comment section, but that's the way to go!

$88 (or $110 list)

Why I don't (usually) publish with Wiley. I want to get it, though.

Whaddya think of that, Matt?

Visualizing correlation matrices

See here. It's an important issue, but their plot has two huge problems:

1. The big fat circles in the diagonal axis are conveying no information and are, to my eye, a distraction.

2. They forgot to to order the variables, as a result creating a confusing pattern. Try reordering to put the highly-correlated variables together (as Tian did for Figure 8 in our article).

They also gave the variables unreadable abbreviations. This is not specifically an error with the correlation plot but it's a common mistake that can easily be avoided.

P.S. More here from Eduardo and John.

MeTube

My red-blue talk at Google:

If you have any good questions that they forgot to ask in Mountain View, feel free to post in comments. And here's more info on the book.

Atlantic causal conference

Dylan Small writes:

We will be holding the next edition of the Atlantic Causal Conference on May 20-21 at Penn. Hope to see you at the conference in May.

It looks great! We actually organized the very first one of these conferences here at Columbia (see also here for a brief report), and I'm pleased to see it's going stronger than ever.

Whiteboard update

Jeronimo writes:

I have been using small whiteboards in my research methods class to have the students work in pairs and it has been a huge success.

I asked, "How large are the whiteboards? And why do you use these rather than simply having them work in their notebooks?" and he responded:

The whiteboards are about 8x11. I like the boards because it changes the dynamic of the class. It introduces the sense of doing something different and also they can erase everything and start all over again. And I guess we don't waste a lot of paper.

I'll try it for the next course I teach.

P.S. As Seth might say, how come I have no problem with anecdotal evidence in education--the area in which I actually work--but when it comes to medicine and public health I focus on potential selection biases, insist on randomized trials, etc. In my defense, I'd point out that there has been some education research showing the benefits of working in pairs, peer instruction, and so forth--thus the "whiteboard for each pair of students" idea makes sense. But, then again, medical interventions typically make sense, whether or not they work (recall The Doctor's Dilemma).

Economist-centrism

Steven Levitt writes of Time Magazine's list of the 100 people who "shape our world," that one year they included him but that, in his opinion, "Economists have not figured very prominently on the previous lists; there has been roughly one economist in the top 100 per year."

One per hundred seems pretty good to me, considering that economists represent only 0.1% of the employed population in the United States!

I guess the real moral of the story is that, whatever people have, they will consider it as a baseline and then want more.

P.S. Of course I'm happy that Nate is ranked in the top 200, but, no, he's not an economist. He's a sabermetrician, or, if you want to use a more general term, "statistician." If you call someone an economist just because he majored in economics in college, then I'm a physicist.

The airport and the supermarket

At the airport they have different terminals for different airlines, with flights leaving from all over the place. Why not have a simpler system, where all the flights to Chicago leave from one section of the airport, all the flights to L.A. leave from another section, and so forth? Then you could buy a "ticket to Chicago"--no airline specified--and then just go to the gate and get on the next flight to the Windy City.

The analogy is the supermarket, where products are organized by what they are, not who manufactures them. If the supermarket were like the airport, they'd have all the Proctor & Gamble products in the same place, and so forth. Or imagine a bookstore where the books were arranged by publisher and you had to look at the Random House books, then the Knopf books. etc. That's what it's like going to the airport, with the extra thrill of having occasional flight delays.

One could argue that flying waste so much fuel that anything that makes air travel more of a hassle is a good idea, and maybe that's true. If so, it's the only argument I know in favor of the current system.

Strangled by data?

Google in 1998

Image via Wikipedia

A frustrated ex-Googler writes:
Yes, it's true that a team at Google couldn't decide between two blues, so they're testing 41 shades between each blue to see which one performs better. I had a recent debate over whether a border should be 3, 4 or 5 pixels wide, and was asked to prove my case. I can't operate in an environment like that. I've grown tired of debating such miniscule design decisions. There are more exciting design problems in this world to tackle.

So, Google observes people and their clicks to determine the color or line thickness. When your software phones back every time it is used, it's like having a microphone or camera in a car that detects every mistake, or that measures the response time.

It is easy to optimize the line thickness, but it's more difficult to optimize the overall design of the study. When your working day has 16 hours, and you spend 15 of them on optimization, there is not much time left for new designs.

The analogy carries over to statistical practice: your model is only as good as the data you're using. And the data, while plentiful and accurate, might be preventing you from solving the problem, looking for keys under the lamp post. Methodology can often be just as constraining as the data.

Over the past few decades, most policy programs were focused on remediation based on easily measured demographic variables, such as age, gender, income, race, education, ideology, ability - at the expense of variables that are harder to model and measure, such as honor, talent, potential, trustworthiness, motivation.

Following my skeptical discussion of their article on the probability of a college basketball team winning after ahead or behind by one point at halftime, Jonah Berger and Devin Pope sent me a long and polite email (with graph attached!) defending their analysis. I'll put it all here, followed by my response. I'm still skeptical on some details, but I think that some of the confusion can be dispelled with a minor writing change, where they make clear that their 6.6% estimate is a comparison to a model.

Berger and Pope's first point was a general discussion about their methods:

I love stories and for a long time have wanted to put together a little book of my favorite statistics stories. I know this is not something that would ever reach David Sedaris levels of popularity (to say the least) but at least it would give me some good material to use at the beginning of class or for other times when I want to engage students in a way that's not too taxing for them. (In the meantime, I recommend that all of you who teach statistics or methods classes begin each of your classes, while the students are walking in, with a 5-minute discussion of whatever the latest items are on this blog.)

Anyway, I have a new story right here for ya.

John Shonder pointed me to this discussion by Justin Wolfers of this article by Jonah Berger and Devin Pope, who write:

In general, the further individuals, groups, and teams are ahead of their opponents in competition, the more likely they are to win. However, we show that through increasing motivation, being slightly behind can actually increase success. Analysis of over 6,000 collegiate basketball games illustrates that being slightly behind increases a team's chance of winning. Teams behind by a point at halftime, for example, actually win more often than teams ahead by one. This increase is between 5.5 and 7.7 percentage points . . .

This is an interesting thing to look at, but I think they're wrong. To explain, I'll start with their data, which are 6572 NCAA basketball games where the score differential at halftime is within 10 points. Of the subset of these games with one-point gaps at halftime, the team that's behind won 51.3% of the time. To get a standard error on this, I need to know the number of such games; let me approximate this by 6572/10=657. The s.e. is then .5/sqrt(657)=0.02. So the simple empirical estimate with +/- 1 standard error bounds is [.513 +/- .02], or [.49, .53]. Hardly conclusive evidence!

Given this tiny difference of less than 1 standard error, how could they claim that "being slightly behind increases a team's chance of winning . . . by between 5.5 and 7.7 percentage points"?? The point estimate looks too large (6.6 percentage points rather than 1.3) and the standard error looks too small.

What went wrong? A clue is provided by this picture:

Halfscore.jpg

As some of Wolfers's commenters pointed out, this graph is slightly misleading because all the data points on the right side are reflected on the left. The real problem, though, is that what Berger and Pope did is to fit a curve to the points on the right half of the graph, extend this curve to 0, and then count that as the effect of being slightly behind.

This is wrong for a couple of reasons.

First, scores are discrete, so even if their curve were correct, it would be misleading to say that being behind increases your chance of winning by 6.6 points. Being behind takes you from a differential of 0 (50% chance of winning, the way they set up the data) to 51% (+/- 2%). Even taking the numbers at face value, you're talking 1%, not their claimed 5% or more.

Second, their analysis is extremely sensitive to their model. Looking at the picture above--again, focusing on the right half of the graph--I would think it would make more sense to draw the regression line a bit above the point at 1. That would be natural but it doesn't happen here because (a) their model doesn't even try to be consistent with the point at 0, and (b) they do some ridiculous overfitting with a 5th-degree polynomial. Don't even get me started on this sort of thing.

What would I do?

I'd probably start with a plot similar to their graph above, but coding score differential consistently as "home team score minus visiting team score." Then each data point would represent different games, they could fit a line and see what they get. And I'd fit linear functions (on the logit scale), not 5th-degree polynomials. And I'd get more data! The big issue, though, is that we're talking about maybe a 1% effect, not a 7% effect, which makes the whole thing a bit less exciting.

P.S. It's cool that Berger and Pope tried to do this analysis. I also appreciate that they attempted to combine sports data with a psychological experiment, in the spirit of the (justly) celebrated hot-hand paper. I like that they cited Hal Stern. And, even discounting their exaggerated inferences, it's perhaps interesting that teams up by 1% at halftime don't do better. This is just what happens when studies get publicized before peer review. Or, to put it another way, the peer review is happening right now! I've put enough first-draft mistakes on my own blogs that I can't hold it against others when they do the same.

P.P.S. Update here.

Support of the Null Hypothesis

Timothy Teräväinen pointed to an interesting journal, the Journal of Articles in Support of the Null Hypothesis:

In the past other journals and reviewers have exhibited a bias against articles that did not reject the null hypothesis. We seek to change that by offering an outlet for experiments that do not reach the traditional significance levels (p < .05). Thus, reducing the file drawer problem, and reducing the bias in psychological literature. Without such a resource researchers could be wasting their time examining empirical questions that have already been examined. We collect these articles and provide them to the scientific community free of cost.

I've three comments.

Branding: Perhaps more people would understand what this is about if the journal was titled, say, "Status Quo" or "Nothing new under the Sun".

Topic or theme: Only statisticians would be instinctively attracted to a standalone topic like this. JASNH would work better as a subtopic (or a folksonomic "tag") of every academic discipline, or a section of any journal. At the same time, it's good to keep all such articles in one place.

Format: I am not sure it's worth writing a whole article about a negative result. Instead of articles, some sort of a shorter write-up would be more efficient - people might not want to spend too much time elaborating on the support of status quo, but other researchers would benefit from knowing what is unlikely to work.

Some statistical analysis says yes:

The HAI [Hiring Activity Index] is essentially a measure of how actively our [Criteria Corp's] customers (made up mostly of SMBs of between 10 and 500 employees) are administering pre-employment tests through our system (and presumably, therefore, hiring) . . . the HAI is the percentage of our customers who are actively hiring (administering tests) in a given month. From January 2008 (when we began tracking the HAI) to October 2008 the HAI remained very steady, within a few points of 65%. (If this seems low, consider that even in the best of times many 30 or 40 person companies will not be hiring every month.)

But as the financial markets plummeted and the unemployment rate surged in November, the HAI sunk about ten points, and by January reached its lowest level since we started tracking it, 53.28%. . . . So I [Josh Millet] was very pleasantly surprised to see a fairly strong uptick in the HAI in February, to 61.41%. It is only one data point, to be sure, but it suggests that for SMBs the hiring picture improved somewhat in February. Could it be an upwards blip in a downward trend? Of course, but the eight point jump in the HAI is the biggest we've seen since we started tracking the index. For those, like me [Millet], inclined to think that the current recession, although brutal and severe, will not be as long-lasting as some suppose, the February HAI reading is cause for hope. . . . Small and medium-sized businesses did not lead us into this recession, but they may just lead us out of it--and don't look now, but it may have already started.

I couldn't resist taking the horrible table that was posted and making a simple graph:

criteria.png

I assume they've done some simple checks with the data and made sure that this isn't some computer glitch, for example a problem with the software causing a bunch of these things to be counted twice, or some change in the calculation or the population of users so that the denominator suddenly changed?

I won't even try attempt to evaluate this--as I never tire of reminding people, my last econ class was in 11th grade--I'm just throwing this out there, first as an interesting example of a Freakonomics-style index and second as potentially important economic news. Again, I'll leave it to others to judge this.

It could be an interesting and important project (an econ M.A. thesis?) for someone to put together a whole bunch of this sort of measure to get some sort of aggregate that could be useful in monitoring aspects of the economy not captured by traditional statistics.

What is Russia's GDP per capita?

$7,600 (World Bank 2007)

$9,100 (World Bank 2007)

$14,700 (PPP adjusted, World Bank 2007)

$4,500 (World Bank 2006)

$7600 or $14,400 (gross national income: "Atlas method" or "purchasing power parity," World Bank 2007)

$12,600 (IMF 2008), $9,100 (World Bank 2007), or $12,500 (CIA 2008)

$2,637 in 2000 US dollars (World Bank 2007); that's $3,200 in 2007 dollars

$2,621 (World Bank 2006) or $8,600 (IMF)

Sure, I realize these statistics cannot be calculated exactly, and, sure, I realize there are definitional issues within a country and choices to be made when converting to other currencies. Still, there's a lot of variation here!

At the very least, this is a good example for a statistics, economics, or political science class to illustrate the difficulties of measurement.

P.S. See here (scroll down to item 3) for why we've been looking this up.

Corrected age and voting graph

newfigure11.png

(A commenter pointed out a mistake in my earlier version.)

Basketball bracket tips

I got this bit of spam in the email but it's actually sort of cool, would be an excellent topic for discussion in an intro stat class or a Bayesian class:

MEDIA ALERT: NCAA COLLEGE BASKETBALL TOURNAMENT - MARCH MADNESS NCAA College Basketball Tournament Bracket-Picking Tips. RJ Bell of Pregame.com, the top Las Vegas based sports betting authority, provides a simple blueprint to improve anyone's bracket results.

In the context of a discussion of rich and poor voters in the U.S. and other countries, Matthew Yglesias posted this graph from our Red State, Blue State book:

fig7.4.png

The commenters raised several issues that I'd like to clarify here. (In particular, it looks like we miscoded some of the GDP per capita numbers, which doesn't affect our conclusions but is a bit embarrassing.)

1. The meaning of the graph

Availability bias in action

Phil went on vacation to Panama (among other places). I said, Panama? Who goes to Panama? Phil said, What do you mean, who goes to Panama? I said, people go to Costa Rica, they go to Guatemala, who goes to Panama?

Phil replied:

According to http://www.thinkpanama.com/panama-weekly/category/panama-tourism and http://www.travelime.com/news/533/ the number of tourists that visited Panama last year was almost exactly the same as the number that visited Guatemala, 1.6M in each case.

OK.

My California trip

Monday UC Irvine: Weakly Informative Priors

Tuesday Caltech: Red State, Blue State

Wednesday Google: Red State, Blue State

Wednesday Stanford: I'm not sure yet

Thursday Berkeley: Red State, Blue State

Friday Berkeley: Weakly Informative Priors

If you're at any of these places, feel free to come and ask your toughest questions!

It looks a little silly that it's the same two talks over and over, but of course the audiences will all be different. Maybe I'll vary them a bit, just to keep things interesting. Also I'm giving a few more lectures at Berkeley for some sort of training program at the education school, but I don't think these are open to general audiences.

If you want to see the slides, the current versions are here and here. (But I think I'll work a bit on the Red State, Blue State presentation.) And if you want to see slides for a bunch of other talks, go here.

Gas tax and rebate

Ian Ayres suggests a gas tax that would start off with a rebate:

The government would offer a $500 advance tax rebate each year for every car you choose to sign up for the tax. In return, you would commit to pay an extra $1 for each gallon of gas you buy.

For obvious reasons, I like this idea--I'd like to get that extra $500. And since the government is giving out stimulus money anyway, now's the time to try it!

But I'm puzzled by their suggested implementation:

The actual tax paid would be based on miles driven and fuel economy. Thus a Chevy Impala rated at 19 m.p.g. would be charged $5.26 each 100 miles, while a Prius rated at 46 m.p.g. would be charged $2.17 per 100 miles.

Wouldn't it be simpler to just charge $1 per gallon of gas (with people who didn't get the rebate getting some sort of sticker exempting them from the tax)? Why have a complicated system based on miles per gallon when you can simply tax the gas itself?

In any case, I get Ayres's main point which is that this rebate system is more of a way to make things psychologically palatable to people than to be a realistic policy suggestion.

Perhaps another way to go on this would be to follow the "you polluted, you clean it up" policy, by which the tax is more directly tied to the cost of keeping the roads going, securing the supply of oil, cleaning the air, retrofitting coal plants to pollute less, etc. Maybe people would be less unhappy paying a higher gas tax if it were clearly going to maintaining the transport system and cleaning up the pollution it creates?

Chris Wiggins points us to this announcement for a conference next year:

Simulation has greatly advanced climate science, but not sufficiently to the profit of theory and understanding. How can simulation better advance climate science and what mathematical issues does this raise? Our hypothesis is that the development of climate science (i.e., theory and understanding) will be best served by focusing computational and intellectual resources on model and data hierarchies. By bringing together physicists, mathematicians, statisticians, engineers and climate-scientists, and focusing on several themes that reach across scales and scientific methodologies, our program will provide a framework for advancing our use of hierarchical methods in our attempt to understand the climate system.

There will be an active program of research activities, seminars and workshops throughout the March 8 - June 11, 2010 period and core participants will be in residence at IPAM for fourteen weeks. The program will open with tutorials, and will be punctuated by four major workshops and a culminating workshop.

This all makes sense to me, although, given the topic, I'm surprised that no statisticians seem to be involved. Lots of potential for interesting models and graphs.

Bed, Bath, and . . . huh??

John and a whole bunch of commenters discuss this weird article by Matt Bai, who defends political journalism (as compared to political science) by saying:

My dinnertime conversation with three Iowans may not add up to a reliable portrait of the national consensus, but it's often more illuminating than the dissertations of academics whose idea of seeing America is a trip to the local Bed, Bath & Beyond.

Would it be ok if the local Bed, Bath & Beyond were in Iowa? Would Nebraska be ok or is that not so helpful, Nebraska not being an early caucus state? Or is the key difference that Bai's conversation is over dinner rather than in a shopping center?

Setting aside all other aspects of this discussion, my question is: What kind of political scientist studies America via a trip to the local Bed, Bath & Beyond??? Not any political scientist that I've ever heard of.

(Rant continues here.)

P.S. More here.

The Sumerian god Ningizzida was the patron of ...

Image via Wikipedia

Groopman and Hartzband wrote an opinion piece arguing against electronic medical records. The issue is essentially analogous to a debate of the dangers of paper in comparison to the traditional clay tablets. Still, I can appreciate criticism that will make electronic records better. I will quote some of the criticism, and respond to it.


The impact of medication errors on malpractice costs is likely to be minimal, since the vast majority of lawsuits arise not from technical mistakes like incorrect prescriptions but from diagnostic errors, where the physician makes a misdiagnosis and the correct therapy is delayed or never delivered. There is no evidence that electronic medical records lower the chances of diagnostic error.

The electronic record can encourage a physician to consider all the relevant information. Now that features are known, automated prediction tools such as those developed at Memorial Sloan-Kettering Cancer Center can support a doctor in making a diagnosis. Such tools can reach considerably higher accuracy because they're based on considerably larger datasets than before. Related information and checklists can be provided. This way the joint knowledge of the whole medical community will complement an individual physicians schooling and experience.


All of us are conditioned to respect the printed word, particularly when it appears repeatedly on a hospital computer screen, and once a misdiagnosis enters into the electronic record, it is rapidly and virally propagated.[...]

But the propagation of mistakes is not restricted to misdiagnoses. Once data are keyed in, they are rarely rechecked with respect to accuracy. For example, entering a patient's weight incorrectly will result in a drug dose that is too low or too high, and the computer has no way to respond to such human error.

Most of what I see on my computer display are printed words. A computerized system based on a probabilistic view of diagnosis will make it easier to understand that a diagnosis is not a binary choice but a probabilistic one. By design, such a system will reveal other possible diagnoses. Just as a diagnosis is entered into the record, it will be possible to check it and re-check it. The design of the system should encourage multiple checks and individual responsibility of those confirming or checking.


Doctors in particular are burdened with checking off scores of boxes on the computer screen to satisfy insurance requirements, so called "pay for performance." But again, there are no compelling data to demonstrate that such voluminous documentation translates into better outcomes for their sick patients.

A statistical model can determine which boxes are more or less important, saving time that would otherwise be spent for checking off what does not matter. At the same time, a good user interface would allow doctors to enter a new box if they notice something salient.


Some have speculated that the patient data collected by the Obama administration in national electronic health records will be mined for research purposes to assess the cost effectiveness of different treatments.[...]
And Americans should decide whether they want to participate in such a national experiment only after learning about the nature of the analysis of their records and who will apply the results to their health care.

This is true, and one has to be careful here not to create mis-incentives: incorrect or biased data (biases emerge from self-selection too) that might lead to lower costs and better care for a patient, or higher costs for the doctor would dangerously pollute the models. At the same time, it is possible to detect such data fraud automatically.

In summary:


  • It is important to collect the data correctly.

  • Electronic medical records make it possible to deploy predictive models widely, improving health care. It is important to build user interfaces that make use of this.

  • There will be opportunities for centers that specialize in predictive models for specific symptoms or diseases, combining the background knowledge aggregated in medical profession over many years with the modern data collection and analysis.

[Included some information from Bob Carpenter's comment]

This news article about a controversial judge was interesting on many levels, but my favorite bit was this:

"Sharon is a hard worker," said Dan Hagood, a defense lawyer and longtime friend from Dallas who served as her campaign treasurer when she ran for election to the court in 1994. "She never complains, never explains."

"Never complain, never explain" is a reasonable motto for some, I'm sure. But for a judge??? I'd think explaining is a key part of the job.

The data appear to say yes, but more work needs to be done to make sense of it all.

Understanding well-being

From America's Health Insurance Plans:


The Gallup-Healthways Well-Being Index, a unique twenty-five year partnership in research and care, is an on-going daily survey that began in January 2008. It surveys 1,000 Americans 350 days per year.

The research and methodology underlying the Well-Being Index is based on the World Health Organization definition of health as "not only the absence of infirmity and disease, but also a state of physical, mental, and social well-being."

While I can't really say what "1000 people 350 days per year" really means, here's a nice map of the aggregate measure of well-being (if you click on it, you will get a slightly larger version):

well-being.png

It's an interesting dataset and it would be interesting to see some analysis about the factors associated with well-being. If you do it using the tables that are available from the site, post a comment, and I'll add it to the entry later on.

As for the visualization - I would have preferred a continuous color scale, rather than having it collapsed into just 5 levels. Also, the boundaries between districts only have to be drawn when the color for both districts is the same (quite rarely, if you follow the advice from the previous sentence) and when there is no other border closer than n pixels (because the boundaries are less important than the colors indicating the variable of interest).

Seth Roberts has had success with self-experimentation--among other things, he's written a successful diet book on how to lose weight by eating unflavored oil or sugar water--and on his blog he reports his latest self-experiments and their effects on him.  For example, recently he wrote about the beneficial effects of fermented food.

When Seth tries a new food, or a new lifestyle change, and finds positive effects, I'm always skeptical:  maybe he's hoping for such effects and then finding them.  But often they work for others.  For example, his correspondent Tucker Max writes:

I have been reading your posts about bacteria in food, so I decided to try it on my own. I HATE Roquefort and other stinky cheeses, and I am not about to eat fermented meat, so the best thing I could find in Whole Foods was Kombucha tea. It is basically normal tea, with bacteria cultures growing in it. Sounds weird I know, but it actually tastes pretty good. . . . [I'm giving all the details to give a sense of how weird this all sounds to an outsider. --AG]

Anyway, after a week of drinking two bottles a day, I have noticed these changes:

1.  My stool is...well, better. In every way. More regular, more solid . . . [ok, enough detail here]

2.  I have more energy. Aside from subjectively feeling it, I can see the difference in my workout logs, just in this past week I've gone up more weight on exercises than I normally do.

3.  I am feeling overall better. This could very well be placebo effect/confirmation bias as it is a very subjective measurement, but I just feel better. . . .


Sure, but maybe this could all be a confirmation bias.  The toilet stuff sounds objective, but who knows what else is happening when he's doing this?  And then of course there's selection bias, that Seth is hearing about the successes.

Just to be clear:  I'm not trying to criticize what Seth is doing, and I'm not trying to shoot it down.  I'm trying to strengthen it by suggesting ways of thinking about it.  As Seth says, criticism is easy, helping people is hard.

So here's my thought.  Maybe Seth could try a real placebo, as follows:  he could make up some goofy food or behavior change (something like . . . eating fermented food!  Or, I dunno, sleeping with the bed inclined at a 10 degree angle.  Or, I dunno--Seth would be better than I at coming up with something.  (Of course, it should be something he tries himself first and finds no adverse effects from.)  He could then make up some fairly vague story about how it helped him, then post it on his blog and see what happens.  Would people respond with stories about how helpful it was?

The great Linus Pauling conspiracy

I'm reminded of the idea I heard once that Linus Pauling knew all along that megadoses of Vitamin C have no effect, and that he altruistically sacrificed his reputation as a scientist to trumpet Vitamin C's virtues, on the theory that it would reduce the suffering of millions via the placebo effect.

In response to my entry on whether propensity score analysis could fix the Harvard Nurses study, Joseph Delaney wrote:

I am unsure about how propensity scores give any advantage over a thoughtfully constructed regression model. . . . I'm not saying that better statistical models shouldn't be used but I worry about overstating the benefits of propensity score analysis. It's an extremely good technique, no question about it, and I've published on one of it's variations. But I want to be very sure that we don't miss issues of study design and bias in the process.

I agree completely. But I'd focus on that "thoughtfully constructed" part of the regression model. As we've discussed, even some of the most thoughtful researchers don't talk much at all about construction of the model when they write regression textbooks.

So I think it might be too much to expect that working statisticians--those that might be employed by a long-running public health study, for example--to necessarily be using "a thoughtfully constructed regression model." Maybe all we can hope is that they use standard methods and document them well.

From this perspective, propensity scores have the advantage that in their standard implementation they allow a researcher to include dozens of background variables, which is not generally done in classical regression. As I noted in my original entry, there are other methods out there that also can handle large numbers of inputs; it doesn't necessarily have to be propensity scores.

The real issue is whether a method can allow a competent user to include the relevant information. This was the point of the famous Dehejia and Wahba paper on adjustment for observational studies.

Delaney also writes:

Issues of self-selection seriously limit all observational epidemiology. The issue is serious enough that I often wonder if we should not use observational studies to estimate medication benefits (at all). It's just too misleading.

Sure, but we do have to make decisions in life, and what do you do in those settings where no randomized trial exists, or where you don't trust a generalization of the results to the general population? Almost always we need some assumptions or another.

Macartan was telling me about this article by James Fearon, Jeremy Weinstein, and himself, which begins as follows:

Civil war is very common in the developing world, with harmful welfare effects when it occurs. Many fear that the devastation wrought by violent conflict destroys social capital, impedes economic development and leads to the recurrence of violence. In response, donors are injecting large amounts of aid into post-conflict countries. A significant share of this assistance is spent on "community-driven reconstruction" programs, which support the establishment of new local institutions in order to promote social reconciliation. Whether this assistance has this effect is, however, largely unknown. Can brief, foreign-funded efforts to build local institutions in fact have positive effects on local patterns of cooperation? We address this question using a randomized field experiment . . .

The answer is yes:

The outcome we examine is the amount of funding a community raises for a collective project through anonymous play in a public goods game. Our findings suggest that the community-driven reconstruction program improved community cohesion . . . Although levels of social cooperation were high across all villages in our sample, 71% of households contributed the maximum amount in treatment communities, while 62% contributed the maximum in control communities. For total payouts, which averaged about $333, treatment communities received 6.5% more on average for the community-selected public good.

And:

This effect is equivalent in magnitude to our estimate of the impact on individual contributions of quadrupling the social rate of return on a private investment.

Important stuff. No graphs, but ya gotta start somewhere...

Book reviews in academic journals

I thought that economists might be interested in my thoughts on the new book by Angrist and Pischke and, more generally, on the different perspectives that statisticians and economists have on causal inference. So I wrote them up as a short document and asked an econometrician friend where to send it. He said that the Journal of Economic Literature does book reviews so I sent it there. They returned it to me with kind words on my review but the note: "The JEL has avoided reviewing textbooks, focusing instead on research monographs. The review makes fine points about the coverage in this textbook, but neither the book nor the review are attempting to advance the state of the art."

Fair enough. So where to send the review. I asked some colleagues and they all agreed that JEL is the only economics journal that reviews books. So I guess econ textbooks just don't get reviewed!

This surprised me, given that book reviews appear in several top statistical journals, including the Journal of the American Statistical Association, the American Statistician, Biometrics, the Journal of the Royal Statistical Society, Statistics in Medicine, and Technometrics. There are also lots of places that review books in political science.

I'm surprised that there's only one place for book reviews for economists.

See here for my thoughts on the surprising stability of the economics curriculum.

A favorite example

Tim Wilson writes:

For a book I'm writing, I'm looking for good examples in which regression suggested that A caused B, whereas experimental studies showed that there was no causal relationship. Even better (at least for the sake of my example) would be if social policy changes were made based on the regression. Do you have a favorite example or two?

My reply:

Here's everybody's favorite example.

David Hillis is a biologist who has written on evolutionary trees. In response to my blog on Laura Novick's research on the perception of cladograms, Hillis writes:

It turns out that the best tree figures for students are neither of the two options she looked at, but rather the kind of trees that we use in Life: The Science of Biology. A more comprehensive study by a Univ. of Missouri education researcher, which included each of these options, clearly showed that the best comprehension by students was achieved with figures like the one in the attached file. People rarely draw trees this way for publications, however, because they are harder to draw than the ones with straight lines.

And here's an example:

hillis.png

I wonder if Laura has done research on this particular type of display.

Life in the long tail

Someone sent me an email asking if I would consider any form of advertising or sponsorship for the blog. I replied, "I wasn't planning to have any advertising or sponsorship on the blog, but I guess it's possible (if unlikely)." And he offered $1000 to sponsor us over two years (for a link in the "Research supported by..." section, where we currently list NSF, NIH, and Yahoo Research).

For $1000 it certainly wasn't worth the hassle. At this level, at least, blogs aren't big business quite yet.

Aaron Strauss spoke today on his work with Kosuke Imai on estimating the optimal order of priority and the optimal approach for contacting voters in a political campaign. They use inferences from field experiments on voter turnout and persuasion and then transfer these findings into a decision-analytic framework. I can't find a link to the latest version of their article, but a not-too-old version is here.

The talk was fascinating, and a bunch of points came up in the discussion and afterward that I wanted to set down here.

Correlation is not causation

Today's xkcd: (thanks Viktor!)
Correlation

"Correlation doesn't imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing 'look over there'."

I had lunch with Fred Lerdahl, a guy in the music department who does research in expectations--what motifs might be expected next in a musical piece--and I was reminded of the Bugs Bunny episode where Yosemite Sam rigs up the piano to explode when a certain note is played, then puts up the sheet music for Bugs, who annoyingly keeps playing the tune but getting the last note wrong. Yosemite gets increasingly frustrated until he finally bangs out the tune himself--causing the piano to blow up, of course.

bugs.png

Anyway, my lunch companion hadn't heard of the episode so I found it on Youtube and sent it to him. His reply:

Thanks, it's terrific! One thing, though: Bugs is supposed to hit C for the TNT to explode; on the soundtrack he hits C# and then Eb instead; but in the video he hits C both times (as does Sam, but in his case the soundtrack hits C, too). The cartoonists should have shown Bugs hitting the different notes (unless one wants to get metaphysical about it).

P.S. Fred adds that he just showed the cartoon to his wife, and she noticed that the dynamite is attached not to C but to B (that is, to one key to the left of the exploding note).

Real suspense and fake suspense

The other day I was reading a story in the New Yorker that had what I consider the now-standard pattern of starting the reader with no information about the key characters so that it takes awhile to figure out who the narrator is and how he relates to the scene. (After a few pages I got the sense that he was a well-off doctor in his fifties or sixties on a vacation with his wife and some friends.)

Anyway, here's my beef. I've always found this sort of style annoying, in comparison to the more traditional opening ("Once upon a time there was a well-off doctor in his sixties named James. One day he went on a vacation with his wife and some friends . . ."). At the same time, I've been conditioned to think that the "New Yorker"-style opening is better, more true to life--after all, in real life, people aren't generally introduced to you with a "Once upon a time"!

But then I was thinking that maybe this New Yorker style isn't so natural. These stories are generally told from one character's perspective--and, from that perspective, you would actually know someone's name, age, etc. It's not so natural at all to have to spend the first part of a story figuring out who's talking to whom.

My new take on this is that this style is a cheat, a way of creating a feeling of mystery and suspense without doing the work to create actual mystery and suspense. Actual mystery is when there's a situation you should be able to understand, but you don't, there are some missing pieces that you're trying to figure out. Actual suspense is when you want to know what happens next. Fake mystery and suspense is when you're just confused and don't know what's happening.

For example, the movie North by Northwest is actually mysterious and suspenseful. But not because it's a cheat and everyone's in a fog and you don't know who's who; it's because you're in the position of a character who knows who he is, but he doesn't know what's going on around him. That's a little different, in my opinion. Similarly with, say, John Le Carre: there's lots of things that, as a reader, you don't understand, but you're clear right away on who's saying what.

Or, for that matter, Mister New Yorker, John Updike, who begins a story with, "The Maples had moved just the day before to West Thirteenth Street, and that evening they had Rebecca Cune over, because now they were so close." Lots of hidden meaning there, but none of this artificial confusion where you're basically thrown into someone's brain at a random moment and not given any background. Following John Updike (or, for that matter, John O'Hara), I think the real challenge is giving the right amount of background--not too much, and not too little. Zero is not usually a serious option, in my opinion.

But, if you're writing a story that really has no mystery and no suspense, then starting by giving the reader no information can be a good way to give the illusion of depth.

P.S. Just to be clear, I'm not complaining about the "start in the middle" approach where the story begins and then you use flashbacks or other revelations to give a sense of how things all got started. That makes a lot of sense to me. What I'm bothered by is the particular trick of not identifying anything explicitly about the main characters so that the first part of the story involves the reader having to figure out the basics.

P.P.S. Sorry for ranting again. Yes, I know, I know, nobody's forcing me to read this story. But these questions of style interest me.

P.P.P.S. These issues also arise when writing statistics books.

No. See here.

I posted the pretty maps at 538. (I'd post them here, but Jeff was complaining that I was crossposting too much.) But one thing that people do like here is R code, so here's some:

> M1 <- glmer (rvote ~ z.inc2*z.state.income.full + (1|inc2) + (1 + z.inc2 | stnum), family=binomial(link="logit")) > display (M1) glmer(formula = rvote ~ z.inc2 * z.state.income.full + (1 | inc2) + (1 + z.inc2 | stnum), family = binomial(link = "logit")) coef.est coef.se (Intercept) -0.06 0.05 z.inc2 0.52 0.07 z.state.income.full -0.49 0.07 z.inc2:z.state.income.full -0.27 0.11

Error terms:
Groups Name Std.Dev. Corr
stnum (Intercept) 0.20
z.inc2 0.30 0.62
inc2 (Intercept) 0.06
Residual 1.00
---
number of obs: 20510, groups: stnum, 49; inc2, 5
AIC = 27668.9, DIC = 27652.9
deviance = 27652.9

I used the estimates from this model to extract McCain's estimated two-party vote share for each income category within each state. I then took a weighted average for each state (weighting by the number of respondents in each income category) and did one final adjustment by shifting the estimates for each category so that the average came out to the same as the actual vote outcome within each state.

I won't show you this R code because it's too damn ugly; I'm embarrassed.

When writing my book with Jennifer, I learned to be super-careful in my use of causal language. For example, when describing a regression coefficient, instead of saying "the effect of x on y," I trained myself to say, "the average difference in y, comparing people who differed by one unit in x." Or, in a multiple regression, "the average difference in y, comparing people who differed by one unit in x while being identical in all other predictors."

At first it's a struggle to speak this way, but eventually, I have found, this constraint has improved my thinking.

Application to the studies that purport to show that "real-life voters must also have based their choice of candidate on looks"

Yesterday I discussed an article that claimed (misleadingly, in my opinion) that people decide how to vote based on candidates' physical appearance.

Let's try to describe the study using Jennifer's non-causal approach. OK, here goes:

Winning politicians are judged to be more attractive, on average, than losing politicians.

Or, if there is some controlling for background variables:

Comparing two political candidates, one who won and one who didn't, but who are the same age, sex, and ..., the winner was, on average, judged to be more attractive than the loser.

At first glance, this might not seem to give us anything beyond the usual summary. But I find its precision helpful. Once the results are expressed as a difference, it's clear that there's no direct relevance to the question of how people vote; rather, it's a statement about a way in which successful and unsuccessful politicians differ. Which, among other things, perhaps makes it clearer that there are a lot of ways this could happen.

More generally

The "comparisons" way of describing regressions has helped me in other ways. Iain Pardoe and I wrote an article on average predictive comparisons, in which we focused on the question of what does it mean to compare two people who differ on one input variable while being identical on all the others. Among other things, this helped clarify for me the distinction between inputs and predictors in a regression model. (For example, in a model with age, sex, and age*sex, there are four predictors--the three items just listed, along with the constant term--but only two inputs: age and sex. It's a challenge to try to compare two people who differ in age but are identical in sex and age*sex--but people do this sort of thing all the time when they look at regression coefficients.)

The interpretation of coefficients as comparisons also helped clarify my thinking regarding the scaling of regression inputs. Now my default is to rescale continuous inputs to have standard deviation 1/2, which makes a comparison of one unit comparable to the difference between 0 and 1 for a binary variable. (Actually, I have to admit that I'm starting to wish that, for comparability with standard deviations in other examples, that I'd set the default of rescaling to have a standard deviation of 1, and rescaled binary inputs to be +/-1. I don't know if I have it in me to shift everything in this way, though.)

My recommendation

When describing comparisons and regressions, try to avoid "effect" and other causal terms (except in clearly causal scenarios) and instead write or speak in descriptive terms. It might seem awkward at first, but give it a try for a week. In the amorphous world of applied statistics, it can be oddly satisfying to speak precisely.

And is there anything out there that can serve as a reasonable substitute?

I feel I have to respond to this item that people keep pointing me to:

John Antonakis and Olaf Dalgas presented photos of pairs of competing candidates in the 2002 French parliamentary elections to hundreds of Swiss undergrads, who had no idea who the politicians were. The students were asked to indicate which candidate in each pair was the most competent, and for about 70 per cent of the pairs, the candidate rated as looking most competent was the candidate who had actually won the election. The startling implication is that the real-life voters must also have based their choice of candidate on looks, at least in part. [emphasis added]

Nooooooooooooooooooooooooooooooooooo!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

This came up a couple of years ago, when, in response to a similar study, I wrote:

It's a funny result: at first it seems impressive--70% accuracy!--but then again it's not so impressive given that you can predict something on the order of 90% of races just based on incumbency and the partisan preferences of the voters in the states and districts [at least in the U.S.; I don't know about France]. If 90% of the races are essentially decided a year ahead of time, what does it mean to say that voters are choosing 70% correct based on the candidates' looks.

I can't be sure what's happening here, but one possibility is that the more serious candidates (the ones we know are going to win anyway) are more attractive. Maybe you have some goofy-looking people who decide to run in districts where they don't have a chance, whereas the politicians who really have a shot at being in congress take the time to get their hair cut, etc.

Anyway, the point of this note is just that some skepticism is in order. It's fun to find some scientific finding that seems to show the shallowness of voters, but watch out! I guess it pleases the cognitive scientists to think that something as important and seemingly complicated as voting is just some simple first-impression process. Just as, at the next level, it pleases biologists to think that something as important and seemingly complicated as psychology is just some simple selfish-gene thing.

And see here for a discussion of some research by Atkinson, Enos, and HIll on this topic.

Just one more thing

From the news article:

"These findings suggest that voters are not appropriately weighting performance-based information on political candidates when undertaking one of democracy's most important civic duties," the researchers said.

No, no, no. Unless you want to take a very weak interpretation of "suggest." Or, to put it another way, sure, I have no doubt that "voters are not appropriately weighting performance-based information on political candidates"--but I don't see the personal appearance study as relevant to even close to definitive on this point.

I'm as cynical as the next guy, but this sort of thing is going a step too far, even for me.

Religion, income, and voting

More pretty graphs from the 2008 Pew surveys, interspersed with discussion and concluding with the line, "One thing sophistication can give you is an appreciation for the simple things in life."

Below is one of the graphs; click on the link above for more.

pewreligrelatt.png

I received the following email:

As a psychologist teaching and using Bayesian statistics, I've been pleased to see some of my colleagues endorsing Bayesian data analysis. But I've been very chagrined to see them champion Bayes factors for null-hypothesis testing, instead of parameter estimation. My question is simple: Are there any articles that head-on challenge the Bayes-factor approach to null-hypothesis testing, and instead favor parameter estimation?

Perhaps the most straight-forward example against Bayes factors for null hypotheses was given by Stone (1997), Statistics and Computing, 7, 263-264. He showed a simple case in which the BF prefers the null but the estimated posterior excludes the null value. I realize that the two approaches are asking different questions --- I've just never really been convinced that the answer provided by a null/alternative comparison really tells us anything we want to know, because no matter what it says, I always want to do the estimation anyway.

My reply: You won't be surprised that I agree with the above perspective. Here's my article with Rubin (from Sociological Methodology 1995) where we bang on Bayes factors for 8 straight pages. I really like this article.

Temporary grave of an American machine-gunner ...

Image via Wikipedia

We often hear that life is precious. But how precious? Can we really try to ascribe cold monetary values to a warm life? When you don't consciously estimate, you run the risk of underestimating it. Every living moment we take chances: it's unsafe to eat, it's unsafe to work, it's unsafe to drive. And whenever we trade risk of death off for time or money, we reveal the value we ascribe to our own life. And with this, we can also contemplate about the cost of economic disasters, and measure them in human life.

Here is a chart by Johannes Ruf (using a bibliography prepared by Bernhard Ganglmair), listing a number of papers that list the value of life as estimated by such trade-offs:
value-of-life.png

So, allowing some tolerance for inflation and deflation, and taking the average of all of the above, we arrive to about 4 million dollars. If we assume that the average life expectancy is 78 years, and that half of the day is waking time, the value comes down to about $12 for a waking hour of life - arbitrary, but the right order of magnitude. Additional complexity can be entered into this model to improve it.

I will now address what we can do with these numbers.

Bilmes and Stiglitz estimate cost of the Iraq war to 3 trillion, while the US military casualties currently number 4250. So, the cost of lives lost is 17 billion, but the economic cost borne by the United States is 3 trillion. What is the true cost of the Iraq war in human lives? 3 trillion divided by 4.2 million comes down to over 720,000 lives. This is the true casualty count, which accounts for people having to work on stuff that explodes instead of spending time with their families. On the Iraqi side, there were about 100,000 civilian lives lost, but it's hard to estimate the full cost of war to Iraq - the Lancet study claims numbers considerably larger than this.

Andrew Gelman has also written about this - when is one's risk of radon to health sufficient to justify the cost of measurement or remediation. I'd again like to acknowledge Johannes and Bernhard's help with the research, but all flaws are solely my own. The difficulties of estimation shouldn't stop us from studying the problem - maybe better awareness of this will help save a million lives in the future.

State of Rationality

I received the following email:

I am working on a paper for a political course which I must discuss "a what if" Pennsylvania transformed into a state of rationality. Everything is the same except that all the citizens, all the candidates for state office, all the state legislators, and all the lobbyists in the state behave rationally in a economic sense. Of these groups, which one is most likely to be the most politically powerful.

I am not sure how to exactly get started and thought I would see if you might have any suggestions or thoughts on the subject.

Sounds like a good assignment to me. I only teach statistics and methods courses, so I never think about this sort of interesting political-science homework problem.

In honor of Darwin's 200th birthday, some research by psychologist Laura Novick on the presentation of evolutionary trees ("cladograms"):

cladograms.gif

Her research shows that students are much better at understanding the diagram on the left than the one on the right. She calls the one on the left a "tree" and the one on the right a "ladder," which confuses me a bit: the one on the right looks more like tree branches to me.

6 percent . . . not bad!

Tyler Cowen links to this report that "economists comprised only 6 percent of guest appearances discussing stimulus on cable news, Sunday shows."

That sounds pretty good to me! I can't imagine that political scientists make up anything close to 6% of the TV commenters on political topics. I doubt it's 0.6%. My recommendation to economists: quit complaining and treasure the 6% you have!

Related: Why are there so few economists in elected office?:

There were 139,000 economists employed in the United States, which reprsented 0.1% of the employed population. 1% of 535 is about 1/2, so with at least two economists in Congress, the profession is hardly unrepresented. . . . even throwing in economics professors and various other practicing economists, I still don't think it would add up to the half-million that would be necessary to reach 2/535 of the employed population. . . . perhaps Congress would indeed be better if it included more economists--but rather to note that people with this sort of job are a small minority in the U.S. (In contrast, there were 720,000 physicians, 170,000 dentists, and 2.1 million nurses, and 1.7 million health technicans in the U.S.)

To put it another way, without reference to economists (or to the 2.1 million "mathematical and computer scientists" out there): the Statistical Abstract has 260,000 psychologists. Certainly Congress would be better off with a few psychologists, who might understand how citizens might be expected to react to various policies. . . . and what about the 114,000 biologists? A few of these in Congress might advance the understanding of public health. And then there are the 290,000 civil engineers--I'd like to have a few of them around also. I'd also like some of the 280,000 child care workers and 620,000 pre-K and kindegarten teachers to give their insight on deliberations on family policy. And the 1.1 million police officers and 340,000 prison guards will have their own perspectives on justice issues. . . .

Of lotteries and the practice of science

Seth points to this wonderful suggestion by Tim Hartford on "how to enjoy the thrill of the lottery without the fool's bet":

Choose your numbers, but don't buy a ticket. You'll win almost every week - the fear that your number might actually come up is an adrenaline rush to beat them all.

I love this. But now I want to return to Seth, who draws a connection to what scientists do. I don't quite agree with what Seth writes--I think he gets his argument tangled up--but it's interesting, so let me repeat it and then follow up with my own comments. Seth writes:

It is the average [lottery] consumer who is gullible and makes the whole thing work . . . Scientists are no less gullible. Self-experimentation, like Hartford's advice, takes advantage of that gullibility. Because scientists essentially play the lottery in their research -- devote considerable resources (their careers) to looking for discoveries in one specific way (scientists are hemmed in by many rules, which also slow them down) -- this leaves a great deal to be discovered by research that doesn't cost a lot and can be done quickly. All of my interesting self-experimental discoveries have involved treatments that conventional scientists couldn't study because their research has to be expensive. Could a conventional scientist study the effect of seeing faces in the morning? No, because you couldn't get funding. And all research must require funding. (Research without funding is low status.) In practice, this means you can't take risks and you can't do very much. Like the lottery, this is a poor bet.

Let me untangle this. Seth is saying that the typical scientist is like a lottery player whereas, by doing self-experimentation, Seth is more like Tim Hartford's reverse lottery player, going for the near-sure thing rather than investing time in the hope of a hypothetical breakthrough.

It's funny that Seth says this, because I've always told him the opposite: conventional scientists such as myself are the plodders, squeezing out little research results each year, publishing in journals and getting grants, whereas Seth has always seemed to me to be the gambler, stepping away from the near-sure thing of the scientific treadmill and risking something like 10 years of his life on self-experimentation--it was about 10 years after he began that he started to get useful results. I've always admired Seth for his gamble.

Right now I can see that Seth views self-experimentation as a grind-it-out way to make discovery after discovery, but 20 years ago, not so much. Conversely, I don't think of conventional scientists as staking their careers on the chance of making a single big discovery. Rather, we make no risks at all! To paraphrase Paul Erdos, a scientist is a machine for turning hard work into little bits of publishable research.

P.S. I don't buy Seth's claim that "research without funding is low status." My impression is that people seek funding because they feel their research is important and they want help getting it done faster. I don't see that status has anything to do with it.

Predictions that are too good to be true?

Chris Masse pointed me to this blog by Panos Ipeirotis, who argues that some online prediction markets give probabilities that are too good to be true:

In Red State, Blue State, we talked about how, in recent years, the Democrats have been winning the rich states, even while richer voters lean Republican.

What happened in 2008? Exit polls were made available immediately--as of election night. The next step is to go to individual-level data, which we recently obtained from the Pew Research Center's pre-election polls.

Here's the income and voting pattern at the national level: pewincome2.png Republicans did better among upper-income voters--except possibly for the over-200,000's. (The highest income category from the Pew surveys is "$150,000+", so we can't do a direct comparison at the top.)

Red and blue states

Now let's look at red, blue, and purple states (which we define, following our book, as those states where George W. Bush won by more than 10 points in both his campaigns, those where he lost by more than 10 points both time, and the states in between): pewrbpinc2.png As in previous elections, income predicts Republican vote more strongly in red than in blue states. (For this and following graphs, I'm switching the x-axis from numerical incomes to income categories.)

Or, to put it another way, the red-state/blue-state divide is happening among the rich (actually, the upper middle class, since surveys don't tell us much about the truly rich) more than the poor.

Andy Sutter writes:

It's been a while (~2 years?) since I was last reading your blog semi-regularly and submitted a comment or two, but I was reading something today that made me recall those days.

At the time, I was curious about why social scientists present data as charts of regression coefficients, since I'd never seen such a presentation in the physical sciences.

Hidden assumptions about the economy

On the front page of Suday's New York Times, the primest of prime real estate, Hiroko Tabuchi writes:

As recession-wary Americans adapt to a new frugality, Japan offers a peek at how thrift can take lasting hold of a consumer society, to disastrous effect. . . . Today, years after the recovery, even well-off Japanese households use old bath water to do laundry, a popular way to save on utility bills. Sales of whiskey, the favorite drink among moneyed Tokyoites in the booming '80s, have fallen to a fifth of their peak. And the nation is losing interest in cars; sales have fallen by half since 1990.

How is this "disastrous"? Using bath water to do laundry makes sense to me. Unfortunately our apartment is not set up to do this, but why not? Cars are much better made than they used to be, probably most people in Japan who want a car badly enough have one already, so it makes sense that car sales would fall--people can continue driving their dependable old cars. Finally, I have nothing against whiskey, but is it really "disastrous" that sales have fallen to a fifth of their peak? Fads and all that.

Sure, I can see that this is all evidence that Japan's economy is far from booming, but I'm a bit disturbed to see frugality treated as a "disaster" in itself.

What really bothers me, though, is that the assumptions in the article are completely unstated. I'd be happier if the reporter had written something like this:

You might think that it's a good thing that the Japanese have become more energy-efficient and less into trendy conspicuous consumption: even well-off Japanese households use old bath water to do laundry, a popular way to save on utility bills, and sales of whiskey, the favorite drink among moneyed Tokyoites in the booming '80s, have fallen to a fifth of their peak.

Even the notorious Japanese tendency to buy new cars and appliances every two years, whether they need it or not, has abated. The nation is losing interest in cars--sales have fallen by half since 1990--and people are sticking with old-fashioned television sets rather than snapping up expensive flat-screen TVs.

But this frugal behavior is having a disastrous effect [or, is symptomatic of an underlying economic disaster]. . . .

This puts the assumptions front and center, at which point they could quote experts on both sides of the issue or whatever.

P.S. Just to be clear: my point here is not that a newspaper reporter wrote something I might disagree with, but rather that sometimes people seem trapped within their unstated assumptions. (Yes, I'm sure that happens to me too.)

Bill Gates on the state of education

A New Bill Gates

Image by jurvetson via Flickr

Bill Gates' talk at TED covers two topics: medical research for the developing world (first 8 minutes), and education for the USA (the last 10 minutes). He has an interesting slide about the impact of different factors on a teacher's performance, which was obtained through statistical analysis of explanatory factors for the improvement in students' scores:

education-performance.png

Thus, a master's degree actually hurts performance, and seniority was irrelevant as a factor. But master's degree and seniority are the only two factors that will increase a teacher's pay.

Now, Gates is pushing a lot for gathering and analyzing data. So there might be opportunities for those interested in doing research in education to get grants from the Gates foundation.

Helen DeWitt, commenting on about friends/colleagues/acquaintances who ask her for reference letters, writes of "a mythical entity: a reference that can just be dashed off in half an hour and popped in the post / fired off in an e-mail. There is no such thing."

Jenny Davidson follows up with:

I [Jenny] do not know why someone thinks that it is possible to write a good letter of recommendation without a HUGE amount of supplementary paperwork . . .

What's my experience? I get asked for a fair number of letters of recommendation or evaluation, and I take about 15 minutes to write such a letter and email it off (to someone who prints it on letterhead paper and mails it). From the remarks above, I suspect that it's considered a norm to spend more time than that, but I think it's a bit of an arms race: your letter has to be long so it can compete with other people's letters. So by writing short letters, I'm doing my part to make the process more sane. There's a well-known statistician who always writes letters for his students saying essentially that they're the second coming of Cauchy; he's recognized for doing this. As long as people have the expectation that my letters will be short, everything should work out fine.

Yair showed me this. It's simply amazing. Click on the link RIGHT AWAY and be awed. Great examples and all the code you'll ever need.

This one's for Zacky

I'm working on a project involving the evaluation of social service innovations, and the other day one of my colleagues remarked that in many cases, we really know what works, the issue is getting it done. This reminded me of a fascinating article by Atul Gawande on the use of checklists for medical treatments, which in turn made me think about two different paradigms for improving a system, whether it be health, education, services, or whatever.

The first paradigm--the one we're taught in statistics classes--is of progress via "interventions" or "treatments." The story is that people come up with ideas (perhaps from fundamental science, as we non-biologists imagine is happening in medical research, or maybe from exploratory analysis of existing data, or maybe just from somebody's brilliant insight), and then these get studied (possibly through randomized clinical trials, but that's not really my point here; my real focus is on the concept of the discrete "intervention"), and then some ideas are revealed to be successful and some are not (with allowances taken for multiple testing or hierarchical structure in the studies), and the successful ideas get dispersed and used widely. There's then a secondary phase in which interventions can get tested and modified in the wild.

The second paradigm, alluded to by my colleague above, is that of the checklist. Here the story is that everyone knows what works, but for logistical or other reasons, not all these things always get done. Improvement occurs when people are required (or encouraged or bribed or whatever) to do the 10 or 12 things that, together, are known to improve effectiveness. This "checklist" paradigm seems much different than the "intervention" approach that is standard in statistics and econometrics.

The two paradigms are not mutually exclusive. For example, the items on a checklist might have had their effectiveness individually demonstrated via earlier clinical trials--in fact, maybe that's what got them on the checklist in the first place. Conversely, the procedure of "following a checklist" can itself be seen as an intervention and be evaluated as such.

And there are other paradigms out there, such as the self-experimentation paradigm (in which the generation and testing of new ideas go together) and the "marketplace of ideas" paradigm (in which more efficient systems are believed to evolve and survive through competitive pressures).

I just think it's interesting that the intervention paradigm, which is so central to our thinking in statistics and econometrics (not to mention NIH funding), is not the only way to think about process improvement. A point that is obvious to nonstatisticians, perhaps.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48