January 2010 Archives

Hal Daume pointed me to this plan of some marathon-running dude named Matt has to quit drinking caffeine. Here's Matt's motivation:

I [Matt] try hard to stay away from acid-forming foods and to eat by the principles of Thrive, where energy comes not from stimulation but from nourishment. I want to maximize the energy I have available to create an exciting life, and coffee, in the long-term, only robs me of this energy.

I've tried hard to quit coffee in the past--I even went a month without coffee a while back. But I keep coming back to it. I come back to it because I have this idea that it helps me think better. I enjoy reading books and doing math more when I drink coffee, and I think I come up with better ideas when I'm caffeinated. But I know that's not true. The type of thinking coffee helps me with is a very linear kind, a proficiency at checking items off a list or even of recombining old ideas in a new way. This isn't real creativity. Real creativity is nonlinear, the creation of truly new ideas that haven't yet been conceived, not simply the reordering of old ones.

What's cool about Matt's project is that he's randomizing: some days he'll drink caffeinated, some days regular coffee, and other days a mix. (To be precise, his wife is doing the randomizing, and she gets to choose the mix.) Each week, he alters the proportions to have more and more decaf--that way he can transition to fully-decaffeinated coffee, but in a way that is slightly unpredictable, so that he's never quite sure what he's getting in any day.

Also, of course, he's making all this public, which I guess will make it tougher for him to break his self-imposed rules.

This is an interesting example in which randomization is used for something other than the typical statistical reason of allowing unbiased comparisons of treatment groups.

I was also amused by his method of having his wife randomize. I remember thinking about this when Seth was telling me about one of his self-experiments, where I worried that expectation effects could be large--Seth knows what he's doing to himself (in this case, I believe it was some choice of which oil he was drinking every day) and I was thinking that this could have a huge effect on his self-measurements. I spent awhile trying to think of a way that Seth could randomize his treatment, but it wasn't easy--Seth was living alone at the time, and there wasn't anyone who could conveniently do it for him--and for reasons having to do with the effects that Seth was expecting to see, a simple randomization wouldn't work. (Seth was expecting results to last over several days, so a randomization by day wouldn't do the trick. But randomizing weeks wouldn't do either, because then you're losing independence of the daily measurements, if Seth guesses (or thinks he can guess) the new treatment on the day of the switch.) It would've been so so easy to do it using a friend, but not at all easy to do alone.

A prediction


What it takes


From a recent email exchange with a collaborator on a paper that a bunch of us are working on:

Yes, it's definitely a methodology paper. But, given that we don't have any theorems or simulation studies, the motivation for the methodology has to come from the application, no?

A few days ago, I suggested that we could invert the usual forecast-the-election-from-the-economy rule and instead use historical election returns to make inferences about past economic trends.

Bob Erikson is skeptical. He writes:

It is an interesting idea but I don't think the economics-vote connection is strong enough to make it work. At best econoims explains no more than "half the variance" and often less. Like I [Bob] am on record as saying the economy has little to do with midterm elections (AJPS 1990) unlike prez elections.

Damn. It's such a cute idea, though, I still want to give it a try.

Some thoughts on final exams


I just finished grading my final exams--see here for the problems and the solutions--and it got me thinking about a few things.

#1 is that I really really really should be writing the exams before the course begins. Here's the plan (as it should be):
- Write the exam
- Write a practice exam
- Give the students the practice exam on day 1, so they know what they're expected to be able to do, once the semester is over.
- If necessary, write two practice exams so that you have more flexibility in what should be on the final.

The students didn't do so well on my exam, and I totally blame myself, that they didn't have a sense of what to expect. I'd given them weekly homework, but these were a bit different than the exam questions.

My other thought on exams is that I like to follow the principles of psychometrics and have many short questions testing different concepts, rather than a few long, multipart essay questions. When a question has several parts, the scores on these parts will be positively correlated, thus increasing the variance of the total.

More generally, I think there's a tradeoff in effort. Multi-part essay questions are easier to write but harder to grade. We tend to find ourselves in a hurry when it's time to write an exam, but we end up increasing our total workload by writing these essay questions. Better, I think, to put in the effort early to write short-answer questions that are easier to grade and, I believe, provide a better evaluation of what the students can do. (Not that I've evaluated that last claim; it's my impression based on personal experience and my casual reading of the education research literature. I hope to do more systematic work in this area in the future.)

I just graded the final exams for my first-semester graduate statistics course that I taught in the economics department at Sciences Po.

I posted the exam itself here last week; you might want to take a look at it and try some of it yourself before coming back here for the solutions.

And see here for my thoughts about this particular exam, this course, and final exams in general.

Now on to the exam solutions, which I will intersperse with the exam questions themselves:

Kevin Spacey famously said that the greatest trick the Devil ever pulled was convincing the world he didn't exist. When it comes The Search for Certainty, a new book on the philosophy of statistics by mathematician Krzysztof Burdzy, the greatest trick involved was getting a copy into the hands of Christian Robert, who trashed it on his blog and then passed it on to me.

The flavor of the book is given from this quotation from the back cover: "Similarly, the 'Bayesian statistics' shares nothing in common with the 'subjective philosophy of probability." We actually go on and on in our book about how Bayesian data analysis does not rely on subjective probability, but . . . "nothing in common," huh? That's pretty strong.

Rather than attempt to address the book's arguments in general, I will simply do two things. First, I will do a "Washington read" (as Yair calls it) and see what Burdzy says about my own writings. Second, I will address the question of whether Burdzy's arguments will have any effect on statistical practice. If the answer to the latter question is no, we can safely leave the book under review to the mathematicians and philosophers, secure in the belief that it will do little mischief.

This is pretty funny. And, to think that I used to work there. This guy definitely needs a P.R. consultant. I've seen dozens of these NYT mini-interviews, and I don't think I've ever seen someone come off so badly. The high point for me was his answering a question about pay cuts by saying that he's from Philadelphia. I don't know how much of this is sheer incompetence and how much is coming from the interviewer (Deborah Solomon) trying to string him up. Usually she seems pretty gentle to her interview subjects. My guess is what happened is her easygoing questions lulled Yudof into a false sense of security, he got too relaxed, and he started saying stupid things. Solomon must have been amazed by what was coming out of his mouth.

P.S. The bit about the salary was entertaining too. I wonder if he has some sort of deal like sports coaches do, so that even if they fire him, they have to pay out X years on his contract.

Ban Chuan Cheah writes:

I'm trying to learn propensity score matching and used your text as a guide (pg 208-209). After creating the propensity scores, the data is matched and after achieving covariate balance the treatment effect is estimated by running a regression on the treatment variable and some other covariates. The standard error of the treatment effect is also reported - in the book it is 10.2 (1.6).

We all know, following the research of Rosenstone, Hibbs, Erikson, and others, that that economic conditions can predict vote swings at state and national levels.

But, what about the reverse? Could we deduce historical economic conditions from election returns? Instead of forecasting elections from the economy, we could hindcast the economy from elections.

Would this make sense as a way of studying local and regional economic conditions in the U.S. in the 1800s, for example? I could imagine that election data are a lot easier to come by than economic data.

P.S. Don't forget that there have been big changes over time in our impressions of the ability of presidents to intervene successfully in the economy.

Patterson update

| 1 Comment

I went to the library and took a look at a book by James Patterson. It was pretty much the literary equivalent of a TV cop show. I couldn't really see myself reading it all the way through, but it was better-written than I'd expected. It's hard for me to see why Patterson wants to keep doing it (even if his coauthors are doing most of the work at this point). But I suppose that, once you're on the bestseller list, it's a bit addictive and you want to stay up there.

Today I faced some tedious work on a project that must be finished by the end of the week, so my procrastination methods reached new heights of creativity. For the first time, I clicked on the "Most Popular" tab at the top of the NY Times website. This gives me another opportunity for procrastination, by typing this blog post, because I noticed something surprising: There's not much overlap between the 10 "most e-mailed" and the 10 "most blogged" recent stories. Only 3 stories are on both "top 10" lists...which is to say, 7 of the most e-mailed stories are not among those that drew the attention of the most bloggers, and 7 of the most-blogged stories didn't make the cut for most emailers. I don't know if this is typical -- maybe this is an unusual week -- but I find it surprising. If a story seems like the kind of thing that would interest your friends, wouldn't it also be a good one to blog about? Does the difference simply reflect demographics? Perhaps bloggers are younger, and are interested in different stories than non-bloggers?

It's not 1933, it's 1930


A major storyline of the 2008 election was that it was the Great Depression all over again: George W. Bush was the hapless Herbert Hoover and Barack Obama was the FDR figure, coming in on a wave of popular resentment to clean things up. The stock market crash made the parallels pretty direct. One could continue the analogy, with Bill Clinton playing the Calvin Coolidge role, mindlessly stoking the paper economy and complicit in the rise of the stock market as a national sport. Public fascination with various richies seemed very 1920s-ish, and we had lots of candidates for the "Andrew Mellon" of the 2000s. Obama's decisive victory echoed Roosevelt's in 1932.

But history doesn't really repeat itself--or if it does, it's not always quite the repetition that was expected. With his latest plan of a spending freeze (on the 17% of the federal budget that is not committed to the military, veterans, homeland security and international affairs, Social Security, or Medicare), Obama is being labeled by many liberals as the second coming of Herbert Hoover--another well-meaning technocrat who can't put together a political coalition to do anything to stop the slide. Conservatives, too, may have switched from thinking of Obama as a scary realigning Roosevelt to viewing him as a Hoover from their own perspective--as a well-meaning fellow who took a stock market crash and made it worse through a series of ill-timed government interventions.

I can see the future debates already: was Obama a Hoover who dithered while the economy burned, too little and too late (the Krugman version) or a Hoover who hindered the ability of the economy to recover on his own by pushing every button he could find on the national console (the Chicago-school version)?

In either storyline, it's 1930, not 1932: rather than being three years into a depression, we're still just getting started and we're still in the Hoover-era position of seeing things fall apart but not quite being ready to take the next step.

Anyway, I'm not claiming to offer any serious political or economic analysis here, just pointing out that the 1932 election was a full three years after the 1929 stock market crash, so Obama's stepping into the story at a different point than when Roosevelt stepped in to his.

Or maybe we're still on track for Obama to "do a Reagan,' ride out the recession in the off-year election and sit tight as the economy returns in years 3 and 4.

Tufte recommendation


A former student writes:

I'm going to get a Tufte book. Do you recommend "The Visual Display of Quantitative Information" or "Envisioning Information?"

My reply: My favorite is his second book, Envisioning Information. His first book was his breakthrough but the second book is the one that I learned the most from, myself.

P.S. I don't know if this counts as a 3-star thread.

What can search predict?


You've all heard about how you can predict all sorts of things, from movie grosses to flu trends, using search results. I earlier blogged about the research of Yahoo's Sharad Goel, Jake Hofman, Sebastien Lahaie, David Pennock, and Duncan Watts in this area. Since then, they've written a research article.

Here's a picture:


And here's their story:

We [Goel et al.] investigate the degree to which search behavior predicts the commercial success of cultural products, namely movies, video games, and songs. In contrast with previous work that has focused on realtime reporting of current trends, we emphasize that here our objective is to predict future activity, typically days to weeks in advance. Specifically, we use query volume to forecast opening weekend box-office revenue for feature films, first month sales of video games, and the rank of songs on the Billboard Hot 100 chart. In all cases that we consider, we find that search counts are indicative of future outcomes, but when compared with baseline models trained on publicly available data, the performance boost associated with search counts is generally modest--a pattern that, as we show, also applies to previous work on tracking flu trends.

The punchline:

We [Goel et al.] conclude that in the absence of other data sources, or where small improvements in predictive performance are material, search queries may provide a useful guide to the near future.

I like how they put this. My first reaction upon seeing the paper (having flipped through the graphs and not read the abstract in detail) was that it was somewhat of a debunking exercise: Search volume has been hyped as the greatest thing since sliced bread, but really it's no big whoop, it adds almost no information beyond a simple forecast. But then my thought was that, no, this is a big whoop, because, in an automatic computing environment, it could be a lot easier to gather/analyze search volume than to build those baseline models.

Sharad's paper is cool. My only suggestion is that, in addition to fitting the separate models and comparing, they do the comparison on a case-by-case basis. That is, what percentage of the individual cases are predicted better by model 1, model 2, or model 3, and what is the distribution of the difference in performance. I think they're losing something by only doing the comparisons in aggregate.

It also might be good if they could set up some sort of dynamic tracker that could perform the analysis in this paper automatically, for thousands of outcomes. Then in a year or so they'd have tons and tons of data. That would take this from an interesting project to something really cool.

Alex Lundry sent along this presentation.. As some of you know, I hate videos, so I didn't actually look at this, but it seems to combine two of my main interests, so I thought it might interest some of you too. If you like it (or you don't), feel free to say so in the comments.

The man with the golden gut


Seth links to this fascinating article by Jonathan Mahler about the popular novelist James Patterson:

Last year, an estimated 14 million copies of his books in 38 different languages found their way onto beach blankets, airplanes and nightstands around the world. Patterson may lack the name recognition of a Stephen King, a John Grisham or a Dan Brown, but he outsells them all. Really, it's not even close. (According to Nielsen BookScan, Grisham's, King's and Brown's combined U.S. sales in recent years still don't match Patterson's.) This is partly because Patterson is so prolific: with the help of his stable of co-authors, he published nine original hardcover books in 2009 and will publish at least nine more in 2010.

Patterson has written in just about every genre -- science fiction, fantasy, romance, "women's weepies," graphic novels, Christmas-themed books. He dabbles in nonfiction as well. In 2008, he published "Against Medical Advice," a book written from the perspective of the son of a friend who suffers from Tourette's syndrome.

More than Grisham, King, and Brown combined: that really is pretty impressive. The sixty-somthing Patterson has written 35 New York Times #1 best sellers but doesn't seem to have too much of a swelled head:

A new kind of spam


As a way of avoiding work, I check the comments on this blog and decide which to approve and which to send to the spam folder. (Lots of stuff gets sent directly to spam; these are almost 100% classified correctly and I basically never need to check there.)

There are different kinds of spam, but I can typically spot it by being close to content-free and with a link to a site that is selling something. I don't mind if you're a statistical consultant and you link to your consulting site, but, no, if you submit a comment with a link to some discount DVD site or whatever, yes, you're going straight to the spam fliter.

Today, though, I got a new kinds of spam: it looked just like the usual stuff but there was no URL, either in the mssage or in the regular URL field. I can't figure out why somebody would bother to do this.

Following up on our recent discussion (see also here) about estimates of war deaths, Megan Price pointed me to this report, where she, Anita Gohdes, Megan Price, and Patrick Ball write:

Several media organizations including Reuters, Foreign Policy and New Scientist covered the January 21 release of the 2009 Human Security Report (HSR) entitled, "The Shrinking Cost of War." The main thesis of the HRS authors, Andrew Mack et al, is that "nationwide mortality rates actually fall during most wars" and that "today's wars rarely kill enough people to reverse the decline in peacetime mortality that has been underway in the developing world for more than 30 years." . . . We are deeply skeptical of the methods and data that the authors use to conclude that conflict-related deaths are decreasing. We are equally concerned about the implications of the authors' conclusions and recommendations with respect to the current academic discussion on how to count deaths in conflict situations. . . .

The central evidence that the authors provide for "The Shrinking Cost of War" is delivered as a series of graphs. There are two problems with the authors' reasoning.

From blogging legend Phil Nugent:


If Scott Brown wins, I [Nugent] suspect that it will have less to do with a massive swing to the right in the bosom of liberalism than with a tendency there to vote against the repulsive and inept candidate in favor of the one who seems Kennedyesque, no matter whether he belongs to the Kennedys' party or not. On the other hand, if the election comes down to a squeaker that finds Coakley victorious, it'll probably be because the last minute media explosion, complete with the sight of all those gleeful Republicans turning cartwheels in the end zone, alerted voters to the strategic importance of holding their noses and voting for the monster over the centerfold.

Andrew Sullivan links to this amusing study [link fixed]. The whole blog is lots of fun--I've linked to it before--and it illustrates an important point in statistics, which I've given as the title of this blog entry.

P.S. I'm not trying to say that statistical methodology is a waste of time. Good methods--and I include good graphical methods in this category--allow us to make use of more data. If all you can do is pie charts and chi-squared tests (for example), you won't be able to do much.

Alan Turing is said to have invented a game that combines chess and middle-distance running. It goes like this: You make your move, then you run around the house, and the other player has to make his or her move before you return to your seat. I've never played the game but it sounds like fun. I've always thought, though, that the chess part has got to be much more important than the running part: the difference in time between a sprint and a slow jog is small enough that I'd think it would always make sense just to do the jog and save one's energy for the chess game.

But when I was speaking last week at the University of London, Turing's chess/running game came up somehow in conversation, and somebody made a point which I'd never thought of before, that I think completely destroys the game. I'd always assumed that it makes sense to run as fast as possible, but what if you want the time to think about a move? Then you can just run halfway around the house and sit for as long as you want.

It goes like this. You're in a tough spot and want some time to think. So you make a move where the opponent's move is pretty much obvious, then you go outside and sit on the stoop for an hour or two to ponder. Your opponent makes the obvious move and then has to sit and wait for you to come back in. Sure, he or she can plan ahead, but with less effectiveness than you because of not knowing what you're going to do when you come back in.

So . . . I don't know if anyone has actually played Turing's running chess game, but I think it would need another rule or two to really work.

This looks interesting; too bad I'm not around to hear it:

Book titles


My collaborators and I have had some successes and some failures; here are some stories, with the benefit of (varying degrees of) hindsight.

"Bayesian Data Analysis." We thought a lot about this one. It was my idea to use the phrase "data analysis": the idea was that "inference" is too narrow (being only one of the three data analysis steps of model-building, inference, and model checking) and "statistics" is too broad (seeing as it also includes design and decision making as well as data analysis). I hadn't thought of the way that BDA sounds like EDA but that came out well, even though the first edition of BDA was pretty weak on the EDA stuff--we fit more of that into the second edition (in chapter 6 and even in the cover). Beyond this, I was never satisfied with "Bayes" in the title--it seemed, and still seems, too jargony and not descriptive enough for me. I'd prefer something like "Data Analysis Using Probability Models" or even "Data Analysis Using Generative Models" (to use a current buzzword that, yes, may be jargon but is also descriptive). But we eventually decided (correctly, I think) that we had to go with Bayes because it's such a powerful brand name. Every once in awhile I see the phrase "Bayesian data analysis" used generically, not in reference to our book, and when this happens it always makes me happy; I think the statistical world is richer to have this phrase rather than the formerly-standard "Bayesian inference" (which, as noted above, misses some big issues).

"Teaching Statistics: A Bag of Tricks." Should've been called "Learning Statistics: A Bag of Tricks." Only a few people want to teach statistics; lots of people want to learn it. And, ultimately, a book of teaching methods is really a book of learning methods. Also, many people have told me that they've bought the book and read it. I actually think it's had more effect from people reading it than from people using it in their classes. Sort of like one of those golf books that people put by their bedside and read even if they don't get around to practicing and following all the instructions.

"Applied Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives." The title seems fine, but something went wrong in the promotion of this book. Xiao-Li and I collected some excellent articles and put a huge amount of effort into editing them. I think the book is great but it hasn't sold a lot. Perhaps we should've structured it slightly differently so it could've been used as a course book? And of course we shouldn't have published with Wiley, who are notorious for pricing their books too high. (I notice they now charge $132 (!) for Feller's famous book on probability theory.) Why did we go with Wiley? At the time, Xiao-Li and I thought it would be difficult to find a publisher so we didn't really try shopping it around. In retrospect, we didn't fully realize how great our book was; we were satisfied just to get it out there without thinking clearly about what would happen next.

"Data Analysis Using Regression and Multilevel/Hierarchical Models." The awkward "Multilevel/Hierarchical" thing is Phil's fault: I wanted to go with "multilevel" (because I felt, and still feel, that "hierarchical" can be seen as implying nested models, and it was very important for me in this book to go beyond the simple identification of multilevel models with simple hierarchical designs and data structures), but Phil pointed out that "hierarchical" is a much more standard word than "multilevel" (for example, "hierarchical model" gets four times as many Google hits as "multilevel model"). So I did the awkward think and kept both words. (And Jennifer was fine with this too.) Also we needed to put Regression in there because a multilevel model is really just regression with a discrete predictor. And Data Analysis for the reasons described above. The book has sold well so the title doesn't seem to have hurt it any.

"Red State, Blue State, Rich State, Poor State: Why Americans Vote the Way They Do." I think this was a mistake. First, as some people have pointed out and as we realized even at the time, we don't actually say why Americans vote the way they do. I really wish we had chosen our other candidate subtitle, "How Americans are Polarized and How They're Not." Beyond this, I'm actually down on the whole Red State, Blue State thing. Sure, it's grabby, but I fear it makes the book seem less serious. Given that we didn't become the next Freakonomics and we didn't sell a zillion copies, if I could go back in time I'd give it a more serious title, such as, hmmm..., "Geographic and Demographic Polarization in American Poliitcs"--no, that's too serious-sounding. Maybe "Democrats and Republicans: Who They Are, Where They Live, and Where They Stand on the Issues." Or "American Voters, Red and Blue: Who They Are, Where They Live, and Where They Stand on the Issues." Something that is a bit grabby but conveys more of our research content. (Many people were misled by our title into thinking the book was merely a retread of our Red State, Blue State article, but really it was full of original research that, to this date, has still only appeared in the book.)

"A Quantitative Tour of the Social Sciences." I can't imagine a better title for this one. And I love the book, too. In addition to having wonderful content, it has a great cover that was contributed by a blog commenter (who I still have to send a free book to; sorry!). We've gotta do a better job of promoting it, but I'm not quite sure how. Here's a nice review.

I have a few more books in (various stages of) the pipeline, but I'll hold off telling you their titles until they're closer to done.

I remember many years ago being told that political ideologies fall not along a line but on a circle: if you go far enough to the extremes, left-wing communists and right-wing fascists end up looking pretty similar.

I was reminded of this idea when reading Christian Robert and George Casella's fun new book, "Introducing Monte Carlo Methods with R."

I do most of my work in statistical methodology and applied statistics, but sometimes I back up my methodology with theory or I have to develop computational tools for my applications. I tend to think of this sort of ordering:

Probability theory - Theoretical statistics - Statistical methodology - Applications - Computation

Seeing this book, in which two mathematical theorists write all about computation, makes me want to loop this line in a circle. I knew this already--my own single true published theorem is about computation, after all--but I tend to forget. In some way, I think that computation--more generally, numerical analysis--has taken some of the place in academic statistics that was formerly occupied by theorem-proving. I think it's great that many of our more mathematical-minded probabilists and statisticians can follow their theoretical physicist colleagues and work on computational methods. I suspect that applied researchers such as myself will get much more use out of theory as applied to computation, as compared to traditionally more prestigious work on asymptotic inference, uniform convergence, mapping the rejection regions of hypothesis tests, M-estimation, three-armed bandits, and the like.

Don't get me wrong--I'm not saying that computation is the only useful domain for statistical theory, or anything close to that. There are lots of new models to be built and lots of limits to be understood. Just, for example, consider the challenges of using sample data to estimate properties of a network. Lots of good stuff to do all around.

Anyway, back to the book by Robert and Casella. It's a fun book, partly because they resist the impulse to explain everything or to try to be comprehensive. As a result, reading the book requires the continual solution of little puzzles (as befits a book that introduces its chapters with quotations from detective novels). I'm not sure if this was intended, but it makes it a much more participatory experience, and I think for that reason it would also be an excellent book for a course on statistical computing.

Charles Warne writes:

A colleague of mine is running logistic regression models and wants to know if there's any sort of a test that can be used to assess whether a coefficient of a key predictor in one model is significantly different to that same predictor's coefficient in another model that adjusts for two other variables (which are significantly related to the outcome). Essentially she's wanting to statistically test for confounding, and while my initial advice was that a single statistical test isn't really appropriate since confounding is something that we make an educated judgement about given a range of factors, she is still keen to see if this can be done. I read your 2006 article with Hal Stern "The difference between 'significant' and 'not significant' is not itself statistically significant" which included the example (p. 328) where evidence for a difference between the results of two independent studies was assessed by summing the squares of the standard errors of each and taking the square root to give the standard error of the difference (se=14). My question is whether this approach can be applied to my colleague's situation, given that both logistic regression models are based on the same sample of individuals and therefore are not independent? Is there an adjustment that can be used to produce more accurate standard errors for non-independent samples or should i not be applying this approach at all? Is there a better way this problem could be tackled?

My reply: No, you wouldn't want to take the two estimates and treat them as if they were independent. My real question, though, is why your colleague wants to do this in the first place. It's not at all clear what question such an analysis would be answering.

P.S. Warne adds:

My final exam


I'm not particularly proud of this one, but I thought it might interest some of you in any case. It's the final exam for the course I taught this fall to the economics students at Sciences Po. Students were given two hours.



Thinking about Erma Bombeck, I'm reminded of the whole "overexposure" phenomenon. Some people get overexposed but it's still ok. The classic example is Michael Jackson: no matter what, people still think Billie Jean and the rest are cool. And somehow Dave Barry managed to hit the stratosphere without getting that "overexposed" vibe. But Bombeck had more of the classic pattern: at first, she was this exciting new thing--I remember when we got The Grass Is Always Greener Over The Septic Tank out of the library--then, somewhere along the way, she became tacky. I guess it would make sense to go reread The Grass is Always Greener and see if it's still funny. I think I'd still think Art Buchwald's old columns are funny, but who knows.

And then there's Erle Stanley Gardner. I have no sense whether he was "overexposed" or just had his deserved period of popularity which naturally ended.

Boris writes, regarding the recent U.S. Senate election (in which moderate Republican Scott Brown narrowly beat liberal Democrat Martha Coakley in usually reliably-Democratic Massachusetts):

I [Boris] disagree with Josh Tucker that the election isn't that consequential. First, the pivotal Senator will now be a Republican, not a Democrat. The parties put a lot of pressure on moderate members of Congress to vote one way or the other; it's often unsuccessful, but its a pretty powerful source of influence. Second, that pivotal Senator will be Brown, not Snowe (if my prediction proves accurate). Finally, this pivotality will exist on every issue, not just health care reform, which probably just expired in its current form. Not too shabby as a consequential election, right?

Based upon his voting record in the Massachusetts State Senate as well the Votesmart surveys of MA state legislators (include his own from 2002), I [Boris] estimate that Brown is to the left of the leftmost Republican in the Senate, Olympia Snowe of Maine and to the right of the rightmost Democrat in the Senate, Ben Nelson of Nebraska. Just as important, Brown stands to become the pivotal member of the Senate--that is, the 60th least liberal (equivalently, the 40th most conservative)-a distinction previously held by Nelson.

More here.

I posted a note on the other blog about the difference between internal and external coherence of political ideology. The basic idea is that, a particular person or small group can have an ideology (supporting positions A, B, C, and D, for example) that is perfectly internally coherent--that is, all these positions make sense given the underlying ideology--while being incoherent with other ideologies (for example, those people who support positions A, B, not-C, and not-D). What's striking to me is how strongly people can feel that their beliefs on a particular issue flow from their being a liberal, or a conservative, or whatever, even though others with similar opinions will completely disagree with them on that issue.

Stephen Dubner reports on an observational study of bike helmet laws, a study by Christopher. Carpenter and Mark Stehr that compares bicycling and accident rates among children among states that did and did not have helmet laws. In reading the data analysis, I'm reminded of the many discussions Bob Erikson and I have had about the importance, when fitting time-series cross-sectional models, of figuring out where your identification is coming from (this is an issue that's come up several times on this blog)--but I have no particular reason to doubt the estimates, which seem plausible enough. The analysis is clear enough, so I guess it would be easy enough to get the data, fit a hierarchical model, and, most importantly, make some graphs of what's happening before and after the laws, to see what's going on in the data.

Beyond this, I had one more comment, which is that I'm surprised that Dubner found it surprising that helmet laws seem to lead to a decrease in actual bike riding. My impression is that when helmet laws are proposed, this always comes up: the concern that if people are required to wear helmets, they'll just bike less. Hats off to Carpenter and Stehr for estimating this effect in this clever way, but it's certainly an idea that's been discussed before. In this context, I think it wouldb useful to think in terms of sociology-style models of default behaviors as well as economics-style models of incentives.

I read this report by Matthew Yglesias that Blue Cross/Blue Shield is "covertly backing far-right efforts to get health reform declared unconstitutional." I don't want to get into a discussion about whether these efforts are really "far-right"--I know next to nothing about the politics of the health reform battle.

What I really wanted to convey here was my first reaction upon seeing this, which was: Blue Cross/Blue Shield?? I remember this organization from the 70s, when it was my vague impression that Blue Cross was synonymous with "health insurance." I've always thought of it as a quasi-public organization, a sort of default health plan. I mean, sure, they're a private organization, so I assume that, just like the gas company and the electric company and the phone company, they're probably top-heavy with overpaid executives who don't do anything while earning ten times what they'd get on the federal scale. Whatever. That's the system we have here: people who work for quasi-public companies get a soft deal.

I was surprised, though, to hear about Blue Cross doing such strong lobbying. Sort of similar to the reaction I had seeing the percentage of political contributions from employees at Harvard etc. that went to the Democrats. I mean, sure, employees of Harvard have the right to give to whoever they want, but, still, there's something funny about a quasi-public institution such as Harvard (or Blue Cross) leaning so strongly on one side of the debate.

I don't really know if I should think of any of this as a problem; it's just seems strange to think of Blue Cross as sponsoring a covert political agenda. It almost sounds like something from one of those '60s parody spy movies, where the bad guys aren't the Russians or ex-Nazis or whatever, but . . . Blue Cross!

I like paperback books that fit in my pocket. Unfortunately, about 25 years ago they pretty much stopped printing books in that size. Usually the closest you can get are those big floppy "trade paperbacks" or, in the case of the occasional Stephen King-type bestseller, a thick-as-a-brick paperback with big printing and fat pages.

It's not my place to question book marketers. My best theory is that book prices went up, for whatever reason, and then people wanted to feel like they're getting their money's worth: instead of a little pocket book for $2.95, you get the trade paperback for $16.95. Personally, I'd prefer the little book--whether or not I'm paying $16.95--but probably others feel differently. It's sort of like they way they'll sell you 50 aspirins in a bottle that would hold 200, and so forth.

Anyway, I pretty much have to get my pocket books used. I was in a used bookstore the other day and bought Killing Time (1961) by Donald E. Westlake, an author whom I've referred to before as the master of the no-redeeming-social-value thriller. This book was pretty good, and, on top of that, it actually had some redeeming social value.

I'll get back to this point in a moment, but first I wanted to say that one of the funnest things about reading a book from fifty years ago is to get a sense of how things used to be. Killing Time takes place in a small East Coast town which is dominated by a few local bigwigs. I imagine there used to be a lot of places like this in the old days but not so much any more, now that not so many people work in factories, and local ties are weaker. It reminded me of when I watched a bunch of Speed Racer cartoons with Phil in a movie theater in the early 90s. These were low-budget Japanese cartoons from the 60s that we loved as kids. From my adult perspective, the best parts were during the characters' long drives, where you could see Japanese industrial scenes in the background.

OK, now back to the "redeeming social value" thing. In Killing Time, Westlake takes the traditional Philip Marlowe private eye scenario and turns it inside out. The main character of the book (named Smith--make of that what you will) follows the standard pattern: he's outwardly cynical, just wanting to live his life and get by, but underlying this he has a philosophy of government that you might call "realistic idealism" or "idealistic realism." In the book, some reformers from the state capital come to town with the goal of exposing corruption, but private eye Smith doesn't want to go along with this: in his view, the reformers are naive, society has a balance, and it's best to keep things on an even keel. There's a crucial scene about two-thirds of the way through the book, though, where I suddenly realized (through the words of another character) how Smith's apparent cynicism is an extreme form of idealism. And then when I got to the end of the book, I had a sense of the explosive internal contradictions inherent in the standard "private eye" view of the world.

What I can't figure out is how anybody could write a private eye story with a straight face after reading the Westlake book. To me, it really closes the door on the genre. It's the Watchmen of private eye novels.

P.S. An interesting thing about Westlake is that he has not, I believe, ever had a breakout bestseller. I don't know what it takes to get such success, but I don't think it ever happened to him. He had many books made into movies, though, so I'm sure he did just fine financially.

P.P.S. Don't get me wrong, it"s not like I'm saying Westlake is some sort of unrecognized literary master. He has great plots and settings and charming characters, but nothing I've ever read of his has the emotional punch of, say, Scott Smith's A Simple Plan (to choose a book whose plot would fit well into the Westlake canon).

It's the Gatsby seminar in the Computational Neuroscience Unit at University College London, Mon 18 Jan at 4pm:

Creating structured and flexible models: some open problems

A challenge in statistics is to construct models that are structured enough to be able to learn from data but not be so strong as to overwhelm the data. We introduce the concept of "weakly informative priors" which contain important information but less than may be available for the given problem at hand. We also discuss some related problems in developing general models for taxonomies and deep interactions. We consider how these ideas apply to problems in social science and public health. If you don't walk out of this talk a Bayesian, I'll eat my hat.

P.S. Link updated.

Nate does Bayes

| 1 Comment

The classical statisticians among you can call it a measurement-error model. Whatever.

Bayesian statistics then and now


The following is a discussion of articles by Brad Efron and Rob Kass, to appear in the journal Statistical Science. I don't really have permission to upload their articles, but I think (hope?) this discussion will be of general interest and will motivate some of you to read the others' articles when they come out. (And thanks to Jimmy and others for pointing out typos in my original version!)

It is always a pleasure to hear Brad Efron's thoughts on the next century of statistics, especially considering the huge influence he's had on the field's present state and future directions, both in model-based and nonparametric inference.

Three meta-principles of statistics

Before going on, I'd like to state three meta-principles of statistics which I think are relevant to the current discussion.

First, the information principle, which is that the key to a good statistical method is not its underlying philosophy or mathematical reasoning, but rather what information the method allows us to use. Good methods make use of more information. This can come in different ways: in my own experience (following the lead of Efron and Morris, 1973, among others), hierarchical Bayes allows us to combine different data sources and weight them appropriately using partial pooling. Other statisticians find parametric Bayes too restrictive: in practice, parametric modeling typically comes down to conventional models such as the normal and gamma distributions, and the resulting inference does not take advantage of distributional information beyond the first two moments of the data. Such problems motivate more elaborate models, which raise new concerns about overfitting, and so on.

As in many areas of mathematics, theory and practice leapfrog each other: as Efron notes, empirical Bayes methods have made great practical advances but "have yet to form into a coherent theory." In the past few decades, however, with the work of Lindley and Smith (1972) and many others, empirical Bayes has been folded into hierarchical Bayes, which is part of a coherent theory that includes inference, model checking, and data collection (at least in my own view, as represented in chapters 6 and 7 of Gelman et al, 2003). Other times, theoretical and even computational advances lead to practical breakthroughs, as Efron illustrates in his discussion of the progress made in genetic analysis following the Benjamini and Hochberg paper on false discovery rates.

My second meta-principle of statistics is the methodological attribution problem, which is that the many useful contributions of a good statistical consultant, or collaborator, will often be attributed to the statistician's methods or philosophy rather than to the artful efforts of the statistician himself or herself. Don Rubin has told me that scientists are fundamentally Bayesian (even if they don't realize it), in that they interpret uncertainty intervals Bayesianly. Brad Efron has talked vividly about how his scientific collaborators find permutation tests and p-values to be the most convincing form of evidence. Judea Pearl assures me that graphical models describe how people really think about causality. And so on. I'm sure that all these accomplished researchers, and many more, are describing their experiences accurately. Rubin wielding a posterior distribution is a powerful thing, as is Efron with a permutation test or Pearl with a graphical model, and I believe that (a) all three can be helping people solve real scientific problems, and (b) it is natural for their collaborators to attribute some of these researchers' creativity to their methods.

The result is that each of us tends to come away from a collaboration or consulting experience with the warm feeling that our methods really work, and that they represent how scientists really think. In stating this, I'm not trying to espouse some sort of empty pluralism--the claim that, for example, we'd be doing just as well if we were all using fuzzy sets, or correspondence analysis, or some other obscure statistical method. There's certainly a reason that methodological advances are made, and this reason is typically that existing methods have their failings. Nonetheless, I think we all have to be careful about attributing too much from our collaborators' and clients' satisfaction with our methods.

My third meta-principle is that different applications demand different philosophies. This principle comes up for me in Efron's discussion of hypothesis testing and the so-called false discovery rate, which I label as "so-called" for the following reason. In Efron's formulation (which follows the classical multiple comparisons literature), a "false discovery" is a zero effect that is identified as nonzero, whereas, in my own work, I never study zero effects. The effects I study are sometimes small but it would be silly, for example, to suppose that the difference in voting patterns of men and women (after controlling for some other variables) could be exactly zero. My problems with the "false discovery" formulation are partly a matter of taste, I'm sure, but I believe they also arise from the difference between problems in genetics (in which some genes really have essentially zero effects on some traits, so that the classical hypothesis-testing model is plausible) and in social science and environmental health (where essentially everything is connected to everything else, and effect sizes follow a continuous distribution rather than a mix of large effects and near-exact zeroes).

To me, the false discovery rate is the latest flavor-of-the-month attempt to make the Bayesian omelette without breaking the eggs. As such, it can work fine if the implicit prior is ok, it can be a great method, but I really don't like it as an underlying principle, as it's all formally based on a hypothesis testing framework that, to me, is more trouble than it's worth. In thinking about multiple comparisons in my own research, I prefer to discuss errors of Type S and Type M rather than Type 1 and Type 2 (Gelman and Tuerlinckx, 2000, Gelman and Weakliem, 2009, Gelman, Hill, and Yajima, 2009). My point here, though, is simply that any given statistical concept will make more sense in some settings than others.

For another example of how different areas of application merit different sorts of statistical thinking, consider Rob Kass's remark: "I tell my students in neurobiology that in claiming statistical significance I get nervous unless the p-value is much smaller than .01." In political science, we're typically not aiming for that level of uncertainty. (Just to get a sense of the scale of things, there have been barely 100 national elections in all of U.S. history, and political scientists studying the modern era typically start in 1946.)

Progress in parametric Bayesian inference

I also think that Efron is doing parametric Bayesian inference a disservice by focusing on a fun little baseball example that he and Morris worked on 35 years ago. If he would look at what's being done now, he'd see all the good statistical practice that, in his section 10, he naively (I think) attributes to "frequentism." Figure 1 illustrates with a grid of maps of public opinion by state, estimated from national survey data. Fitting this model took a lot of effort which was made possible by working within a hierarchical regression framework--"a good set of work rules," to use Efron's expression. Similar models have been used recently to study opinion trends in other areas such as gay rights in which policy is made at the state level, and so we want to understand opinions by state as well (Lax and Phillips, 2009).

I also completely disagree with Efron's claim that frequentism (whatever that is) is "fundamentally conservative." One thing that "frequentism" absolutely encourages is for people to use horrible, noisy estimates out of a fear of "bias." More generally, as discussed by Gelman and Jakulin (2007), Bayesian inference is conservative in that it goes with what is already known, unless the new data force a change. In contrast, unbiased estimates and other unregularized classical procedures are noisy and get jerked around by whatever data happen to come by--not really a conservative thing at all. To make this argument more formal, consider the multiple comparisons problem. Classical unbiased comparisons are noisy and must be adjusted to avoid overinterpretation; in constrast, hierarchical Bayes estimates of comparisons are conservative (when two parameters are pulled toward a common mean, their difference is pulled toward zero) and less likely to appear to be statistically significant (Gelman and Tuerlinckx, 2000).

Another way to understand this is to consider the "machine learning" problem of estimating the probability of an event on which we have very little direct data. The most conservative stance is to assign a probability of ½; the next-conservative approach might be to use some highly smoothed estimate based on averaging a large amount of data; and the unbiased estimate based on the local data is hardly conservative at all! Figure 1 illustrates our conservative estimate of public opinion on school vouchers. We prefer this to a noisy, implausible map of unbiased estimators.

Of course, frequentism is a big tent and can be interpreted to include all sorts of estimates, up to and including whatever Bayesian thing I happen to be doing this week--to make any estimate "frequentist," one just needs to do whatever combination of theory and simulation is necessary to get a sense of my method's performance under repeated sampling. So maybe Efron and I are in agreement in practice, that any method is worth considering if it works, but it might take some work to see if something really does indeed work.

Comments on Kass's comments

Before writing this discussion, I also had the opportunity to read Rob Kass's comments on Efron's article.

I pretty much agree with Kass's points, except for his claim that most of Bayes is essentially maximum likelihood estimation. Multilevel modeling is only approximately maximum likelihood if you follow Efron and Morris's empirical Bayesian formulation in which you average over intermediate parameters and maximize over hyperparameters, as I gather Kass has in mind. But then this makes "maximum likelihood" a matter of judgment: what exactly is a hyperparameter? Things get tricky with mixture models and the like. I guess what I'm saying is that maximum likelihood, like many classical methods, works pretty well in practice only because practitioners interpret the methods flexibly and don't do the really stupid versions (such as joint maximization of parameters and hyperparameters) that are allowed by the theory.

Regarding the difficulties of combining evidence across species (in Kass's discussion of the DuMouchel and Harris paper), one point here is that this works best when the parameters have a real-world meaning. This is a point that became clear to me in my work in toxicology (Gelman, Bois, and Jiang, 1996): when you have a model whose parameters have numerical interpretations ("mean," "scale," "curvature," and so forth), it can be hard to get useful priors for them, but when the parameters have substantive interpretations ("blood flow," "equilibrium concentration," etc.), then this opens the door for real prior information. And, in a hierarchical context, "real prior information" doesn't have to mean a specific, pre-assigned prior; rather, it can refer to a model in which the parameters have a group-level distribution. The more real-worldy the parameters are, the more likely this group-level distribution can be modeled accurately. And the smaller the group-level error, the more partial pooling you'll get and the more effective your Bayesian inference is. To me, this is the real connection between scientific modeling and the mechanics of Bayesian smoothing, and Kass alludes to some of this in the final paragraph of his comment.

Hal Stern once said that the big divide in statistics is not between Bayesians and non-Bayesians but rather between modelers and non-modelers. And, indeed, in many of my Bayesian applications, the big benefit has come from the likelihood. But sometimes that is because we are careful in deciding what part of the model is "the likelihood." Nowadays, this is starting to have real practical consequences even in Bayesian inference, with methods such as DIC, Bayes factors, and posterior predictive checks, all of whose definitions depend crucially on how the model is partitioned into likelihood, prior, and hyperprior distributions.

On one hand, I'm impressed by modern machine-learning methods that process huge datasets and I agree with Kass's concluding remarks that emphasize how important it can be that the statistical methods be connected with minimal assumptions; on the other hand, I appreciate Kass's concluding point that statistical methods are most powerful when they are connected to the particular substantive question being studied. I agree that statistical theory is far from settled, and I agree with Kass that developments in Bayesian modeling are a promising way to move forward.

This story is pretty funny. "Distractions in the classroom," indeed. They take nursery school pretty seriously down there in Texas, huh?

Where's Ripley on the web?


Related to our discussion of influential statisticians, I looked up Brian Ripley, who has long been an inspiration to me. (Just to take one example, the final chapter of his book on spatial processes had an example of simulation-based model checking that had a big influence on my ideas in that area.)

I was stunned to find that his webpage hasn't been updated since 2002, and it links to a "list of recent and forthcoming papers" that, believe it or not, hasn't been updated since 1997! I can't figure it out, especially given that Ripley is so computer-savvy and still appears to be active in the computational statistics community. Perhaps someone can explain?

P.S. No rude comments, please. Thank you.

P.P.S. Somebody pointed out that you can search for B D Ripley's recent papers using Google. Here's what's been going on since 2002. Aside from the R stuff, he seems to have been focusing on applied work. Perhaps he could be persuaded to write an article for a statistics journal discussing what he's learned from these examples. I find that working with applied collaborators gives me insights that I never would've had on my own, and I'd be interested in hearing Ripley's thoughts on his own successes and struggles on applied problems.

First the scientific story, then the journalist, then my thoughts.

Part 1: The scientific story

From the Daily News:

Spanking makes kids perform better in school, helps them become more successful: study

The research, by Calvin College psychology professor Marjorie Gunnoe, found that kids smacked before age 6 grew up to be more successful . . . Gunnoe, who interviewed 2,600 people about being smacked, told the [London] Daily Mail: "The claims that are made for not spanking children fail to hold up. I think of spanking as a dangerous tool, but then there are times when there is a job big enough for a dangerous tool. You don't use it for all your jobs."

From the Daily Mail article:

Professor Gunnoe questioned 2,600 people about being smacked, of whom a quarter had never been physically chastised. The participants' answers then were compared with their behaviour, such as academic success, optimism about the future, antisocial behaviour, violence and bouts of depression. Teenagers in the survey who had been smacked only between the ages of two and six performed best on all the positive measures. Those who had been smacked between seven and 11 fared worse on negative behaviour but were more likely to be academically successful. Teenagers who were still smacked fared worst on all counts.

Part 2: The journalist

Po Bronson (whose life and career are eerily similar to the slightly older and slightly more famous Michael Lewis) writes about this study in Newsweek:

Unfortunately, there's been little study of [kids who haven't been spanked], because children who've never been spanked aren't easy to find. Most kids receive physical discipline at least once in their life. But times are changing, and parents today have numerous alternatives to spanking. The result is that kids are spanked less often overall, and kids who've never been spanked are becoming a bigger slice of the pie in long-term population studies.

One of those new population studies underway is called Portraits of American Life. It involves interviews of 2,600 people and their adolescent children every three years for the next 20 years. Dr. Marjorie Gunnoe is working with the first wave of data on the teens. It turns out that almost a quarter of these teens report they were never spanked.

So this is a perfect opportunity to answer a very simple question: are kids who've never been spanked any better off, long term?

Gunnoe's summary is blunt: "I didn't find that in my data." . . . those who'd been spanked just when they were young--ages 2 to 6--were doing a little better as teenagers than those who'd never been spanked. On almost every measure.

A separate group of teens had been spanked until they were in elementary school. Their last spanking had been between the ages of 7 and 11. These teens didn't turn out badly, either.

Compared with the never-spanked, they were slightly worse off on negative outcomes, but a little better off on the good outcomes. . . .

Gunnoe doesn't know what she'll find, but my thoughts jump immediately to the work of Dr. Sarah Schoppe-Sullivan, whom we wrote about in NurtureShock. Schoppe-Sullivan found that children of progressive dads were acting out more in school. This was likely because the fathers were inconsistent disciplinarians; they were emotionally uncertain about when and how to punish, and thus they were reinventing the wheel every time they had to reprimand their child. And there was more conflict in their marriage over how best to parent, and how to divide parenting responsibilities.

I [Bronson] admit to taking a leap here, but if the progressive parents are the ones who never spank (or at least there's a large overlap), then perhaps the consistency of discipline is more important than the form of discipline. In other words, spanking regularly isn't the problem; the problem is having no regular form of discipline at all.

I couldn't find a copy of Gunnoe's report on the web. Her local newspaper (the Grand Rapids News) reports that she "presented her findings at a conference of the Society for Research in Child Development," but the link only goes to the conference website, not to any manuscript. Following the link for Marjorie Gunnoe takes me to this page at Calvin College, which describes itself as "the distinctively Christian, academically excellent liberal arts college that shapes minds for intentional participation in the renewal of all things."

Gunnoe is quoted in the Grand Rapids Press as saying:

"This in no way should be thought of as a green light for spanking . . . This is a red light for people who want to legally limit how parents choose to discipline their children. I don't promote spanking, but there's not the evidence to outlaw it."

I'm actually not sure why these results, if valid, should not by taken as a "green light" for spanking, but I guess Gunnoe's point is that parental behaviors are situational, and you might not want someone reading her article and then hitting his or her kid for no reason, just for its as-demonstrated-by-research benefits.

Unsurprisingly, there's lots of other research on the topic of corporal punishment. A commenter at my other blog found a related study of Gunnoe's, from 1997. It actually comes from an entire issue of the journal that's all about discipine, including several articles on spanking.

Another commenter linked to several reports of research, including this from University of New Hampshire professor Murray Straus:


(I don't know who is spanked exactly once, but maybe this is #times spanked per week, or something like that. I didn't search for the original source of the graph.)

I agree with the commenter that it would be interesting to see Gunnoe and Straus speaking on the same panel.

Part 3: My thoughts

I can't exactly say that Po Bronson did anything wrong in his writeup--he's knowledgeable in this area (more than I am, certainly) and has thought a lot about it. He's a journalist who's written a book on child-rearing, and this is a juicy topic, so I can't fault him for discussing Gunnoe's findings. And I certainly wouldn't suggest that this topic is off limits just because nothing has really been "proved" on the causal effects of corporal punishment. Research in this area is always going to be speculative.

Nonetheless, I'm a little bothered by Bronson's implicit acceptance of Gunnoe's results and his extrapolations from her more modest claims. I get a bit uncomfortable when a reporter starts to give explanations for why something is happening, when that "something" might not really be true at all. I don't see any easy solution here--Bronson is even careful enough to say, "I admit to taking a leap here." Still, I'm bothered by what may be a too-easy implicit acceptance of an unpublished research claim. Again, I'm not saying that blanket skepticism is a solution either, but still . . .

It's a tough situation to be in, to report on headline-grabbing claims when there's no research paper to back them up. (I assume that if Bronson had a copy of Gunnoe's research article, he could send it to various experts he knows to get their opinions.)

P.S. I altered the second-to-last paragraph above in light of Jason's comments.

Jim Madden writes:

I have been developing interactive graphical software for visualizing hierarchical linear models of large sets of student performance data that I have been allowed access to by the Louisiana Department of Education. This is essentially pre/post test data, with student demographic information (birthday, sex, race, ses, disability status) and school associated with each record. (Actually, I can construct student trajectories for several years, but I have not tried using this capability yet.) My goal is to make the modeling more transparent to audiences that are not trained in statistics, and in particular, I am trying to design the graphics so that nuances and uncertainties are apparent to naive viewers.

Andrew Gelman, Jingchen Liu, and Sophia Rabe-Hesketh are looking for two fulltime postdocs, one to be based at the Department of Statistics, Columbia University, and the other at the Graduate School of Education, University of California, Berkeley.

The positions are funded by a grant entitled "Practical Tools for Multilevel/Hierarchical Modeling in Education" (Institute of Education Sciences, Department of Education). The project addresses modeling and computational issues that stand in the way of fitting larger, more realistic models of variation in social science data, in particular the problem that (restricted) maximum likelihood estimation often yields estimates on the boundary, such as zero variances or perfect correlations. The proposed solution is to specify weakly informative prior distributions - that is, prior distributions that will affect inferences only when the data provide little information about the parameters. Existing software for maximum likelihood estimation can then be modified to perform Bayes posterior modal estimation, and these ideas can also be used in fully Bayesian computation. In either case, the goal is to be able to fit and understand more complex, nuanced models. The postdocs will contribute to all aspects of this research and will implement the methods in C and R (Columbia postdoc) and Stata (Berkeley postdoc).

Both locations are exciting places to work with research groups comprising several faculty, postdocs, and students working on a wide range of interesting applied problems. The Columbia and Berkeley groups will communicate with each other on a regular basis, and there will be annual workshops with outside panels of experts.

Applicants should have a Ph.D. in Statistics or a related area and have strong programming skills, preferably with some experience in C/R/Stata. Experience with Bayesian methods and good knowledge of hierarchical Bayesian and/or "classical" multilevel modeling would be a great advantage.

The expected duration of the positions is 2 years with an approximate start date of September 1, 2010.

Please submit a statement of interest, not exceeding 5 pages, and your CV to either Andrew Gelman and Jingchen Liu (asc.coordinator@stat.columbia.edu) or Sophia Rabe-Hesketh (Graduate School of Education, 3659 Tolman Hall, Berkeley, CA 94720-1670, sophiarh@berkeley.edu), stating whether you would also be interested in the other position. Applications will be considered as they arrive but should be submitted no later than April 1st.

This is in addition to, but related to, our other postdoctoral positions. A single application will work for all of them.

Alex Tabarrok has posts on the amusing story of Westerners overestimating the Soviet economy. For example, here's a graph from the legendary Paul Samuelson textbook (from 1961):


Tabarrok points out that it's even worse than it looks: "in subsequent editions Samuelson presented the same analysis again and again except the overtaking time was always pushed further into the future so by 1980 the dates were 2002 to 2012. In subsequent editions, Samuelson provided no acknowledgment of his past failure to predict and little commentary beyond remarks about 'bad weather' in the Soviet Union."

The bit about the bad weather is funny. If you've had bad weather in the past, maybe the possibility of future bad weather should be incorporated into the forecast, no?

As Tabarrok and his commenters point out, this mistake can't simply be attributed to socialist sympathies of the center-left Samuelson: For one thing, various other leftist economists did not think that the Soviets were catching up to us; and, for another, political commentators on the right at the time were all telling us that the communists were about to overwhelm us militarily.

I don't really have anything to add here, I just agree with Alex that it's a funny graph.

This is just to say


They are always saying to check the temperature of your oven. Well, "they" aren't kidding. I checked with an oven thermometer, and it was 30 degrees (C) higher than labeled. We were suspicious that something was going on, but who'd ever think it could be off by 30 degrees??

Alex Tabarrok links to these amusing partial Google searches found by Dan Ariely:



Ariely pretty much takes these at face value, labeling them "What Boyfriends and Girlfriends Search for on Google" and writing: "This shows Google's remarkable power as a source of data on a range of human behaviors, emotions, and opinions. It gives us insights into what people might care the most about concerning a given topic. . . ."

I followed a link in Ariely's comments to a blog whose entire content are partial Google searches. Seems like a bit of a niche market to me, but the results were so weird (for example, one of the top ten searches for "my rob" is "my robot friend is pregnant") that i started to get skeptical.

So I tried the simplest thing I could think of on my own computer, and here's what came out:


(Click to see a larger version of this image.)

My second choice was est-ce-que, which also yielded some strange results.

So my current thought is not to take these Google partial searches so seriously. I wonder if the algorithm purposely spits out wacky searches in order to make the search function more fun to play with.

Maybe some of the Google employees who read this blog can enlighten us (anonymously, if necessary) about how seriously we should interpret these?

P.S. Ariely's blog is pretty cool--a mix of some basic intro stuff (as is appropriate, since the blog is attached to his popular book and some deeper ideas too. When Predictably Irrational came out, we received from his publicist several emails, a copy of the book (which Juli reviewed), and a suggestion that we could interview Ariely or he could guest blog for us. We said yes on both but never heard back. I can understand it: publicists get busy, and we did get a free book out of it. But, Dan, if you're reading this, get in touch: we'd still be glad to have you guest blog for us!

P.P.S. Our earlier discussion of googlefights as a teaching tool.

Tyler Cowen adds to the always-popular "Europe vs. America" debate. At stake is whether western European countries are going broke and should scale back their social spending (for example, here in Paris they have free public preschool starting at 3 years old, and it's better than the private version we were paying for in New York), or whether, conversely, the U.S. should ramp up spending on public goods, as Galbraith suggested back in the 1950s when he wrote The Affluent Society.

Much of the debate turns on statistics, oddly enough. I'm afraid I don't know enough about the topic to offer any statistical contributions to the discussion, but I wanted to bring up one thing which I remember people used to talk about but hasn't seem to have come up in the current discussion (unless I've missed something, which is quite possible).

Here's my question. Shouldn't we be impressed by the performance of the U.S. economy, given that we've spent several zillion dollars more on the military than all the European countries combined, but our economy has continued to grow at roughly the same rate as Europe's? (Cowen does briefly mention "military spending" but only in a parenthetical, and I'm not sure what he was referring to.) From the other direction, I guess you could argue that in the U.S., military spending is a form of social spending--it's just that, instead of providing health care etc. for everyone, it's provided just for military families, and instead of the government supporting some modern-day equivalent of a buggy-whip factory, it's supporting some company that builds airplanes or submarines. Anyway, this just seemed somewhat relevant to the discussion.

P.S. OK, there's one place where I can offer a (very small) bit of statistical expertise.

Someone who wishes to remain anonymous writes:

I was consulting with an academic group, and I noticed and pointed out what I believed was a clear and obvious mistake in something they were planing to publish on. Now we can always be wrong, but here the mistake was mathematical and had been recently published by a well know author - not me. So I am very sure about it and after the consulting came to an end I sent a very blunt email along with a warning that they would not want their error to be pointed in a letter to the editor. They were arguing that the error was subtle and maybe not a really an error or at least one that was well or widely understood. (If they accept it is an error they would have to develop a new method.) Unfortunately/fortunately I noticed a recent paper from them was listed in the references of a statistical methods paper I was just asked to review. I checked the paper and the error is there with not even a warning about any uncertainty or mixed opinions about there possibly being an error. There was no signed non-disclosure clause. What should I do now?

I can't find someone else to write the letter to the editor I will but I am wondering how often others run into this and what ideas they have.

P.S. To preserve anonymity, some of the details have been faslified in unimportant ways.

My reply:

Many years ago I was involved in a project--not as a coauthor but just as a very peripheral member of a research group--where the data collection didn't go so well, the experimenters got a lot fewer participants than they had hoped for, and when all was said and done, there didn't seem to be much difference between treated and control groups. We were all surprised--the treatment made a lot of sense and we all thought it would work. After it didn't, we had lots of theories as to why that could've been, but we didn't really know. One other thing: there were before and after measures (of course), and both the treatment and the control groups showed strong (and statistically significant) improvement.

I drifted away from this project, but later I heard that the leader of the study had published a paper just with the treatment group data. Now, if all you had were the treatment data--if it were a simple before/after study--that would be fine: there would be questions about causality but you go with what you've got.

I didn't follow up on this, partly because I don't know the full story here. The last I saw of the data, there wasn't much going on, but it may very well be that completely proper data analysis led to cleaner results which were ultimately publishable. To say it again for emphasis: I don't really know what was happening here; I only heard some things second-hand. In any collaboration, it's good to have a certain level of communication, trust, and sense of common purpose which just wasn't there in that project at all.

Anyway, back to the original question: I don't see why you can't yourself write and submit a letter to the editor. First run it by the authors of the article to see what they say, then send it to the journal. That would seem like the best option for all concerned. Ideally it won't be perceived as a moral issue but just as a correction. As the author of a published paper with a false theorem, I'm all in favor of corrections!

A while ago, this blog had a discussion of short English words that have no rhymes. We've all heard of "purple" (which, in fact, rhymes with the esoteric but real word hirple) and "orange" in this context, but there are others. This seems a bit odd, which I guess is why some of these words are famous for having no rhyme. Naively, and maybe not so naively, one might expect that at least some new words would be created to take advantage of the implied gaps in the gamut of two-syllable words. Is there something that prevents new coinages from filling the gaps? Why do we have blogs and and vegans and wikis and pixels and ipods, but not merkles and rilvers and gurples?

I have a hypothesis, which is more in the line of idle speculation. Perhaps some combinations are automatically disfavored because they interfere with rapid processing of the spoken language. I need to digress for just a moment to mention a fact that supposedly baffled early workers in speech interpretation technology: in spoken language, there are no pauses or gaps between words. If you say a typical sentence --- let's take the previous sentence for example --- and then play it back really slowly, or look at the sound waveform on the screen, you will find that there are no gaps between most of the words. It's not "I -- need -- to -- digress," it's "Ineed todigressforjusta momento..." Indeed, unless you make a special effort to enunciate clearly, you may well use the final "t" in "moment" as the "t" in "to": most people wouldn't say the t twice. But with all of these words strung together, how is it that our minds are able to separate and interpret them, and in fact to do this unconsciously most of the time, to the extent that we feel like we are hearing separate words?

My thought --- and, as I said, it is pure speculation --- is that perhaps there is an element of "prefix coding" in spoken language, or at least in spoken English (but presumably others too). "Prefix coding" is the assignment of a code such that no symbol in the code is the start (prefix) of another symbol in the code. Hmm, that sentence only means something if you already know what it means. Try this. Suppose I want to compose a language based on only two syllables, "ba" and "fee". Using a prefix code, it's possible to come up with a rule for words in this language, such that I can always tell where one word stops and another word ends, even with no gaps between words. ("Huffman coding" provides the most famous way of doing this.) For instance, suppose I have words bababa, babafee, feeba, bafee, and feefeefee. No matter how I string these together, it turns out there is only one possible breakdown into words: babafeefeebabafeefeefeefeefeebabababa can only be parsed one way, so there's no need for word breaks. In fact, as soon as you reach the end of one word, you know you have done so; no need to "go backwards" from later in the message, to try out alternative parses.

English doesn't quite work like this. For example, the syllable string see-thuh-car-go-on-the-ship can be interpreted as "see the cargo on the ship" or "see the car go on the ship". But it took me several tries to come up with that example! To a remarkable degree, you don't need pauses between the words, especially if the sentence also has to make sense.

So, maybe words that rhymes with "circle" or "empty" are disfavored because they would interfere with a quasi-"prefix coding" character of the language? Suppose there were a word "turple" for example. It would start with a "tur" sound, which is one of the more common terminal sounds in English (center, mentor, enter, renter, rater, later...). A string of syllables that contains "blah-blah-en-tur-ple-blah" could be split more than one place...maybe that's a problem. Of course, you'll say "but there are other words that start with "tur", why don't those cause a problem, why just "turple"? But there aren't all that many other common "tur" words --- surprisingly few, actually --- turn, term, terminal. "Turple" would be the worst, when it comes to parsing, because its second syllable --- pul --- is a common starting syllable in rapidly spoken English (where many pl words, like please and plus and play, start with an approximation of the sound).

So...perhaps I'm proposing nonsense, or perhaps I'm saying something that has been known to linguists forever, but that's my proposal: some short words tend to evolve out of the language because they interfere with our spoken language interpretation.

Coblogger John Sides quotes a probability calculation by Eric Lawrence that, while reasonable on a mathematical level, illustrates a sort of road-to-error-is-paved-with-good-intentions sort of attitude that bothers me, and that I see a lot of in statistics and quantitative social science.

I'll repeat Lawrence's note and then explain what bothers me.

Here's Lawrence:

In today's Wall Street Journal, Nate Silver of 538.com makes the case that most people are "horrible assessors of risk." . . . This trickiness can even trip up skilled applied statisticians like Nate Silver. This passage from his piece caught my [Lawrence's] eye:
"The renowned Harvard scholar Graham Allison has posited that there is greater than a 50% likelihood of a nuclear terrorist attack in the next decade, which he says could kill upward of 500,000 people. If we accept Mr. Allison's estimates--a 5% chance per year of a 500,000-fatality event in a Western country (25,000 causalities per year)--the risk from such incidents is some 150 times greater than that from conventional terrorist attacks."

Lawrence continues:

Here Silver makes the same mistake that helped to lay the groundwork for modern probability theory. The idea that a 5% chance a year implies as 50% chance over 10 years suggests that in 20 years, we are certain that there will be a nuclear attack. But . . . the problem is analogous to the problem that confounded Chevalier de Méré, who consulted his friends Pascal and Fermat, who then derived several laws of probability. . . . A way to see that this logic is wrong is to consider a simple die roll. The probability of rolling a 6 is 1/6. Given that probability, however, it does not follow that the probability of rolling a 6 in 6 rolls is 1. To follow the laws of probability, you need to factor in the probability of rolling 2 6s, 3 6s, etc.

So how can we solve Silver's problem? The simplest way turns the problem around and solves for the probability of not having a nuclear attack. Then, preserving the structure of yearly probabilities and the decade range, the problem becomes P(no nuclear attack in ten years) = .5 = some probability p raised to the 10th power. After we muck about with logarithms and such, we find that our p, which denotes the probability of an attack not occurring each year, is .933, which in turn implies that the annual probability of an attack is .067.

But does that make a difference? The difference in probability is less than .02. On the other hand, our revised annual risk is a third larger. . . .

OK, so Lawrence definitely means well; he's gone to the trouble to write this explanatory note and even put in some discussion of the history of probability theory. And this isn't a bad teaching example. But I don't like it here. The trouble is that there's no reason at all to think of the possibility of a nuclear terrorist attack as independent in each year. One could, of course, go the next step and try a correlated probability model--and, if the correlations are positive, this would actually increase the probability in any given year--but that misses the point too. Silver is making an expected-value calculation, and for that purpose, it's exactly right to divide by ten to get a per-year estimate. Beyond this, Allison's 50% has got to be an extremely rough speculation (to say the least), and I think it confuses rather than clarifies matters to pull out the math. Nate's approximate calculation does the job without unnecessary distractions. Although I guess Lawrence's comment illustrates that Nate might have done well to include a parenthetical aside to explain himself to sophisticated readers.

This sort of thing has happened to me on occasion. For example, close to 20 years ago I gave a talk on some models of voting and partisan swing. To model votes that were between 0 and 1, we first did a logistic transformation. After the talk, someone in the audience--a world-famous statistician who I respect a lot (but who doesn't work in social science) asked about the transformation. I replied that, yeah, I didn't really need to do it: nearly all the vote shares were between 0.2 and 0.8, and the logit was close to linear in that range; we just did logit to be on the safe side. [And, actually, in later versions of this research, we ditched the logit as being a distraction that hindered the development of further sophistication in the aspects of the model that really did matter.] Anyway, my colleague responded to my response by saying, No, he wasn't saying I should use untransformed data. Rather, he was asking why I hadn't used a generalized linear model; after all, isn't that the right thing to do with discrete data. I tried to explain that, while election data are literally discrete (there are no fractional votes), in practice we can think of congressional election data as continuous. Beyond this, a logit model would have an irrelevant-because-so-tiny sqrt(p(1-p)/n) error term which would require me to add an error term to the model anyway, which would basically take me back to the model I was already starting with. This point completely passed him by, and I think he was left with the impression that I was being sloppy. Which I wasn't, at all. In retrospect, I suppose a slide on this point would've helped; I'd just assumed that everyone in the audience would automatically understand the irrelevance of discrete-data models to elections with hundreds of thousands of votes. I was wrong and hadn't realized the accumulation of insights that any of us gather when working within an area of application, insights which aren't so immediately available to outsiders--especially when they're coming into the room thinking of me (or Nate Silver, as above) as an "applied statistician" who might not understand the mathematical subtleties of probability theory.

P.S. Conflict-of-interest note: I post on Sides's blog and on Silver's blog, so I'm conflicted in all directions! On the other hand, neither of them pays me (nor does David Frum, for that matter; as a blogger, I'm doing my part to drive down the pay rates for content providers everywhere), so I don't think there's a conflict of interest as narrowly defined.

Frank Hansen writes:

Life expectancy is an outcome with possibly many confounding factors like genes or lifestyle, other than cost.

I used the 2007 oecd data on system resources to construct a health care system score. Here is the graph of that score against per capita cost:


The other day I saw some kids trying to tell knock-knock jokes, The only one they really knew was the one that goes: Knock knock. Who's there? Banana? Banana who? Knock knock. Who's there? Banana? Banana who? Knock knock. Who's there? Orange. Orange who? Orange you glad I didn't say banana?

Now that's a fine knock-knock joke, among the best of its kind, but what interests me here is that it's clearly not a basic k-k; rather, it's an inspired parody of the form. For this to be the most famous knock-knock joke--in some circles, the only knock-knock joke--seems somehow wrong to me. It would be as if everybody were familiar with Duchamp's Mona-Lisa-with-a-moustache while never having heard of Leonardo's original.

Here's another example: Spinal Tap, which lots of people have heard of without being familiar with the hair-metal acts that inspired it.

The poems in Alice's Adventures in Wonderland and Through the Looking Glass are far far more famous now than the objects of their parody.

I call this the Foghorn Leghorn category, after the Warner Brothers cartoon rooster ("I say, son . . . that's a joke, son") who apparently was based on a famous radio character named Senator Claghorn. Claghorn has long been forgotten, but, thanks to reruns, we all know about that silly rooster.

And I think "Back in the USSR" is much better known than the original "Back in the USA."

Here's my definition: a parody that is more famous than the original.

Some previous cultural concepts

Objects of the class "Whoopi Goldberg"

Objects of the class "Weekend at Bernie's"



Benjamin Kay writes:

I wonder if you saw Bruce Reed's article The Year of Running Dangerously -- In a tough economy, incumbency is the one job nobody wants. about the recent flurry of retirement announcements in the Senate but also less well publicized ones in the House. My understanding is that there is a known effect on retirement from census driven redistricting. We also happen to be in a census year but I haven't read any journalists discussing that as as a factor. Do you have any insight into the relative explanatory decomposition of partisan politics, redistricting related concerns, and simple economy driven unpopularity in these retirement decision?

My reply: Retirement rates definitely go up in redistricting years (see, for example, figure 3 here), but that would be 2012, not 2010, I believe. The census is this year, but I don't think they're planning to redraw the district lines in time for the 2010 elections.

In any case, I imagine that somebody has studied retirement rates, to see if they tend to go up in marginal seats in bad economies. Overall, retirement rates are about the same in marginal seats as any other (at least, that's what Gary and I found when we looked at it in the late 1980s), but I could imagine that things vary by year. The data are out there, so I imagine somebody has studied this.

P.S. I've never understood why anybody would want to retire from a comfortable white-collar job. But now, spending a year on sabbatical and relaxing most of the time, I can really understand the appeal of retirement.



I learned from a completely reliable source that the letter to the editor I published in the Journal of Theoretical Biology was largely in error.

I have to admire the thousands of anonymous Wikipedians who catch the mistakes of poseurs such as myself, and I'm looking forward to the forthcoming correction in the journal. The editors of JTB must be pretty embarrassed to have published a letter that was so wrong--but I guess it makes sense that they could be bamboozled by a statistician from a fancy Ivy League college.

Oh well, at least it's better that I learn the error of my ways now than that I live the rest of my life under the illusion that I knew what I was doing all this time!

P.S. I wish the anonymous Wikipedia editor had contacted me directly regarding my mistakes. As it is, I may never learn exactly how my criticisms have been "already addressed and corrected."

P.P.S. Due to the dynamic nature of Wikipedia, the above is now out of date. (Latest version is here.) Take a look at the comments below.

I'm giving a short course (actually, more like a series of lectures) in Leuven on 17 Feb.

Influential statisticians


Seth lists the statisticians who've had the biggest effect on how he analyzes data:

1. John Tukey. From Exploratory Data Analysis I [Seth] learned to plot my data and to transform it. A Berkeley statistics professor once told me this book wasn't important!

2. John Chambers. Main person behind S. I [Seth] use R (open-source S) all the time.

3. Ross Ihaka and Robert Gentleman. Originators of R. R is much better than S: Fewer bugs, more commands, better price.

4. William Cleveland. Inventor of loess (local regression). I [Seth] use loess all the time to summarize scatterplots.

5. Ronald Fisher. I [Seth] do ANOVAs.

6. William Gosset. I [Seth] do t tests.

My data analysis is 90% graphs, 10% numerical summaries (e.g., means) and statistical tests (e.g., ANOVA). Whereas most statistics texts are about 1% graphs, 99% numerical summaries and statistical tests.

I think this list is pretty reasonable, but I have a few comments:

1. Just to let youall know, I wasn't the Berkeley prof who told Seth that EDA wasn't important. I've even published an article about EDA. That said, Tukey's book isn't perfect. I mean, really, who cares about the January temperature in Yuma?

2, 3. I agree that S and R are hugely important. But if they weren't invented, maybe we'd just be using APL or Matlab?

4. Cleveland also made important contributions to statistical graphics.

5. I've written an article about Anova too, but at this point I think of Fisher's version of Anova as an excellent lead-in to hierarchical models and not such a great tool in itself. I think that psychology researchers will be better off when they forget about sums of squares, mean squares, and F tests, and instead focus on coefficients, variance components, and scale parameters.

6. I don't really do t-tests.

P.S. I wouldn't even try to make my own list. As a statistician myself, I've been influenced by so many many statisticians that any such list would run to the hundreds of names. I suppose if I had to make such a list about which statisticians have had the biggest effect on how I analyze data, it might go something like:

1. Rubin: He taught me applied statistics and clearly has had the largest influence on me (and, maybe, on many readers of my books)

2. Laplace/Lindley/etc.: The various pioneers of hierarchical modeling and applied Bayesian statistics

3. Gauss: Least squares, error models, etc etc

4. Cleveland: Crisp, clean graphics for data analysis. Although maybe if Cleveland had never existed, I'd have picked this up from somewhere else

5. Fisher: He's gotta be there, since he's had such a big influence on the statistical practice of the twentieth century

6. Jaynes: Not the philosophy of Bayes stuff, but just one bit--an important bit--in his book where he demonstrated the principle of setting up a model, taking it really seriously, looking hard to see where it doesn't fit the data, and then looking deeply at the misfit to see what it reveals about how the model to see how it could be improves.

But I'm probably missing some big influences that I'm forgetting right now.

Good timing


I was at the store today and bought some pants. Everything was 50-75% off, so I bought 4 pairs instead of just two. Then I came home and read this. Cool!

P.S. I'm amused by the level of passion in many of Tyler's blog commenters. Although, really, who am I to talk, given that I get passionate on the subject of hypothesis tests for contingency tables.

New kinds of spam

| 1 Comment

I got the following bizarre email, subject-line "scienceblogs.com/appliedstatistics/":


After looking at your website, it is clear that you share the same concerns about Infections as we do here at Infection.org. Our website is dedicated to sharing the various up to date information regarding Infection, and we would love to share it to you and your readers.

I would like to discuss possible partnership opportunities with you. Please contact me if you are interested. Thank you.

June Smith
Assistant Editor

Just the "Assistant Editor," huh? I'm assuming that when Instapundit and Daily Kos got this email, it came directly from the top. On the other hand, the real nobody-blogs probably get a request from the Deputy Assistant Editor or an intern or someone like that. . . .

P.S. Spam is a kind of infection, so in this way I guess the message makes a certain kind of sense.

The newest way to slam a belief you disagree with--or maybe it's not so new--is to call it "religious." For example, "Market Fundamentalism is a quasi-religious faith that unregulated markets will somehow always produce the best possible results," and so is global warming ("The only difference between the religions right and the religious left, is that the religious right worships a man, and the religious left worships . . . Mother Nature"). As is evidence-based medicine ("as religious as possible . . . just another excuse, really--to sneer at people"). And then there's the religion of Darwinism.

I encountered an extreme example of this sort of thing recently, from columnist Rod Dreher, who writes disapprovingly of "(Climate) science as religion"--on a religious website called Beliefnet (which has, under the heading, "My Faith," the options Christianity, Buddhism, Catholic, Hinduism, Mormon, Judiasm, Islam, Holistic, and Angels. Dreher actually appears to be a supporter of climate science here; he's criticizing a dissent-suppressing attitude that he sees, not the actual work that's being done by the scientists in the field.

Maybe it's time to retire use of the term "religion" to mean "uncritical belief in something I disagree with." Now that actual religious people are using the term in this way, it would seem to have no meaning left.


Perhaps I'm a little sensitive about this because back when I started doing statistics, people often referred to Bayesianism as a religion. At one point, when I was doing work on Bayesian predictive checking, one of my (ultra-classical) colleagues at Berkeley said that he was not a Bayesian. But if he was, he'd go the full subjective route. So he didn't understand what I was doing.

One of my Berkeley colleagues who studied probability--really, a brilliant guy--commented once that "of course" he was a Bayesian, but he was puzzled by how Bayesian inference worked in an example he'd seen. My feeling was: Bayes is a method, not a religion! Can't we evaluate it based on how it works?

And, a few years ago, someone from the computer science department came over and gave a lecture in the stat dept at Columbia. His talk was fascinating, bu the irritated me by saying how his method gave all the benefits of Bayesian inference "without having to believe in it." I don't believe in logistic regression either, but it sure is useful!

Update on Universities and $

| No Comments

We discussed here, Kevin Carey replies here.

70 Years of Best Sellers


I ran across this book, by Alice Payne Hackett, in the library last month: it lists the bestselling fiction and nonfiction books for every year from 1895 through 1965 (the year the book was written) and also the books that have sold the most total copies during that period. The nonfiction lists start in 1917. It's fun to read these, but the classifications confuse me a bit: apparently Charley Brown, Pogo, and the Bible are considered nonfiction. You could maybe make an argument for one or two of these, but it's hard to imagine that anyone could put all three of them in the nonfiction category!

Hackett writes:

The authors who have had the most titles on the seventy annual ists are Mary Roberts Rinehart with eleven, Sinclair Lewis with ten, Zane Grey and Booth Tarkington with nine each, and Louis Bromfield, Winston Churchill (the American novelist), George Barr McCutcheon, Gene Stratton Porter, Frank Yerby, Edna Ferber, Daphne du Maurier, and John Steinbeck with eight each.

And here are the top selling books in the United States during 1895-1965:

The Pocket Book of Baby and Child Care (The Common Sense Book of Baby and Child Care), by Benjamin Spock, 1946 . . . 19 million copies sold
Better Homes and Gardens Cook Book, 1930 . . . 11 million
Pocket Atlas, 1917 . . . 11 million
Peyton Place, by Grace Metalious, 1956 . . . 10 million
In His Steps, by Charles Monroe Sheldon, 1897 . . . 8 million
God's Little Acre, by Erskine Caldwell, 1933 . . . 8 million
Betty Crocker's New Picture Cookbook, 1950 . . . 7 million
Gone With the Wind, by Margaret Mitchell, 1937 . . . 7 million
How to Win Friends and Influence People, by Dale Carnegie, 1937 . . . 6.5 million
Lady Chatterley's Lover, by D. H. Lawrence, 1932 . . . 6.5 million
101 Famous Poems, compiled by R. J. Cook, 1916 . . . 6 million
English-Spanish, Spanish-English Dictionary, compiled by Carlos Castillo and Otto F. Bond, 1948 . . . 6 million
The Carpetbaggers, by Harold Robins, 1961 . . . 5.5 million
Profiles in Courage, by John F. Kennedy, 1956 . . . 5.5 million
Exodus, by Leon Uris, 1958 . . . 5.5 million
Roget's Pocket Thesaurous, 1923 . . . 5.5 million
I, the Jury, by Mickey Spillane, 1947 . . . 5.5 million
To Kill a Mockingbird,. by Harper Lee, 1960 . . . 5.5 million
The Big Kill, by MIckey Spillane, 1951 . . . 5 million
Modern World Atlas, 1922 . . . 5 million
The Wonderful Wizard of Oz, by L. Frank Baum, 1900 . . . 5 million
The Catcher in the Rye, by J. D. Salinger, 1951 . . . 5 million
My Gun is Quick, by Mickey Spillane, 1950 . . . 5 million
One Lonely Night, by Mickey Spillane, 1951 . . . 5 million
The Long Wait, by Mickey Spillane, 1951 . . . 5 million
Kiss Me, Deadly, by Mickey Spillane, 1952 , , , 5 million
Tragic Ground, by Erskine Caldwell, 1944 . . . 5 millon
30 Days to a More Powerful Vocabulary, by Wilfred J. Funk and Norman Lewis, 1942 . . . 4.5 million
Vengeance is Mine, by MIckey Spillane, 1950 . . . 4.5 million
The Pocket Cook Book, by Elizabeth Woody, 1942 . . . 4.5 million
Return to Peyton Place, by Grace Metalious, 1959 . . . 4.5 million
Never Love a Stranger, by Harold Robbins, 1948 . . . 4.5 million
Thunderball, by Ian Fleming, 1965 . . . 4 million
1984, by George Orwell, 1949 . . . 4 million
The Ugly American, by William J. Lederer and Eugene L. Burdick, 1958 . . . 4 million
A Message to Garcia, by Elbert Hubbard, 1898 , . . 4 million
Hawaii, by James A. Michener, 1959 . . . 4 million

This is so much fun, just typing these in. I hardly know when to stop. OK, here are the next few:

Journeyman, by Erskine Caldwell, 1935 . . . .4 million
The Greatest Story Ever Told, by Fulton Oursler, 1949 . . . 4 million
Kids Say the Darndest Things!, by Art Linkletter, 1957 . . . 4 million
The Radio Amateur's Handbook, 1926 . . . 4 million
Diary of a Young Girl, by Anne Frank, 1952 . . . 3.5 million
From Here to Eternity, by James Jones, 1951 . . . 3.5 million
Goldfinger, by Ian Fleming, 1959 . . . 3.5 million
Lolita, by Vladimir Nabokov, 1958 . . . 3.5 million
Trouble in July, by Erskine Caldwell, 1940 . . . 3.5 million
Lost Horizon, by James Hilton, 1935 . . . 3.5 million
Butterfield 8, by John O'Hara, 1935 . . . 3.5 million
The American Woman's Cook Book, ed. by Ruth Berolzhemier, 1939 . . . 3.5 million
Duel in the Sun, by Niven Busch, 1944 . . . 3.5 million
Georgia Boy, by Erskine Caldwell, 1943 . . . 3.5 million
Four Days, by American Heritage and U.P.I., 1964, 3.5 millon

And those are all the ones that, as of 1965, had at last 3.5 million recorded sales. (Hackett, annoyingly, reports sales figures to the last digit (for example, 19,076,822 for Dr. Spock), but I've rounded, following the rules of good taste.) Of all these, I'd say that five would or could be considered great literature (Chatterley, Catcher in the Rye, 1984, From Here to Eternity, and Lolita). Not such a bad total, considering all the possibilities. I've read ten of the books on the above list (not counting the Betty Crocker cookbook, which is where I got my recipe for biscuits), with From Here to Eternity being my favorite. I once tried to read Mockingbird but with no success. Of all the books above that I haven't read, I'd guess that I'd enjoy the John O'Hara the most. Also, Kids Say the Darndest Things, which someone (Phil?) once told me actually is very funny. Most of the books on the list sound vaguely familiar, but many only vaguely. For example, I recall the name "Erskine Caldwell" but know nothing about his books beyond what I can imagine from the titles. Maybe I once read something on his work in a compilation of reviews by Edmund Wilson ,or something like that? "Duel in the Sun" was made into a movie, Mickey Spillane was famous for suspense thrillers where the hero shot the girl, Kennedy "authored" rather than wrote Profiles in Courage. And so on.

I had a great time just going through the titles and authors. Here's the end of the list, all the books that are listed as having sold a (meaninglessly precise) 1,000,000 copies:

Anthony Adverse, by Harvey Allen, 1933
Brave Men, by Ernie Pyle, 1944
Etiquette, by Emily Post, 1922
The Fire Next Time, by James Baldwin, 1963
A Heap O'Livin', by Edgar Guest, 1916
Little Black Sambo, by Helen Bannerman, 1999
The Moneyman, by Thomas B. Costain, 1947
Pollyanna Grows Up, by Eleanor H. Porter, 1915
Short Story Masterpieces, edited by Robert Penn Warren and Albert Erskine, 1958
The Simple Life, by Charles Wagner, 1901
Stiletto, by Harold Robbins, 1960
Twixt Twelve and Twenty, by Pat Boone, 1957
The Web of Days, by Edna L. Lee, 1947
Will Rogers, by Patrick J. O'Brien, 1935
Youngblood Hawke, by Herman Wouk, 1962

These really are obscure; most of them I'd never heard of before. Seeing that Pat Boone title reminds me of "Pimples and Other Problems, by Mary Robinson," a title which I made up for an assignment in high school in which we were supposed to list and summarize 20 books that we had read. For some reason, my friends and I were getting bored as we got near the end, and so we filled out our lists with made-up books. Fictional fiction, as it were. Although I seem to recall that the Robinson book was more of a book of nonfictional reminisces, in the manner of Erma Bombeck but with an adolescent focus. We all thought it was funny that we padded our lists, but now that I'm a teacher myself, I realize what a pain it is to grade papers. Our teacher probably flipped through our assignments at lightspeed.

The writer with the most titles on the combined list (of all books selling at least a million copies, in all editions) was Erle Stanley Gardner, with 91 (!). It's funny, I don't remember seeing any of these in the public library when I was a kid. I wonder if the librarians considered them too hard-boiled for public consumption. His bestseller was The Case of the Lucky Legs, which came out in 1934 and, through 1965, had sold 3,499,948 copies. The Case of the Sulky Girl, from 1933, had sold 3.2 million copies, and it continues from there.

This is just so much fun. The analogy, I suppose, is with the weekly movie grosses that I've been told are now a national obsession (maybe not anymore?) or the TV ratings, which I remember reading about regularly in the newspaper thirty years ago. Back in 1965, books had some of the central position that movies (and video games?) have now in our culture. (TV seems to have come in and gone out; lots and lots of people watch TV, but I don't get a sense that people care too much anymore what are the top 10 shows in the Nielsens.) Movies are OK, but I'm still much more interested in books, which is one reason I so much enjoyed flipping through 70 Years of Best Sellers. (A sequel, "80 Years . . .," came out 10 years later, but that seems to be the end of the line.) The book concludes with a list of references, various books and articles about bestsellers, many of which look like they'd be fun to read.

P.S. This list is fun too. The numbers are much larger (it has A Tale of Two Cities, at 200 million copies, as the bestselling book not published by the government or a religious group, with the Boy Scout Handbook, Lord of the Rings, and one of the Harry Potter books following up). The numbers on this Wikipedia list come from all different sources and I'm sure that some are overestimates; beyond this, I guess that lots and lots of books have been sold in the forty years since 1965. The Wikipedia list is admittedly incomplete; for example, it doesn't include the aforementioned Perry Mason in its list of bestselling series. It does, however, note that "the Perry Rhodan series has sold more than 1 billion copies." I'd never heard of this one at all, but, following the link, I see that it's some sort of German serial, which I guess puts it in the same category as Mexican comic books and other things that I've vaguely heard about but have never seen. Once you start thinking about things like that--books that blur the boundary between literature and pop entertainment--I guess you can pile up some big numbers.

P.P.S. What are the bestselling statistics books? (Not counting introductory textbooks, which don't quite count, since students don't quite choose to by them.) The first two I can think of are Statistical Methods for Research Workers, Snedecor and Cochran, and Feller volume 1 (counting all editions in each case), but all these were published long ago and probably had most of their sales back in the days before book sales were so high (sales for all books are continuing to increase, in advance of the big crash that's coming some day soon). When thinking of total sales, maybe I should be thinking of books that have come out more recently. Exploratory Data Analysis? The Visual Display of Quantitative Information (yes, that's a statistics book)? I wouldn't quite count Freakonomics or Supercrunchers or Fooled by Randomness or the collected works of Malcolm Gladwell (or, for that matter, Red State, Blue State); these books are all about statistics but I wouldn't quite call them "statistics books." Generalized Linear Models, maybe? Everybody has that one, but in lifetime sales maybe it's not in the Snedecor and Cochran class. I'd hate to think that the all-time #1 is How to Lie with Statistics (or, worse, Statistics for Dummies), but maybe so. Or maybe there's something huge and obvious that I'm forgetting?

And the all-time #1 political science book is, what, Machiavelli? Or Hobbes, maybe? At least until Sarah Palin's memoir comes out.

Different sorts of survey bias

| No Comments

Fascinating blog by Nate Silver on different ways a survey organization can be biased (or not). Issues of question wording, and of which questions to ask in a survey, come up from time to time.

Recidivism statistics


From the news today:

A man charged with trying to kill a Danish cartoonist was arrested last year in an alleged plot to harm U.S. Secretary of State Hillary Clinton, officials said. [...] The suspect was one of four people arrested last summer in Nairobi in an alleged plot to harm Clinton during her tour of African countries, the newspaper Politken reported. The suspect was released from a Kenyan jail in September because of a lack of evidence and returned to Denmark, where he had been living, Sky News reported Sunday.

Just a few days ago, CNN reported:

That announcement led to questions about how many other former Guantanamo detainees may be planning to carry out terrorist attacks.

Pentagon officials have not released updated statistics on recidivism, but the unclassified report from April says 74 individuals, or 14 percent of former detainees, have turned to or are suspected of having turned to terrorism activity since their release.

Of the more than 530 detainees released from the prison between 2002 and last spring, 27 were confirmed to have engaged in terrorist activities and 47 were suspected of participating in a terrorist act, according to Pentagon statistics cited in the spring report.

More at Wikisource.

These are actually lower than the general population, where about 65% of prisoners are expected to be rearrested within 3 years. The numbers seem lower in recent years, about 58%. More at Wikipedia.

David Verbeeten writes:

A half-decade of blogging


We started this blog in October, 2004, as a way for people in my research group to share ideas, and for us to promote and elicit comments on our work. I soon came to regret that we hadn't started a year or so earlier; it appears that, up to 2003, all the blogs linked to all the other blogs. Starting in 2004 or so, the bigtime blogs mostly stopped updating their blogroll. (Luckily for us, Marginal Revolution didn't get the memo.) On the other hand, we benefited from late entry in having a sense of what we wanted the blog to be like. If I'd started blogging in 2002 or 2003, I suspect I would've been like just about everybody else and spewed out my political opinions on everything. By the end of 2004, I'd seen enough blogs that did that, and I realized we could make more of a contribution by keeping it more focused, keeping political bloviating, sarcasm, and academic gossip to a minimum.

Just to be clear: I'm not slammin those other kinds of blogs. Political opinions are great, and I think we really can learn from seeing ideas we agree with (or disagree with) expressed well and with emotion. Sarcasm is great too; it's what makes a peanut-butter-and-sandpaper sandwich worth eating, or something like that. And, hey, I love academic gossip; it's even more fun than reading about celebrities. These just aren't the best ways for me personally to contribute to the discourse.

When I started with the blog, I figured that if we were ever low on material, I could just link to my old articles, one at a time. But it's rarely come to this; in fact, I don't always get around to blogging my new articles and books right when they come out. The #1 freebie is that things I used to put in one-on-one emails or in referee reports, now I put on the blog so everyone can see. Much more efficient, I think. The only bad thing about the blog--other than the time it takes up--is that now I get occasional emails from people informing me of developments in the sociobiology of human sex ratios. A small price to pay, I'd say.

MCMC model selection question


Robert Feyerharm writes in with a question, then I give my response. Youall can play along at home by reading the question first and then guessing what I'll say. . . .

I have a question regarding model selection via MCMC I'd like to run by you if you don't mind.

One of the problems I face in my work involves finding best-fitting logistic regression models for public health data sets typically containing 10-20 variables (= 2^10 to 2^20 possible models). I've discovered several techniques for selecting variables and estimating beta parameters in the literature, for example the reversible jump MCMC.

However, RJMCMC works by selecting a subset of candidate variables at each point. I'm curious, as an alternative to trans-dimensional jumping would it be feasible to use MCMC to simultaneously select variables and beta values from among all of the variables in the parameter space (not just a subset) using the regression model's AIC to determine whether to accept or reject the betas at each candidate point?

Using this approach, a variable would be dropped from the model if its beta parameter value settles sufficiently close to zero after N iterations (say, -.05 < βk < .05). There are a few issues with this approach: Since the AIC isn't a probability density, the Metropolis-Hastings algorithm could not be used here as far as I know. Also, AIC isn't a continuous function (it "jumps" to a lower/higher value when the number of model variables decreases/increases), and therefore a smoothing function is required in the vicinity of βk=0 to ensure that the MCMC algorithm properly converges. I've run a few simulations and this "backwards elimination" MCMC seems to work, albeit it converges to a solution very slowly.

Anyways, if you have time I would greatly appreciate any input you may have. Am I rehashing an idea that has already been considered and rejected by MCMC experts?

Simple ain't easy


Let's start 2010 with an item having nothing to do with statistical modeling, causal inference, or social science.

Jenny Diski writes:

'This is where we came in' is one of those idioms, like 'dialling' a phone number, which has long since become unhooked from its original practice, but lives on in speech habits like a ghost that has forgotten the why of its haunting duties. The phrase is used now to indicate a tiresome, repetitive argument, a rant, a bore. But throughout my [Diski's] childhood in the 1950s and into the 1970s, it retained its full meaning: it was time to leave the cinema - although, exceptionally, you might decide to stay and see the movie all over again - because you'd seen the whole programme through. It seems very extraordinary now, and I don't know how anyone of my generation or older ever came to respect cinema as an art form, but back then almost everyone wandered into the movies whenever they happened to get there, or had finished their supper or lunch, and then left when they recognised the scene at which they'd arrived. Often, one person was more attentive than the other, and a nudge was involved: 'This is where we came in.' . . .

Interesting. It's been awhile since I've come to a move in the middle and sat through the next showing until reaching the point where I came in. Maybe this is not allowed anymore?

The real reason I wanted to discuss Diski's article, though, was because of an offhand remark she makes, dissing an academic author's attempt to write for a popular audience:

Skerry isn't really one to let go of jargon. In the preface he explains how to read his book, not as most books are doomed to be read, from beginning to end, but differently and 'in keeping with the multiplicity of voices that make up the text'. It gets quite scary: 'The temporal structure of these chapters goes from the present-tense narrative of my research trip in Chapter 1 to the achronological, "cubist" structure of Chapter 3 . . .

"Skerry" sounds like the name of a fictional character, but he's actually the author of the book under review.

My real point, though, is that I suspect that Skerry was not intentionally writing in jargon; it's just hard to write clearly. Harder than many readers realize, and maybe harder than professional writer Diski realizes. My guess is that Skerry was trying his best but he just doesn't know any better.

I had a similar discussion with Seth on this awhile ago (sorry, I can't find the link to the discussion), where he was accusing academics of deliberately writing obscurely, to make their work seem deeper than it really is, and I replying that we'd all like to write clearly but it's not so easy to do so.

There are some fundamental difficulties here, the largest of which, I think, is that the natural way to explain a confusing point is to add more words--but if you add too many words, it's hard to follow the underlying idea. Especially given that writing is one-dimensional; you can't help things along with intonation, gestures, and facial expressions. (There's the smiley-face and its cousin, the gratuitous exclamation point (which happened to be remarked upon by Alan Bennett in that same issue of the LRB), but that's slim pickings considering all the garnishes available for augmenting face-to-face spoken conversation.)

P.S. Here's my advice on how to write research articles. I don't really get into the jargon thing. Writing clearly and with minimal jargon is so difficult that I wasn't ready to try to give advice on the topic.

Normative vs. descriptive

| No Comments

Following a link from Rajiv Sethi's blog, I encountered this blog by Eilon Solan, who writes:

One of the assumptions of von-Neumann and Morgenstern's utility theory is continuity: if the decision maker prefers outcome A to outcome B to outcome C, then there is a number p in the unit interval such that the decision maker is indifferent between obtaining B for sure and a lottery that yields A with probability p and C with probability 1-p.

When I [Solan] teach von-Neumann and Morgenstern's utility theory I always provide criticism to their axioms. The criticism to the continuity axiom that I use is when the utility of C is minus infinity: C is death. In that case, one cannot find any p that would make the decision maker indifferent between the above two lotteries.

The funny thing is, this is an example I've used (see section 6 of this article from 1998) to demonstrate that you can, completely reasonably, put dollars and lives on the same scale. As I wrote:

We begin this demonstration by asking the students what is the dollar value of their lives---how much money would they accept in exchange for being killed? They generally answer that they would not be killed for any amount of money. Now flip it around: suppose you have the choice of (a) your current situation, or (b) a probability p$of dying and a probability (1-p) of gaining $1. For what value of p are you indifferent between (a) and (b)? Many students will answer that there is no value of p; they always prefer (a). What about p=10^{-12}? If they still prefer (a), let them consider the following example.

To get a more precise value for p, it may be useful to consider a gain of $1000 instead of $1 in the above decision. To see that $1000 is worth a nonnegligible fraction of a life, consider that people will not necessarily spend that much for air bags for their cars. Suppose a car will last for 10 years; the probability of dying in a car crash in that time is of the order of 10*40,000/280,000,000 (the number of car crash deaths in ten years divided by the U.S. population), and if an air bag has a 50% chance of saving your life in such a crash, this gives a probability of about 7*10^{-4} that the bag will save your life. Once you have modified this calculation to your satisfaction (for example, if you do not drive drunk, the probability of a crash should be adjusted downward) and determined how much you would pay for an air bag, you can put money and your life on a common utility scale. At this point, you can work your way down to the value of $1 (as illustrated in a different demonstration). This can all be done with a student volunteer working at the blackboard and the other students making comments and checking for coherence.

The student discussions can be enlightening. For example, one student, Julie, was highly risk averse: when given the choice between (a) the current situation, and (b) a 0.000 01 probability of dying and a 0.999 99 of gaining $10,000, she preferred (a).
Another student in the class pointed out that 0.000 01 is approximately the probability of dying in a car crash in any given three-week period. After correcting for the fact that Julie does not drive drunk, and that she drives less than the average American, perhaps this is her probability of dying in a car crash, with herself as a driver, in the next six months. By driving, she is accepting this risk; is the convenience of being able to drive
for six months worth $10,000 to her?

This demonstration is especially interesting to students because it shows that they really do put money and lives on a common scale, whether they like it or not.

So . . . is this a violation of the continuity axiom, or not? In a way, it is, because people's stated preferences in these lotteries do not satisfy the axiom. In a way, it's not, because people can be acting in a way consistent with the axiom without realizing it. From this perspective, the axiom (and the associated mathematics) are valuable because they give us an opportunity to confront our inconsistencies.

In that sense, the opposition isn't really normative vs. descriptive, but rather descriptive in two different senses.

(Regular readers of this blog will know that I have big problems with the general use of utility theory in either the normative or the descriptive sense, but that's another story. Here I'm talking about a circumscribed problem where I find utility theory to be helpful.)

Recent Comments

  • Andrew Gelman: I still want to know what they were thinking, listing read more
  • Ben: Does reposting other people's (admittedly stat-related) comments count towards your read more
  • David Shor: How does the census handle it? read more
  • David Shor: I put a lot of weight on six and seven read more
  • dmk38: If you are going to be doing such a systematic read more
  • daniel: This is called a "mixed mode" survey and there is read more
  • http://models.street-artists.org/?author=2: Your blog post got me off my duff to talk read more
  • MV: Cap-recap? read more
  • Cyrus: This is an interesting question. One way to view it read more
  • Kaiser: Gianluca: your formulation is also common in business applications. k read more
  • Jeff S: Multi-unit dwellings can be surveyed without violating IRB. You need read more
  • Kaiser: This is an interesting problem that has broad implications for read more
  • Andrew Gelman: Joshua: No, negativity is no red herring. It's fundamental here read more
  • Joshua Vogelstein: Seems like negativity is sometimes a red herring. The more read more
  • Gianluca Baio: To add on Kaiser's point, while the ICER is still read more

About this Archive

This page is an archive of entries from January 2010 listed from newest to oldest.

December 2009 is the previous archive.

February 2010 is the next archive.

Find recent content on the main index or look in the archives to find all content.