May 9, 2008
Bayesian prediction with high-order interactions!!
Longhai Li did a really cool Ph.D. thesis (under the supervision of Radford Neal) on computing for models with deep interactions. The website containing all stuff about this software, including
the R packages, documentations and references, is here and here. Here's a quick description (from the website):
This R package is used in two situations. The first is to predict the next outcome based on the previous states of a discrete sequence. The second is to classify a discrete response based on a number of discrete covariates. In both situations, we use Bayesian logistic regression models that consider the high-order interactions. The time arising from using high-order interactions is reduced greatly by our compression technique that represents a group of original parameters as a single one in MCMC step. In this version, we use log-normal prior for the hyperparameters. When it is used for the second situation --- classification, we consider the full set of interaction patterns up to a specified order.
And here's the research paper (by Longhai and Radford). I wonder if they've achieved some of my goals in wanting weakly informative priors for models with interactions. That Cauchy thing rings a bell.
P.S. to Longhai: I don't recommend keeping your software in two places. Won't it be a pain to keep both sites up-to-date? Or maybe it's done automatically, I don't know.
Posted by Andrew at 9:42 PM | Comments (0) | TrackBack (0)
Databases and R
Alex Reed writes,
Continue reading "Databases and R"
Posted by Andrew at 9:21 PM | Comments (1) | TrackBack (0)
The flow of time in fiction
Hey, this looks interesting . . . Inderjeet Mani, of Mitre Corporation, will be here on Monday, speaking on Interpreting Fictional Narrative: Crossing Some Ancient Frontiers. Here's the abstract:
While progress has been made on computational understanding of the flow of time in non-fictional genres, there has been little attention paid to time in literary texts. I will discuss a new project that examines the intersection between computational linguistics and narratology. I argue that understanding time in fiction requires not only the construction of timelines, but also a grasp of how characters, and readers' attitudes towards them, evolve. Accordingly, one needs to represent the goals and outcomes of characters' actions, superimposing a model of plot as an additional layer on top of the timeline. The theory models narrative progression in terms of changes in an ideal reader's emotional reactions to particular characters as the plot unfolds. In addition to examining samples from well-known literary works, I will discuss progress to date on an annotation scheme for plot and character evaluations.
Perhaps someone from the Classics department can come and comment on how this relates to the ancient theories of tragedy, comedy, etc. I also wonder how his theories work with explicitly time-organized fiction such as that of Jonathan Coe and Richard Ford (and I guess we could throw Wordsworth in there too), as compared to more straightforward narrative.
The talk will be 1:30 PM, Monday May 12th in the Back Open Conference Area of the CS Building. (enter the CS Building within Mudd and ask the receptionist to direct you back).
Posted by Andrew at 5:54 PM | Comments (0) | TrackBack (0)
Campaign contributions and policies
Michael Franc looked at Federal Election Commission data on campaign contributions and found some interesting things:
Through May 1, the Democratic presidential field has suctioned up a cool $5.7 million from the more than 4,000 donors who list their occupation as “CEO.” The Republicans’ take was only $2.3 million. Chief financial officers, general counsels, directors, and chief information officers also break the Democrats’ way by more than two-to-one margins. . . .
I'm not actually sure where these numbers come from. When I queried the FEC database (looking up "ceo" from 01/01/2008 to 05/01/2008), the total contributions (not just for presidential candidates) were only $45,124. So I must be doing something wrong here in my query. In any case, I guess it makes sense that most of the contributions have gone to Democrats so far, since (a) the Democratic primary has been much more competitive than the Republican, and (b) the Democrats are favored to win this year.
Franc continues:
In this upside-down campaign season when populist GOP campaigners like John McCain and Mike Huckabee surprised the pundits with their primary victories or, in the case of Ron Paul, their fundraising prowess, it almost makes sense that the party of the country club set has been winning the fundraising race among the common man. . . . This trend extends to the saloons, where the Democrats carry the bartenders and the Republicans the waitresses. . .
The bit about the bartenders and waitresses caught my eye. But when I looked it up, I found no contributions from either group this year. Going to the entire database, I did find some "waitress" contributions between 1998 and 2005, but they were mostly to Democrats. Also a few bartender contributions since 1998, again mostly to Democrats. So I'm not really sure about that. I emailed Franc to ask for his data source so I hope to learn more.
Setting aside the data difficulties, I think Franc makes an important point in the conclusion of his article:
Continue reading "Campaign contributions and policies"
Posted by Andrew at 9:04 AM | Comments (2) | TrackBack (0)
May 8, 2008
Unalphabetize!
I dream of a day when a journalist such as Ezra Klein, when seeing a graph such as this from Rob Goodspeed,

will immediately say, Hey! Why are these items in alphabetical order? That just confuses things. (It's not like they need to be in alphabetical order so that we can look up "faith" in the index or whatever.)
I have no substantive comment on the graph except that it seems unfair to McCain in that his page has fewer total words, which as displayed in the graph makes him look less substantive overall. I mean, maybe it's just a choice for him to focus on just a few issues.
P.S. I'm not knocking Goodspeed, who put in the work to make the graph, or Klein, who went to the trouble of finding it. I'm just saying that in the ideal world, an irrelevantly alphabetized graph would JUMP OUT OF THE PAGE as something not quite right, in the way that a typo or grammatical error does now. But, hey, my job is education, right? So here's my try.
P.P.S. Howard Wainer has called this the Alabama First error and wrote an article on the topic in Chance in 2001.
Posted by Andrew at 1:27 PM | Comments (8) | TrackBack (0)
The candy weighing demonstration, or, the unwisdom of crowds
My favorite statistics demonstration is the one with the bag of candies. I've elaborated upon it since including it in the Teaching Statistics book and I thought these tips might be useful to some of you.
Preparation
Buy 100 candies of different sizes and shapes and put them in a bag (the plastic bag from the store is fine). Get something like 20 large full-sized candy bars, 20 or 30 little things like mini Snickers bars and mini Peppermint Patties. And then 50 or 60 really little things like tiny Tootsie Rolls, lollipops, and individually-wrapped Life Savers. Count and make sure it's exactly 100.
You also need a digital kitchen scale that reads out in grams.
Also bring a sealed envelope inside of which is a note (details below). When you get into the room, unobtrusively put the note somewhere, for example between two books on a shelf or behind a window shade.
Setup
Hold up the back of candy and the scale and write the following on the board:
Each pair of students should:
1. Pull 5 candies out of the bag
2. Weigh the candies
3. Write down the weight
4. Put the candies back in the bag!!
5. Pass the scale and bag to your neighbors
6. Silently multiply the weight of the 5 candies by 20.
(And, as Frank Morgan told me once, remember to read aloud everything you write on the board. Don't write silently.)
The students should work in pairs. Explain that their goal is to estimate the total weight of all the candies in the bag. They can choose their 5 candies using any method--systematic sampling, random sampling, whatever. Whichever pair guesses closest to the true weight. they get the whole bag!
Demonstrate how to zero the scale, give the scale and the bag of candies to a pair of students in the front row, and let them go.
Action
The demo will proceed silently while the rest of the class proceeds. So do whatever you were going to do in class. Take a look to make sure the scale and bag are moving slowly through the room. After about 30 or 40 minutes, it will reach the back and the students will be done.
At this point, ask the pairs, one at a time, to call out their estimates. Write them on the board. They will be numbers like 3080, 2400, 4340, and so forth. Once all the numbers are written, make a crude histogram (for example, bins from 2000-3000 grams, 3000-4000, 4000-5000, etc.). This represents the sampling distribution of the estimates.
Now call up two students from the class (but not from the same pair) to look at all the estimates. Ask them what their best guess is, having seen this information. As the class if they agree with these two students. Now give the bag to the two students in the front of the room and have them weigh it.
Punch line
The weight of all 100 candies will be something like 1658. It's always, always, always lower than all of the individual guesses on the board. Write this true weight as a vertical bar on the histogram that you've drawn. This is a great way to illustrate the concepts of bias and standard error of an estimator.
Now call out to the students who are sitting near where you hid the envelope: "Um, uh, what's that over there . . . is it an envelope??? Really? What's inside? Could you open it up?" A student opens it and reads out what's written on the sheet inside: "Your guesses are all too high!"
Aftermath
Now's the time to talk about sampling. Large candies are easy to see and to grab, while small candies fall through the gaps between the large ones and end up at the bottom of the bag. You can draw analogies to doing a random sample by going to the mall or by sending out an email survey and seeing who responds. Ask, How could you do a random sample. It won't be obvious to the students that the way to do a random sample is to number each of the candies from 1 to 100 and pick numbers at random. Also, as noted above, this is an example you can use later in the semester to illustrate bias and standard error.
P.S. My feeling about describing these demos is the same as what Penn and Teller say about why they show audiences how they do their tricks: it's even cooler when you know how it works.
P.P.S. Remember--it's crucial that the candies in the bag be of varying sizes, with a few big ones and lots of little ones!
Posted by Andrew at 10:04 AM | Comments (10) | TrackBack (0)
May 7, 2008
Speak clearly
When you leave a voice mail, please say your name and phone number slowly and clearly. Thank you.
Posted by Andrew at 10:29 AM | Comments (4) | TrackBack (0)
Eight Americas?
Ben Goldacre links to this article by Christopher Murray et al.:
Continue reading "Eight Americas?"
Posted by Andrew at 6:50 AM | Comments (1) | TrackBack (0)
May 6, 2008
Can you trust a dataset where more than half the values are missing?
Rick Romell of the Milwaukee Journal Sentinel pointed me to the National Highway Traffic Safety Administration’s data on fatal crashes. Rick writes,
In 2006, for example, NHTSA classified 17,602 fatal crashes as being alcohol-related and 25,040 as not alcohol-related. In most of the crashes classified as alcohol-related, no actual blood-alcohol-concentration test of the driver was conducted. Instead, the crashes were determined to be alcohol-related based on multiple imputation. If I read NHTSA’s reports correctly, multiple imputation is used to determine BAC in about 60% of drivers in fatal crashes.
He goes on to ask, "Can actual numbers be accurately estimated when data are missing in 60% of the cases?" and provides this link to the imputation technique the agency now uses and this link to an NHTSA technical report on the transition to the currently-used technique.
My quick thought is that the imputation model isn't specifically tailored to this problem and I'm sure it's making some systematic mistakes, but I figure that the NHTSA people know what they're doing, and if the imputed values made no sense, they would've done something about it. That said, it would be interesting to see some confidence-building exercises to give a sense that the imputations make sense. (Or maybe they did this already; I didn't look at the report in detail.)
Posted by Andrew at 12:02 AM | Comments (6) | TrackBack (0)
May 5, 2008
Martian inferences
Benjamin Kay points to this:
Continue reading "Martian inferences"
Posted by Andrew at 12:32 AM | Comments (0) | TrackBack (0)
May 3, 2008
Motivations for political contributions
I came across this paper by Sanford Gordon, Catherine Hafer, and Dimitri Landa, who write:
Do individuals give political contributions simply because they derive an expressive or other consumption benefit from doing so? Or are they attempting to influence policy outcomes? If the consumption view is correct, then political donations are just another means by which citizens participate in the political process (unequal to be sure), and need not imply improper or undemocratic influence. In contrast, donation decisions that are driven by an investment motivation, especially when they are made on behalf of small but economically powerful minority interests, naturally raise concerns about the possibility of an undemocratic exchange of policy for dollars.We [Gordon et al.] propose a strategy to distinguish investment and consumption motives for political contributions by examining the behavior of individual corporate executives. If executives expect contributions to yield policies beneficial to company interests, those whose compensation varies directly with corporate earnings should contribute more than those whose compensation comes largely from salary alone. We find a robust relationship between giving and the sensitivity of pay to company performance, and show that the intensity of this relationship varies across groups of executives in ways that are consistent with instrumental giving but not with alternative, taste-based, accounts. Together with earlier findings, our results suggest that contributions are often best understood as purchases of "good will" whose returns, while positive in expectation, are contingent and rare.
The empirical part of the paper looks cool--I have no experience looking at this sort of data and so can't really say anything beyond "it's cool." (Well, I will say that I'd like to see a scatterplot to make it clear at a glance what their data are saying.) But I do have some thoughts on the general framework. They consider political contributions as "consumption" or "investment"--which, as far as I know, follows the mainstream of the discipline, but I have a problem with this approach.
I just don't really see the clear distinction between "consumption" and "investment" in this context.
If someone is contributing from an "expressive or other consumption benefit," presumably this person is giving to the candidate whose policies he or she favors. (Perhaps there are some people who give to the other side for reputational reasons, for example an oil company executive who happens to be a Democrat might give to a Republican so he won't stand out in the crowd, or a college professor might donate to Obama to fit in, even if he's actually a McCain supporter. Or maybe it could go the other way too, that someone would donate $20 to the other side just to get a reputation for being unorthodox. But I imagine this sort of thing represents only a very tiny minority of contributions.) Conversely, someone who's donating as an investment probably thinks that his or her candidate is good for the country as a whole. As the authors note, the translation of unequal financial resources to unequal political resources is a potential distortion of the democratic process--I just don't understand this distinction, especially in light of the fact that voting and small-dollar political contributions are rational to the extent that the voter or contributor believes that his or her preferred candidate will benefit the general good.
Posted by Andrew at 9:31 PM | Comments (4) | TrackBack (0)
May 2, 2008
Election as trial by combat?
The 2008 Democratic primary brings to mind a similar contest in 1972, where an experienced champion faced an exciting young challenger. I'm speaking, of course, of the world chess championship, where Bobby Fischer, down 2 games to zero, destroyed Boris Spassky and unequivocally established himself as the best player in the world.

The Clinton-Obama contest has led to confusion: Obama has basically won the election in the sense of being on track to get more than half of the delegates. In that case, how can Hillary Clinton retain the support of 40% of Democrats nationwide? And how did she manage to win Pennsylvania?
Continue reading "Election as trial by combat?"
Posted by Andrew at 12:19 PM | Comments (7) | TrackBack (0)
May 1, 2008
Pretty polling plots
John Sides presents some data backing up the standard political science view that news blips are not so important in determining election outcomes in two-candidate races.
Posted by Andrew at 3:47 AM | Comments (0) | TrackBack (0)
Through the looking glass
Ubs links to a Wall Street Journal column by John Yoo on problems with the Democrats' presidential nominating procedure. Before going into the details of how Yoo makes a botch of election history, Ubs writes, "I'm not accusing Yoo of being ignorant of history. I know he's a well-educated man, and his words in this column strongly suggest he knows exactly what he's talking about. In spite of that, he somehow manages to turn history upside-down so that it seems to mean exactly the opposite. How one does that, other than out of ignorance, I don't know. Outright deceit? A lawyerly disregard for anything but advocacy? I'm definitely accusing him of something, I'm just not sure what."
My take on this is slightly different: I'm guessing that Yoo is like a lot of people who, once they take a side on an issue, quickly slip toward the assumption that all the facts automatically support their position. As a statistician, I'd like to think I'm particularly aware of the general issue of discordant evidence. (To take Yoo's example, just because a particular nominating system might be bad, you don't have to think that it's bad in all cases--this is what seems to have led him astray in his discussion of the 1824 election, as Ubs discusses in detail.) In contrast, a lawyer may be trained more to brush aside or not even notice details that contradict his main story. Perhaps this is even more true of a lawyer such as Yoo who is famous for writing opinions that are kept secret.
The unwillingness to accept discordant evidence is not unique to lawyers, of course. Hal Stern once telling me about how, in the classic book on racetrack betting, Dr. Z's examples were set up so his system always won. As Hal pointed out, no system will win all the time--all that's required is that it beat the track's 18% edge or whatever--but in a narrative it's disturbing to see counterexamples (unless they're clearly swallowed up into an "it's all right at the end" narrative).
Anyway, that's just a longwinded way of saying that I don't think Yoo was necessarily being deceptive or malicious here. First, I think he probably is somewhat ignorant of the details of elections from the early 1800s (after all, so am I, and I'm a political science professor specializing in American politics); second, he can be falling into the unfortunate but common habit of just assuming that his argument, if correct, must hold in 100% of cases.
But, why?
The more interesting question to me, though, is something that Ubs doesn't ask, which is why did Yoo write this Wall Street Journal column at all? With all his notoriety, wouldn't he be better off keeping his head down rather than writing partisan articles that bring his name further to attention? After all, he's not an expert on elections (at least, I can't find any research by him on the topic), so presumably he could've recommended that someone else write that article. Why would he stick his head up like this and make himself a target?
Here my theory is that Yoo has fully gone through the mirror at this point and has emerged as a political activist. As an academic researcher, you have to be careful of what you say, lest it affect the reputation of your scholarly efforts. Thus the endless qualifications that I and others resort to in all our published work.
To elaborate further: I'm not taking about mistakes. Researchers of all levels of ability make mistakes. Yoo's example seems different--the issue is not so much that he made some errors in his column, but that he stuck his neck out by writing a column on a topic where he's not an expert, and then made the mistakes. It just seems so unnecessary to me.
But, and here the metaphor of the "looking glass" comes in: All of us who are applied researchers have mirror images in the public sphere, where our work--or distorted versions of our work--become more widely known. Many of us want to publicize our work--to write Wall Street Journal op-eds, as it were--partly just to make our work more widely known, partly to present our work the way we think it should be presented, and partly to position ourselves to be more likely to promote our future work. But in doing that we have to protect our research reputations. At some point, though, the publicity or advocacy becomes the point, rather than the research itself. For Yoo, perhaps his reputation as a researcher is so politicized at this point that there's nothing left to protect. At this point, he might as well go for it and develop a name for himself as a freelance editorial-page writer?
As a researcher, I envy newspaper columnists' opportunity to have their writings immediately read by millions of people. At the same time, I assume they envy my ability to spend as much time on in-depth research projects as I would like. On the occasions that I try to write something for a broad readership, I'm careful to protect my viability (as Bill Clinton might say) as a researcher. I wonder if Yoo has decided that the choice has already been made for him.
Continue reading "Through the looking glass"
Posted by Andrew at 12:26 AM | Comments (7) | TrackBack (0)
April 30, 2008
Congratulations, Joel
Joel Beal, also known in our research group as the New Kid, is the valedictorian. Joel did some excellent work on our red-blue project, although we were too disorganized to make full use of him. We'd give him a project on Monday, then on Tuesday he'd return with a bunch of graphs and ask us for more to do. I guess we could've called him the Original Mitch. Econ undergraduate R.A.'s rule.
Posted by Andrew at 2:04 AM | Comments (0) | TrackBack (0)
Another salvo in the ongoing battle over standardizing regression coefficients
Sander Greenland doesn't like the automatic rescaling of regression coefficients (for example, my pet idea of scaling continuous inputs to have a standard deviation of 0.5, to put them on a scale comparable to binary predictors) because he prefers interpretable units (years, meters, kilograms, whatever). Also he points out that data-based rescaling (such as I recommend) creates problems in comparing models fit to different datasets.
OK, fine. I see his points. But let's go out into the real world, where people load data into the computer and fit models straight out of the box. (That's "out of the box," not "outside the box.")
Here's something I saw recently, coefficients (and standard errors) from a fitted regression model:
coefficient for "per-capita GDP": -.079 (.170)
coefficient for "secondary school enrollment": -.001 (.006)
Now you tell me that these have easy interpretations. Sure. I'd rather have seen these standardized. Then I'd be better able to interpret the results. Nobody's stopping you from doing a more careful rescaling, a la Greenland, but that's not the default we're starting from.
Posted by Andrew at 12:18 AM | Comments (6) | TrackBack (0)
April 29, 2008
Congestion pricing
OK, here's a blind item . . . I was talking with a colleague about a certain academic journal, traditionally ranked #2 in a social science field that is associated with government and politics . . . my colleague told me that said journal had recently converted to electronic submissions and that the journal's editors, expressing concern about the increasing volumne of submissions, had decided to slow things down by deliberately sitting on each submission for a month. So, you send them a paper, they wait a month, then they send to reviewers. Reviewers send in their report, the editors wait a month, then they send you the report. You send in your revision, they wait a month, then they send back to reviewers. And so forth.
To me, this seems self-defeating--it would take me more trouble to keep track of the one-month delays than to just review the damn paper. Also, this is the first time I heard of a journal discouraging submissions. My impression is that even the top journals--and their #2 counterparts--find top-quality submissions to be few and far between. On the other hand, they must really be overwhelmed by the workload if they feel the need to resort to such wacky tactics.
Any suggestions? My thought would be to split the journal into 3 or 4 parts with separate editorial staffs for each.
P.S. I've been told that charging $ for submissions (as is done in economics) is a nonstarter--a lot of the people who might submit articles don't make a lot of money and can't easily spare a nonrefundable $50 or whatever to submit.
Posted by Andrew at 12:42 AM | Comments (23) | TrackBack (0)
April 28, 2008
Not enough discrimination?
Aleks pointed me to this article by Stan Liebowitz on the recent financial crisis:
Continue reading "Not enough discrimination?"
Posted by Andrew at 9:50 AM | Comments (7) | TrackBack (0)
April 26, 2008
Teaching skills, not concepts
Dan sends along this article which reports a study saying that math is more effectively taught using drills instead of story problems. Speaking as a teacher (and without actually reading the report of the study), I'd say this is plausible. After 20 years of teaching, I've come to the conclusion that teaching skills works better than teaching concepts (or, should I say, trying to teach concepts).
Continue reading "Teaching skills, not concepts"
Posted by Andrew at 9:33 PM | Comments (8) | TrackBack (0)
What would Rosenstone say?
I can understand Paul Krugman's frustration over the level of discourse in the Democratic primary election campaign, but I don't know of any evidence to support the implicit claim in his last sentence: "unless Democrats can get past this self-inflicted state of confusion, there’s a very good chance that they’ll snatch defeat from the jaws of victory this fall." I pretty much take the general view of political scientists that general election outcomes are pretty much determined by fundamentals--that the voters will get the information needed to realize roughly where Obama (or Clinton) and McCain stand on the key issues and vote accordingly. (See here and here for our evidence, including the picture below.)

Posted by Andrew at 3:57 PM | Comments (0) | TrackBack (0)
Is a 65-hour story better than a 3-hour story?
Jane Dark writes here about movies taking only 100 minutes whereas, on TV, "The Wire is about 65 hours long, divided graciously into five location-based chapters. Movies are now the short form, television the long form." I've never seen The Wire (we live on the 7th floor, no reception) so I can't comment on this example, but the discussion reminds me of the fractal nature of soap operas: in any couple of episodes, so much is happening, but then if you tune in a year or two later, everything's still at the same place. Presumably this is to make things interesting to people who watch every day, while still allowing people to miss an episode.
I'd also comment, regarding length, that single novels are generally agreed to be better than series novels. There are exceptions, sure, and you could argue that some sets of novels (for example, Charles Dickens or Anne Tyler) have enough common themes that they function as series. But Dark is specifically talking about the ability to develop character over the long form. For some reason, you don't usually see novelists doing this (again, you have exceptions such as Richard Ford, John Updike, and Philip Roth). One reason, perhaps, is that part of the fun of a work of literature is the chance to meet new characters. Much as we'd like to see our favorites reappear in future books, there's something that seems to be missing in a mere continuation. So I think there is something missing in Dark's argument.
We live in an age of literary abundance. There are so many great storytellers out there, we don't need to rely on a few characters over and over again, as we have to do in a bedtime-story world in which one's limited power of invention invariably results in the same few characters and formulations shuffled around like a deck of cards.
P.S. It appears that in 1997 Jane Dark apparently saw 52 movies more than I did, so I defer to her expertise.
P.P.S. She also amusingly analogizes Dubai to Michael Jackson, loosely adapting the economic theory that free money corrupts the soul (to which I generally agree, but it doesn't stop me from taking government grants, on the theory (which I sincerely believe to be true in this case) that I'll do thing differently).
P.P.P.S. Hey, I like bread and water. If it's good bread, that is.
P.P.P.P.S. Jenny points me to this.
Posted by Andrew at 7:39 AM | Comments (3) | TrackBack (0)
April 25, 2008
70,000 Assyrians
One of my favorite instances of numeracy in literature is William Saroyan's story, "70,000 Assyrians," which I read in the collection, Bedside Tales. The story is typical charming early-Saroyan: it starts out with him down-and-out, waiting on line for a cheap haircut, then he converses with the barber, asking if he, like Saroyan, is Armenian. No, he replies, he's Assyrian. Saroyan says how sad it is that the Assyrians, like the Armenians, no longer have their own country, but that they can hope for better. The barber says, sadly, that the Assyrians cannot even hope, because they have been so depleted, there are only 70,000 of them left in the world.
This is the numeracy: 70,000 is a large number, a huge number of people. It's crowds and crowds and crowds--enough for an entire society, and then some. But not enough for a country, or not enough in a hostile part of the world where other people are busy trying to wipe you out. The idea that 70,000 is a lot, but not enough--that's numeracy. People can be numerate with dollars--for example, $70,000 is a lot of money but it can't buy you a nice apartment in Manhattan--but it's my impression and others' that people have more difficulty with other sorts of large numbers. That's why this Saroyan story made an impression on me.
Posted by Andrew at 12:00 AM | Comments (2) | TrackBack (0)
April 24, 2008
Praying for a Recession: The Business Cycle and Protestant Religiosity in the United States
In the course of commenting on our article on religion, income, and voting, David Beckworth links to this interesting paper on religiosity and the business cycle:
Mainline Protestant denominations--which tend to have higher income earners--do well in terms of growth during economic booms while evangelical Protestants denominations--which tend to have lower income earners--actually struggle. (During economic downturns the outcomes are reversed--evangelicals Protestant denominations thrive.) In general, I [Beckworth] find mainline Protestants to have a strong procyclical component to their religiosity while evangelicals have a strong countercyclical component. These findings can be explained by again appealing to the labor-leisure choice explained by economic theory.
I haven't had a chance to really look this over, but it seems important, and it reminds me of Robert Putnam's comment that, although we think of religious attendance and denomination as fixed demographic descriptors of people, it's pretty common for people to change denominations--even to switch between Protestant and Catholic. Putnam also said there was evidence that people switch religions to match their political beliefs.
Regarding Beckworth's paper itself, what it really needs (from my perspective) are some graphs that directly map the data to the findings. Regressions are great, but I need some scatterplots to really be convinced. And then the challenge is to map the graphs to the regression estimates. This takes work--sometimes a lot of work--but the payoff is a new level of confidence building, a step beyond mere statistical significance.
Posted by Andrew at 10:07 AM | Comments (1) | TrackBack (0)
The opiate of the elites
In case you didn't see our graph-laden Vox EU article, here it is. The Obama reference is already a bit stale but the content is still fresh, I hope . . .
Barack Obama attracted attention recently by describing small-town Americans who were “bitter” at economic prospects who “cling to guns or religion’’ in frustration. This statement, made during the height of the Democratic nomination battle, has received a lot of attention, but it represents a common view. For example, Senator Jim Webb of Virginia wrote, “Working Americans have been repeatedly seduced at the polls by emotional issues such as the predictable mantra of ‘God, guns, gays, abortion and the flag’ while their way of life shifted ineluctably beneath their feet.’’ And this perspective is not limited to Democrats. For example, conservative columnist David Brooks associates political preference with cultural values that are modern and upscale (“sun-dried tomato concoctions”) or more traditional (“meatloaf platters”).All these claims fit generally into the idea of religion as the opiate of the masses, the idea that social issues distract lower-income voters from their natural economic interests. But there is an opposite view, associated with political scientist Ronald Ingelhart, of post-materialism—the idea that, as people and societies get richer, their concerns shift from mundane bread-and-butter issues to cultural and spiritual concerns.
Which story better describes how Americans vote? Who are the values voters? Are they the poor (as implied by the “opiate of the masses’’ storyline) or the rich (as would be predicted by “post-materialism”)?
Continue reading "The opiate of the elites"
Posted by Andrew at 12:37 AM | Comments (8) | TrackBack (0)
April 23, 2008
Objects of the class "Whoopi Goldberg"

I'm talking about actors who are undeniably talented but are almost always in bad movies, or at least movies that aren't worthy of their talent. Sure, Whoopi was in The Color Purple, but that's it. Other examples: Martin Short. Michael Keaton (well, I liked Mr. Mom and Johnny Dangerously, but they're still not worthy of his talent).
Do they have bad taste, or just bad luck?
What's the opposite? William Holden. (I can't think of any more recent examples of mediocre actors who've appeared in several great movies, but I'm sure there are some.)
P.S. This goes in some sort of series with Objects of the class "Weekend at Bernie's" (which, as the commenters said, include Heathers and Zelig as well).
Posted by Andrew at 7:59 AM | Comments (15) | TrackBack (0)
April 22, 2008
The case of the disappearing Smiths
Sam Roberts writes,
In 1984, according to the Social Security Administration, nearly 3.4 million Smiths lived in the United States. In 1990, the census counted 2.5 million. By 2000, the Smith population had declined to fewer than 2.4 million.
Where did all the Smiths go from 1984 to 1990? I can believe it flatlined after 1990, but it's hard to believe that the count could have changed so much in 6 years.
Perhaps it's the difference between the SSA and Census methods of counting?
Posted by Andrew at 5:14 PM | Comments (5) | TrackBack (0)
Researcher incentives and empirical methods
Bob Erikson pointed me to this paper by Edward Glaeser:
Economists are quick to assume opportunistic behavior in almost every walk of life other than our own. Our empirical methods are based on assumptions of human behavior that would not pass muster in any of our models. The solution to this problem is not to expect a mass renunciation of data mining, selective data cleaning or opportunistic methodology selection, but rather to follow Leamer's lead in designing and using techniques that anticipate the behavior of optimizing researchers. In this essay, I [Glaeser] make ten points about a more economic approach to empirical methods and suggest paths for methodological progress.
This is a great point. The paper itself has an unusual format: the ten key points are made in pages 3-5, and then they are expanded upon in the rest of the paper. I think Glaeser's specific analyses are limited by his focus on classical statistical methods (least-squares regression, p-values, and so forth), but his main points are important, and I'll repeat them here:
Continue reading "Researcher incentives and empirical methods"
Posted by Andrew at 7:27 AM | Comments (6) | TrackBack (0)
Interesting spam
I usually don't like spam, but this message I got the other day from Ed Tranham was pretty good:
Continue reading "Interesting spam"
Posted by Andrew at 6:54 AM | Comments (2) | TrackBack (0)
April 21, 2008
Rockumentaries are the best
I first twigged to this when that Theremin movie came out. That was really cool. Then the Brian-Wilson-o-mentary, which was excellent also (although it could've used some interviews with skeptics who said that Brian is no big deal). The Keith-Richards-produced Chuck Berry movie was fascinating in a different way. And Monterey Pop, Gimme Shelter, Don't Look Back. Even that doc a few years ago about the Funk Brothers--that was really pretty lame, but it was still great. I'm left to conclude that all rockumentaries are the best. I think there are a few zillion more I haven't seen.
Posted by Andrew at 8:31 AM | Comments (4) | TrackBack (0)
April 20, 2008
More on the ever-popular topic of names
Some interesting facts here:
1 in every 25 Americans is named Smith, Johnson, Williams, Brown, Jones, Miller or Davis . . . In 1984, according to the Social Security Administration, nearly 3.4 million Smiths lived in the United States. . . . By 2000, the Smith population had declined to fewer than 2.4 million.
Also there's a list of the 5000 most common last names in America.
Posted by Andrew at 2:38 AM | Comments (3) | TrackBack (0)
Rubik’s cube proof cut to 25 moves

See here. It took me 3 weeks the first time, about 1 week the second time. I remember setting my alarm to 5am so I could work on the cube for two hours in the morning before going to school. Eventually I got my time down to a little over 2 minutes (which is just about the longest I can concentrate on anything). There were two kinds of cube solvers: those who held the cube in a stationary orientation and spun the edges around, and those who kept turning the cube around in their hands to get just the right orientation for each move. I was of this second type, which I think kept my efficiency down. One of my math professors in college told me that he'd solved the cube in theory--he taught abstract algebra--but had never bothered to do it in practice. This impressed me to no end. A guy down the hall from me had a 4x4x4 cube, which at one point we tried to see if we could solve using only 3x3x3 operators. I don't think we succeeded.
It's been years since I've done the cube. Last time I tried and tried and tried and got stuck. If I ever want to do it again, I think I'll have to figure out some operators again from scratch.
Posted by Andrew at 12:52 AM | Comments (1) | TrackBack (0)
April 18, 2008
Coalition dynamics
I hate to publicize this sort of thing, but two different people forwarded it to me, so I thought I should comment. It's a paper by Peter Klimek, Rudolf Hanel, and Stefan Thurner:
The quality of governance of institutions, corporations and countries depends on the ability of efficient decision making within the respective boards or cabinets. Opinion formation processes within groups are size dependent. It is often argued - as now e.g. in the discussion of the future size of the European Commission - that decision making bodies of a size beyond 20 become strongly inefficient. We report empirical evidence that the performance of national governments declines with increasing membership and undergoes a qualitative change in behavior at a particular group size.
I admire the goal of doing empirical analysis, and the graphs are great, but I agree with the Arxiv blogger that their mathematical model of "a critical value of around 19-20 members" is "somewhat unconvincing" (except that I'd remove the "somewhat"). Do people really believe this sort of thing? It seems like numerology to me.
The problem with counting countries
Another problem, to my mind, is the reference to the number of countries in the European Union. I understand that these are sovereign states, but I don't think it makes sense to count them equally. Applying a model in which all voters are equal doesn't make sense to me.
P.S.
I am unhappy with the authors' attempts to imply that their work is relevant to actual politics. That said, I like the rest of the paper--it's a fun model, and you have to start somewhere. After all, I wrote a paper on coalitions myself that had no empirical relevance. So I can hardly object to this sort of academic exercise.
Posted by Andrew at 9:53 PM | Comments (9) | TrackBack (0)
Who are the "values voters"?
Larry Bartels wrote an excellent op-ed on rich and poor voters, ringing many of the bells that we strike in our forthcoming book. Bartels writes:
Do small-town, working-class voters cast ballots on the basis of social issues? Yes, but less than other voters do. Among these voters, those who are anti-abortion were only 6 percentage points more likely than those who favor abortion rights to vote for President Bush in 2004. The corresponding difference for the rest of the electorate was 27 points, and for cosmopolitan voters it was a remarkable 58 points. Similarly, the votes cast by the cosmopolitan crowd in 2004 were much more likely to reflect voters’ positions on gun control and gay marriage.Small-town, working-class voters were also less likely to connect religion and politics. Support for President Bush was only 5 percentage points higher among the 39 percent of small-town voters who said they attended religious services every week or almost every week than among those who seldom or never attended religious services. The corresponding difference among cosmopolitan voters (34 percent of whom said they attended religious services regularly) was 29 percentage points.
It is true that American voters attach significantly more weight to social issues than they did 20 years ago. It is also true that church attendance has become a stronger predictor of voting behavior. But both of those changes are concentrated primarily among people who are affluent and well educated, not among the working class.
Well put, and nicely backed up by statistical evidence.
One little thing . . .
Continue reading "Who are the "values voters"?"
Posted by Andrew at 12:50 AM | Comments (4) | TrackBack (0)
April 17, 2008
Occam
Regarding my anti-Occam stance ("I don't count 'Occam's Razor,' or 'Ockham's Razor,' or whatever, as a justification. You gotta do better than digging up a 700-year-old quote."), David Gillman writes:
I was at your talk at MIT yesterday, and something bothered me until I realized just now that your reason for rejecting Occam's Razor was wrong, from a Bayesian point of view. A priori what's the probability that something somebody says will be remembered for 800 years? I figure it's machine learning people who want your models to be simple, but Occam's answer to that would be that you aren't a machine.
He also says,
If somebody quotes ancient wisdom and you disagree with them, Occam's Razor says don't blame the ancient wisdom, because the person is probably misappropriating it.
Good point.
Posted by Andrew at 2:52 AM | Comments (8) | TrackBack (0)
The rich-poor voting gap in rural areas
For some reason David posted this on his other blog rather than here . . .
David writes,
We can see a steady decline of Republican support among rural poor voters starting in 1972. Even with a big jump in 2000, support for the Republican presidential candidate was less than 50 percent. So, Obama, it looks like poor rural Americans have no problem voting for Democrats.
I'm not quite sure why 2004 isn't included here too, but in any case, the sample size of rural voters is pretty small in each year, so you don't want to over-interpret the jumps from year to year.
Posted by Andrew at 12:54 AM | Comments (2) | TrackBack (0)
April 16, 2008
Nutsy Squirrel
Posted by Andrew at 12:00 PM | Comments (2) | TrackBack (0)
Open space means longer search times
The comments on this entry--yes, I prefer a "sewer" to an "airport," at least when it comes to train stations--prompt me to elaborate on my comments on the Chicago public library, which was widely praised when it was built, enough so that I visited it one day when I was living there. It was stunningly difficult to get to the books--they were hidden on the fourth floor, I believe, and in some small section of low, widely-separated shelves. They didn't understand the concept of book density--the goal of minimizing the travel time between book A and book B.
I had a similar experience yesterday when visiting my friend at MIT. He had a huge office, which at first impressed me, but then I realized that these huge offices and impressive spaces lead to one of MIT's notorious problems: the need to take long walks through featureless corridors. I think it would be a better place with all the spaces sized down by half. That's the Bell Labs style. Bell Labs did it right, and their choice was particularly impressive given that they had tons of extra space and tons of extra money, so they could've built huge offices if they'd wanted to.
Posted by Andrew at 12:10 AM | Comments (4) | TrackBack (0)
April 15, 2008
Statistical software for blind people
What's out there? I have a few desires:
1. A speech-oriented statistics package--a front-end to something like Stata or R with voice commands and spoken output. For example:
User: Regress income on height and sex.
Computer: [repeats, to make sure no misunderstanding] Regress income on height and sex.
User: Yes
Computer: There is no "income" variable
U: What variables do we have?
C: height, sex, weight, occupation, earnings, age---
U: [interrupts] Set y to earnings
C: Set y to earnings
U: Yes
C: Regression of income on height and sex. The intercept is 3.4 with a standard error of 1.2. The slope for height is . . .
U: Add the interaction of height and sex
C: Add the interaction of height and sex
U: Yes
C: Regression of income on height, sex, and height times sex. The intercept is . . .
It would be good to have lots of functions here, but I imagine we could start with regressions and simple statistics and then see what else is useful.
2. A statistical graphics program that uses touch and sound to convey information. For a scatterplot or two-dimensional intensity graph could be conveyed with a setup where as you move a mouse (or a pen, or your hand) over a pad, the computer makes louder sounds where there are more data. I'm thinking of something that sounds like rain, with individual drops for single data points and various sounds of heavy rain or rushing water where there are lots of data.
I'm sure lots more could be done here, for example using some combinations of pitch, timing, chirps, etc., to convey different patterns in data.
Does anyone know what's out there? A quick web search yields this for SPSS and this, which claims to let you hear images, and this screen reader. But what I think we should really be doing is creating some software that is so cool that sighted people will want to use it too.
Posted by Andrew at 11:37 AM | Comments (4) | TrackBack (0)
Redundancy and efficiency
Walking through Penn Station in New York, I remembered how much I love its open structure. By "open," I don't mean bright and airy. I mean "open" in a topological sense. The station has three below-ground levels--the uppermost has ticket counters (and, what is more relevant nowadays, ticket machines), some crappy stores and restaurants, and a crappy waiting area. The middle level has Long Island Rail Road ticket counters, some more crappy stores and restaurants, and entrances to the 7th and 8th Avenue subway lines. The lower level has train tracks and platforms. There are stairs, escalators, and elevators going everywhere. As a result, it's easy to get around, there are lots of shortcuts, and the train loads fast--some people come down the escalators and elevators from the top level, others take the stairs from the middle level.
The powers-that-be keep threatening to spend a couple billion dollars upgrading the station. I hope that never happens, because I know that it will all become much more organized and airportlike, with "gates," long lines, and only one way to get from point A to point B. Something horrible like that new Chicago public library (not so new now, I guess--it was built around 1990) that was so pretty and so nonfunctional.
Posted by Andrew at 9:53 AM | Comments (6) | TrackBack (0)
April 14, 2008
p-values blah blah blah
Karl Ove Hufthammer points me to this paper by Raymond Hubbard and R. Murray Lindsay, "Why P Values Are Not a Useful Measure of Evidence in Statistical Significance Testing."
I agree that p-values are a problem, but not quite for the same reasons as Hubbard and Lindsay do. I was thinking about this a couple days ago when talking with Jeronimo about FMRI experiments and other sorts of elaborate ways of making scientific connections. I have a skepticism about such studies that I think many scientists share: the idea that a questionable idea can suddenly become scientific by being thrown in the same room with gene sequencing, MRIs, power-law fits, or other high-tech gimmicks. I'm not completely skeptical--after all, I did my Ph.D. thesis on medical imaging--but I do have this generalized discomfort with these approaches.
Consider, for example, the notorious implicit assocation test, famous for being able to "assess your conscious and unconscious preferences" and tell if you're a racist. Or consider the notorious "baby-faced politicians lose" study.
From a statistical point of view, I think the problem is with the idea that science is all about rejecting the null hypothesis. This is what researchers in psychology learn, and I think it can hinder scientific understanding. In the "implicit association test" scenario, the null hypothesis is that people perceive blacks and whites identically; differences from the null hypothesis can be interpreted as racial bias. The problem, though, is that the null hypothesis can be wrong in so many different ways.
To return to the main subject, an alarm went off in my head when I read the following sentence in the abstract to Hubbard and Lindsay's paper: "p values exaggerate the evidence against [the null hypothesis]." We're only on page 1 (actually, page 69 of the journal, but you get the idea) and already I'm upset. In just about any problem I've studied, the null hypothesis is false; we already know that! They describe various authoritative-seeming Bayesian articles from the past several decades, but all of them seem to be hung up on this "null hypothesis" idea. For example, they include the notorious Jeffreys (1939) quote: "What the use of P implies … is that a hypothesis that may be true may be rejected because it has not predicted observable results that have not occurred. This seems a remarkable procedure." OK, sure, but I don't believe that the hypothesis "may be true." The question is whether the data are informative enough to reject the model.
Any friend of the secret weapon is a friend of mine
OK, now the positive part. I agree with just about all the substance of Hubbard and Lindsay's recommendations and follow them in practice: interval estimates, not hypothesis tests; and comparing intervals of replications (the "secret weapon"). More generally, I applaud and agree with their effort to place repeated studies in a larger context; ultimately, I think this leads to multilevel modeling (also called meta-analysis in the medical literature).
P.S. This is minor, but I'm vaguely offended by referring to Ronald Fisher as "Sir" Ronald Fisher in an American journal. We don't have titles here! I guess it's better than calling him Lord Fisher-upon-Tyne or whatever.
P.P.S. I don't know if I agree that "An ounce of replication is worth a ton of inferential statistics." More data are fine, but sometimes it's worth putting in a little effort to analyze what you have. Or, to put it more constructively, the best inferential tools are those that allow you to analyze more data that have already been collected.
Posted by Andrew at 11:00 AM | Comments (8) | TrackBack (0)
April 13, 2008
R.I.P. Minghui Yu
Rachel wrote this note about our Ph.D. student who unexpectedly and tragically died recently.

Posted by Andrew at 12:58 PM | Comments (3) | TrackBack (0)
April 11, 2008
My talk at MIT on Monday
I'm speaking Monday 14 April at 4:30 on weakly informative prior distributions and models with interactions. I'll try to make things accessible to a general audience of people who might not know much about statistics in general or Bayesian methods in particular.
Posted by Andrew at 12:00 PM | Comments (7) | TrackBack (0)
The decline of the white working class and the rise of a mass upper middle class
Richard Florida links to this article by Ruy Teixeira and Alan Abramowitz:
Dramatic shifts have taken place in the American class structure since the World War II era. Consider education levels. Incredible as it may seem today, in 1940 three-quarters of adults 25 and over were high school dropouts (or never made it as far as high school), and just 5 percent had a four-year college degree or higher. . . . by 2007, it was down to only 14 percent. . . . In 1940, only about 32 percent of employed US workers held white collar jobs (professional, managerial, clerical, sales). By 2006, that proportion had almost doubled to 60 percent . . . we [Teixeira and Abramowitz] discuss these shifts in the class structure and analyze their political implications, primarily by focusing on the decline of the white working class.
Yu-Sung made some graphs (to appear in our book) that extend earlier estimates of Brooks and Manza show some of the trends in voting over the past fifty years:

Professionals (doctors, lawyers, and so forth) and routine white collar workers (clerks, etc.) used to support the Republicans more than the national average, but over the past half-century they have gradually moved through the center and now strongly support the Democrats. Business owners have moved in the opposite direction, from close to the national average to being staunch Republicans; and skilled and unskilled workers have moved from strong Democratic support to near the middle.
These shifts are consistent with the oft-noted cultural differences between Red and Blue America. Doctors, nurses, lawyers, teachers, and office workers seem today like prototypical liberal Democrats, while businessmen and hardhats seem like good representatives of the Republican party. The dividing points were different 50 years ago. The Republicans still have the support of most of the high-income voters, but these are conservatives of a different sort. As E. J. Dionne noted in analyzing poll data from 2004, the Democrats' strength among well-educated voters is strongest among those with household incomes under $75,000---"the incomes of teachers, social workers, nurses, and skilled technicians, not of Hollywood stars, bestselling authors, or television producers, let alone corporate executives."
We tried to take our analysis further by regressing on income within occupation groups, but we didn't find anything exciting; there wasn't much evidence of different rich/poor voting gaps in different occupation categories. The Teixeira and Abramowitz article adds something to this picture because they talk about how the relative sizes of these different groups are changing.
Posted by Andrew at 7:41 AM | Comments (1) | TrackBack (0)
Dentists named Dennis, Georgias who move to Georgia, free will has nothing to do with it, confusion about conditional probabilities
John Shonder points me to this article on the work of Brett Pelham, who's been featured here before. The news article states,
In studies involving Internet telephone directories, Social Security death index records and clinical experiments, Brett Pelham, a social psychologist, and colleagues have found in the past six years that Johnsons are more likely to wed Johnsons, women named Virginia are more likely to live in (and move to) Virginia, and people whose surname is Lane tend to have addresses that include the word “lane,” not “street.”
They didn't mention my favorite, which is that there are almost twice as many
