July 2008 Archives

On April Fool's Day I posted my article, "Why I don't like Bayesian statistics." At the time, some commenters asked for my responses to the criticisms that I'd raised.

My original article will appear, in slightly altered form, in the journal Bayesian Analysis, with discussion and rejoinder. Here's the article, which begins as follows:

Bayesian inference is one of the more controversial approaches to statistics. The fundamental objections to Bayesian methods are twofold: on one hand, Bayesian methods are presented as an automatic inference engine, and this raises suspicion in anyone with applied experience. The second objection to Bayes comes from the opposite direction and addresses the subjective strand of Bayesian inference. This article presents a series of objections to Bayesian inference, written in the voice of a hypothetical anti-Bayesian statistician. The article is intended to elicit elaborations and extensions of these and other arguments from non-Bayesians and responses from Bayesians who might have different perspectives on these issues.

And here's the rejoinder, which begins:

In the main article I presented a series of objections to Bayesian inference, written in the voice of a hypothetical anti-Bayesian statistician. Here I respond to these objections along with some other comments made by four discussants.

You'll have to wait until the journal issue comes out to read the discussions, by Jose Bernardo, Joe Kadane, Larry Wasserman, and Stephen Senn. And thanks to Bayesian Analysis editor Brad Carlin for putting this all together.

See here.

Thinking like a scientist

| 1 Comment

I spoke at the University Commons retirement community about the Red State, Blue State book--a great audience, it was lots of fun. Anyway, in the talk, around the time I told them Pauline Kael's non-quote and Michael Barone's actual quote, I had the occasion to mention that I tell my students that, in any research project, you need to answer the following four questions:

1. What's your evidence?
2. How does this fit in with what else you know?
3. What have you found beyond what people thought before?
4. How did all those smart people who came before get things wrong?

(Item 4 is the topic of chapter 3 of our book.)

My proposed budget cut

| 4 Comments

Politicians always promise to cut government waste but the experts always say it can't be done. But I came across an example today. A bunch of trucks came by to tear up and repave our street. But our street is just fine. The city clearly has money to burn in this budget line.

Fund the answer you want...

| 3 Comments

Aleks sends in this article by David Michaels:

More than 90 percent of the 100-plus government-funded studies performed by independent scientists found health effects from low doses of BPA, while none of the fewer than two dozen chemical-industry-funded studies did. . .

"An ounce of replication..."

| 5 Comments

I was looking through this old blog entry and found an exchange I like enough to repost. Raymond Hubbard and R. Murry Lindsay wrote,

An ounce of replication is worth a ton of inferential statistics.

I questioned this, writing:

More data are fine, but sometimes it's worth putting in a little effort to analyze what you have. Or, to put it more constructively, the best inferential tools are those that allow you to analyze more data that have already been collected.

Seth questioned my questioning, writing:

I'd like to hear more about why you don't think an ounce of replication is worth a ton of inferential statistics. That has been my experience. The value of inferential statistics is that they predict what will happen. Plainly another way to figure out what will happen is to do it again.

To which I replied:

I'm not sure how to put replication and inferential statistics on the same scale . . . but a ton is 32,000 times an ounce. To put in dollar terms, for example, I think that in many contexts, $32,000 of data analysis will tell me more than $1 worth of additional data. Often the additional data are already out there but haven't been analyzed.

I think it's fun to take this sort of quotation literally and see where it leads. It's a rhetorical strategy that I think works well for me, as a statistician.

NYT vs WSJ on gender issues

| 19 Comments

Aleks sends in a striking example of a news story presented in two completely different ways:

I [Aleks] was looking at the NYT and WSJ today, and one particular discrepancy struck me. The NYT story, "Math Scores Show No Gap for Girls," by Tamar Lewin, says:
Three years after the president of Harvard, Lawrence H. Summers, got into trouble for questioning women’s “intrinsic aptitude” for science and engineering — and 16 years after the talking Barbie doll proclaimed that “math class is tough” — a study paid for by the National Science Foundation has found that girls perform as well as boys on standardized math tests. . . . “Now that enrollment in advanced math courses is equalized, we don’t see gender differences in test performance,” said Marcia C. Linn of the University of California, Berkeley, a co-author of the study. “But people are surprised by these findings, which suggests to me that the stereotypes are still there.” . . . Although boys in high school performed better than girls in math 20 years ago, the researchers found, that is no longer the case. . . . The researchers looked at the average of the test scores of all students, the performance of the most gifted children and the ability to solve complex math problems. They found, in every category, that girls did as well as boys. . . .

The NYT story had absolutely no mention of the girl/boy variance whatsoever. Compare to the
WSJ version (girl/boy variance in the headline), "Boys' Math Scores Hit Highs and Lows," by Keith Winstein:

Girls and boys have roughly the same average scores on state math tests, but boys more often excelled or failed, researchers reported. The fresh research adds to the debate about gender difference in aptitude for mathematics, including efforts to explain the relative scarcity of women among professors of science, math and engineering.

In the 1970s and 1980s, studies regularly found that high- school boys tended to outperform girls. But a number of recent studies have found little difference. . . . [The recent study] didn't find a significant overall difference between girls' and boys' scores. But the study also found that boys' scores were more variable than those of girls. More boys scored extremely well -- or extremely poorly -- than girls, who were more likely to earn scores closer to the average for all students. . . . The study found that boys are consistently more variable than girls, in every grade and in every state studied. That difference has "been a concern over the years," said Marcia C. Linn, a Berkeley education professor and one of the study's authors. "People didn't pay attention to it at first when there was a big difference" in average scores, she said. But now that girls and boys score similarly on average, researchers are taking notice, she said.

Here's some context from a few years back (I looked it up, because I wasn't sure exactly what Summers said, and the NYT article referred to him. From the NYT a few years ago:

Dr. Summers cited research showing that more high school boys than girls tend to score at very high and very low levels on standardized math tests, and that it was important to consider the possibility that such differences may stem from biological differences between the sexes. Dr. Freeman said, "Men are taller than women, that comes from the biology, and Larry's view was that perhaps the dispersion in test scores could also come from the biology.

What's amazing is that the two newspapers quote the same researcher but with two nearly opposite points. I assume she made both points to both newspapers, but the NYT reporter ran with the "stereotypes are still there" line and the WSJ reporter ran with "researchers are taking notice." It must be frustrating to Linn to have only part of her story reported in each place. (Yeah, yeah, I know that newspapers have space constraints. It still must be frustrating.)

Our own Kenny Shirley spoke at the Bayes meeting on this stuff. As is often the case in applied work, what's interesting here isn't so much the model--which is enough to get the job done--but how it fits into the larger policy goals (which in this case involve quantifying uncertainty, a natural fit for Bayesian methods).

Question wording effects

| No Comments

John Sides points to a news report of a hit-and-run driver who "struck and slightly injured a pedestrian while driving his sports car in downtown Washington" and then said, "I didn’t know I hit him…I feel terrible…[But] he’s not dead, that’s the main thing." He was fined $50.

It often seems to happen this way, that punishments for reckless driving are much less severe than the effect of the crime itself. (Even being "slightly injured" in a car crash has gotta be a personal loss of much more than $50, not even counting hospital costs.) This is particularly striking given that not every offender is caught, so you might think that punishments would be higher for their deterrent value.

Why are the punishments so low? One reason is that many of the legislators who write the laws and judges who decide sentencing are themselves dangerous drivers at times, and I suspect that it can be easier for them to identify with the criminals than the victims. (If Gary Larsen were writing the laws, there'd probably be a death penalty for running over a dog.)

But I think there's something deeper going on, having to do with retrospective and prospective decision analysis. In the driving example, it goes like this.

1. Suppose somebody (e.g., Dick Cheney) is driving dangerously but nobody is hurt, or not seriously. Then the response is that no serious harm was done--it's just one of those things--so no point in having a big punishment.

2. Suppose somebody (e.g., Ted Kennedy or Laura Bush) is driving dangerously and seriously injures or kills someone. Then the response is that it's a terrible tragedy but very bad luck, so what is gained by seriously punishing the driver.

The issue is that deaths and serious injuries are also rare--even if you drive recklessly, it's extremely unlikely that you'll kill someone in any given outing. So you're stuck between punishing the almosts and might-have-beens or really laying down the hammer on the serious cases. No option seems quite right. Although I guess in this case the pedestrian will do all right because he'll probably sue the driver for a couple of million dollars.

Alone in the car at night

| 3 Comments

I drove a car for 30 miles yesterday. I hadn't driven a car so far without passengers in over 15 years, and boy did it feel weird. All these cars on the road with people sitting perfectly still holding their steering wheels and having to remember not to go off the road. Driving, I feel a bizarre mixture of complete control and no control at all.

driving.jpg

Steve Kass writes:

Under the headline “Rise in TB Is Linked to Loans From I.M.F.”, Nicholas Bakalar writes for the New York Times today that “The rapid rise in tuberculosis cases in Eastern Europe and the former Soviet Union is strongly associated with the receipt of loans from the International Monetary Fund, a new study has found.”

The study, led by Cambridge University researcher David Stuckler, was published in PLoS Medicine . . . After reading the paper and looking at much of the source data, I [Kass] agree with William Murray, an IMF spokesman also quoted in the article: “This is just phony science.”

Some fun shootin'-fish-in-a-barrel follows. But, hey, it was published in PLoS Medicine, it must be correct, right?

Stanley Chin writes:

I had the usual stats training in grad school, and after some years as a practicing statistician and economist find myself increasingly approaching problems from a Bayesian perspective -- never more so than in a problem that was brought to me as an external consultant. My question is brief, the set up is a little long -- the question is in the subject line, can you recommend any reading in Bayesian approaches to quality control sampling?

I just baked three loaves of bread and then saw these nutrition notes posted by Seth. It's oddly entertaining to read, even though I don't understand anything about it. Sort of like the feeling you get from reading a John Le Carre novel--it all seems so real!

But I completely disagree with Seth's comment that "among academics to write clearly is low status, to write mumbo-jumbo is high status." What Seth is missing here is that it's difficult to write clearly. My impression is that people write mumbo-jumbo because that's what they know how to do; writing clearly takes a lot of practice. It's often surprisingly difficult to get people to state in writing exactly what they did (for example, in fitting a model to data). It takes continual effort to express oneself clearly and directly. Language is inherently nonalgorithmic. It might be that high-status people write mumbo-jumbo, but I suspect that's just because they're not putting in the immense effort required to write clearly. Lots of low-status academics write mumbo-jumbo also (as I know from reviewing many hundreds of submissions to academic journals).

Ben Hinchliffe writes,

In the paper "Tools for Bayesian Data Analysis in R", you mention the need for a flexible computing format that allows for the manipulation and summarization of simulations of a Bayesian probability model. I work for a company (Blue Reference, Inc.) that has developed a software environment that enables the creation of dynamic, interactive documents for reproducible research and teaching using Microsoft Office and R. After two-and-a-half years of development work, we are now ready to release Inference for R and focus on its implementation in reproducible research and dynamic document teaching practices.

For an introduction to Inference for R, visit our website at www.InferenceForR.com and view the 2-minute overview screencast.

For a sense of the scope of our Inference project, selectively view the collection of screencasts, postings, documents and screenshots at www.inference.us. For a hands-on assessment of the capabilities of Inference, download a copy of our release candidate at www.InferenceForR.com.

I don't know anything about these people but I thought it might interest some of you.

See here for some pretty pictures (from our forthcoming Red State, Blue State book) that display the distributions of voters, House members, and senators on a common scale.

Animated adiposity

| 2 Comments

Rebecca sends in this animated graph and writes, "all the white states inititally are a bit deceptive, but even so, it's pretty striking, and the animation is very effective." I think I'd prefer a time series of the national average, along with a color-coded animated map showing each state relative to the national average in each year.

You need one of these before you can do this wonderful demonstration. What's amazing to me is that the entry has 34 comments. I mean, what's there to say about kitchen scales?

Alex Tabarrok has an interesting discussion of saving strategies. Alex writes:

There are people who don't save much because they have very low incomes, their behavior does not seem to be in error, especially when we take into consideration the various welfare programs that will cover people in their old age. . . . So let's focus on people with moderate to high incomes. . . . Over confidence and in particular the idea that we are special and will live a long life suggests the error is saving too much. . . . Availability bias probably also suggests we save too much - we see people who saved too little in the street but the ones who saved too much are dead and gone. . . . I do not know which error is more prevalent but if we are to be neither spendthrift nor miser we need to recognize both types of error.

My guess is that Alex is a little too optimistic about people's savings strategies, given all the credit card debt out there. Also, as some of his commenters note, it's easy for people to get used to a particular spending pattern, and it's easier to ramp it up than to scale it down. So, for psychological purposes, it might be better to plan for a gradually increasing standard of living than something completely flat over time.

But I'm sympathetic with Alex's general point that both kinds of errors are relevant. It reminds me of when I asked the students in my decision analysis class to raise their hands if they'd never missed a flight. I then said to them: You go to the airport too early! A retrospective rather than a prospective analysis but still essentially correct, I think.

Rey De Castro writes:

I have a longitudinal data set that needs imputation, but the problem doesn't seem to resemble a typical imputation situation. So I'm casting about for a reasonably defensible approach that I can implement without tremendous custom-programming effort. My question concerns Bayesian approaches to imputation.

The Situation: I have longitudinal data for each of a group of schoolchildren. Each observation in the series is a multilevel class indicator of several canonical locations (i.e., indoor-home, indoor-school, outdoors, commuting) where the child reported being present during a particular 15-minute interval. Essentially, it's a series giving each child's location over time at 15-minute intervals. There are ~100 children, and each child's series is very long: ~2000 observations.

Alan Lenarcic sent along this. He writes, "The amount of strange Windows settings you have to set is a little daunting."

The American (League) Dynasty

| 15 Comments

Every year, the best players (or at least many of the best players) from Major League Baseball's American League play their counterparts in the National League in the All-Star Game. They played last night; the American league won in the 15th inning. Here's who won, from 1965 (when I was born) to the present, with 1965 at the left and 2008 at the right.

NNNNNNNNNNNNNNNNNNANNANAAAAAANNNAAAAATAAAAAA

The "T" indicates a tie (in 2002): unlike regular games, there is no requirement that the All-Star Game continue until somebody wins, and pitchers are reluctant to pitch too many innings and potentially hurt themselves.

I was born into an era in which the National League won every game. Now, the American League wins (or, at least, doesn't lose) every game. This is happening in a sport where even bad teams beat good teams occasionally, so it's really mystifying. It would be possible to explain a small edge for one league or the other, that persists for a few years --- the league with the best pitcher will have an advantage, for example, and that pitcher can play year after year --- but these effects can't come close to explaining the long runs in favor of one team or another. Predicting next year's winner to be the same as this year's winner would have correctly predicted 80% of the games in my lifetime...and that's if we pretend the National League won the tie game in 2002. (If we pretend the American League won it, it's 84%).

What would be a reasonable statistical model for baseball All-Star games, and why isn't it something close to coin flips?

R is too strongly typed

| 6 Comments

I fit a multilevel model in R and called it M2, then innocently put together some coefficients in order to make a prediction:

a.hat <- fixef(M2)["(Intercept)"] + fixef(M2)["u.full"]*u + ranef(M2)$county

Then I tried

a.hat[26]

and got the following response:

Error in `[.data.frame`(a.hat, 26) : undefined columns selected

OK, OK, I had to go back and change the original line to:

a.hat <- fixef(M2)["(Intercept)"] + fixef(M2)["u.full"]*u + unlist (ranef(M2)$county)

This is a pain, because sometimes it's "as.vector," sometimes it's "as.numeric," sometimes it's something else. It's so hard sometimes to just access the data. R is so strongly typed now that I have to waste a lot of time simply extracting things from objects that I already have sitting in memory.

Dept of silly graphs

| 5 Comments

Bill Harris points to this:

directv.png

Bill writes:

I've always felt that Joe Queenan has gone straight downhill since "If You're Talking to Me, Your Career Must Be in Trouble," but, following this link from Fabio Rojas, I see an interesting recent article from Queenan. I didn't know he did serious stuff too. (Yes, I know that Queenan's claims are debatable--in particular, I'm not sure where he would put fit Stravinsky's work (up to the mid-1920s) in his system--but he makes interesting points.) Mainly, I'm just interested to see that he's writing something closer to his earlier standards.

Sandra McBride writes:

My current model (in Winbugs) runs very slowly on my very slow laptop, so I am getting a new desktop. Here are some questions in the hopes of speeding up my model:

Is Winbugs or Openbugs multithreaded? (Then I'd buy a quad core rather than a duo core)

When using from within R, is Openbugs faster than Winbugs in general?

My reply: I don't know, but I've heard that JAGS is the fastest. I'm not sure if R2WinBUGS is set up to run JAGS, but if not, it could be done (right, Yu-Sung)? Also, I would think that in a parallelized implementation of R, it would be straightforward for R2WinBUGS to run 4 chains and send one to each of 4 processors; however, I don't know that this has actually been implemented.

In any case, I'm hoping that not too far in the future we'll have some a version of Bugs that is much faster, at least for the sorts of hierarchical regression models that Bugs currently chokes on.

"Frenchman"?

| 7 Comments

Do people really still use this word? From the context, Cowen appears to be using it to mean "French person" rather than "French man," so maybe he is being ironic? I admit to some nostalgia for various old-fashioned ethinic descriptors that aren't exactly offensive but still don't really get used anymore, such as Chinaman, Jewess, Turk. Something like an old Sam Spade novel where "the Turk" comes out of an alley with a knife, or whatever. Recently I've been hearing Latinos (Hispanics) refer to themselves as "Spanish," which is kind of cool.

OK, time to get back to work.

Popularity and readability

| 13 Comments

Seth had this discussion where he quoted Nassim Taleb's Black Swan book, to which someone commented that Taleb's books are "unreadable," to which Seth responded:

If The Black Swan is “so unreadable” why has it been so popular?

Now this is an interesting question. Not so much about The Black Swan (which I liked) but about the more general question of whether a bestseller must be readable. Obviously, readability helps, but are popular books "readable"? I can think of two issues:

1. Books such as "A Brief History of Time" or, to take Michael Kinsley's famous example, "Deadly Gambits," which people buy but never get around to reading.

2. Books which seem supremely readable when they come out but don't age well. A lot of bestsellers are like that, I imagine. If you go back to a bestseller list from decades ago, I think you'd see books that would not be so easy to read today. What I'm getting at is that "readability" is not just a property of the book, it also depends a lot on the reader.

Vector autoregression

| 2 Comments

Yefin Dain writes:

Integrate this, pal

| 3 Comments

socparticles.png

I copied this image over here, certain that I'd be able to add a witty remark of my own, but I give up.

538

| 2 Comments

Julie Rehmeyer has a nice article up about Nate Silver's election models. A nice motivator for all the quantitatively minded students out there.

Opening Day

| 2 Comments

Nathan Yau writes,

I recently put up a visualization showing the spread of walmarts over time, ... I'm wondering if you know of any other "opening dates" data (starbucks, for example)? I'm itching to put some more data into my code.

Here are some hilarious (if you're a statistician) sketches from Stephen Senn:

Robustnik "These are the three laws of robustics. First law: get a computer. second law: get a bigger computer. Third law: what you really need is a much bigger computer." Favourite reading: I Robust, by Isaac Azimuth.

Frequency Freak
" Did you randomise? OK: so far so good. Now what would you have said if the third value from the left had been the second from the right. Hold on a minute. Are you sure you haven't looked at this question before?" Favourite reading: Casino Royale.

Bog Bayesian
" All you need is Bayes. It's the answer to everything. If only Adolf and Neville could have exchanged utility functions at Munich we could have saved the world a whole lot of bother round about the middle of the last century." Favourite reading: The Hindsight Saga.

Subset Surfer
"OK, so the egg's rotten but parts of it are excellent." Favourite reading: Europe on $5 a day.

Gibbs Sampler
" First catch your likelihood. Take one Super Cray, a linear congruential generator, any prior you like and if the whole thing isn't done to a turn within three days my name's not Gary Rhodes." Favourite reading: Mrs Beaton

Complete Consultant
" First we test the randomisation. Then we look for homogeneity between centres. Then we run the Shapiro-Wilks over it and if you like we'll throw in a Kolmogorov-Smirnov at no extra cost. Then we test for homogeneity of variance and look for outliers and even if that's OK we'll do a Mann-Whitney anyway just to be on the safe side. All this will be fully documented in a report with our company logo on every page." Favourite reading: The Whole Earth Catalogue.

Mr Mathematics
"I just don't see the problem. All you have to do is define the null hypothesis precisely, define the alternative hypothesis precisely, choose your type I error rate and use the most powerful test." Favourite reading: Brave New World.

Bootstrapper
"Look, this is the way to build the football team of the future. You choose a player. You put him back in the pool. You choose again. Do that long enough and if you don't eventually get a team which has Becks in it three times my name's not Sven Goran Erikson." Favourite reading: Bradley's Shakksperrr.

Unconditional Inferencer
"It's true that all the engines are on fire and the captain has just died from a heart attack but there's no need to worry because averaged over all flights air travel is very safe." Favourite reading: Grimm's Fairy Tales

And many more:

In the Playroom today, I came across a book called "A Design for Scholarship," a collection of speeches from 1935-1936 by Isaiah Bowman, president of Johns Hopkins University. Flipping through, I came across this quote:

If you wish to live in bovine contentment, the University is no place for you.

Things sure have changed, huh?

The Graduate Junction

| 2 Comments

Esther Dingley sent an email about this site which is intended to help graduate students share research ideas. I'm not sure where it falls in the spectrum from Facebook to Wikipedia, but perhaps it will be useful. Looking up some of my own research interests, I found nothing for "statistics" or "political science," but there was a group for "social networks."

More graphical propaganda

| 3 Comments

John Sides reproduces this graph showing Kenyan election results:

kenyaexitpoll.PNG

What a horrible graph! The re-coloring and re-ordering of the wedges makes the difference between "official results" and "poll" seem much greater than they are.

As in my earlier example of PDA (propaganda data analysis), I have no comments on the merits of the case (for example, what can you learn from a poll taken six months after the election)--I'm just weighing in on the graphical presentation.

John Kastellec writes:

Let's say you wanted to estimate a multilevel model with an interaction in the individual-level model, say:

Pr(y=1) = logit-1(B0 + B1X + B2Z + B3XZ)

and you wanted to allow the interaction effect to vary by group. Would the correct procedure be to allow all the coefficients to vary by group, then interpret the main and interactive effects as you would normally (i.e. for each group)?

Yup.

David Ross writes,

Greg Mankiw writes,

Cass Sunstein and Justin Wolfers say we don't really know whether or not capital punishment deters crime.

Maybe so, but it does solve the problem of recidivism.

He links to a news article that refers to an excellent article by Wolfers and Donohue. But I don't think Mankiw is correct about capital punishment solving recidivism. A key aspect of the death penalty in the U.S. is how rare it is for prisoners to actually be executed. I don't see how you solve the problem of recidivism by executing on the order of a hundred people a year. And, given that already our best estimate is that a person who is sentenced to death has a two-thirds chance of having that sentence reversed by a higher court, it's hard for me to believe that the rate of executions can be increased very much.

Henry presents another example of more educated voters being more ideological:

Graph of inequality by political information

The above graph (from Larry Bartels) shows the probability that liberals or conservatives agree with the statement that income inequality between rich and poor people has increased. The two groups diverge in their attitudes as they get more information.

Democrats can get things wrong, too

The above is an example where conservatives with high information levels get things wrong. Just as balance, here's an example (also from Larry Bartels) where Democrats are the ones in error. The example is in chapter 8 of our forthcoming red state, blue state book:

Even objective features of the economy are viewed through partisan filters. For example, a survey was conducted in 1988, at the end of Ronald Reagan’s second term, asking various questions about the government and economic conditions, including, “Would you say that compared to 1980, inflation has gotten better, stayed about the same, or gotten worse?” Amazingly, over half of the self-identified strong Democrats in the survey said that inflation had gotten worse and only 8% thought it had gotten much better, even though the actual inflation rate dropped from 13% to 4% during Reagan’s eight years in office.

Regroove

| 2 Comments

Stephen Burt's recent article on Philip K. Dick was quoted with approval by Jenny Davidson, but I wasn't impressed. For one thing, Jack Isidore regrooves tires, he doesn't retread them. Also, I don't think Burt did a good job at addressing how funny Dick's books are--even Scanner, which is so serious, is also hilarious. Finally, I don't get the bit at the end of Burt's essay where he speculates on other science fiction writers whose work could be collected in the Library of America. Maybe Dick would be better characterized with authors such as James Jones who create recognizable worlds using whatever literary tools they happen to have at hand.

P.S. Link above fixed.

In response to some of the questions about our graphs on state liberalism/conservatism:

- A lot of surveys don't include Alaska and Hawaii. I guess in the days of face-to-face surveys these places were too far to go to, and even for telephone surveys you have to deal with time zones.

- I can't remember the sample sizes, but in the small states they're not huge, so you can't take seriously the exact ordering of all the states in the graphs. When David gets back in town we can take a look at the uncertainty in these estimates.

- Could we look at dispersions as well as averages within each state? Yes, but I don't know that we'd get much out of this; dispersion measures are notoriously noisy.

- We show positive numbers as conservative and negative numbers are liberal because the number line goes from left to right.

- Yes, it would be interesting to look at other issue dimensions such as foreign policy.

- Some people asked what exactly was in our scales. From page 195 of our red-state, blue-state book:

Andrew Sullivan links to this news article which links to this research article by Stamos Karamouzis and Dee Wood Harper called "An Artificial Intelligence System Suggests Arbitrariness of Death Penalty":

Chris Weiss writes with a question about propensity score matching with multilevel data:

Here's the title/abstract for my talk at the R conference in August:

Many statistical methods of all sorts have tuning parameters. How can default settings for such parameters be chosen in a general-purpose computing environment such as R? We consider the example of prior distributions for logistic regression.

Logistic regression is an important statistical method in its own right and also is commonly used as a tool for classification and imputation. The standard implementation of logistic regression in R, glm(), uses maximum likelihood and breaks down under separation, a problem that occurs often enough in practice to be a serious concern. Bayesian methods can be used to regularize (stabilize) the estimates, but then the user must choose a prior distribution. We illustrate a new idea, the "weakly informative prior," and implement it in bayesglm(), a slight alteration of the existing R function. We also perform a cross-validation to compare the performance of different prior distributions using a corpus of datasets.

The title is "Bayesian generalized linear models and an appropriate default prior," and it's based on this paper with Aleks, Grazia, and Yu-Sung.

Patent absurdity

| 6 Comments

Jouni writes,

Here is a link (see also here) to a patent on Bayesian linear regression. Yes, they call their algorithm an "invention."
A simple yet powerful Bayesian model of linear regression is disclosed for methods and systems of machine learning. Unlike previous treatments that have either considered finding hyperparameters through maximum likelihood or have used a simple prior that makes the computation tractable but can lead to overfitting in high dimensions, the disclosed methods use a combination of linear algebra and numerical integration to work a full posterior over hyperparameters in a model with a prior that naturally avoids overfitting. The resulting algorithm is efficient enough to be practically useful. The approach can be viewed as a fully Bayesian version of the discriminative regularized least squares algorithm.

Now, hurry up and patent Bayesian nonlinear regression before they do it.

Jouni continues:

Maybe we all should be submitting our papers to the patent office instead of journals? Perhaps they would probably be more easily accepted?

It's all fun and games until they sue your a$$. . . .

In our blog we get useful comments about R programming, data sources, the philosophy of science, and even suggestions for book covers. But every now and then we get mentioned by big blogs, and then I'm reminded what real blog commenters are like.

Sudhir Venkatesh mentioned us in the Freakonomics blog. Among the 27 comments were:

A legal mystery

| 2 Comments

Maybe someone can explain this to me?

Our publisher is putting together our new book (no, not Red State, Blue State, I'm talking about our next book, A Quantitative Tour of the Social Sciences), and we need a cover design. Now. Any ideas? Free book to the person with the best idea. And anybody with a particularly good idea, I'll take to lunch. (Or maybe Jeronimo, my coeditor, will take you to lunch if you're in Houston...)

Some background: The book has sections on history, economics, sociology, political science, and psychology, and each section has a different author (or set of authors). It's not a statistics book; rather, it's a set of discussions and case studies, giving the reader (most likely a student of one of the social sciences) a sense of how to think like a historian, economict, sociologist, etc. It's based on a course I created for our Quantitative Methods in Social Science program at Columbia. Anyway, there will be plenty of time for book promotion later; now, I'm just trying to give you enough information to come up with a good cover design for us.

Here's the table of contents:

Recent Comments

  • Christopher: I think the point of the saying is to remind read more
  • rec: I think this should become a genre of statistical analysis: read more
  • Yolio: I think that physics uses a lot of sophisticated stochastic read more
  • Alex: The statistics used for large-scale particle physics such as LHC read more
  • Markk: Physics is a large field which has statistics at it read more
  • aram: I think the average physics experiment involves statistics that is read more
  • ecoles france: I agree with you that capturing relevant data and learning read more
  • Andrew Gelman: I definitely believe that multiple comparisons ideas are important; I read more
  • Jonathan: OK... We know you don't like adjustments for multiple comparisons. read more
  • Andrew Gelman: "Effects of causes" is statistics/econometrics jargon for thinking about potential read more
  • noumignon: What I meant to say was that you used the read more
  • Andrew Gelman: Alex: It's not just what Krugman wrote about this, it's read more
  • Jeremy Miles: You can pull a PDF from Dan's website, at http://www.fiu.edu/~dwright/research.htm read more
  • Alex: "One could similarly look, for example, at Paul Krugman's advocacy read more
  • Sebastian: but that has been part of the initial wave of read more
  • Andrew Gelman: Again, I think the "unintended consequences" issue is less of read more
  • Sebastian: I do think that the consistency issue for Levitt and read more
  • Matt Stevens: Nice. In my intro class I take the time to read more
  • Lord: If you want something to work, you provide a market read more
  • Andrew Gelman: Or, in a broader sense, non-strategic retirement. read more