October 2008 Archives

Sure, I knew it was a desert. But I didn't realize that so few people lived there.

Let's get conjugate

| 2 Comments

David Shor writes:

I'm working on a projection system on election night, and came across a case where I have a binomial distribution with an unknown number of trials.

Is there a good conjugate prior in such a situation?

My reply: There are some articles on this by Adrian Raftery in the late 1980s, you can find references in Bayesian Data Analysis, including a homework assignment in chapter 3, I believe.

Tom Knapp writes:

I have four questions and one correction about your article about scaling regression inputs in Statistics in Medicine:

I just received by email a request to review a manuscript called "Acute Inflammatory Proteins Constitute the Organic Matrix of Prostatic Corpora Amylacea and Calculi in Men with Prostate Cancer." The abstract is below:

Phoenix Suns shooters

| 4 Comments

Yair sends in this plot of the week:

suns-wings.png

He writes:

This displays the smoothed distribution of shots taken by wing players for the Phoenix Suns in the '07-'08 regular season (Matt Barnes played for the GS Warriors that year). Raja Bell seems like the perfect wing player for the Suns, because he plays defense and then basically sits at the 3-pt line waiting for Steve Nash to give him the ball for a good shot. Leandro Barbosa is similar, but he drives a bit more (especially when Nash is off the floor). Grant Hill didn't fit this mold because he has no 3-pt shot; he is more of a mid-range guy. From this standpoint, Matt Barnes (their free-agent pickup) looks like he could be a better fit. Of course, this plot says nothing about whether he actually hits the threes, but at least his heart is in the right place. Then again, if their offensive system changes because of the new coach, all bets are off.

Pretty graphs, huh? The color scheme seems good for a team called the Suns.

Greenspan said (on the topic of the present financial crisis):

"The whole intellectual edifice, however, collapsed in the summer of last year because the data inputted into the risk management models generally covered only the past two decades — a period of euphoria."

2004/2008

| 5 Comments

How is the 2008 election different from 2004, beyond the (currently predicted) national swing of about 4 percentage points (enough to move from Kerry's 49% of the vote to 53% for Obama)?

Here's a graph of Obama's predicted share of the two-party vote in each state (based on Nate Silver's recent poll aggregation) compared to Kerry's in 2004:

2004_2008.png

I then fit a simple linear regression; here's a map of the residuals, showing where Obama is doing particularly well or poorly, compared to last time:

2004_2008_map.png

See here for further discussion and more graphs.

See here for more (including the link to the article by Nate Silver, Aaron Edlin, and myself describing what we did).

decisive1.png

decisive2.png

[Typo in caption to figure 1 fixed, thanks to commenters.]

Bill Harris writes:

When I taught a graduate course at UW last year, I followed this sequence:

- - Student reading assignment
- - Student homework on the reading
- - Lecture and peer instruction on the reading
- - Homework graded and returned

Many reported they'd much prefer something like

- - Student reading assignment
- - Lecture and peer instruction on the reading
- - Student homework on the reading
- - Homework graded and returned

Do you have any pointers to evidence as to which sequence works best? I had been concerned that the latter approach involved students in three sets of work each week:

- - Reading the new material to prepare for class
- - Reviewing the previous week's material to do the homework
- - Reviewing the material from two weeks ago to understand the feedback on the returned homework

but I guess there could be advantages in that. Thoughts?

I'm embarrassed to admit I don't have any thoughts on this at all, but, yes, there must be some research on the topic. Can anybody help here?

Red State, Blue State this week

| 1 Comment

Good Roads Everywhere

| No Comments

This formula is so, so important. It tells you that when you have two sources of variation, only the larger one matters (unless the variances are very close to each other). It comes up all the time in multilevel modeling.

Bill Richardson and Dick Williams

| 1 Comment

I was reading a book by William Manchester--he's great, by the way, just like George V. Higgins said--and then I started thinking about his alternative name. I'm thinking that "Henry Birmingham" is the best match.

P.S. Or maybe "Rich Williamson" is better. But I think that the Dick/Bill parallel is best. "Rich" is more like "Will."

Ted Dunning writes:

Google analytics normally does a pretty good job of dealing with statistical issues. For instance, the Google website optimizer product does a correct logistic regression complete with error bars and (apparently) Bayesian analysis of how likely one setting is to actually be better than another.

But their demo of their latest visualization product is worth a write-up. They seem to ascribe volumes of meaning to a variations in small count statistics.

Check out the video.

As Aleks knows, I can't bear to watch videos. I like the idea of dynamic graphics, but I can't stand the lack of control that comes from watching a video. I like to read something that I can see all of at once.

But the Google tool looks pretty cool. Also, I didn't know they did Bayesian logistic regression. I wonder what prior distribution they use? This is a topic that my colleagues and I have thought about.

Ted continues:

I hate BIC blah blah blah

| 3 Comments

It's all in chapter 6 of Bayesian Data Analysis. Anyway, Sam Gershman wrote to me:

The election is coming up so this is our last DC event . . . I'll be speaking on Red State, Blue State this Mon, 27 Oct, at the New America Foundation. The event will be from 12.15-1.45, and there will be a discussion by David Frum. Frank Micciche of the New America Foundation will moderate. Info is here.

Below is the description of the event. (My coauthors won't be present at the talk but they will be implicitly there, as I'm presenting our joint research.)

"Binky Urban"

| 5 Comments

J. Robert Lennon writes about the end of the publishing industry, a story in which the improbably named "Binky Urban" plays a role. The most interesting aspect, to me, is the difference between having a paying job and not. It's gotta be so difficult to do your work in a setting where you feel you need to make money from it in order for you to keep doing it.

Why Model?

| 2 Comments

Stan pointed me to a short article "Why Model?" by J. M. Epstein. The default principle, both in statistics and in machine learning, is to predict. Any act of statistical fitting that involves likelihood is inherently predictive in its nature.

Visualization is in no way different from predictive modeling - it's just that the (sometimes implicit) model is transparent and interpretable. Visualization is not the only type of interpretable model: even a table with regression coefficients is interpretable, a decision tree is an interpretable model, a list of typical cases is an interpretable model. A 2D scatter plot that nicely shows the difference in outcomes is a model, because the two dimensions used by the plot indeed help distinguish the outcomes.

Most priors are grounded purely in the desire to capture the truth, as such they are predictive priors. But the interpretable models involve priors that are not grounded in prediction - but rather in the human cost of interpretation. The more difficult it is to interpret a parameter, the lower prior probability of interpretation it should have.

In summary, while most mathematical treatment of statistical modeling tends to be focused purely on prediction, there is a good reason why the cost of interpretation should be considered. Epstein's list of why interpretability matters should motivate us to care:

I've always wanted to write something for the Wichita Eagle . . .

P.S. My proposed title was, "What's the Matter with Kansas? Nothing--and the data prove it." I don't mind the revision but I would always always write "data" as plural!

Mathematics.

Statistics.

Some differences:

- Tao uses more words. This makes sense: he's busy explaining this stuff to himself as well as to his readers. To a statistician, these ideas are so basic that it's hard for us to really elaborate. (Also, I had a word limit.)

- Tao emphasizes that a confidence interval is not a probability interval. In my experience, confidence intervals are always treated as probability intervals anyway, so I don't spend time with the distinction.

- I emphasize that a poll is a snapshot, not a forecast.

- Tao says that the number of polled voters is fixed in advance. I don't think this is exactly true, what with nonresponse.

- Tao fills his blog entry with Wikipedia links. Wikipedia is ok but I'm not so thrilled with it; I'm happy with people looking things up in it if they want but I won't encourage it.

But we're basically saying the same thing. I like how I put it, but I'm sure a lot of people prefer Tao's style. Luckily there's room on the web for both!

See discussion here.

ANOVA and the mixed-model muddle

| 2 Comments

Rick DeShon writes,

As I read through your discussion paper on the analysis of variance published in the Annals of Statistics in 2005, I became a bit confused about the connections between your notion of parameter batches and prior work on the topic of fixed and random effects. Specifically, I wonder how your approach connects to Nelder's "great mixed model muddle?"

My talks in Toronto

| 1 Comment

I finished Personal Days

| No Comments

Great ending. And, now that it's over, it reminds me even more of Jonathan Coe. Just one thing is bugging me now: what did the people in that office actually do for work. I mean, I know that it's on purpose that we're not told, but I'm still curious.

I just started the last section of Ed Park's Personal Days--this final section appears to be a long rambling letter of the unreliable narrator type such as concludes The Rotter's Club--which reminds me of a particularly asinine passage in the incredibly overrated Godel Escher, Bach, which for some horrible reason I remember after nearly thirty years, where Hofstadter writes about how, when you read a book, you know you're coming to the end, which affects your expectations, unlike in real life stories or in a movie of indeterminate length, when the end can come as a surprise. The natural solution for a book would be to pad it with an indeterminate number of empty pages--not completely blank, of course, that would be too obvious, but with sentences that are clearly different from the main story. Hofstadter fatuously concluded that this would be impossible: to be convincing, the fake story would have to be close enough to the real one that, essentially, it would be part of the main narrative. But that's completely wrong: it would be easy enough to just have an only barely related story at the end, and then when the main story really did end, for example on page 240, the author could just have a paragraph saying, "This is the end of the story. The rest is padding," or something like that. I mean, you're not expecting the reader to look too carefully at the end matter: either it's really part of the book and the reader wouldn't want to lose the suspense, or it's fake matter, in which case the reader would still like to preserve the suspense of the story's actual length.

But that's not what I was planning to write about. What does Personal Days remind me of (besides it being a remake of Then We Came to the End)? The similarly alphabetically-structured Kafkaesque office nightmare story office nightmare Forlesen, for one thing. Although, oddly enough, Gene Wolfe was a Republican when he wrote that story, I think. The focus is different, though: the office takes up almost all of Forlesen's life time, but his family is ultimately what is central and nobody in the office is real to him; in Personal Days, only the office is real; the characters have no families.

My favorite things in Personal Days so far are the management-speak in the Jilliad and the goofy three-syllable restaurant names.

I pretty much couldn't keep the characters straight, even when I was reading the book. But I suspect this is part of the point. We'll see how I feel when I'm all done.

P.S. I am still training myself in writing with precision: two paragraphs above where it says "My favorite things," I originally had the sloppier "The best things." On the other hand, editing a blog entry is almost the definition of a waste of time. On the other other hand, I like to think this keeps me in practice for more important writing efforts.

P.P.S. I think I am ideally qualified to use the term Kafkaesque, having never read anything by Kafka except the first two pages of that story they give you to read in high school, where Gregor Samsa wakes up as a bug. I've read too much Orwell to be comfortable with "Orwellian."

P.P.P.S. Can blogs do hypertext? The Hofstadter digression in the first paragraph above belonged just where it did, but it's a distraction from my main points. I'd like to be able to enter it as some sort of clickable sidebar (without going to the trouble of setting it up as its own blog entry, which I just don't want to do)?

Our Cato event from last month will be on Book TV on C-Span2 this Sat, 18 Oct, 7pm, and Mon, 20 Oct, 6am. My presentation has gotten a bit slicker since then, but it's still good stuff, and you also get interesting discussions by Brink Lindsey and Michael McDonald.

Hey, I was right!

| No Comments

See here.

Ed Park is a Democrat

| 7 Comments

I'm about halfway through Personal Days and I'm pretty sure Ed Park is a Democrat. Or something like that, maybe a Green party member or whatever, but certainly not a Republican. Why? Is it just statistical reasoning, he's a youngish writer who lives in NYC? I think it's more than that, there's something about the book that screams "Democrat." Not that a Republican wouldn't make fun of corporate culture but it would be done in a more affectionate, Christopher Buckley-style way.

I'm not saying every artistic-type writer is a Democrat. For example, I don't know anything about David Foster Wallace's politics, but based on what I've read of him, he could've been a Republican. He probably wasn't, but he had that elitist thing going on.

David Mamet, he's a famous Democrat-turned-Republican, but I think it's fair to say that all along he could've been either. Updike's in the middle of the road, Gore Vidal is to the left of the Democrats but I could picture him as a Republican, sort of. . . .

OK, this is getting pretty pointless. . . clearly it's getting too close to the election for me . . . I'll have to finish Personal Days and tell Jeff whether I recommend it. Caroline read one page and said, hey, isn't this just like that other book you read about those people in an office? I said, yeah, but it's a great theme, surely big enough to hold two good books. I showed her the scene with Grime's typos, I'd been laughing aloud at that, but she didn't quite see the point. Perhaps it was only funny after the pages and pages of implicit setup.

Howard Wainer writes:

On September 22, 2008, the New York Times carried the first of three articles about a report, commissioned by the National Association for College Admission Counseling, that was critical of the current college admission exams, the SAT and the ACT. The commission was chaired by William R. Fitzsimmons, the dean of admissions and financial aid at Harvard.

The report was reasonably wide-ranging and drew many conclusions while offering alternatives. Although well-meaning, many of the suggestions only make sense if you say them fast.

Tyler McCormick, Matt Salganik, and Tian Zheng just wrote this article on using the scale-up method to estimate the size of people's social networks using responses to questions such as "How many people do you know named Kevin?" They build upon earlier work by Bernard, Killworth, McCarty et al. and Zheng, Salganik, and Gelman. This new paper is great; it takes these methods from the "cool" stage to the "useful" stage.

Red Blue at NYU

| 1 Comment

I'll be speaking Tues 14 Oct (that's tomorrow) 10am on Red State, Blue State at NYU, at 802 Kimmel Center, 60 Washington Square South. Pat Egan will discuss, and then there will be time for discussion. The talk will be open to the public.

I recently became aware of two papers by David van Dyk on a new approach to Gibbs sampling using incompatible conditional distributions. This seems similar to the parameter expansion or redundant parameter idea developed by C. Liu, J. Liu, Meng, Rubin, van Dyk, and others, but perhaps a bit more generalizable and thus usable in routine problems.

Here's the theoretical paper (with Taeyoung Park).

And here's the more applied paper (which has a logistic regression example), with Hosung Kang.

This looks great, although I'm still not sure exactly how to apply this to our problems. Maybe we're getting closer, though...

Feedback

| 1 Comment

Aleks sent along this article that suggests that debate-watchers are influenced by crowd noise and feedback graphics:

So-called Bayesian methods

| 2 Comments

Seth points me to these papers:

John P. A. Ioannidis, Effect of Formal Statistical Significance on the Credibility of Observational Associations, Am. J. Epidemiol. 2008 168: 374-383.

Hormuzd A. Katki, Invited Commentary: Evidence-based Evaluation of p Values and Bayes Factors. Am. J. Epidemiol. 2008 168: 384-388.

John P. A. Ioannidis, The Author Responds to "Evaluating p Values and Bayes Factors", Am. J. Epidemiol. 2008 168: 389-390.

I do not, do not, do not have the energy now to comment on these. Let me just say that what is labeled in the above articles as "Bayesian" is not the only way to do Bayesian statistics. I refer you to Bayesian Data Analysis for exposition of what I consider the more reasonable Bayesian approach, which is based on modeling rather than hypothesis testing and never involves computing the posterior probability that the null hypothesis is true.

I can't stop people from doing these other things and I wouldn't even try. But I would like them to be aware of this other, more direct approach. This paper may also help.

A colleague asks,

How do you deal with the following from Alan Abramowitz and Ruy Teixeira's Brookings paper:
Indeed, just how far the Democrat party fell in the white working class' eyes over this time period can be seen by comparing the average white working class (whites without a four year college degree) vote for the Democrats in 1960-64 (55 percent) to their average vote for the Democrats in 1968-72 (35 percent). That's a drop of 20 points. The Democrats were the party of the white working class no longer…… Al Gore….lost white working class voters in the 2000 election by 17 points. And the next Democratic presidential candidate, John Kerry, did even worse, losing these voters by a whopping 23 points in 2004. One could reasonably ascribe the worsening deficit for Democrats in 2004 to the role of national security and terrorism after 9/11 but the very sizeable 2000 deficit cannot be explained on that basis. Apparently, the successes of the Clinton years, which included a strong economy that delivered solid real wage growth for the first time since 1973, did not succeed in restoring the historic bond between the white working class and the Democrats.

My reply: When you slice things by income, you see a clear pattern of Republicans doing better among the rich of all races (except maybe Asians, but I don't particularly trust those numbers what with small sample size):

national.png

Compared to earlier years, Democrats have lost among less well-educated voters and gained among the more educated voters, but their income profile hasn't changed so much. As E.J. Dionne has noted, the Democrats' strength among well educated voters is strongest among those with household incomes below $75,000--"the incomes of teachers, social workers, nurses, and skilled technicians, not of Hollywood stars, bestselling authors, or television producers, let alone corporate executives."

So a quick answer is that I don't necessarily see a machinist, say, as having more street-cred than a social worker with a graduate degree who makes the same amount of money. As Larry Bartels has pointed out, it's not so easy to identify exactly what is meant by "working class." There have been changes, but remember that the difference in voting between rich and poor has been as large in the past 10 years as it's ever been; see page 47 of the red-blue book. Yes, it's different rich and poor people than before, but it's still there. It's a mistake to think there was a past golden era of class-based voting. Geographic factors were important in voting decades ago, and they are now as well.

See here for my earlier comments on the Teixeira and Abramowitz article.

Finally, David Park made this graph of the trend since the 1950s of the rich-poor voting gap (the difference between Republican vote share among the upper third of income, minus the Republican vote share among the lower third) in Presidential elections. The gray dots represent all voters, the black dots represent whites only (yes, I know, they should be white dots...).

whites.png

The rich-poor voting gap among whites has in recent elections been a bit below its 1970s-1990s peak, but it's far from zero. And, what with increasing diversity in the minority population, it's not so clear that "whites" is as useful a category as it once was.

P.S. More here.

A new cost of living index

| 1 Comment

Boris passed this along. We've struggled with cost of living indexes (see here, here and here), so maybe this will be helpful.

Red-blue roundtable

| 1 Comment

Here's a fun discussion (still developing, it'll be going through Thursday, I think) on red and blue America, featuring pollster John Zogby, journalist Bill Bishop, consultant Valdis Krebs, and myself, moderated by Tom Nissley at Amazon.com.

My strategy is to make my points using graphs.

I gotta read this article:

The game theoretic study of coalitions focuses on settings in which commitment technologies are available to allow groups to coordinate their actions. Analyses of such settings focus on two questions. First, what are the implications of the ability to make commitments and form coalitions for how games are played? Second, given that coalitions can form, which coalitions should we expect to see forming? I [Humphreys] examine classic cooperative and new noncooperative game theoretic approaches to answering these questions. Classic approaches have focused especially on the first question and have produced powerful results. However, these approaches suffer from a number of weaknesses. New work attempts to address these shortcomings by modeling coalition formation as an explicitly noncooperative process. This new research reintroduces the problem of coalitional instability characteristic of cooperative approaches, but in a dynamic setting. Although in some settings, classic solutions are recovered, in others this new work shows that outcomes are highly sensitive, not only to bargaining protocols, but also to the forms of commitment that can be externally enforced. This point of variation is largely ignored in empirical research on coalition formation. I close by describing new agendas in coalitional analysis that are being opened up by this new approach.

And also this. And then relate all this to my research on coalition formation as a prisoner's dilemma.

Head over to the Red State, Blue State blog for my post on my new measure of Senator Barack Obama's (and other prominent IL Democrats) ideology from his service as an Illinois state Senator (from Hyde Park). It comes from a new research project of mine on state legislative ideology.

Amazon, U.S.A.

| 4 Comments

Amazon.com has this cool website showing which sorts of political books people are buying in which states:

amazon.png

What struck me was the similarity of this to the "voting patterns of the rich" map from our book:

3maps.png

I wonder what data from Wal-Mart from Wal-Mart would look like. Maybe like one of the lower of the two maps? I'm not sure, though, since, even at Wal-Mart, buyers of political books are more politically active and thus maybe more like "rich people" in their red-blue divisions.

There's a lot going on for those of you in the NY/NJ area.

1. On Monday morning I'm doing an activity on the Electoral College. But you can't come to that unless you're a 4th grader in Zacky's school.

2. Monday 4.30pm at room 801 International Affairs Building (at Columbia), I'm speaking on Red State, Blue State in an event cosponsored by the Columbia Journalism School, with discussions by Nicholas Lemann and Thomas Edsall and moderated by Sharyn O'Halloran.

3. Monday 7pm at the Princeton Club in midtown Manhattan, I'm speaking and signing books. You can only go to this one if you're a member of the club, I think.

4. Tuesday 4.30pm at Robertson Hall at the Woodrow Wilson School at Princeton University, there's an event sponsored by the New York and New Jersey chapters of the American Association for Public Opinion Research, featuring Joe Lenski, Chris Achen, Larry Hugick, and myself. After the panel there will be lots of time for informal discussion as well.

Bayes, Bayesians

| 3 Comments

I can't remember who said this first, and I can't remember if I've already put this on the blog, but the following definition may be helpful:

Every statistician uses Bayesian inference when it is appropriate (that is, when there is a clear probability model for the sampling of parameters). A Bayesian statistician is someone who will use Bayesian inference for all problems, even when it is inappropriate.

I am a Bayesian statistician myself (for the usual reason that, even when inappropriate, Bayesian methods seem to work well).

(The above is perhaps inspired by the saying that any fool can convict a guilty man; what distinguishes a great prosecutor is the ability to convict an innocent man.)

Cool historical maps

| No Comments

Hey, see here for info on a site that has cool interactive electoral vote maps with good historical details. Here's the map for the most important of all presidential elections:

1860.png

Cool.

She writes "sox" instead of "socks." What's that all about? Is this an accepted alternative spelling? (I wouldn't quite recommend the book, but it is also interesting in other ways.)

Why do swing states matter?

| No Comments

Hey, I got quoted in the Weekly Reader! Much cooler than the Annals of Statistics.

This is funny. It reminds me of when I was asked to help design a study, and I told the researcher I was upset to be involved in the design. Why? Because the #1 thing that statisticians like to say is, "Sorry, the analysis is really difficult because you screwed up the design." So, if you ask me to help with the design, I lose my best alibi!

Jeff pointed me to this paper by Brandon "not Larry" Bartels on using multilevel modeling for time series cross-sectional data. I agree with Bartels's recommendations, which are:

- Use a multilevel model to allow intercepts to vary by groups. This is more reliable than estimating intercepts by least squares or not allowing the intercepts to vary at all.
- Also allow slopes to vary. (Bartels doesn't emphasize this so strongly but I think this is important advice also.)
- Include as group-level predictors the group-level averages of important individual-level predictors. This will in many settings capture some of the otherwise unexplained group-level variation, as Joe Bafumi and I discuss.

Bartels also recommends representing individual-level predictors by their deviation from group averages. This is ok but I don't think it's necessary. It depends on the context. For example, if you have a predictor that is 1 if you're African American and 0 otherwise, I wouldn't want to subtract that from its state average. In that case you'd be better off including the individual predictor and state % African American as two predictors in the model. In other settings, Bartels's recommendation to center the predictor for each group makes more sense. Either way, this doesn't affect his main recommendation to fit a multilevel model, including important predictors in their group averages as well.

Individual and group-level predictors

Finally, I recommend my 2006 Technometrics paper, "Multilevel (hierarchical) modeling: what it can and cannot do," which begins:

Multilevel (hierarchical) modeling is a generalization of linear and generalized linear modeling in which regression coefficients are themselves given a model, whose parameters are also estimated from data. We illustrate the strengths and limitations of multilevel modeling through an example of the prediction of home radon levels in U.S. counties. The multilevel model is highly effective for predictions at both levels of the model, but could easily be misinterpreted for causal inference.

In particular, see the discussion in Section 2.4 of my paper on the interpretation of a group-level predictor. You have to be careful about calling such coefficients "effects" or interpreting them causally.

Just to let you know things are busy around here . . .

Juan Morales writes:

I am currently fitting a multilevel model to data of fruit removal rates which I model using binomial distributions for the number of removed fruit out of total fruits available. I would like to estimate the proportion of variance explained and the amount of pooling at each level (tree, forest stand and so on). You show how to do such things for the radon example and mention that something similar could be done for generalized linear models using deviances. Has this been done somewhere?

My reply: R-squared for multilevel linear models is discussed in our book and in my paper with Pardoe. I think it would make sense to do this with logistic regression also (perhaps using the latent variable formulation with residual s.e. of 1.6, as Jennifer and I discuss in chapter 5). But I haven't done it yet. A good research paper, I think!

Rajeev sends a link to this paper on hierarchical modeling for evaluating multi-site interventions:

This article discusses the evaluation of programs implemented at multiple sites. Two frequently used methods are pooling the data or using fixed effects (an extreme version of which estimates separate models for each site). The former approach ignores site effects. The latter incorporates site effects but lacks a framework for predicting the impact of subsequent implementations of the program (e.g., would a new implementation resemble Riverside?). I present a hierarchical model that lies between these two extremes. Using data from the Greater Avenues for Independence demonstration, I demonstrate that the model captures much of the site-to-site variation of the treatment effects but has less uncertainty than estimating the treatment effect separately for each site. I also show that when predictive uncertainty is ignored, the treatment impact for the Riverside sites is significant, but when predictive uncertainty is considered, the impact for these sites is insignificant. Finally, I demonstrate that the model extrapolates site effects with reasonable accuracy when the site being predicted does not differ substantially from the sites already observed. For example, the San Diego treatment effects could have been predicted based on their site characteristics, but the Riverside effects are consistently underpredicted.

Seems like a good idea to me. Remember, interactions are important!

Recent Comments

  • Ross: That depends on the hotdog, of course :) read more
  • Ed: I like Nissam Taleb's preference for always losing small amounts read more
  • michael webster: You write: "I don't think it makes sense to use read more
  • Philip: What about modeling actors and preferences as a bipartite graph read more
  • superdestroyer: Bill, Places like Chicago, Maryland, NYC, Mass., demonstrate that you read more
  • Michael Sweeney: Some very good, absolutely fascinating analysis here. I feel that read more
  • Michael Roberts: Thank you for this post. My sediments EXACTLY. This notion read more
  • Preston McAfee: I don't disagree; a lot of related but different behaviors read more
  • Bill: Superdestroyer, demographic shifts won't ever make the U.S. a one-party read more
  • Bill Jefferys: @Greg Davies: Is this article available, and can you provide read more
  • Hopefully Anonymous: "Steve Sailer | November 15, 2009 9:24 AM | Reply read more
  • Andrew Gelman: Preston: I think utility theory is great, both in theory read more
  • Brian Josephson: Pathological Science? Don't forget Pathological Disbelief! read more
  • Andrew Gelman: Bella: I thought it was very accurate. I just didn't read more
  • Bob Hawkins: You think Meryl Streep is wasted in "The Fantastic Mr. read more
  • Ken Williams: When I was in grad school, a fellow student (with read more
  • Bella Stander: Andrew, I thought Weiner's piece was hilarious. Painfully so, because read more
  • Phil: Wait a minute...you saw a movie? read more
  • Jonathan Rodden: Thanks for the comments Andy. A couple of quick responses: read more
  • Preston McAfee: Economics has more than its share of people for whom read more