Results matching “R”

Someone pointed me to this note by John Langford:

Graduating students in Statistics appear to be at a substantial handicap compared to graduating students in Machine Learning, despite being in substantially overlapping subjects.

The problem seems to be cultural. Statistics comes from a mathematics background which emphasizes large publications slowly published under review at journals. Machine Learning comes from a Computer Science background which emphasizes quick publishing at reviewed conferences. This has a number of implications:

1. Graduating statistics PhDs often have 0-2 publications while graduating machine learning PhDs might have 5-15.

2. Graduating ML students have had a chance for others to build on their work. Stats students have had no such chance.

3. Graduating ML students have attended a number of conferences and presented their work, giving them a chance to meet people. Stats students have had fewer chances of this sort.

In short, Stats students have had relatively few chances to distinguish themselves and are heavily reliant on their advisors for jobs afterwards. This is a poor situation, because advisors have a strong incentive to place students well, implying that recommendation letters must always be considered with a grain of salt.

This problem is more or less prevalent depending on which Stats department students go to. In some places the difference is substantial, and in other places not.

One practical implication of this, is that when considering graduating stats PhDs for hire, some amount of affirmative action is in order. At a minimum, this implies spending extra time getting to know the candidate and what the candidate can do is in order.

My first thought is that if CS graduate students really have 5-15 publications, then, yeah, maybe I would like to hire these people--at least they seem to have learned some useful skills! I don't know how different the fields really are--I remember, about 10 or 15 years ago, one of my senior colleagues in the statistics department telling me that one of my papers must not be very good, because I had published only one paper on the topic. (His reasoning: if it were really such a good idea, it should be possible to get several papers out of it.)

Beyond this, I've always assumed that Ph.D. students in certain fields (computer science, also I'm thinking of economics and physics) are simply smarter, on average, than students in more run-of-the-mill fields such as statistics, math, political science, and sociology. I don't mean that this is always true, but I've always thought (with no particular hard evidence) that it was true on the average. Of course, it's important what you learn at these programs too. Top MBA programs are incredibly hard to get into, but that doesn't mean I want to hire an MBA to do research for me.

But, as I've written before, the cool thing about a postdoc (compared to a faculty position, or for that matter compared to admissions to college or a graduate program) is that you're hired based on what you can do, not based on how "good" you are in some vaguely defined sense. I like to hire people know how to fit models and to communicate with other researchers, and my postdocs have included a psychologist, an economist, and a computer scientist, along with several statisticians.

Where the Starbucks and Walmarts are

consumer1.png

consumer2.png

consumer3.png

P.S. The above graph is wrong (see comment by Alex F. below). Corrected graphs are here.

What should we call our book? A possible title is:

"Red State, Blue State, Rich State, Poor State: How Americans are Polarized and How They're Not"

Or maybe,

"The Red State, Blue State Paradox: ..."

We've been told that a subtitle is a good idea, but it would be good for the main title to be crisp.

Perhaps we have to think outside the box and forget about the red/blue thing, I dunno.

Any suggestions?

Thanks in advance. We'll give a free ice cream cone to anybody who comes up with a good idea!

P.S. The book is intended for a general audience. It'll be coming out around Labor Day.

P.P.S. One concern is that I don't know of a lot of popularly successful books with 8-word titles (and that's not even counting the subtitle). One to three words would be best, I'd think.

Ecolanguage

Lee Arnold, a plumber from LA, has developed a wonderful graphical way of communicating the behavior of complex systems, inspired by Odum's work, which he calls Ecolanguage. Here is how he explains banking:

He also tackles some spicier political topics, such as Gore's Assault on Reason, or Social Security. Finally, he uses this graphic language to explain heavier philosophical issues such as semiotics in New Chart.

While the symbols have been used before, Lee's animations and narration make it a great way of communicating quantitative ideas. He should find a publisher and produce a DVD!

Should I try a slower delivery?

At the Harvard 50th anniversary celebration a few months ago, they showed a video of Fred Mosteller's TV lectures on statistics from 1960 or so. The thing that struck me was that Fred was speaking reallllly reallllly slowly. He was a slow talker in real life, but this was so slow that I'm pretty sure he was doing it on purpose, maybe following some specific advice to go slow. Fred was a great teacher (as I remember from being his T.A.), and it made me wonder if I should speak more slowly also. Probably the answer is yes.

Also, when you speak slowly, you have to think more carefully about what to keep in your lecture and what to leave out, which is probably a good idea too.

You are what you spend

I just ran into this article by W. Michael Cox and Richard Alm on the comparison of incomes and spending of rich and poor:

The share of national income going to the richest 20 percent of households rose from 43.6 percent in 1975 to 49.6 percent in 2006 . . . families in the lowest fifth saw their piece of the pie fall from 4.3 percent to 3.3 percent.

Income statistics, however, don’t tell the whole story of Americans’ living standards. Looking at a far more direct measure of American families’ economic status — household consumption — indicates that the gap between rich and poor is far less than most assume.

The top fifth of American households earned an average of $149,963 a year in 2006. As shown in the first accompanying chart, they spent $69,863 on food, clothing, shelter, utilities, transportation, health care and other categories of consumption. The rest of their income went largely to taxes and savings.

The bottom fifth earned just $9,974, but spent nearly twice that — an average of $18,153 a year. How is that possible? . . . lower-income families have access to various sources of spending money that doesn’t fall under taxable income. These sources include portions of sales of property like homes and cars and securities that are not subject to capital gains taxes, insurance policies redeemed, or the drawing down of bank accounts. While some of these families are mired in poverty, many (the exact proportion is unclear) are headed by retirees and those temporarily between jobs, and thus their low income total doesn’t accurately reflect their long-term financial status.

So, bearing this in mind, if we compare the incomes of the top and bottom fifths, we see a ratio of 15 to 1. If we turn to consumption, the gap declines to around 4 to 1. . . .

Let’s take the adjustments one step further. Richer households are larger — an average of 3.1 people in the top fifth, compared with 2.5 people in the middle fifth and 1.7 in the bottom fifth. If we look at consumption per person, the difference between the richest and poorest households falls to just 2.1 to 1.

This would be a good example for an intro statistics class when the topic of measurement comes up. The challenge for a stat class is to focus on measurement issues--how to design a survey to estimate people's income, assets, and spending patterns, or how to design an experiment or observational study to estimate the effects of changes in income on spending.

From the economics perspective, the example confuses me--on one hand, it makes sense to use consumption, not income, as a measure of well-being. On the other hand, if I were given the choice between two options:

(a) Earning $100,000 next year and spending $50,000, or
(b) Earning $40,000 next year and spending $60,000,

I'd prefer option (a). So I don't really know how to think about this. This sort of thing always confuses me in discussions of the utility of money (which I teach in my decision analysis class): it's good to have more money, but, usually, it's not money that brings joy, it's the things that money buys that do it.

In the example above, it would certainly make sense to adjust income for taxes and transfer payments and probably for household size (even if not by simply dividing by the number of people). It's harder for me to think how whether to adjust for savings or for non-cash benefits such as health-insurance.

Two sides to the IRB story

1. This article by Carl Elliott reminded me why institutional review boards (IRBs) are needed.

2. This site (via Seth) reminds me of why IRBs can be a bad thing.

For me, IRBs are typically a waste of time, nothing more, but for others they are a (potential) protection against health hazards and exploitation, and for others they are a barrier to research progress.

Well, she's the expert . . .

From Meet the Press, Doris Kearns Goodwin writes:

Well, look, just as these politicians on the campaign trail are borrowing and absorbing patterns and evolving, so too speechwriters. They look at the best speeches in history. It's inevitable that those patterns are going to be get in their heads. And you know, we can't make too much of this. This is the spoken word. It's different from the written word, and it becomes part of what's in there. As you said, there's not that much in their heads anymore that's coming in that's new. So all that's in there is what was there before.

Howard pointed me to this cool page of artwork by Chris Jordan. Here's one example:

Plastic Cups, 2008 60x90"

Depicts one million plastic cups, the number used on airline flights in the US every six hours.

jordan1.jpg

Partial zoom:

jordan2.jpg

Detail at actual print size:

jordan3.jpg

Several more are at Jordan's website.

Causal inference workshop

Liz Stuart writes:

We are pleased to announce the next Mid-Atlantic Causal Inference Workshop, to be held at Johns Hopkins Bloomberg School of Public Health on Monday and Tuesday May 19-20, ending at noon on May 20.

I was reading this otherwise-excellent article by Elizabeth Kolbert and came across this:

Like neoclassical economics, much democratic theory rests on the assumption that people are rational. Here, too, empirical evidence suggests otherwise. Voters, it has been demonstrated, are influenced by factors ranging from how names are placed on a ballot to the jut of a politician’s jaw. . . . A 2005 study, conducted by psychologists at Princeton, showed that it was possible to predict the results of congressional contests by using photographs. Researchers presented subjects with fleeting images of candidates’ faces. Those candidates who, in the subjects’ opinion, looked more “competent” won about seventy per cent of the time.

I can't really comment on the bit about democratic theory, but I do want to put in a word about this study of candidates' faces. It's a funny result: at first it seems impressive--70% accuracy!--but then again it's not so impressive given that you can predict something on the order of 90% of races just based on incumbency and the partisan preferences of the voters in the states and districts. If 90% of the races are essentially decided a year ahead of time, what does it mean to say that voters are choosing 70% correct based on the candidates' looks.

I can't be sure what's happening here, but one possibility is that the more serious candidates (the ones we know are going to win anyway) are more attractive. Maybe you have some goofy-looking people who decide to run in districts where they don't have a chance, whereas the politicians who really have a shot at being in congress take the time to get their hair cut, etc. More discussion here (see also the comments).

Anyway, the point of this note is just that some skepticism is in order. It's fun to find some scientific finding that seems to show the shallowness of voters, but watch out! I guess it pleases the cognitive scientists to think that something as important and seemingly complicated as voting is just some simple first-impression process. Just as, at the next level, it pleases biologists to think that something as important and seemingly complicated as psychology is just some simple selfish-gene thing.

Basketball statistics

I don't know anything about basketball (except that the players are shorter than they say they are, and I don't really even know that); nonetheless, in the recent-but-still-grand tradition of blogging . . .

I was lucky to see most of the talk that Josh Tenenbaum gave in the psychology department a couple weeks ago. He was talking about some experiments that he, Charles Kemp, and others have been doing to model people's reasoning about connectedness of concepts. For example, they give people a bunch of questions about animals (is a robin more like a sparrow than a lion is like a tiger, etc.), and then they use this to construct an implicit tree structure of how people view animals. (The actual experiments were interesting and much more sophisticated than simply asking about analogies; I'm just trying to give the basic idea.) Here's a link to some of this work.

My quick thought was that Tenenbaum, Kemp, et al. were using real statistics to model people's "folk statistics" (by which I mean the mental structures that people use to model the world). I have a general sense that folk statistical models are more typically treelike or even lexicographical, whereas reality (for social phenomena) is more typically approximately linear and additive. (I'm thinking here of Robyn Dawes's classic paper on the robust beauty of additive models, and similar work on clinical vs. statistical prediction.) Anyway, the method is interesting. I wondered whether, in the talk, Tenenbaum might have been slightly blurring the distinction between normative and descriptive, in that people might actually think in terms of discrete models, but actual social phenomena might be better modeled by continuous models. So, in that sense, even if people are doing approximate Bayesian inference in their brains, it's not quite the Bayesian inference I would do, because people are working with a particular set of discrete, even lexicographic, models, which are not what I suspect are good descriptions of most of the phenomena I study (although they might work for problems such as classifying ostriches, robins, platypuses, etc.).

Near the end of his talk, Tenenbaum did give an example where the true underlying structure was Euclidean rather than tree-like (it was a series of questions about the similarity of U.S. cities), and, indeed, there he could better model people's responses using an underlying two-dimensional model (roughly but not exactly corresponding to the latitude-longitude positions of the cities) than a tree model, which didn't fit so well.

I sent Tenenbaum my above comment about real and folk statistics, and he replied:

I'd expect that for either the real world or the mind's representations of the world, some domains would be better modeled in a more discrete way and others in a more continuous way. In some cases those will match up - I talked about these correspondences towards the end of the talk, not sure if you were still there - while in other cases they might not. It would be interesting to think about both kinds of errors: domains which our best scientific understanding suggests are fundamentally continuous while the naive mind treats them as more discrete, and domains which our best scientific understanding suggests are discrete while the naive mind treats them as more continuous. I expect both situations exist.

Also, the "naive mind" is quite an idealization here. The kind of mental representation that someone adopts, and in particular whether it's more continuous or discrete, is likely to vary with expertise, culture, and other experiential factors.

My reply:

I think the discrete/continuous distinction is a big one in statistics and not always recognized. Sometimes when people argue about Bayes/frequentist or parametric/nonparametric or whatever, I think the real issue is discrete/continuous. And I wouldn't be surprised if this is true in psychology (for example, in my sister s work on how children think about essentialism).

Tenenbaum replied to this with:

While the focus for most of my talk emphasized tree-structured representations, towards the end I talked about a broader perspective, looking at how people might use different forms of representations to make inferences about different domains. Even the trees have a continuous flavor to them, like phylogenetic trees in biology: edge length in the graph matters for how we define the prior over distributions of properties on objects.

I'll buy that.

On a less serious note . . .

This reminds me of all sorts of things from children's books, such as pictures of animals that include "chicken" and "bird" as separate and parallel categories, or stories in which talking cats and dogs go fishing and catch and eat real fish! (The most bizarre of all these, to me, are the Richard Scarry stories in which the sentient characters include a cat, a dog, and a worm, and they go fishing. My naive view of the "great chain of being" would put fish above worms, but I guess Scarry had a different view.)

This looks it could be interesting.

In the article, "Activists and partisan realignment in the United States," published in 2003 in the American Political Science Review, Gary Miller and Norman Schofield point out that the states won by the Democrats and Republicans in recent elections are almost the opposite of the result of the election of 1896:

1896a.png

Miller and Schofield describe this as a complete reversal of the parties' positions. In their story, in 1896 the parties competed on social (racial) issues, with the Republicans on the left and the Democrats on the right. Then the parties gradually moved around in the two dimensional social/economic issue space, until from the 1930s through the 1960s, the parties primarily competed on economic issues. Since then, in the Miller/Schofield story, the parties continued to move until now they compete primarily on social issues, but now with the Democrats on the left and the Republicans on the right.

It's an interesting argument but I have some problems with it. First off, it was my impression that the 1896 election was all about economic issues, with the Democrats supporting cheap money and easy credit (W. J. Bryan's "cross of gold" speech) and the Republicans representing big business. At least in that election, it was the Democrats on the left on economic issues and the Republicans on the right.

Getting to recent elections, the evidence from surveys and from roll call votes is that the Democrats and Republicans are pretty far apart on economic issues, again with the D's on the left and the R's on the right. So, from that perspective, it's not the parties that have changed positions, it's the states that have moved. The industrial northeastern and midwestern states have moved from supporting conservative economic policies to a more redistributionist stance. Which indeed is something of a mystery, and it's related to attitudes on social issues, but I certainly wouldn't say that economic issues don't matter anymore. According to Ansolabehere, Rodden, and Snyder, social issues are more important now in voting than they were 20 years ago, but economic issues are still voters' dominant concern.

1896 vs. 2000 by counties within each state

Here are some more pretty pictures. First, within 6 selected states, a scatterplot of Bush vote share in 2000 vs. McKinley vote share in 1896. There are completely different patterns in different states! Nothing like as clean a pattern as the statewide plot above.

1896b.png

And here's another plot, this time showing each county as an ellipse, with the size of the ellipse proportional to the population of the county (more precisely, the voter turnout) in the two elections.

1896c.png

Nowadays the Democrats clearly do better in the big cities (in these graphs, the large-population counties). In 1896 the pattern wasn't so clear. I'd be interested to know what Jonathan Rodden thinks of all this. . .

Using simulation to do statistical theory

We were looking at some correlations--within each state, the correlations between income and different measures of political ideology--and we wanted to get some sense of sampling variability. I vaguely remembered that the sample correlation has a variance of approximately 1/n--or was that 0.5/n, I couldn't remember. So I did a quick simulation:


> corrs <- rep (NA, 1000)
> for (i in 1:1000) corrs[i] <- cor (rnorm(100),rnorm(100))
> mean(corrs)
[1] -0.0021
> sd(corrs)
[1] 0.1

Yeah, 1/n, that's right. That worked well. It was quicker and more reliable than looking it up in a book.

Tied presidential elections

Apropos of this discussion, here's a list of all the U.S. presidential elections that were decided by less than 1% of the vote:

1880
1884
1888
1960
1968
2000

Funny, huh? Other close ones were 1844 (decided by 1.5% of the vote), 1876 (3%), 1916 (3%), 1976 (2%), 2004 (2.5%).

Four straight close elections in the 1870s-80s, five close elections since 1960, and almost none at any other time.

Electability blah blah blah

The predictability of election outcomes from fundamental variables suggests that different presidential candidates from the same party don't differ much in the votes they will receive in the general election. It's better to be a moderate than an extremist, and it's better to be a better campaigner etc., but all these things together probably only count for a couple of percentage points of the vote.

Steven Rosenstone wrote about this in 1984 in his book, Forecasting Presidential Elections, and I don't think the elections since then have given any reason to doubt Rosenstone's logic.

Now, don't get me wrong: a couple of percentage points of the vote can make a big difference--just look at the tied elections of 1960, 1968, 1976, and 2000, as well as the very close election of 2004. Also, who knows how things will go with the unprecedented "woman or young black guy vs. old white guy" dynamic. But, based on past elections, I'd say the whole "electability" thing is overrated. Once Election Day comes around, people will find a reason to vote for the party they want to support.

Just to let you know . . .

We're busy finishing our book during these next couple of weeks. So if I'm slow in responding to messages, just wait. You might hear from me in mid-March.

Lee Sigelman writes,

In the latest issue of The Political Methodologist, James S. Krueger and Michael S. Lewis-Beck examine the current standing of the time-honored but oft-dismissed-as-passe ordinary least squares regression model in political science research. . . . Krueger and Lewis-Beck report that . . . The OLS regression model accounted for 31% of the statistical methods employed in these articles. . . . “Less sophisticated” statistical methods — those that would ordinarily be covered before OLS in a methods course — accounted for 21% of the entries. . . . Just one in six or so of the articles that reported an OLS-based analysis went on to report a “more sophisticated” one as well. . . . OLS is not dead. On the contrary, it remains the principal multivariate technique in use by researchers publishing in our best journals. Scholars should not despair that possession of quantitative skills at an OLS level (or less) bars them from publication in these top outlets.

I have a few thoughts on this:

1. I don't like the term OLS ("ordinary least squares"). I prefer the term "linear regression" or "linear model." Least squares is an optimization problem; what's important (in the vast majority of cases I've seen) is the model. For example, if you still do least squares but you change the functional form of the model so it's no longer linear, that's a big deal. But if you keep the linearity and change to a different optimization problem (for example, least absolute deviation), that generally doesn't matter much. It might change the estimate, and that's fine, but it's not changing the key part of the model.

2. I like simple methods. Gary and I once wrote a paper that had no formulas, no models, only graphs. It had 10 graphs, many made of multiple subgraphs. (Well, we did have one graph that was based on some fitted logistic regressions--an early implementation of the secret weapon--but the other 9 didn't use models at all.) And, contrary to Cosma's comment on John's entry, our findings were right, not just published. The purpose of the graphical approach was not simply to convey results to the masses, and certainly not because it was all that we knew how to do. It just seemed like the best way to do this particular research. Since then, we've returned to some of these ideas using models, but I think we learned a huge amount from these graphs (along with others that didn't make it into the paper).

3. Sometimes simple methods can be justified by statistical theory. I'm thinking here of our approach of splitting a predictor at the upper quarter or third and the lower quarter or third. (Although, see the debate here.)

4. Other times, complex models can be more robust than simple models and easier to use in practice. (Here I'm thinking of bayesglm.)

5. Sometimes it helps to run complicated models first, then when you understand your data well, you can carefully back out a simple analysis that tells the story well. Conversely, after fitting a complicated model, you can sometimes make killer graphs.

A couple of colleagues sent me a copy of an article by Steve Sailer in The American Conservative called Value Voters and subtitled, "The best indicator of whether a state will swing Red or Blue? The cost of buying a home and raising a family.'' It's here online, and an excerpt is on Sailer's blog.

The article gives a new (to me) take on the red-state, blue-state paradox: the point that Republicans do well in lower-income states, even though they do better among richer voters. Sailer points out that the strong Republican states in the south and middle of the country have lower cost of living (which of course goes along with them being low income). But, more to the point, many of the rich, Democratic-leaning metropolitan have housing costs that are even more expensive. Sailer attributes some of this to what he calls the Dirt Gap--coastal cities such as NY, Boston, and LA are bounded by water which limits their potential for growth, as compared to inland cities such as Dallas or St. Louis: "The supply of suburban land available for development is larger in Red State cities, so the price is lower."

Sailer notes that Republicans do better among married voters, and he has the following impressive graph:

sailer.png

Even excluding D.C., the correlation is high. (Sailer has some discussion about why he's only looking at white women here. I don't follow all his reasoning here; you can read his article to get the details.)

We were talking about Sailer's article around the office today, and the point was made that the above graph illustrates that Bush did better in states where people are more culturally conservative (in this case, as measured by the chance of women marrying at a young age).

I'm not quite sure how this relates to our finding that the Republican advantage among high-income voters is large in poor states and small in rich states. One suggestion was that, in poor states, your high income gets you a larger house, whereas in a rich state, even with a high income you're not living in a palace. Sailer might say that cultural conservatives will move out of rich states, even if they're high income, because they want a nice house with a yard for their kids to play in. We've tried to look into some of these things, but it's a challenge to analyze data on moving. It's easier to analyze cross-sectional surveys, so that's what we spend most of our time doing.

To get back to the main point, Sailer is making a geographic argument, that Democrats do better in coastal states because families are less likely to live in coastal metropolitan areas, because housing there is so expensive, because of the geography: less nearby land for suburbs. This makes a lot of sense, although it doesn't really explain why the people without kids want to vote for Democrats and people with kids want to vote for Republicans. I can see that more culturally conservative people are voting Republican, and these people are more likely to marry and have kids at younger ages--but in that sense the key driving variable is the conservatism, not the marriage or the kids.

I think Sailer's arguments are interesting but I can't quite follow him all the way to his conclusion, where he says that "the late housing bubble . . . reduced the affordability of family formation, which should help the Democrats in the long run." I just don't see where the data are showing this. I'm not saying he's wrong--I've certainly heard it said, for example, that the postwar boom helped the Republicans, and conservative causes generally, by moving millions of people up into the middle class--but it just seems like a stretch. I also don't follow his claim that if the Republican party should move to restrict immigration--that it "could then position itself as the party of more weddings and more babies." Immigrants have weddings and babies too, right?

That said, the point about affordability of housing seems important, and it's not always captured in standard cost-of-living measures (see here). And it's interesting to see these correlations between demography and voting.

I'll try to clarify my recent entry on unintended consequences by focusing on a less politically-loaded example.

Millions of people in south Asia are exposed to high levels of arsenic in their drinking water. It's a natural contaminant (something to do with the soil chemistry) but it's become an increasingly important problem in the past decades because people have been digging millions of deep (~ 100 feet) tubewells. The background is that the surface water is often contaminated, and international organizations have been encouraging the locals to dig these tubewells which draw clean water from hundreds of feet below ground. Unfortunately, some of that water is contaminated with arsenic. A true unintended consequence. But what to do next?

There are various solutions out there, including a low-cost device for purifying surface water. My connection to this is that I've been involved in a project to give information to people in Bangladesh about where and how deep to dig to find arsenic-free deep water. In some places you have to drill hundreds of feet deep, and this can be expensive (relative to Bangladeshis' incomes). So we're setting up an insurance system for people there, so they can pay a little bit more but be assured of eventually getting a safe well, or their money back. The idea is to provide incentives for well-drillers also, to set up an ongoing system where there is trust and so that safe wells can be installed.

More unintended consequences?

Two concerns about unintended consequences arise. First, on the physical level, there is a concern that, if people build wells taking clean water from deep aquifers, they'll start using that water more and more (just as we in the developed world flush our toilets with fresh water, etc), leading to changes in the water flow that might bring arsenic down there or have other bad consequences. I don't know enough to evaluate this concern so I'm just trusting my colleagues on this.

The second concern is something I mentioned to my collaborators the other day: should we really be offering this insurance scheme at all? The goal of the program is to get people to dig deeper wells than they otherwise would've done, by setting up incentives for customers and well-drillers to get together. (I should explain that this is intended to be a revenue-neutral, "at cost," system: not a subsidy for Bangladeshis to dig wells, but not a moneymaker for us, either. The money would be made by the drillers, and this would provide an incentive for the program to continue.)

Anyway, I asked my collaborators whether maybe we shouldn't be doing this program at all, since we're trying to get people to do something they wouldn't do themselves.

One of my colleagues replied that, no, it was a good idea, and for us not to do it would be "paternalistic" in that we're saying that we know what's best for the locals. We can offer the insurance and they can decide. But, wait! I said. If we really want to be non-paternalistic, we wouldn't get involved at all, right?

Defaults

It seems that these debates come down to the choice of the default. If the default is to do our insurance program, then it's paternalistic to consider not doing it. But if the default is for us to stop messing around in Bangladesh, then it's paternalistic to try to motivate them to dig deep wells. (The unintended consequence of the mid-1990s intervention--encouraging moderately deep tube wells--is cautionary, but it's not clear that this should be a message that we shouldn't get involved.)

Melissa Lafsky writes in Freakanomics discusses how biofuels, which have been proposed as an environmentally-friendly alternative energy source, have been estimated to create more pollution than drilling for more oil. And then, of course, climate change is itself a huge unintended consequence of industrialization. I just have a couple of comments.

1. Alex Tabarrok wrote:

The law of unintended consequences is what happens when a simple system tries to regulate a complex system. The political system is simple, it operates with limited information (rational ignorance), short time horizons, low feedback, and poor and misaligned incentives. Society in contrast is a complex, evolving, high-feedback, incentive-driven system. When a simple system tries to regulate a complex system you often get unintended consequences.

I like this description but it doesn't quite fit either of the examples here. To start with, climate change was an unanticipated consequence of industrialization. But industrialization was not designed to regulate the climate (schemes such as cloud-seeding aside). So maybe Alex's paragraph is more of a description of perverse unintended consequences.

To take the other example: Yes, biofuels were proposed to regulate climate change, so the first half of Alex's description works. But the second part isn't quite appropriate, because the unintended consequences were discovered in advance. According to the quoted report, "Prior analyses made an accounting error." So in this case it doesn't sound like a problem in anticipating feedback.

2. This brings me to my second point, which is that the problem seems to have been discovered before the massive shift to biofuels actually happened, so the problem "for the next 93 years" won't really happen. According to the article, "scientists [are] already calling for government reform on biofuel policies." So this is more of an anticipated than an actual unintended consequence.

Meta-analysis question

Brant Inman writes:

More jobs for the Bayesians

Peter Green et al. point us to the following opportunity:

In his American Psychologist article reviewing studies of personality profiles and political affiliation, and the followup article here, John Jost writes,

Compared with liberals and moderates, conservatives score significantly higher on psychological instruments designed to measure epistemic needs for order, structure, simplicity, certainty, and closure, and they score significantly higher on instruments designed to measure the intensity of existential concerns such as fear of death and perceptions of a dangerous world. In terms of basic personality dimensions, liberals (and leftists) score significantly higher on Openness to New Experiences, and their greater open-mindedness manifests itself in terms of creativity, curiosity, novelty, diversity, and interest in travel. By contrast, conservatives (and rightists) score higher on Conscientiousness, and they are generally more orderly, organized, duty-bound, conventional, and more likely to follow rules. The evidence strongly contradicts the commonly held assumption that political orientation is “consistently and strikingly unrelated to personality and temperament factors."

Like much social science, the above statement seems either obviously true or a ridiculous distortion, depending on how you look at it. But that's the purpose of doing research, to try to evaluate such hypotheses. There are usually many different interpretations, but it's good to establish the facts.

Taboo research?

In any case, what interests me here, beyond the importance of the topic itself, are the political reactions to such work. Jost's article is called "The End of the End of Ideology" and it describes how studies of personality characteristics, political orientation, and authoritarianism were popular in the 1950s but had fallen into disfavor for several decades after. One claim that Jost makes is that much of the opposition to this research has been, implicitly or explicitly, political: findings such as, "conservatives have more authoritarian personalities, and liberals have more openness" are not value-neutral and do not fit into the mainstream of modern political science. (In contrast, a psychologist can feel more free to do such work, since psychologists do not have the same occupational inclination toward treating different political orientations symmetrically.)

The short version of the argument is: the data show correlations between personality types and political ideologies, but these results don't fit well into the usual framework of political science, so this line of research is less well developed than it should be. It's what Steven Pinker would call a taboo question. In this case, it's people on the right who object to this research, who find it silly.

More recently, there's been research on genetics and political behavior (see here for some of James Fowler's work in this area), and I imagine there's some resistance from people on the political left, considering the general association of genetics with racism or, more generally, social determinism (the idea that our positions in society are basically determined by our genes and thus (a) aren't anybody's fault, and (b) can't easily be fixed.

Political cover?

I'm hoping that these two strands of research can provide political cover for each other. The people who study personality types can connect their work to genetics (or "human nature," for the nonbelievers in evolution) to placate the conservatives, and the people who study genetics can discuss the personality research to keep the liberals at bay.

David Afshartous writes,

Verbeke & Molenberghs (2000; Linear Mixed Models for Longitudinal Data; Section 23.2, p.392) discuss an analytic procedure for power calculations. Specifically, for the case of testing a general linear hypothesis, under the alternative hypothesis the test statistic distribution can be approximated by a noncentral F distribution (supported by simulation results of Helms 1992). And power calculations are thus obtained by incorporating the appropriate quantile from the null F distribution (as opposed to a null chi-squared distribution that doesn't account for variability introduced in estimating variance components).

What is your opinion of this approach versus your suggested simulation approach (p.437 of recent book)? Perhaps the analytic method above is not as appealing due to the dependence of the results on the method to estimate the denominator degrees of freedom in the null F distribution (albeit lower dependence for larger sample sizes)?

My reply: I have mixed feelings about power calculations in general. The topic of statistical power is hugely important (see here for my recent thoughts on the perils of underpowered studies), but I have real problems with the standard "NIH-style" power calculations. Briefly, the problem is that the calculations are set up as if there is a certain power goal, and you get the data necessary to reach it, but realistically it's often the other way around, with a sample size set from practical considerations and then a power calculation set up to get the answer you need. We have a whole chapter on power calculations in our book but I don't really know if I like the idea at all.

In answer to the specific question: I hate thinking about these F distributions--the last time I thought hard about them was for my 1992 paper with Rubin--so I prefer simulation despite its occasional awkwardness.

See here for a link to a useful article on power calcuations by Russ Lenth. (The link is from June, 2005; many of you probably weren't reading this blog back then...)

More discreteness, please

Justin Wolfers presents this graph that he (along with Eric Bradlow, Shane Jensen, and Adi Wyner) made comparing the career trajectory of Roger Clemens to other comparable pitchers:

clemens.jpg

The point is that Clemens did unexpectedly well in the later part of his career (better earned run average, allowed fewer walks+hits) compared to other pitchers with long careers. This in turn suggests that maybe performance-enhancing drugs made a difference. Justin writes:

To be clear, we don’t know whether Roger Clemens took steroids or not. But to argue that somehow the statistical record proves that he didn’t is simply dishonest, incompetent, or both. If anything, the very same data presented in the report — if analyzed properly — tends to suggest an unusual reversal of fortune for Clemens at around age 36 or 37, which is when the Mitchell Report suggests that, well, something funny was going on.

I can't comment on the steroids thing at all, but I will say that I'd like more information than are in the graphs. For one thing, Clemens is clearly not a typical pitcher and never has been. At the very least, you'd like to see the comparison of his trajectory with all the other individual trajectories, not simply the average. For another, the graphs above seem to be relying way too much on the quadratic fit. At least for the average of all the other pitchers, why not show the actual averages. Far be it from me to criticize this analysis (especially since I am friends with all four of the people who did it!)--this is just a recreational activity, and I'm sure these guys have better things to do than correct ERA's for A.L./N.L. effects, etc.--but I think you do want to have some comparisons of the entire distribution, as well as a sense of how much the "unusal reversal around ages 36 or 37" is an artifact of the fitted model.

P.S. to Justin, Eric, Shane, and Adi: Now youall have permission to be picky about my analyses in return. . . .

P.P.S. Nathan made this plot showing data from the 16 most recent Hall of Fame pitchers.

Color tile visual illusion

Since I'm taking things from BoingBoing now, check these out:

squares.jpg

To quote: "The 'blue' tiles on the top face of the left cube are the same color as the "yellow" tiles in the top of the right cube."

No statistical content here at all. But maybe you could draw an analogy to hypothesis testing, the idea that two studies could give identical results, but one could be statistically significant and the other non-significant, if the two studies were embedded in different experimental designs. Or in meta-analysis, two different studies could be interpreted differently if surrounded by different sets of other studies in a hierarchical model. In that case, though, the perceptions of difference would be real and the fact that the two studies were, in isolation, "the same color," would miss the point.

P.S. Regarding Bill's comment below: the illusion is not new, it's just a cool presentation of a result that is well-known among vision researchers.

This reminds me of the following story:

We have a high school student working here one day a week on some of our research projects. It's great to have him around. Anyway, he was telling us a few weeks ago that Hillary Clinton spoke at his school and he met her. I asked him what she was like. He said that she was really old. Also, really short. He said she's supposed to be 5'7" (see here for some competing estimates) but that in his judgment she couldn't have been 5'4" in heels. I told him that Isiah Thomas probably isn't really 6'1" either.

Dead heat

Gary sent along this news article from the Syracuse Post-Standard:

Dead heat: Obama and Clinton split the Syracuse vote 50-50

by Mike McAndrew

In the city of Syracuse, the strangest thing happened in Tuesday's Democratic presidential primary.

Sen. Hillary Clinton and Sen. Barack Obama received the exact same number of votes, according to unofficial Board of Election results.

Clinton: 6,001.

Obama: 6,001.

"Wow, that is odd," said Jay Biba, Clinton's Central New York campaign coordinator. "I never heard of that in my life."

The odds of Clinton and Obama tying were less than one in 1 million, said Syracuse University mathematics Professor Hyune-Ju Kim.

The Fenimore Cooper of sociobiology

Oddly enough, I've received two unrelated emails attaching articles shooting down hypotheses of the notorious Satoshi Kanazawa: a paper by Kevin Denny in the Journal of Theoretical Biology:

Recently Kanazawa (2005) proposed a generalization of the Trivers–Willard hypothesis which states that parents who possess any heritable trait that increases male reproductive success at a greater rate than female reproductive success will have more male offspring. . . . This note shows that analysing the same data somewhat differently leads to very different conclusions.

and one by Vittorio Girotto and Katya Tentori in Mind & Society:

According to Kanazawa (Psychol Rev 111:512–523, 2004), general intelligence, which he considers as a synonym of abstract thinking, evolved specifically to allow our ancestors to deal with evolutionary novel problems while conferring no advantage in solving evolutionary familiar ones. We present a study whereby the results contradict Kanazawa’s hypothesis by demonstrating that performance on an evolutionary novel problem (an abstract reasoning task) predicts performance on an evolutionary familiar problem (a social reasoning task).

These, on top of other debunkings of this work by Volscho, Freese, and others, makes me think that Kanazawa is actually serving a useful role in the fields of biology and sociology by evoking such interesting rebuttals.

P.S. I probably should stop bringing this stuff up--it's just that I got those two emails one right after the other. As David Weakliem and I discuss in our paper, Kanazawa's work is not particularly interesting in itself except as an example of genuine statistical challenges that arise in the estimation of very small effects. Basically, the multiple comparisons problem in action, but with a twist in that Kanazawa has been successful enough at getting his ideas out there that he's attracted debunkers. Presumably, there's lots of stuff like this out there in the scientific literature that nobody even notices. In studying these problems, I'd like to think that I'm contributing to the search for better methods of estimating small effects, not simply making fun of the errors of non-statisticians.

It's also interesting to me that biologists and economists seem to fall for this stuff, while sociologists and psychologists see the flaws right away. Presumably because sociologists and psychologists have lots of experience studying small effects in the context of individual variation.

Primary impressions

From a recently overheard conversation:

Friend 1: Who did you vote for?

Friend 2: How about you?

1: I slightly preferred Obama but I voted for Clinton because I wanted to make my wife happy--she's really excited about Hillary.

2: I can't stand Hillary but I don't know if that's real or whether I've just been manipulated by the media.

1: Hmm . . . what don't you like about her?

2: Y'know, that $100,000 she made in three hours investing in cattle futures, all those sleazy people they hang out with, they pardoned that guy Marc Rich . . . but, yeah, probably every politician has some sleazy connections.

1: It's just part of the game. Obama's younger, maybe he hasn't made all these contacts yet, but it'll happen.

2: Yeah, sure. I'm sure McCain has lots of crooked friends too. . . . There's also the war. That's a legitimate reason to not want to vote for Hillary--she supported the war. It's not enough of a reason for me to hate her, though.

1: Especially since you supported the war yourself.

2: No I didn't.

1: Yeah, I remember having a long conversation with you back in 2003. I opposed the war and you supported it.

2: No way! I was torn about the war but I opposed it.

1: That's not what I remember.

2: No, I opposed it.

1: There was a study that found that lots of people say now that they opposed the war, lots more than actually opposed the war. I think you're one of those people.

2: No, you just don't remember what I said to you back then.

1: I understand what you're saying, but I think you're the one who's misremembering.

etc etc.

P.S. Rebecca adds: thought you might find Matt's very brief but insightful take on the Hillary electability question interesting. I think he's right that these results might give us some purchase into how she'd fare in a genral election. . .

On the subject of how Hillary will do in a general election, the following results stood out at me from Super Tuesday:

Alaska - 74 - 25 Obama
Idaho - 80 - 17 Obama
Kansas - 74 - 26 Obama
Colorado - 67 - 32 Obama

Yikes! She is not liked out West. I mean, those were Democrats.

Why Welfare States Persist

My review of “Why Welfare States Persist,” by Clem Brooks and Jeff Manza, for Political Science Quarterly:

Why do welfare states persist? Because they are popular, argue Clem Brooks and Jeff Manza in their new book, a statistical study of the connections between public opinion and policies in 16 rich countries in Europe and elsewhere.

Rich capitalist democracies around the world differ widely in their welfare states—their systems of government--provided social support--despite having comparable income levels. Brooks and Manza report that welfare state spending constituted 27% of GDP in “social democratic countries” such as Sweden and 26% of GDP in “Christian democratic countries” such as Germany, but only 17% in “liberal democracies” such as the United States and Japan. These differences are correlated with differences in income inequality and poverty rates between countries.

Cycles in closeness of elections

Dave Wascha writes,

Course evaluations and butterfly ballots

I recently taught a short course, and, at the end, the students in the class filled out evaluation forms where they filled in the little circles. I just received copies of the forms in the mail. Amazingly (to me), 4 of the 25 forms in the class were filled out in error, with people getting the direction of the questions reversed (filling in "strongly disagree" where they meant "strongly agree," etc.) Three of the four people caught themselves and scribbled over their mistakes (this wouldn't work for a machine reader, though), but one never seemed to notice at all!

Perhaps this will be a less important issue now that everything is moving online, but just a reminder that it's good to provide some confirmation of people's choices. Especially fin areas more important than teaching evaluations.

It's all over but the normalizin'

Ted Dunning writes:

You advocated recently [article to appear in Statistics in Medicine] the normalization of variables to have average deviation of 1/2 in order to match that of a {0,1} binary variable.

This recommendation will disturb lots of people for obvious reasons which may make your recommendation sell better.

But have you considered normalizing the binary variable to {-1, 1} instead of {0,1} before adjusting the mean to zero? This has the same effect but leaves larger communities happier, particularly because much of the applied modeling community has always normalized their binary variables to this range.

My reply: I actually went back and forth on this for awhile. In most of the regression analyses in political science, economics, sociology, epidemiology, etc., that I've seen, it's standard to code binary variables as 0/1. But, yeah, the other way to go would've been to standardize by dividing by 1 sd and then give the recommendation to code binary variables as +/- 1. Maybe that would've been a better idea. I was trying to decide which way would disturb people less, but maybe I guessed wrong!

I'm doing some work related to Geologic Carbon Sequestration (also called Geologic Capture and Storage): the idea is to capture carbon dioxide from power plants or other industrial sources, and pump it deep underground into geologic formations that will trap it for centuries or millenia. It sounds desperate, and I initially had substantial misgivings, but upon looking into it more I think it is a good idea and we should get started. But there are still some major political, legal, and economic issues that have to be resolved. I'm working on an issue that is on the border between the technical and regulatory spheres:

The State of California will soon be called on to decide whether sequestration should be allowed in some specific places. The government will have to assess the risks and decide whether a particular spot is "safe enough." Any site that is likely to be proposed will be considered by experts to be highly likely to retain almost all of the CO2 that is pumped into it...but of course, the experts could be wrong. It's very hard to characterize the subsurface---there could be faults or old boreholes that you don't know about---so maybe the CO2 could leak out. If it does, bad things can happen.

According to Andrew Sullivan, a political commentator named Michael Graham wrote,

I am so confident of both a Patriots win today and a Romney win in Massachusetts on Tuesday that I made this pledge on the air Friday: 'If the NY Giants beat the Patriots in the Super Bowl, I will vote cast my Super Duper Tuesday primary vote for (shudder) John McCain.

But . . . the Patriots were favored by 14 points, and if you look up "football" in the index of Bayesian Data Analysis, you'll see that football point spreads are accurate to within a standard deviation of 14 points, with the discrepancy being approximately normally distributed. So, a 14-point underdog has something like a 15% chance of winning. It's funny how people don't get this sort of thing.

On the other hand, his pledge is nonenforceable so it's no big deal.

We had a brief discussion here about prior distributions for clinical trials, motivated by a question from Sanjay Kaul. In that discussion, I asked what Sander Greenland would say. Well, here are Sander's comments:

My view is that if you are going to do a Bayesian analysis, you ought to do an informative ("subjective") Bayesian analysis, which means you should develop priors (note the plural) from context. I don't believe in reference or default priors except as frequentist shrinkage devices, and even then they need to be used with careful attention to proper contextual scaling. In my view the frequentist result is the noninformative analysis (I attach a recent Bayes primer I wrote on invitation from a clinical journal which explains why), and there is no contextual value added (and much harm possible) in Bayesian attempts to imitate it.

I think the idea of noncontextual default priors is like the idea of a one-size-fits-all shoe: Damaging except in the minority of cases it accidentally fits. Yet it seems even nominally Bayesian statisticians dislike the idea of procedures that must be fit to each context. In contrast, I think the ability to take advantage of expert contextual information is the chief argument for Bayesian over frequentist procedures (note that modern high-level applied frequentism has largely morphed into "machine learning," meaning fully automated procedures which have their place, but not in clinical-trial analysis).

On a separate issue that applies to Andrew's reply: Any regression procedure (frequentist or Bayesian) based on scaling the covariates using on their sample distribution confounds accidental features of the sample with their effects, which is to say it adds more noise to an already noisy system, and should be avoided. So it is with the default prior recommended in the Gelman et al. preprint.

For "conservative" Bayesian analysis I recommend instead contextual "covering" priors (from "covering all opinions" or some might say "covering your ass") which assign high relative probability to all values that anyone involved in the topic would take remotely seriously. Your alternative from the power calculation provides an example of a non-null point that needs to be covered that way and gives an idea of the minimum range of high probability density for the prior. I discourage uniform priors for unbounded parameters because they unrealistically cut off suddenly, a property they transmit to the posterior (meaning that the posterior will cut off even if the data screams "big effect"!).

After proper contextual (not SD) scaling, I think a not bad very vague prior to start off in most clinical and epidemiologic contexts is the logistic for a log odds ratio or log hazard ratio, which is just a generalization of Laplace's law of succession and assigns over 95% prior probability to the ratio being between 1/40 and 40 (if the effect were larger than this there would hardly be a question of subtle analysis). This corresponds to adding 2 prior observations (as opposed about 1 for the Gelman et al. default). I do believe in examining results under different priors, and one can go from the logistic inward, skewing as desired by altering the number of observations added to the treated and untreated. One can use non-null centers but I find them a little scary and prefer skewing instead unless the null is improbable to all observers (e.g., as often the case with age and sex coefficients when using priors for confounder control).

Also Sander had a couple of technical notes regarding our paper:

1) The Witte-Greenland-Kim 1998 article you cited was rapidly superseded by the attached Witte et al. 2000 paper, which takes advantage of a then new SAS feature and works better than the 1998 proc. So you ought to sub the 2000 paper for the 1998 one.

2) Note that the log-F prior gets heavier tails as the A (half the F degrees-of-freedom) parameter drops, so another and computationally easier way to accomplish what you wanted would be to reduce A below 1/2 to the point that it roughly matched the Cauchy and then rescale, instead of rescaling the Cauchy to match the add-1/2 prior. See my SIM paper for details on use of the rescaled log-F (which I called generalized conjugate in my Biometrics 2003 paper).

3) My IJE paper shows how to use all this in logistic and Cox regressions.

In recognition of the crudeness of all these models in health and social sciences (which you acknowledge), I have hard time seeing the need for the more elaborate R calculations you use, when I can just use ordinary software with added records and an offset term to get the same effects (see the section on the offset method in the IJE paper).

My reply (beyond, of course, Thanks for all the suggestions) is:

- I think that what you (Sander) are calling an informative or "subjective" prior is similar to what I'm calling "weakly informative," in that you're adding information but not putting in everything you might currently believe about the science of a problem. I agree that, ideally, priors adapted to individual problems are better than default priors. But I think default priors are better than no priors or flat priors (which are themselves a default, just not always a very good default).

- I have to look at all your articles... It makes sense that similar functionality could be gained using different functional forms or parametric families of models.

- Regarding your comment on the "elaborate R calculations": Actually, bayesglm is more robust and easier to run than glm. Whether or not the programming was elaborate, it's transparent to the user. With classical glm (which, of course, most users also don't understand) you get separation and instability all the time. I have no problem with ordinary software etc... but if you're using R, then bayesglm _is_ "ordinary software"--it's part of the freely downloadable "arm" package. You might as well say, "Who needs fancy linear regression software, when I can just use ordinary matrix inversion to solve least squares problems?" If the method is good (which we can argue about, of course), then it's a perfectly natural step to program it directly so that the user (including me) can just run it, thus automating the steps of calculating an offset term, etc.

Sander responded to my last comment as follows:

Most epidemiologists simply will not use anything -- I mean will not use ANYTHING -- that is not a regular proc in one of a few major packages: SAS, Stata, maybe SPSS. Some will not even allow publication of their study results except as verified through a major package (for a reason why see the attached, in which the authors could not bring themselves to come out and say upfront "we goofed because we trusted the S-Plus defaults"). So, however good your R procs may be, their outputs just aren't about to enter my field except through occasional forays as you may make, and I'll wager the situation isn't all that different in many other fields. Hence I developed pure-data augmentation procedures that everyone can use with SAS (and thus pioneer new kinds of errors). You can stabilize the SAS/Stata results using data priors with fractional A (say 0.5) if that's your only goal -- my article from AJE (2000) is on that topic, more or less, and about a very common problem in epi.

I'm perfectly happy with the idea that new ideas will start in R and other developer-friendly packages and gradually work their way to Stata, SAS, etc. I'm hoping that bayesglm (or something better) will eventually be implemented in SAS, so that the SAS die-hards will use it also! But I agree that it's a useful contributions to develop work-arounds that work in existing packages.

Meanwhile, Sanjay wrote:

Thank you for your note and the papers that lucidly lay out your perspective. I agree with you that the choice of priors should be driven by context. In the index example, the "context" is the investigator's estimate of what they felt to be a clinically important difference, i.e., 25% relative risk difference (so-called 'delta'). This should ideally be based on prior evidence and/or clinical/biological plausibility. Unfortunately, this number is often driven by trial feasibility issues (inverse square relation between 'delta' and sample size). So, one could justify constructing a prior based on the investigator's expectation of a 25% benefit by centering the distribution on a RR of 0.75 (equivalent to a relative risk reduction of 25%) and a very small likelihood (say 2.5% probability) of RR>1.0 and RR <0.75. Thus the 95% CI of this prior distribution (characterized as ENTHUSIASTIC) would be 0.56 to 1.0 RR or a mean ln (RR) of -0.288 and sd ln (RR) of 0.147. The other choice of prior would be based on a mean RR of 1.0 (c/w null effect) and a very small probability of benefit <0.75 RR. Thus the 95% CI of this prior distribution (characterized as SKEPTICAL) would be 0.75 to 1.33 RR or a mean ln (RR) of 0 and sd ln (RR) of 0.147. Thus, one can construct posteriors from a range of priors that span the spectrum of beliefs in a "contextualized" manner. One can construct different priors based on different estimates of clinically important differences which will vary from physician to physician, patient to patient, disease to disease, and outcome to outcome (for example less severe outcomes requiring a larger delta and vice versa), thereby preserving the critical role of "context" in interpretation.

To which Sander replied:

As a matter of application, neither of the priors you mention below are what I would call "covering priors", that is neither of them cover the range of opinions on the matter; instead, they represent different opinions in the range. Those priors may represent what some individuals want to use, given their strong prior opinions (whether enthusiastic or skeptical), but I doubt those priors and their results are what a consensus or policy formulation panel would want. If those two priors represent the extremes of views held by credible parties or stakeholders, one covering prior might have a 95% central interval of roughly 1/2 to sqrt(2), thus encompassing each extreme view but focused between those extremes. The posterior from that prior would be of more interest to me than those from the extremes. And I would not leave out the ordinary frequentist result, which is the limiting result from a normal prior allowed to expand indefinitely, and hence the limit of a normal covering prior as the range of opinions covered expands indefinitely (the normality is not essential, although some shape restriction is needed). Context determines what is adequate covering by the prior.

I pretty much agree with Sander except that I don't know that there is always an "ordinary frequentist result"; at least, such a thing doesn't really exist for logistic regression with separation.

I showed Jamie Galbraith these graphs on changes in income inequality by state:

ineqscatters.png

and he wrote:

I think it's consistent with my two-part take on the U.S. income distribution. The very rich are geographically very concentrated, not just in the rich states but in the rich counties within the rich states, and their income-share experience is governed by the rise of the stock market since the early 1970s.

The income experience of the poor in poor states is governed mainly by federal social welfare policies -- and especially by the rising real value of the Social Security benefit, since 1972. Assuming SS is counted as income, that could take care of it, though food stamps may also play a role, and in recent years the EITC has also been very significant for the bottom decile. There may also have been a significant rise in the minimum wage in the 1960s under Johnson; in real terms the peak year would have been 1970.

In richer states, the bottom 10 percent may have had some income support from the state, which would mean that the expansion of SS and other federal benefits in the early 70s were a smaller share of their income. Also the federal minimum wage is much less important in rich states than in poor ones.I have not done any quantitative work on this, but New York and other rich states did have "Home Relief" before the Feds came in. And other things, such as public housing. I'd look first at the differential effect of the minimum wage in rich and poor states. I read somewhere that the latest increase had zero effect on New York, but will reach about a million women here in Texas.

This is interesting. I'd noticed the patterns of increasing income among the poor in the poor states and the rich in the rich states, but I really had no thought of where this could've been coming from.

Jamie also pointed me to this paper by himself and Travis Hale on developments in income inequality in the past forty years:

In this note we report on the evolution of between-sector wage inequality in the United States from 1969 – 2006. Our calculations take advantage of new NAICS sectoral classification, merging these with the earlier SIC scheme to achieve a single unified series. We compare this measure to the standard CPS-based Gini coefficient of household income inequality, showing that the evolution of the two series is very close. We show that between-sector variations dominate between-state variations in determining the evolution of inequality. The high importance of between-sector variations in driving overall U.S. pay inequality raises important questions about the standard invocation of education and training as a remedy for inequality, since the choice of specialization has become a speculative decision, whose income prospects depend heavily on the ebb and flow of sectoral economic fortunes.

This seems really important. I'm curious how it relates to the trends in income inequality in other countries, compared to the U.S.

Bill Harris writes,

I stumbled across this project today and thought it might be related to a comment I posted last summer here.

WinEdt tip

Masanao writes,

In WinEdt, if you just highlight the lines you want to indent and press tab it automatically indents the block for you. If the space is set to 4 as it is by default, you can go to options->preference->tab and set the number to 2.

Hey, thanks!

Bob Shapiro pointed me to this source of data. From the official announcement:

Social Explorer, in association with the Association of Religious Data Archives, releases maps and reports at the county level that provide counts of adherents and congregations of most denominations in the United States for 1980, 1990 and 2000, including Catholics, many Protestant denominations, both evangelical and mainline, Mormons, Muslims and Jews, etc. Based on the Religious Congregations and Membership Study this is the most complete census available on religious congregations and their members. . . . What to know where the Baha'i or the Church of Jesus Christ of Latter Day Saints (Mormons) are concentrated by state or county Social Explorer can tell you. There are well over 100 denominations reported for each decade.

Exemplary statistical graphics

Chris "last author" Zorn points us to this, commenting: "Too many pie charts, but aside from that..."

Michael Braun writes,

My particular problem involves a Dirichlet process mixture, but I think that the issue I describe also applies to finite mixtures where label switching can be a problem.

I am trying to construct a reasonably "close to optimal" adaptive scaling algorithm for a random walk Metropolis update, within a Gibbs sampler, for a model that includes a Dirichlet process mixture.

Some philosophy for ya

Jason Anastasopoulos writes,

I thought your blog readers might be interested in a philosophy professor at NYU whose work I discovered a few weeks ago. He's one of the few that writes about the philosophy of probability and specifically on Bayesian theory. Here's a link to his research page at NYU.

This looks interesting. I have a few points of disagreement or at least comment; maybe when I have time in March, I'll write more on this. Until then, if you read the above papers and have questions about the foundations of probability, I'd suggest taking a look at chapter 1 of Bayesian Data Analysis, where we consider these issues from our own perspective, using examples from record linkage and football betting.

P.S. Jason adds:

There's some really fascinating work going on in philosophy and history of science in probability as well . If you haven't already, you should check out Ian Hacking, a philosopher of science that has written extensively on probability theory. I would recommend "The Taming of Chance" and
The Emergence of Probability: A Philosophical Study of Early Ideas about Probability, Induction and Statistical Inference.

In a comment to an entry linking to my paper on splitting a predictor at the upper quarter or third and the lower quarter or third, MV links to this article by Frank Harrell on problems caused by categorizing continuous variables:

1. Loss of power and loss of precision of estimated means, odds, hazards, etc.

2. Categorization assumes that the relationship between the predictor and the response is flat within intervals; this assumption is far less reasonable than a linearity assumption in most cases

. . .

12. A better approach that maximizes power and that only assumes a smooth relationship is to use a restricted cubic spline . . .

My reply:

I agree that it is typically more statistically efficient to use continuous predictors. But, if you are discretizing, our paper shows why it can be much more efficient to use three groups (thus, comparing "high" vs. "low", excluding "middle"), rather than simply dichotomizing into high/low.

As discussed in the paper, we specify the cutpoints based on the proportion of data in each category of the predictor, x. We're not estimating the cutpoints based on the outcome, y. (This handles points 7, 8, 9, and 10 of the Harrell article.)

We're not assuming that the regression function is flat within intervals or discontinuous between intervals. We're just making direct summaries and comparisons. That's actually the point of our paper, that there are settings where these direct comparisons can be more easily interpretable.

Just to be clear: I'm not recommending that discrete parameters be used for articles in the New England Journal of Medicine or whatever, in an area where regression is a well understood technique. I completely agree with Harrell that it's generally better to keep variables as continuous rather than try to get cute with discretization. On the other hand, when you have your results, it can be helpful to explain them with direct comparisons. The point of our paper is that, if you're going to do such direct comparisons, it's typically efficient to do upper and lower third or quarter, rather than upper and lower half.

Friend Sense

The Human Social Dynamics group at Yahoo! Research (including Duncan Watts, Sharad Goel, and others; Juli and David and I are working with them too) has just launched a Facebook application that measures how much people know about their friends' attitudes and beliefs. It is surprisingly fun to play--even a little addictive--especially if your friends are playing as well. Check it out here.

sharad.gif

Robin Hanson suggested here an experimental design in which patients, instead of randomly assigned to particular treatments, are randomly given restrictions (so that each patient would have only n-1 options to consider, with the one option removed at random). I asked some experts about this design and got the following responses.

Eric Bradlow wrote:

I think "exclusion", more generally, in Marketing has been done in the following ways:

[1] A fractional design -- each person only sees a subset of the choices, items, or attributes of a product (intentionally) on the part of the experimenter. Of course, this is commonly done to reduce complexity of the task while trading off the ability to estimate a full set of interactions. The challenge here, and I wrote a paper about this in JMR in 2006, is that people infer the values of the missing attributes and do not, despite instructions, ignore them. Don Rubin actually wrote an invited discussion on my piece. So, random exclusion on the part of the experimenter is done all of the time.

[2] A second way exclusion is sometimes done is prior to the choice or consumption task, you let the respondent remove "unacceptable" alternatives. There was a paper by Seenu Srinivasan of Stanford on this. In this manner, the respondent eliminates "dominated/would never choose alternatives". This is again done for the purposes of reducing task complexity.

[3] A third set of studies I have seen, and Eric Johnson can comment on the psychology of this much more than I can, is something that Dan Ariely (now of Duke formerly of MIT and colleagues have done), which seems closest to this post. In these sets of studies, alternatives are presented and then "start to shrink and/or vanish". What is interesting is that these alternatives that he does this to are not the preferred ones and it has a dramatic effect on people's preferences. I always found these studies fascinating.

[4] A fourth set of related work, of which Eric Johnson has great fame, is a "mouse-lab" like experiment where you allow people to search alternatives until they want to stop. This then becomes a sequential search problem; however, people exclude alternatives when they want to
stop.

So, Andy, I agree with your posting that:

(a) Marketing researchers have done some of this.

(b) Depending on who is doing the excluding, one will have to model this as a two-step process, where the first step is a self-selection (observational study like likelihood piece, if one is going to be model-based).

The aforementioned Eric Johnson then wrote:

I think there are at least two important thoughts here:

(1) random inclusion for learning... Decision-making has changed the way we think about preferences: They are discovered (or constructed) not 'read' from a table (thus Eric B.'s point 3).

A related point is that a random option can discover a preferences (gee, I never thought I liked ceviche....) so there may be value in adding random options to the respondent,,, The late Hillel Einhorn wrote about 'making mistakes to learn.'

(2) "New Wave' choice modeling often consists of generating the experimental design on the fly: Adaptive conjoint. By definition, these models use the results from one choice to eliminate a bunch of possible options and focus on those that have the most information. Olivier Toubia at Columbia Marketing is a master of this.

To elaborate on Eric B.'s points:

Consumer Behavior research shows that elimination is a major part of choice for consumers, probably determining much of the variance in what is chosen. Make choice easier, learning harder.

There is an interesting tradeoff for both the individual and larger publics here: You try a option you are likely not to like (treatment which may well not work). If you are surprised, then you (or subsequent patients) benefit for a long time. Since this is an intertemporal choice, people may
not experiment enough.

Finally, Dan "Decision Science News" Goldstein added:

I've never seen a firm implement such a design in practice, neither when I worked in industry, nor when I judged "marketing effectiveness" competitions.

My own thoughts are, first, that there are a lot of interesting ideas in experimental design beyond the theory in the textbooks. It would be worth thinking systematically about this (someday). Second, I want to echo Eric Johnson's comment about preferences being constructed, not "read off a table" from some idealized utility function. Utility theory is beautiful but it distresses me that people think it fits reality in an even approximate way.

Random term limits

We had this discussion today about how most congressmembers are in safe seats, where voters can only make a difference in the primary elections for the rare open seats. One solution to the problem is term limits, but Bob Erikson pointed out that then there's a lame-duck problem, with congressmembers who are about to be term-limited no longer being moderated by the voters.

So here's my solution: each year, select some congressmembers to be term limited. You can adjust the probability to match the turnover rate you want; for example if you do 40 a year, you'll cycle through all of them every 10 years or so, on average. Then schedule a special election (or else do the lottery in February or so, to give candidates time to run in the primary).

I'm sure there are lots of reasons why this is a bad idea, but I kind of like it. When some really great congressmember gets term limited out, he or she can perhaps consider some appointive office or contribute to government in some other way.

P.S. Ted Dunning suggests a different solution (see comments below): move the district lines randomly after every election. I like this, since it seems cleaner to implement from a constitutional perspective. Also has the advantage of allowing the districts to equalize population more frequently, and, beyond this, it puts less pressure on each redistricting to be super-balanced, since the lines would be redrawn every two years.

Why do I co-author papers?

Tyler Cowen asks, "Why are there so many co-authored papers?" I do it because I think adding coathors makes my work better. That's why I've put coauthors on all my books and most of my articles. More than once, I've written an article or most of a book and added a coauthor, just because I think I'll like the result more if someone else's input is included.

One advantage of working in statistics, political science, and public health is that the norm in these fields is for the first author to indicate the person who actually writes the paper. So if I add a coauthor, I don't have to worry that people will think he or she did all the work. Conversely, if I suggest an idea to a student, who then does the work and writes it up (under my guidance), he or she can and should be first author.

The field of economics is a bit more difficult because there it seems the norm is alphabetical order. Sometimes I've unfairly benefited from this, sometimes I've lost out.

Beamer / Powerpoint

In response to my presentation linked to here, Ken writes, "Andrew is using the LaTeX and the beamer package to produce the presentations. Much better than Powerpoint, especially for equations and can directly include .eps from R. A good alternative to beamer is powerdot."

I agree that beamer is great--I've been using it ever since Jouni told me about it, a few years ago--but it's awkward for a presentation with lots of images, since first I have to convert each graph to .pdf, then I have to spend lots of time in trial-and-error moving and resizing the figures using the numbers inside the \includegraphics call to get the pictures in the right place. When I have a lot of pictures, I actually use Powerpoint! See here, for example. (This particular presentation actually looks cooler in "real life": when I converted to pdf for convenient downloading, the software added a white border--which I hate, but I can't get rid of--to each slide.)

Here.

Bill DuMouchel wrote:

I recently came across your paper, "A default prior distribution for logistic and other regression models," where you suggest the student-t as a prior for the coefficients. My application involves drug safety data and very many predictors (hundreds or thousands of drugs might be associated with an adverse event in a database). Rather than a very weakly informative prior, I would prefer to select the t-distribution scale parameter (call it tau) to shrink the coefficients toward 0 (or toward some other value in a fancier model) as much as can be justified by the data. So I want to fit a simple hierarchical model where tau is estimated. Is there an easy modification of your algorithm to adjust tau at every iteration and to ensure convergence to the MLE of tau (or maximum posterior estimate if we add a prior for tau)? And do you know of any arguments for why regularization by cross-validation would really be any better than fitting tau by a hierarchical model, especially if the goal is parameter interpretation rather than pure prediction?

I replied:

We also have a hierarchical version that does what you want, except that the distribution for the coeffs is normal rather than t. (I couldn't figure out how to get the EM working for a hierarchical t model. The point is that the EM for the t model uses the formulation of a t as a mixture of normals, i.e., it's essentially already a hierarchical normal.)

We're still debugging the hierarchical version, hope to have something publicly available (as an R package) soon.

Regarding your qu about cross-validation, yes, I think a hierarchical model would be better The point of the cross-validation in our paper was to evaluate priors for unvarying parameters which would not be modeled hierarchically.

Bill then wrote:

I did have my heart set on a hierarchical model for t rather than normal, because I wanted to avoid over shrinking very large coefficients while still "tuning" the prior scale parameter to the data empirically. (Although my worry about over shrinking might be less urgent if I use prior information to create "batches" that can have their own centers of shrinkage, as in your in-progress hierarchical bayesglm program.)

Lee Edlefsen and I [Bill D.] are working on a drug adverse events dataset with about 3 million rows and three thousand predictors, using logistic regression and some extensions of LR, and with thousands of different response events to fit. Plus the potential non repeatability of MCMC results would be a real turnoff for the FDA regulators and pharma industry researchers.

An EM question

I have a question for Chuanhai or Xiao-Li or someone like that: is it possible to do EM with two levels of latent variables in the model? In the usual formulation, there are data y, latent parameters z, and hyperparameters theta, and EM gives you the maximum likelihood (or posterior mode) estimate of theta, conditional on y and averaging over z. This can commonly be done fairly easily because z commonly has (or can be approximated with) a simple distribution given y and theta. This scenario describes regression with fixed Student-t priors, or regression with normal priors with unknown mean and variance.

But what about regression with t priors with unknown center and scale? There are now two levels of latent variables. Can an EM, or approximate EM, be constructed here? As Bill and I discussed in our emails, Gibbs is great, and it's much easier to set up and program than EM, but it's harder to debug. There's something nice about a deterministic algorithm, especially if it's built with bells and whistles that go off when something goes wrong.

Simple methods are great, and "simple" doesn't always mean "stupid" . . .

Here's the mini-talk I gave a couple days ago at our statistical consulting symposium. It's cool stuff: statistical methods that are informed by theory but can be applied simply and automatically to get more insights into models and more stable estimates. All the methods described in the talk derived from my own recent applied research.

For more on the methods, see the full-length articles:

Scaling regression inputs by dividing by two standard deviations

A default prior distribution for logistic and other regression models

Splitting a predictor at the upper quarter or third and the lower quarter or third

A message for the graduate students out there

Research is fun. Just about any problem has subtleties when you study it in depth (God is in every leaf of every tree), and it's so satisfying to abstract a generalizable method out of a solution to a particular problem.

P.S. On the other hand, many of Tukey's famed quick-and-dirty statistical methods don't seem so great to me anymore. They were quick in the age of pencil-and-paper computation, and sometimes dirty in the sense of having unclear or contradictory theoretical foundations. (In particular, his stem-and-leaf plots and his methods for finding gaps and clusters in multiple comparisons seem particularly silly from the perspective of the modern era, however clever and useful they may have been at the time he proposed them.)

P.P.S. Don't get me wrong, Tukey was great, I'm not trying to shoot him down. I wrote the above P.S. just to remind myself of the limitations of simple methods, that even the great Tukey tripped up at times.

Notation for crossover designs

David Afshartous writes,

Robin Hanson writes,

To make sense of social complexity we would ideally want to add lots of randomization to people's real choices, and then collect lots of data on what happens to them. But this seems a lot to ask of people. For example, people who eat at a restaurant might be willing to tell you how they felt later after eating there, but they'd be reluctant to eat a random item from the menu even one percent of the time.

Would people be more willing to have a few of their options randomly excluded? For example, would people mind much if on a menu of one hundred items one of the items was randomly excluded each time - "sorry we are out of that today"? Data about choices under such reduced menus would still have a key randomization component.

This idea occurred to me while talking to a cancer doctor who thought he could get thousands of cancer patients to agree to release data on their progress, but who would be more reluctant to accept a random treatment. Once standard drugs have failed, there are about twenty alternative drugs a patient could try, which they usually pick based on the side effects etc. Patients probably wouldn't mind much having one of these options taken off the menu.

My thoughts:

I think I'd eat a random item 1% of the time as part of an experiment--after all, 1% of the time would correspond to three lunches per year.

To get to your main proposal: I think if you exclude one item, you'll get a study that is a mix of experiment and observational study, which could probably be analyzed in a way more robustly than purely observational data could be analyzed, but requiring more information than the analysis of a pure experiment.

This sounds like something that marketing researchers might have studied too.

P.S. See here for much more from the marketing researchers.

Here's Kaiser Fung's presentation at our consulting mini-symposium. It was interesting to hear about the challenges of in-house consulting at Sirius Satellite Radio.

Our statistical consulting mini-symposium yesterday was great. I wish we'd been able to video it. There was lively discussion of the connections between statistical consulting and research, and the different aspects of consulting in academic, corporate, and legal environments.

I'll be posting everyone's slides. Here's David Rindskopf's contribution:

Rindskopf’s Rules for Statistical Consulting

Some of these rules are universal, while others apply only in particular situations: Informal academic consulting, formal academic consulting, or professional consulting. Hopefully the context will be apparent for each rule.

Communication with the Client:

(1) In the beginning, mostly (i) listen and (ii) ask questions that guide the discussion.

(2) Your biggest task is to get the client to discuss the research aims clearly; next is design, then measurement, and finally statistical analysis.

(3) Don’t give recommendations until you know what the problem is. Premature evaluation of a consulting situation is a nasty disease with unpleasant consequences.

(4) Don’t believe the client about what the problem is. Example: If the client starts by asking “How do I do a Hotelling’s T?” (or any other procedure), never believe (without strong evidence) that he/she really needs to do a Hotelling’s T.

Exception: If a person stops you in the hall and says “Have you got a minute?” and asks how to do Hotelling’s T, tell them and hope they’ll go away quickly and not be able to find you later. (I’ve had this happen, and if I ask enough questions I inevitably find that it’s the wrong test, answers the wrong question, and is for the wrong type of data.)

Adapting to the Client and His/Her Field

(5) Assess the client’s level of knowledge of measurement, research design, and statistics, and talk at an appropriate level. Make adjustments as you gain more information about your client.

(6) Sometimes the “best” or “right” statistical procedure isn’t really the best for a particular situation. The client may not be able to do a complicated analysis, or understand and write up the results correctly. Journals may reject papers with newer methods (I know it’s hard to believe, but it happens in many substantive journals). In these cases you have to be prepared to do more “traditional” analyses, or use methods that closely approximate the “right” ones. (Turning lemons into lemonade: Use this as an opportunity to write a tutorial for the best journal in their field. The next study can then use this method.) A similar perspective is represented in the report of the APA Task Force on Statistical Significance; see their report: Wilkinson, L., & APA Task Force on Statistical Inference. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594-604.

Professionalism (and self-protection)

(7) If you MUST do the right (complicated) analysis, be prepared to do it, write a few tutorial paragraphs on it for the journal (and the client), and write up the results section.

(8) Your goal is to solve your client’s problems, not to criticize. You can gently note issues that might prevent you from giving as complete a solution as desired. Corollary: Your purpose is NOT to show how brilliant you are; keep your ego in check.

Time Estimation, Charging for Your Time, etc.

(9) If a person stops you in the hall and asks if you have a minute, make him/her stand on one leg while asking the question and listening to your answer. If they ask for five minutes, it’s really a half-hour they need (or more).

(10) Corollary: Don’t charge by the job unless you really know what you’re doing or are really desperate. Not only do people (including you) underestimate how long it will take, but (a la Parkinson’s Law) the job will expand to include everything that comes into the client’s mind as the job progresses. If you think you know enough, write down all of the tasks, estimate how much time each will take, and double it. Also let the client know that if they make changes they’ll pay extra (Examples: “Whoops, I left out some data; can you redo the analyses?”, or “Let’s try a crosstab by astrological sign, and favorite lotto number, and...”)

(11) Charge annoying people a higher hourly rate. If you don’t want to work for them at all, charge them twice your usual rate to discourage them from hiring you (at least if they do hire you, you’ll be rewarded well.)

Sanjay Kaul writes,

I would like to hear about your perspective on specification of prior for a Bayesian analysis of a clinical trial. The trial was designed to have 80% power to detect a 25% relative difference in outcomes (presumably reflecting the investigator’s estimate of a “clinically important” difference - MCID).

The results suggested an 18% treatment benefit for the new agent (95% CI 2 to 32%), P = 0.028.

One can estimate that the results MIGHT be compatible with a clinically important treatment effect (25% difference lies within the 95% CI). However, one can explicitly calculate the probability of a 25% effect based on the Bayesian analysis.

1. What prior is appropriate in this setting? Non-informative; skeptical; enthusiastic

If skeptical, do we center the distribution on null difference with 2.5% probability of the treatment effect exceeding MCID (risk ratio <0.75)?

If enthusiastic, do we center the distribution on 25% difference with 2.5% probability of the treatment effect exceeding null difference (risk ratio >1.0)?

2. Do we base the prior on power estimation variables, i.e., the 80% of “area under the curve” contain the interval of 25% difference (RR of 1 to 0.75) with 10% distributed on either side of the tail (using the symmetry assumptions).

My reply: It doesn't seem right to base the prior distribution on the power calculation. The standards for power calculations are much more relaxed than for Bayesian priors: for the power calculation, you make a reasonable assumption and compute the power conditional on that assumption. But the analysis itself isn't supposed to rely on the assumed model. Once you have the data, the power calculation is irrelevant.

I suppose the best prior distribution would be based on a multilevel model (whether implicit or explicit) based on other, similar experiments. A noninformative prior could be ok but I prefer something weakly informative to avoid your inferences being unduly affected by extremely unrealistic possibilities in the tail of the distribuiton.

I'm curious what Sander Greenland would say here.

P.S. Sanjay Kaul adds:

I [Kaul] should have clarified that my previous example is a REAL CLINICAL TRIAL designed and analyzed with frequentist methods and I am now attempting to analyze the data using Bayesian methods.

For the uniform prior, I use a log odds mean of zero and sd of 2. What values do you recommend for a "weakly informative" prior, especially when there is no previous information available?

My reply: I'd set it up as a logistic regression and then use a Cauchy prior distribution centered at 0 with scale 2.5, as described here.

Once a midwesterner . . .

Lee Sigelman refers to himself as a "midwesterner" even though he lives in D.C. This reminds me that we want to redo our geography-and-voting analyses looking at the states where people come from (rather than where people currently live). A bunch of surveys ask this, I think.

More generally, I assume that some sociologists have looked at the question of how people define themselves by region. I know there's been lots of research on people's racial, ethnic, and national self-definition. I remember that, about 15 years ago, Michael Hout gave a talk in our seminar: "How 4 million Irish immigrants became 40 million Irish Americans." Contrary to expectations, it wasn't about prolific breeding, it was about how people of mixed background choose to classify themselves. (Maybe things are different now, in the era of Caublanasians.)

See here.

Mini-Symposium: Statistical Consulting

When: January 24, 2008, from 3pm to 5pm

Where: Applied Statistics Playroom*

Sponsored by the New York City chapter of the American Statistical Association and the Columbia University Statistics Department, ISERP, and Applied Statistics Center.

Agenda

* Before 3pm: Casual conversation. This is a good time to meet new people or catch up with others.

* 3pm to 5pm:

o Brief lecture by Andrew Gelman: Some Recent Progress in Simple Statistical Methods.

o Panel discussion on statistical consulting with Naihua Duan (New York State Psychiatric Institute), Mimi Kim (Albert Einstein College of Medicine), Eva Petkova (New York University), Andrew Gelman (Columbia University), Kaiser Fung (Sirius Satellite Radio), and David Rindskopf (CUNY Graduate Center).

o The panel members will speak briefly, discuss questions, and facilitate a general discussion about statistical consulting.

* After 5pm: End of the formal part of the symposium. People can continue a group discussion, leave, or break into smaller groups.

Topics to be discussed include:

* Providing statistical solutions within the range of understandability;
* Handling the trade-offs between doing the analyses yourself and teaching others to perform all or parts of the analyses themselves;
* Managing expectations and building long-term relationships;
* Deciding how much to cater to the norm within disciplines;
* Balancing the goals of co-authorship in conjunction with money-making.

* The Applied Statistics Playroom is 707 International Affairs Building, Columbia University, at 118 St. and Amsterdam Ave., near the 116 St. #1 train. Snacks will be provided.

P.S. See here, here, and here for slides of some of the presentations.

Does jittering suck?

Antony Unwin saw this scatterplot (see here for background):

views.png

and had some comments and suggestions. I'll show his plots below, but first I want to talk about jittering. I wonder if the main problem my original graph above is that it is too small.

In any case, the jittering makes it looks weird, but I wonder whether it would be better if it were jittered a bit more, so that the clusters of points blurred into each other completely. (Since the data are integers, we could just jitter by adding random U(-.5,.5) numbers to each point in the x and y directions.)

But Antony says:

Jittering makes strong assumptions, which are rarely mentioned. It is bad for small cell sizes (you can get odd patterns unless you specifically adjust your jittering to account for cell size and how many do that?) and is bad for large cell sizes (because of overplotting and because you get solid blocks which can hardly be distinguished from one another). In fairness I should declare myself as an anti-jittering fundamentalist and say that there are hardly any circumstances when I think jittering is useful. Jittering is a legacy from the days when you could only plot points. Area plots should always be the first choice.

Maybe he's right. A gray-scale plot using image() might be a better way to go in a situation like this one with many hundreds of data points.

As we've discussed before, the Republican party gets more support from the rich than from the poor, especially in poor states. (In poor states such as Mississipi, rich people are much more Republican than poor people; in rich states such as Connecticut, rich people are only slightly more Republican than the poor.)

Rich voter, poor voter

The next step is to look at time trends. Here we use the National Election Studies pooled into 20-year intervals. First, the difference between rich and poor voters in rich, middle-income, and poor states. As you can see, the gap in voting between rich and poor voters has increased, but especially in the poor states:

gapseries.png

I don't know exactly how this is related, but in the past 25 years, income inequality has actually been increasing faster in the rich states than the poor states.

Rich state, poor state

Next we look at things from the other direction, comparing the voting patterns of rich and poor states, but looking separately at rich, middle-income, and poor voters. As you can see, within each income category, there didn't used to be any large systematic differences in voting patterns in rich and poor states until recently. Even now, the rich-state, poor-state difference shows up mostly among high-income voters, somewhat among middle-income voters, and not at all among the poor:

gapseries2.png

Thus, the familiar "red America, blue America" pattern, the "culture war" between red and blue states, is really something happening at the higher range of incomes.

P.S.: whites-only analysis

In response to some of the commenters below, I did an analysis with just whites (88% of the total dataset). Removing the minorities reduces the differences by about half. Here's the new version of our first picture:

gapseries_white.png

And here's the second picture:

gapseries2_white.png

Among whites, the red-state, blue-state divide is still strongest among the rich but it's no longer zero for the poor.

Stephen Dubner and Steven Levitt wrote this Freakanomics column, which concludes, "if there is any law more powerful than the ones constructed in a place like Washington, it is the law of unintended consequences." What I'm wondering is, what sort of law is this? Obviously it's not a real "law" like the law of gravity or even one of those social-science laws like Gresham's law or the statement that democracies usually don't fight each other. But it's supposed to be more than just a joke in the manner of Murphy's law, right?

I've remarked previously that unintended consequences often were actually intended but Dubner and Levitt's examples seem actually unintended. So these seem like real examples, but I don't know what it takes for this to be a "law." Surely there must be dozens of other examples of intended consequences that actually happened? Or unintended consequences which, although unfortunate, were minor compared to the intended consequences? The Freakanomics article was interesting; now I want to hear a statement of the law itself...

P.S. Interesting comments below. Also, Alex Tabarrok has further elaboration:

The law of unintended consequences is what happens when a simple system tries to regulate a complex system. The political system is simple, it operates with limited information (rational ignorance), short time horizons, low feedback, and poor and misaligned incentives. Society in contrast is a complex, evolving, high-feedback, incentive-driven system. When a simple system tries to regulate a complex system you often get unintended consequences.

Schools' report cards anger NYC parents

Kenny passed on this link, which is related to a project that Jennifer and I are involved in, on comparing New York City public schools:

Thanks to heavy parent involvement and high test scores, Public School 321 in Park Slope, a yuppie neighborhood in Brooklyn, is considered a gem of New York City's public school system. In the eyes of New York's Department of Education, however, P.S. 321 deserved just a B in the city's first-ever school report cards, which are based largely on how students score on standardized tests. Such accountability efforts — widespread since the advent of the federal No Child Left Behind Act — have raised the hackles of parents and educators across the country. . . .

James Liebman, chief accountability officer for New York City schools, devised the grading system for the city's 1.1 million-pupil school system. Liebman said standardized tests are a good measure of whether students have learned what they should know. "If children can't read and they can't do math, then the educational system and their school have failed them," he said. . . . Liebman pointed to a Quinnipiac University poll in which voters said the grades were fair by a margin of 61 to 27 percent. "It's a system to provide information to parents to make their own judgments," he said.

I've talked with Jim about the school evaluations but I don't know exactly how they finally decided to do it. One of the challenges in doing this sort of rating is that the evidence seems to show that teachers, rather than schools, have the biggest effects on test scores. To first approximation, the effect of the school seems to be pretty much the average of its teacher effects.

Regarding criticisms of the evaluations: one way the evaluations themselves can be evaluated is to apply them retroactively and see how well they predict future performance, to estimate the answer to the following question: if you were to send your kid to a highly-graded or poorly-graded school, how much different would you expect his or her test scores to be in a year, or two years, or whatever.

Beyond this, I think one of the motivations for getting these evaluations out there is to put some pressure on the schools. I have to say that I think our own teaching at the university level would be improved if our students had to take standardized tests after each of our courses and we were confronted with evidence on how much (or how little) they learned.

I had various course titles floating around: my course at Columbia this spring is officially called Applied Statistics, and I had promised people that it would cover Bayesian statistics. At Harvard they asked me to teach Statistical Computing, but I wanted to focus on applied Bayesian methods. So I'm putting it all together in the title given above.

If you're interested in taking the class, let me know if you have any questions or just show up to the first few lectures; it's Wed Fri 9:00-10:30 at Columbia (if you're in New York), or Mon 11:30-2:30 (if you're in Boston).

Motivation:

Statistical computing is to statistics as statistics is to science: necessary but a distraction from the main event. I hate computing, yet I do it all the time. For those of us in this position, it makes sense to spend a bit more time thinking harder about how to compute efficiently. Learning statistical computation is an investment in becoming a more effective practitioner and researcher.

Overview:

We will cover topics in Bayesian computation, statistical graphics, and software validation, as well as special topics that interest the class.

There will be some homework (writing programs and making graphs in R) and a final project to be done in pairs.

The (tentative) syllabus is below.

The speed-dating data

Somebody writes,

I am looking for interesting, unusual datasets for a data analysis class I am teaching, and I heard by email from Ray Fisman that you have a sanitized version of the data from his speed dating experiment.

Indeed, the data are here; we use them in a homework assignment in our book. The data were collected by Ray Fisman and Sheena Iyengar, an economist and a psychologist at the business school here, and they summarized their findings in this paper:

We study dating behavior using data from a Speed Dating experiment where we generate random matching of subjects and create random variation in the number of potential partners. Our design allows us to directly observe individual decisions rather than just final matches. Womenvput greater weight on the intelligence and the race of partner, while men respond more to physical attractiveness. Moreover, men do not value women’'s intelligence or ambition when it exceeds their own. Also, we find that women exhibit a preference for men who grew up in affl­uent neighborhoods. Finally, male selectivity is invariant to group size, while female selectivity is strongly increasing in group size.

What I really want to do with these data is what I suggested to Ray and Sheena several years ago when they first told me about the study: a multilevel model that allows preferences to vary by person, not just by sex. Multilevel modeling would definitely be useful here, since you have something like 10 binary observations and 6 parameters to estimate for each person.

I'm hoping that some pair of students analyzes these data as a project in my class this spring. I suspect that we could learn some interesting things. Also, once the model has been fitted successfully once, Ray, Sheena, and others would be able to fit it to other similar datasets easily enough.

Finally, let me thank Ray and Sheena again for making their data available to all.

There's been a lot of talk about the recent New Hampshire primaries. Now it's time to hear from the experts, in particular, Michael Herron, Walter Mebane, and Jonathan Wand, the political scientists who, among other things, did the definitive estimate of the Florida vote from 2000. Their punchline: "with respect to Hillary Clinton’s surprise victory in the Democratic Primary and the notable differences across vote tabulation technologies in Clinton’s and others’ levels of support, our results are consistent with these differences being due entirely to the fact that New Hampshire wards that use Accuvote optical scan machines typically have voters with different political preferences than wards that use hand counted paper ballots."

Here's their paper, and here's the executive summary:

We [Herron, Mebane, and Wand] address concerns that the reported vote counts of candidates running in the 2008 New Hampshire Presidential Primaries were affected by the vote tabulating technologies used across New Hampshire.

• In the Democratic Primary, Hillary Clinton was more successful in New Hampshire wards that used Accuvote optical scan vote tabulating technology than was Barack Obama, receiving 4.3 more percentage points of the vote there (40.2% for Clinton versus 35.9% for Obama). In contrast, Clinton did worse than Obama in wards that counted paper ballots by hand, trailing by 6.1 percentage points (33.7% versus 39.8%).

• In the Republican Primary, Mitt Romney trailed John McCain by 3.6 points in Accuvote wards and by 15 points in wards that counted ballots by hand.

• In New Hampshire the choice of vote tabulation technology is made ward by ward, and electronic technology was used in wards that typically differ demographically and politically from wards that count ballots by hand. Wards that selected electronic tabulation are disproportionately from the southeast part of New Hampshire, and they tend to be more densely populated and more affluent. Accuvote and hand count wards have also typically produced divergent voting patterns in elections prior to the 2008 primary. It is plausible that most or all of the observed differences between vote tabulation technologies in the votes candidates received reflect such background differences and not anything inherent in the tabulation methods.

• Using a subset of New Hampshire wards that have similar demographic features and voting histories but differ in their vote tabulation technologies, we find no significant relationship between a ward’s use of vote tabulating technology and the votes or vote shares received by most of the leading candidates who competed in the 2008 New Hampshire Presidential Primaries. Among Clinton, Edwards, Kucinich, Obama and Richardson in the Democratic primary and Giuliani, Huckabee, Paul, Romney and McCain in the Republican primary, we observe a significant difference only in the votes counted for Edwards, and that difference is small (a deficit of between 0.6 and 3.4 percent in the hand-counted votes).

• With respect to Hillary Clinton’s surprise victory in the Democratic Primary and the differences
across vote tabulation technologies in Clinton’s and others’ votes, our results are consistent with these differences being due entirely to the fact that New Hampshire wards that use Accuvote optical scan machines have voters with different political preferences than wards that use hand counted paper ballots.

The Irrelevance of "Probability"?

Seth forwarded me this article [link fixed, I hope] from Nassim Taleb:

Version numbers

As regular readers know, I'm a big fan of Doug Bates's lmer() function, which comes with his lme package in R. Anyway, I had an exchange with another lmer() user which included the following funny bit:

Using the version from CRAN (lme 0.99875-9) everything works; however using lme4_0.999375-0 it does not! I was running lme4_0.999375-0...

I'm thinking that maybe it's time to go to version 1.0 already!

Huh?

I don't know what this is about, but on the other hand I'll link to irrelevant things if they seem irrelevant enough. . . Tim Penn writes,

I [Penn] thought I would just alert you to a little project I’m working on using the blog. I have some photos I took twenty years ago of Russian rock icon Viktor Tsoy. Scott Page told me the other day he was just chatting with someone about Tsoy’s band Kino, so I make no assumptions now about how well known they are outside Russia by well-informed people like yourself.

Anyway, I am trying to spread the word and draw people who like Kino to the blog over the next few weeks as I put more of the pictures up. But also I’m trying to tap into networks which are more knowledgeable about that old underground culture than I am.

There is a taster up there at the moment, which 3quarksdaily linked to the other day. If you are Led Zep fan, there is sweetener in there too. Hope you’ll find it interesting and will share it with any Russian friends or Russophiles around Columbia. I think students should have an obligation to know about him, but then I’m biased. I have enough material to create a single-track flash video of the clandestine concert stills, and will do that once I feel able.

Today I put up another non-scientific stab at non-linearity. It starts with surfing and ends with Monty Python. There’s a fantastic scandal behind one of the hyperlinks at the point where I’m standing in the Desert Inn in Las Vegas. Yes I knew that man.

Naturally, I've heard of neither Viktor Tsoy nor Scott Page. (Yes, I'm sure I could search for them on the web but that would be cheating.)

Selection bias in measuring polarization

A lot of people are concerned with political polarization--the idea that people are becoming divided into opposing camps that don't communicate with each other. (Not everyone worries about polarization--some people see it as healthy political competition, others worry about opposite problems such as political apathy and non-participation, and you even used to hear people say that there wasn't a dime's worth of difference between the two parties.)

Anyway, polarization can be measured in various ways: one approach is to ask people who they talk with, and find out the extent to which people mostly associate with people similar to them. Another method is to look at people's positions on the issues and see if most people have extreme positions. Regarding this latter approach, Jeremy Freese points out a potential source of measurement bias:

Occasionally social scientists become interested in whether Americans are becoming “more polarized” in their opinions. The obvious strategy for considering this question is to take a bunch of survey items that have been asked of comparable samples in the past and now, and to look at whether people hold more divergent views now than they did then. . . . [But] people buying survey time are typically interested in questions that vary. If they are asking a question that doesn’t vary, it’s for some reason, like perhaps because it has been asked repeatedly in the past. . . . So, items that would provide evidence of polarization — consensus then, divergence now — are disproportionately less likely to be part of the universe of available items for comparison over time, while items that provide evidence of no polarization — divergence then, consensus now — are disproportionately more likely. And thus researchers claim to producing findings about the world of public opinion when the patterns in their data actually reflect the world of public opinion surveys.

It's an interesting issue--selection bias of questions rather than the usual worries about survey respondents--and it's something that Delia and I thought about some when using National Election Studies to analyze trends in issue polarization. These issues are real, although I don't know that it's such a problem as you say, because in any case the inferences will be conditional on whatever questions you happen to be studying--so, in any case, the researcher has to justify which issues he or she is looking at.

It happens all the time

Here I just want to point out that these measurement issues are not unique to the study of polarization. For example, is the Supreme Court drifting to the left, the right, or roughly staying the same? These things can be measured, but with difficulty because it depends on the docket for each year.

Or you could even ask simpler questions about median voters. For example, when I wrote why it it can be rational to vote (because you can feel that having your preferred candidate win would likely make a big difference to many millions of people), some people replied that it's somewhat naive to feel that _your_ preferred candidate will be so great: if approximately half the people preferred Bush and half preferred Kerry, then what makes you so sure that your views are more valid than the other 50% of the population? One difficulty with that argument is that the answer depends on the reference set. For example, suppose you live in Texas. If you voted for Kerry, who are you to say that your judgment is better than the 61% who supported Bush? On the other hand, if you voted for Bush, who are you to say that your judgment is better than the (presumably) vast majority of people around the world who hate the guy? What it means to be in the "center" depends on your reference set. I'm sure there are many other examples of this sort of selection bias in measurements.

Back to polarization

The way that Delia and I actually measured polarization was through correlations between issue attitudes. The idea is that, if the population is becoming more polarized, this should show up as increasing coherence in issue positions, so that if I know where you stand on abortion, I'm more likely to be able to figure out where you stand on social security (for example). You can see our results here: they seem consistent with Fiorina's theory that voters are sorting into parties more than they are polarizing on the issues.

One other amusing thing (well, it's amusing if you're a statistician, maybe)

Polarization is a property of a population, not of individuals. It doesn't mean anything (usually) to say that "I am polarized" but you can talk about a group of people being polarized into different subgroups, or polarized along some dimensions. The polarization of a group cannot be expressed as a sum or average of polarizations of individuals. It's an interesting example, because many (most?) of the things we measure in this way tend to be individual properties that we simply aggregate up (for example, the percentage of people who support candidate X, or the average age of people in a group, or whatever). In statistical terms, polarization is a property of the distribution, not of the random variable.

Math Awareness Month

Statistics postdoc at Michigan

I hate to advertise the competition, but this looks like it could be interesting:

The most common county names in America

David and I were looking at comparisons of county-level election results (cool scatterplots to come) and, just for laughs, we made a ranking of the most common county names in America. Take a guess as to what they are. . . . The top 15 are listed below:

Lingzhou Michael Xue writes in with two questions:

A sighting of the unicorn

Richard Barker sent in this photograph and the following note:

barker.png

Matt just pointed me to your article: You can load a die but you can't bias a coin. You might be interested in the attached, a photo of a bent NZ 50c coin that I had pressed in the Physics lab here a few years ago because I got bored using flat coins in classroom demonstrations where everyone knows what Pr(heads) is. Fortunatley that particular style of coin is no longer legal tender so I am unlikely to be prosecuted for defacing her Majesty's coinage.

In discussing this with Matt this afternoon we conjured up a counter example where the coin is completely pressed into a sphere. Then it has Pr(heads) = 1. If the pressing is not quite complete it will be a little less than one, so we claim the statement in the title of your article is not true. We think you can bias a coin.

At about 300 flips it looks as though Pr(Heads) is about 0.55.

When I first bent the coin I did some experiments letting the coin land on the ground. On soft carpet it was not obvioulsly biased but it was on a hard surface. On hard surfaces, most of the time it bounces up and starts spinning on its edge. When this happens it then always lands heads up.

Yeah, sure, he's right. We were thinking of weighting a coin, but if you bend it enough, then it is no longer set to land "heads" for half of its rotation. And bouncing, sure, then anything can happen. We were always assuming you catch it in the air!

Finally, we were addressing the concept of the "biased coin," which, by analogy to the "loaded die," looks just like a regular die but actually has probabilities other than 50/50 when caught in the air. In that sense, the bent coin is not a full counterexample since it clearly looks funny.

Cosma Shalizi (of the CMU statistics dept) and I had an exchange about the role of measure theory in the statistics Ph.D. program. I have to admit I'm not quite sure what "measure theory" is but I think it's some sort of theoretical version of calculus of real variables. I had commented that we're never sure what to do with our qualifying exam, and Cosma wrote,

I think we have a pretty good measure-theoretic probability course, and I wish more of our students went on to take the non-required sequel on stochastic processes (because that's the one I usually teach). I do think it's important for statisticians to understand that material, but I also think it's actually easier for us to teach someone how a martingale works than it is to teach them to be interested in scientific questions and to not get a freaked out, "but what do I calculate?" response when confronted with an open research problem. Here it's been suggested that we replace our qualifying exams with having the student prepare a written review of some reasonably-live topic from the literature and take an oral exam on it, which would be more work for us but come a lot closer to testing what the students actually need to know.

I replied,

I agree that it's hard to teach how to think like a scientist, or whatever. But I don't think of the alternatives as "measure theory vs. how-to-think-like-a-scientist" or even "measure theory vs. statistics". I think of it as "measure theory vs. economics" or "measure theory vs. CS" or "measure theory vs. poli sci" or whatever. That is, sure, all other things being equal, it's better to know measure theory (or so I assume, not ever having really learned it myself, which didn't stop me from proving 2 published theorems, one of which is actually true). But, all other things being equal, it's better to know economics (by this, I mean economics, not necessarily econometrics), and all other things being equal, it's better to know how to program. Etc. I don't see why measure theory gets to be the one non-statistical topic that gets privileged as being so requrired that you get kicked out of the program if you can't do it.

Cosma then shot back with:

I also don't think of the alternatives as "measure theory vs. how-to-think-like-a-scientist" or even "measure theory vs. statistics". My feeling --- I haven't, sadly, done a proper experiment! --- is that it's easier to, say, take someone whose math background is shaky and teach them how a generating-class argument works in probability than it is to take someone who is very good at doing math homework problems and teach them the skills and attitudes of independent research.

You say, "I think of it as "measure theory vs. economics" or "measure theory vs. CS" or "measure theory vs. poli sci" or whatever." I'm more ambitious; I want our students to learn measure-theoretic probability, and scientific programming, and whatever substantive field they need for doing their research, and, of course, statistical theory and methods and data analysis. Because I honestly think that if someone is going to engage in building stochastic models for parts of the world, they really ought to understand how probability _works_, and that is why measure theory is important, rather than for its own sake. (I admit to some background bias towards the probabilist's view of the world.) At the same time it seems to me a shame (to use no stronger word) if someone, in this day and age, gets a ph.d. in statistics and doesn't know how to program beyond patching together scripts in R.

P.S. I think measure theory should be part of the Ph.D. statistics curriculum but I don't think it should be a required part of the curriculum. Not unless other important topics such as experimental design, sample surveys, statistical computing and graphics, stochastic modeling, etc etc are required also. It's sad to think of someone getting a Ph.D. in statistics and not knowing how to work with mixed discrete/continuous variables (see Nicolas's comment below) but it seems equally sad to see Ph.D.'s who don't know what Anova is, who don't know the basic principles of experimental design (for example, that it's more effective to double the effect size than to double the sample size), who don't know how to analyze a cluster sample, and so forth.

Unfortunately, not all students can do everything, and any program only gets some finite number of applicants. If you restrict your pool to those who want to do (or can put up with) measure theory, you might very well lose some who could be excellent statistical researchers. It would be sort of like not admitting Shaq to your basketball program because he can't shoot free throws.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48  

Recent Comments

  • Marcel: At the same time it seems to me a read more
  • Jason Connor: I went through the 1st (required) part of that sequence read more
  • John Johnson: Well, now we get back to what does a Ph.D. read more
  • Andrew: David, I'm not sure if you're responding to me or read more
  • David Whitaker: Bad: asserting that a subject is unimportant when, by your read more
  • Daniel Lakeland: Nicholas: I disagree that you "need" measure theory for your read more
  • Hadley: A PhD program that I've always thought would be interesting read more
  • Blaise: Suppose a fly is found to be on one wall read more
  • MDM: One of the major gaps, I believe, in my math-stat read more
  • dcase: This discussion is similar to the debate between theory and read more
  • Andrew: Nicolas, Giovanni, Good points. See my "P.S." comments added above. read more
  • Giovanni: Being a CS graduate that is trying to find his read more
  • Nicolas Chopin: Interestingly, we had a debate recently at the ENSAE (Paris) read more
  • Andrew: John, I agree with your theory about learning the prerequisite. read more
  • Hadley: I've taken (and passed) a years worth of measure theory read more

Find recent content on the main index or look in the archives to find all content.