Results matching “R”

Here's the announcement:

Update on Mankiw's work incentives

Tyler Cowen links to a blog by Greg Mankiw with further details on his argument that his anticipated 90% marginal tax rate will reduce his work level.

Having already given my thoughts on Mankiw's column, I merely have a few things to add/emphasize.

Greg Mankiw writes (link from Tyler Cowen):

Without any taxes, accepting that editor's assignment would have yielded my children an extra $10,000. With taxes, it yields only $1,000. In effect, once the entire tax system is taken into account, my family's marginal tax rate is about 90 percent. Is it any wonder that I [Mankiw] turn down most of the money-making opportunities I am offered?

By contrast, without the tax increases advocated by the Obama administration, the numbers would look quite different. I would face a lower income tax rate, a lower Medicare tax rate, and no deduction phaseout or estate tax. Taking that writing assignment would yield my kids about $2,000. I would have twice the incentive to keep working.

First, the good news

Obama's tax rates are much lower than Mankiw had anticipated! According to the above quote, his marginal tax rate is currently 80% but threatens to rise to 90%.

But, in October 2008, Mankiw calculated that Obama's would tax his marginal dollar at 93%. What we're saying, then, is that Mankiw's marginal tax rate is currently thirteen percentage points lower than he'd anticipated two years ago. In fact, Mankiw's stated current marginal tax rate of 80% is three points lower than the tax rate he expected to pay under a McCain administration! And if the proposed new tax laws are introduced, Mankiw's marginal tax rate of 90% is still three percentage points lower than he'd anticipated, back during the 2008 election campaign. I assume that, for whatever reason, Obama did not follow through on all his tax-raising promises.

To frame the numbers more dramatically: According to Mankiw's calculations, he is currently keeping almost three times the proportion of his income that he was expecting to keep under the Obama administration (and 18% more than he was expecting to keep under a hypothetical McCain administration). If the new tax plans are put into effect, Mankiw will still keep 43% more of his money than he was expecting to keep, only two years ago. (For those following along at home, the calculations are (1-0.80)/(1-0.93)=2.9, (1-0.80)/(1-0.83)=1.18, and (1-0.90)/(1-0.93)=1.43.)

Given that Mankiw currently gets to keep 20% of his money--rather than the measly 7% he was anticipating--it's no surprise that he's still working!

Now, the bad news

I don't think Mankiw has fully thought this through.

Steven Levitt writes:.

How to think about Lou Dobbs

I was unsurprised to read that Lou Dobbs, the former CNN host who crusaded against illegal immigrants, had actually hired a bunch of them himself to maintain his large house and his horse farm. (OK, I have to admit I was surprised by the part about the horse farm.)

But I think most of the reactions to this story missed the point. Isabel Macdonald's article that broke the story was entitled, "Lou Dobbs, American Hypocrite," and most of the discussion went from there, with some commenters piling on Dobbs and others defending him by saying that Dobbs hired his laborers through contractors and may not have known they were in the country illegally.

To me, though, the key issue is slightly different. And Macdonald's story is relevant whether or not Dobbs knew he was hiring illegals. My point is not that Dobbs is a bad guy, or a hypocrite, or whatever. My point is that, in his setting, it would take an extraordinary effort to not hire illegal immigrants to take care of his house and his horses.

That's the point. Here's Lou Dobbs--a man who has the money, the inclination, and every incentive to not hire illegals--and he hires them anyway. It doesn't matter to me whether he knew about it or not, whether he hired contractors in a wink-and-nod arrangement to preserve his plausible deniability, or whether he was genuinely innocent of what was going on. Either way, he did it--even though he, more than most people, had every incentive not to.

For Lou Dobbs, as for so many other American individuals and corporations, going without illegal immigrants is like trying to live a zero-emissions lifestyle: it might sound like a good idea but it's too much work to actually do!

This does not mean that Dobbs's goal of reducing illegal immigration is a bad idea--but it does suggest that his attacks on illegal immigrants and their U.S. employers are simplistic, at best.

Martin Lindquist writes that he and others are trying to start a new ASA section on statistics in imaging. If you're interested in being a signatory to its formation, please send him an email.

Bayes jumps the shark

John Goldin sends in this, from an interview with Alan Dershowitz:

Cameron McKenzie writes:

I ran into the attached paper [by Dave Marcotte and Sara Markowitz] on the social benefits of prescription of psychotropic drugs, relating a drop in crime rate to an increase in psychiatric drug prescriptions. It's not my area (which is psychophysics) but I do find this kind of thing interesting. Either people know much more than I think they do, or they are pretending to, and either is interesting. My feeling is that it doesn't pass the sniff test, but I wondered if you might (i) find the paper interesting and/or (ii) perhaps be interested in commenting on it on the blog. It seems to me that if we cumulated all econometric studies of crime rate we would be able to explain well over 100% of the variation therein, but perhaps my skepticism is unwarranted.

My reply:

I know what you mean. The story seems plausible but the statistical analysis seems like a stretch. I appreciate that the authors included scatterplots of their data, but the patterns they find are weak enough that it's hard to feel much confidence in their claim that "about 12 percent of the recent crime drop was due to expanded mental health treatment." The article reports that the percentage of people with mental illness getting treatment increased by 13 percentage points (from 20% to 33%) during the period under study. For this to have caused a 12 percent reduction in crime, you'd have to assume that nearly all the medicated people stopped committing crimes. (Or you'd have to assume that the potential criminals were more likely to be getting treated.) But maybe the exact numbers don't matter. The 1960s/1970s are over, and nowadays there is little controversy about the idea of using drugs and mental illness treatments as a method of social control. And putting criminals on Thorazine or whatever seems a lot more civilized than throwing them in prison. For example, if you put Tony Hayward or your local strangler on mind-numbing drugs and have them do community service with some sort of electronic tag to keep them out of trouble, they'd be making a much more useful contribution to society than if they're making license plates and spending their days working out in the prison yard.

P.S. It looks like I was confused on this myself. See Kevin Denny's comment below.

New Sentences For The Testing Of Typewriters (from John Lennon):

Fetching killjoy Mavis Wax was probed on the quay.

"Yo, never mix Zoloft with Quik," gabs Doc Jasper.

One zany quaff is vodka mixed with grape juice and blood.

Zitty Vicki smugly quipped in her journal, "Fay waxes her butt."

Hot Wendy gave me quasi-Kreutzfeld-Jacob pox.

Jack's pervy moxie quashed Bob's new Liszt fugue.

I backed Zevy's qualms over Janet's wig of phlox.

Tipsy Bangkok panjandrums fix elections with quivering zeal.

Mexican juntas, viewed in fog, piqued Zachary, killed Rob.

Jaywalking Zulu chieftains vex probate judge Marcy Quinn.

Twenty-six Excedrin helped give Jocko quite a firm buzz.

Racy pics of bed hijinx with glam queen sunk Val.

Why Paxil? Jim's Bodega stocked no quince-flavor Pez.

Wavy-haired quints of El Paz mock Jorge by fax.

Two phony quacks of God bi-exorcize evil mojo.

After noticing these remarks on expensive textbooks and this comment on the company that bribes professors to use their books, Preston McAfee pointed me to this update (complete with a picture of some guy who keeps threatening to sue him but never gets around to it).

The story McAfee tells is sad but also hilarious. Especially the part about "smuck." It all looks like one more symptom of the imploding market for books. Prices for intro stat and econ books go up and up (even mediocre textbooks routinely cost $150), and the publishers put more and more effort into promotion.

McAfee adds:

I [McAfee] hope a publisher sues me about posting the articles I wrote. Even a takedown notice would be fun. I would be pretty happy to start posting about that, especially when some of them are charging $30 per article.

Ted Bergstrom and I used state Freedom of Information acts to extract the journal price deals at state university libraries. We have about 35 of them so far. Like textbooks, journals have gone totally out of control. Mostly I'm focused on journal prices rather than textbooks, although of course I contributed a free text. People report liking it and a few schools, including Harvard and NYU, used it, but it fizzled in the marketplace. I put it in flatworld.org to see if things like testbanks make a difference; their model is free online, cheap ($35) printed. The beauty of free online is it limits the sort of price increases your book experienced.

Here is a link to the FOIA work

which also has some discussion of the failed attempts to block us.

By the way, I had a spoof published in "Studies in Economic Analysis", a student-run journal that was purchased by Emerald Press. Emerald charges about $35 for reprints. I wrote them a take-down notice since SEA didn't bother with copyright forms so I still owned the copyright. They took it down but are not returning any money they collected on my article, pleading a lack of records. These guys are the schmucks of all schmucks.

Displaying a fitted multilevel model

Elissa Brown writes:

I'm working on some data using a multinomial model (3 categories for the response & 2 predictors-1 continuous and 1 binary), and I've been looking and looking for some sort of nice graphical way to show my model at work. Something like a predicted probabilities plot. I know you can do this for the levels of Y with just one covariate, but is this still a valid way to describe the multinomial model (just doing a pred plot for each covariate)? What's the deal, is there really no way to graphically represent a successful multinomial model? Also, is it unreasonable to break down your model into a binary response just to get some ROC curves? This seems like cheating. From what I've found so far, it seems that people just avoid graphical support when discussing their fitted multinomial models.

My reply:

It's hard for me to think about this sort of thing in the abstract with no context. We do have one example in chapter 6 of ARM where we display data and fitted model together in a plot--it's from our storable votes project--but maybe it's not quite general enough for your problem. I'm sure, though, that there is a good solution, and likely it's a solution that's worth programming and writing up in a journal article. I certainly agree that it's a bad idea to break up your response into binary just to use some convenient binary-data tools. If you must dichotomize your data, please throw out the middle third or half.

After I spoke tonight at the NYC R meetup, John Myles White and Drew Conway told me about this competition they're administering for developing a recommendation system for R packages. They seem to have already done some work laying out the network of R packages--which packages refer to which others, and so forth.

I just hope they set up their system so that my own packages ("R2WinBUGS", "r2jags", "arm", and "mi") get recommended automatically. I really hate to think that there are people out there running regressions in R and not using display() and coefplot() to look at the output.

P.S. Ajay Shah asks what I mean by that last sentence. My quick answer is that it's good to be able to visualize the coefficients and the uncertainty about them. The default options of print(), summary(), and plot() in R don't do that:

- print() doesn't give enough information
- summary() gives everything to a zillion decimal places and gives useless things like p-values
- plot() gives a bunch of residual and diagnostic plots but no graphs of the fitted model and data.

I like display() because it gives the useful information that's in summary() but without the crap. I like coefplot() too, but it still needs a bit of work to be generally useful. And I'd also like to have a new function that automatically plots the data and fitted lines.

Partly in response to my blog on the Harlem Children's Zone study, Mark Palko wrote this:

Talk of education reform always makes me [Palko] deeply nervous. Part of the anxiety comes having spent a number of years behind the podium and having seen the disparity between the claims and the reality of previous reforms. The rest comes from being a statistician and knowing what things like convergence can do to data.

Convergent behavior violates the assumption of independent observations used in most simple analyses, but educational studies commonly, perhaps even routinely ignore the complex ways that social norming can cause the nesting of student performance data.

In other words, educational research is often based of the idea that teenagers do not respond to peer pressure. . . .

and this:

My lecture for Greg's class today (taken from chapters 5-6 of ARM).

Also, after class we talked a bit more about formal modeling. If I have time I'll post some of that discussion here.

There are never 70 distinct parameters

Sam Seaver writes:

I'm a graduate student in computational biology, and I'm relatively new to advanced statistics, and am trying to teach myself how best to approach a problem I have.

My dataset is a small sparse matrix of 150 cases and 70 predictors, it is sparse as in many zeros, not many 'NA's. Each case is a nutrient that is fed into an in silico organism, and its response is whether or not it stimulates growth, and each predictor is one of 70 different pathways that the nutrient may or may not belong to. Because all of the nutrients do not belong to all of the pathways, there are thus many zeros in my matrix. My goal is to be able to use the pathways themselves to predict whether or not a nutrient could stimulate growth, thus I wanted to compute regression coefficients for each pathway, with which I could apply to other nutrients for other species.

There are quite a few singularities in the dataset (summary(glm) reports that 14 coefficients are not defined because of singularities), and I know the pathways (and some nutrients) I can remove because they are almost empty, but I would rather not because these pathways may apply to other species. So I was wondering if there are complementary and/or alternative methods to logistic regression that would give me a coefficient of a kind for each pathway?

My reply:

If you have this kind of sparsity, I think you'll need to add some prior information or structure to your model. Our paper on bayesglm suggests a reasonable default prior, but it sounds to me that you'll have to go further.

To put it another way: give up the idea that you're estimating 70 distinct parameters. Instead, think of these coefficients as linked to each other in a complex web.

More generally, I don't think it ever makes sense to think of a problem with a lot of loose parameters. Hierarchical structure is key. One of our major research problems now is to set up general models for structured parameters, going beyond simple exchangeability.

Sociotropic Voting and the Media

Stephen Ansolabehere, Marc Meredith, and Erik Snowberg write:

The literature on economic voting notes that voters' subjective evaluations of the overall state of the economy are correlated with vote choice, whereas personal economic experiences are not. Missing from this literature is a description of how voters acquire information about the general state of the economy, and how that information is used to form perceptions. In order to begin understanding this process, we [Ansolabehere, Meredith, and Snowberg] asked a series of questions on the 2006 ANES Pilot about respondents' perceptions of the average price of gas and the unemployment rate in their home state.

We find that questions about gas prices and unemployment show differences in the sources of information about these two economic variables. Information about unemployment rates come from media sources, and are systematically biased by partisan factors. Information about gas prices, in contrast, comes only from everyday experiences.

This is no surprise, perhaps, but, still, I think this sort of work is important. The connection between economic conditions and political outcomes is hugely important, and increasingly recognized as such. But a key piece is the connection between perceptions, political ideology, and economic reality.

Someone who works in statistics in the pharmaceutical industry (but prefers to remain anonymous) sent me this update to our discussion on the differences between approvals of drugs and medical devices:

The 'substantial equivalence' threshold is a very outdated. Basically the FDA has to follow federal law and the law is antiquated and leads to two extraordinarily different paths for device approval.

You could have a very simple but first-in-kind device with an easy to understand physiological mechanism of action (e.g. the FDA approved a simple tiny stent that would relieve pressure from a glaucoma patient's eye this summer). This device would require a standard (likely controlled) trial at the one-sided 0.025 level. Even after the trial it would likely go to a panel where outside experts (e.g.practicing & academic MDs and statisticians) hear evidence from the company and FDA and vote on its safety and efficacy. FDA would then rule, consider the panel's vote, on whether to approve this device.

On the other hand you could have a very complex device with uncertain physiological mechanism declared equivalent to a device approved before May 28, 1976 and it requires much less evidence. And you can have a device declared similar to a device that was similar to a device that was similar to a device on the market before 1976. So basically if there was one type I error in this chain, you now have a device that's equivalent to a non-efficacious device. For these no trial is required, no panel meeting is required. The regulatory burden is tens of millions of dollars less expensive and we also have substantially less scientific evidence.

But the complexity of the device has nothing to do with which path gets taken. Only it's similarity to a device that existed before 1976.

This was in the WSJ just this morning.

You can imagine there was nothing quite like the "NanoKnife" on the market in 1976. But it's obviously very worth a company's effort to get their new device declare substantially equivalent to an old one. Otherwise they have to spend the money for a trial and risk losing that trial. Why do research when you can just market!?

So this unfortunately isn't a scientific question -- we know what good science would lead us to do. It's a legal question and the scientists at FDA are merely following U.S. law which is fundamentally flawed and leads to two very different paths and scientific hurdles for device approval.

David Rohde writes:

Racism!

Last night I spoke at the Columbia Club of New York, along with some of my political science colleagues, in a panel about politics, the economy, and the forthcoming election. The discussion was fine . . . until one guy in the audience accused us of bias based on what he imputed as our ethnicity. One of the panelists replied by asking the questioner what of all the things we had said was biased, and the questioner couldn't actually supply any examples.

It makes sense that the questioner couldn't come up with a single example of bias on our part, considering that we were actually presenting facts.

At some level, the questioner's imputation of our ethnicity and accusation of bias isn't so horrible. When talking with my friends, I engage in casual ethnic stereotyping all the time--hey, it's a free country!--and one can certainly make the statistical argument that you can guess people's ethnicities from their names, appearance, and speech patterns, and in turn you can infer a lot about people's political attitudes from their occupations, ethnicities, and so on. Still, I think it was a pretty rude comment and pretty pointless. How was he expecting us to respond? Maybe he thought we'd break down under the pressure and admit that we were all being programmed by our KGB handlers??

Then, later on, someone asked a truly racist question--a rant, really--that clearly had a close relation to his personal experiences even while having essentially zero connection to the real world as we understand it statistically.

I've seen the polls and I know that there are a lot of racists out there, of all stripes. Still, I don't encounter this sort of thing much in my everyday life, and it was a bit upsetting to see it in the flesh. Blog commenters come to life, as it were. (Not this blog, though!)

P.S. Yes, I realize that women and minorities have to deal with this all the time. This was the first time in my professional life that I've been accused of bias based on my (imputed) ethnicity, but I'm sure that if you're a member of a traditionally-disparaged group, it happens all over. So I'm not complaining, exactly, but it still upsets me a bit.

"Who owns Congress"

Curt Yeske pointed me to this. Wow--these graphs are really hard to read!

The old me would've said that each of these graphs would be better replaced by a dotplot (or, better still, a series of lineplots showing time trends).

The new me would still like the dotplots and lineplots, but I'd say it's fine to have the eye-grabbing but hard-to-read graphs as is, and then to have the more informative statistical graphics underneath, as it were. The idea is, you'd click on the pretty but hard-to-read "infovis" graphs, and this would then reveal informative "full Cleveland" graphs. And then if you click again you'd get a spreadsheet with the raw numbers.

That I'd like to see, as a new model for graphical presentation.

U-Haul statistics

Very freakonomic (and I mean that in the best sense of the word).

Rob Kass writes:

Statistics has moved beyond the frequentist-Bayesian controversies of the past. Where does this leave our ability to interpret results? I [Kass] suggest that a philosophy compatible with statistical practice, labeled here statistical pragmatism, serves as a foundation for inference. Statistical pragmatism is inclusive and emphasizes the assumptions that connect statistical models with observed data. I argue that introductory courses often mis-characterize the process of statistical inference and I propose an alternative "big picture" depiction.

In my comments, I pretty much agree with everything Rob says, with a few points of elaboration:

Kass describes probability theory as anchored upon physical randomization (coin flips, die rolls and the like) but being useful more generally as a mathematical model. I completely agree but would also add another anchoring point: calibration. Calibration of probability assessments is an objective, not subjective process, although some subjectivity (or scientific judgment) is necessarily involved in the choice of events used in the calibration. In that way, Bayesian probability calibration is closely connected to frequentist probability statements, in that both are conditional on "reference sets" of comparable events . . .

In a modern Bayesian approach, confidence intervals and hypothesis testing are both important but are not isomorphic; they represent two different steps of inference. Confidence statements, or posterior intervals, are summaries of inference about parameters conditional on an assumed model. Hypothesis testing--or, more generally, model checking--is the process of comparing observed data to replications under the model if it were true. . . .

Kass discusses the role of sampling as a model for understanding statistical inference. But sampling is more than a metaphor; it is crucial in many aspects of statistics. . . .

The only two statements in Kass's article that I clearly disagree with are the following two claims: "the only solid foundation for Bayesianism is subjective," and "the most fundamental belief of any scientist is that the theoretical and real worlds are aligned." , , , Claims of the subjectivity of Bayesian inference have been much debated, and I am under no illusion that I can resolve them here. But I will repeat my point made at the outset of this discussion that Bayesian probability, like frequentist probability, is except in the simplest of examples a model-based activity that is mathematically anchored by physical randomization at one end and calibration to a reference set at the other. , , , a person who is really worried about subjective model-building might profitably spend more effort thinking about assumptions inherent in additive models, logistic regressions, proportional hazards models, and the like. Even the Wilcoxon test is based on assumptions . . .

Like Kass, I believe that philosophical debates can be a good thing, if they motivate us to think carefully about our unexamined assumptions. Perhaps even the existence of subfields that rarely communicate with each other has been a source of progress in allowing different strands of research to be developed in a pluralistic environment, in a way that might not have been so easily done if statistical communication had been dominated by any single intolerant group. . . .

He doesn't trust the fit . . . r=.999

I received the following question from an education researcher:

Sam Jessup writes:

I am writing to ask you to recommend papers, books--anything that comes to mind that might give a prospective statistician some sense of what the future holds for statistics (and statisticians). I have a liberal arts background with an emphasis in mathematics. It seems like this is an exciting time to be a statistician, but that's just from the outside looking in. I'm curious about your perspective on the future of the discipline.

Any recommendations? My favorite is still the book, "Statistics: A Guide to the Unknown," first edition. (I actually have a chapter in the latest (fourth) edition, but I think the first edition (from 1972, I believe) is still the best.

Sanjay Kaul wrotes:

By statute ("the least burdensome" pathway), the approval standard for devices by the US FDA is lower than for drugs. Before a new drug can be marketed, the sponsor must show "substantial evidence of effectiveness" as based on two or more well-controlled clinical studies (which literally means 2 trials, each with a p value of <0.05, or 1 large trial with a robust p value <0.00125). In contrast, the sponsor of a new device, especially those that are designated as high-risk (Class III) device, need only demonstrate "substantial equivalence" to an FDA-approved device via the 510(k) exemption or a "reasonable assurance of safety and effectiveness", evaluated through a pre-market approval and typically based on a single study.

What does "reasonable assurance" or "substantial equivalence" imply to you as a Bayesian? These are obviously qualitative constructs, but if one were to quantify them, how would you go about addressing it?

A question for psychometricians

Don Coffin writes:

A colleague of mine and I are doing a presentation for new faculty on a number of topics related to teaching. Our charge is to identify interesting issues and to find research-based information for them about how to approach things. So, what I wondered is, do you know of any published research dealing with the sort of issues about structuring a course and final exam in the ways you talk about in this blog post? Some poking around in the usual places hasn't turned anything up yet.

I don't really know the psychometrics literature but I imagine that some good stuff has been written on principles of test design. There are probably some good papers from back in the 1920s. Can anyone supply some references?

In the context of a discussion of Democratic party strategies, Matthew Yglesias writes:

Given where things stood in January 2009, large House losses were essentially inevitable. The Democratic majority elected in 2008 was totally unsustainable and was doomed by basic regression to the mean.

I'd like to push back on this, if for no other reason than that I didn't foresee all this back in January 2009.

Regression to the mean is a fine idea, but what's the "mean" that you're regressing to? Here's a graph I made a couple years ago, showing the time series of Democratic vote share in congressional and presidential elections:

adv.png

Take a look at the House vote in 2006 and 2008. Is this a blip, just begging to be slammed down in 2010 by a regression to the mean? Or does it represent a return to form, back to the 55% level of support that the Democrats had for most of the previous fifty years? It's not so obvious what to think--at least, not simply from looking at the graph.

What I'm saying is this. As an ear-to-the-ground political pundit, Yglesias might well have a sense of political trends beyond what I have up here in my ivory tower. (I really mean this; I'm not being sarcastic. I don't know much about the actual political process or the politicians who participate in it.) And I can well believe that, in January 2009, Yglesias was already pretty sure that the Democrats were heading for electoral trouble. But, if so, I think it's more than "regression to the mean"; he'd have had to have some additional information giving him a sense of what that mean actually is.

P.S. Yglesias responds:

I [Yglesias] think historically Democrats averaged over 50% of the vote because of weird race dynamics in the South, but nowadays we should expect both parties to average 50% of the vote over the long term.

To which I wrote:

Could be. On the other hand, various pundits have been saying that in future years, the race dynamics of blacks and Latinos will give the Democrats a permanent advantage. And in many ways it seems gravity-defying for the Republicans to be at 50% with such conservative economic policies. In any case, you may be right. I just have to admit it's not something that I saw as of Jan 2009. As a matter of fact, I clearly remember looking at that graph I made in Nov 2008 and trying to decide whether it represented an exciting new trend, an anti-Bush blip, or a reversion to the pre-1994 pattern of 55%/45% voting. At the time, I decided I had no idea.

Yglesias then shot back with:

Well then let me go on record now then as hypothesizing that the long-run 1994- trend will average 50/50.

The winner's curse

If an estimate is statistically significant, it's probably an overestimate of the magnitude of your effect.

P.S. I think youall know what I mean here. But could someone rephrase it in a more pithy manner? I'd like to include it in our statistical lexicon.

Where do our taxes go?

Mark Palko links to a blog by Megan McArdle which reproduces a list entitled, "What You Paid For: 2009 tax receipt for a taxpayer earning $34,140 and paying $5,400 in federal income tax and FICA (selected items)."

McArdle writes, "isn't it possible that the widespread support for programs like Social Security and Medicare rests on the fact that most people don't realize just how big a portion of your paycheck those programs consume?" But, as Palko points out, the FICA and Medicare withholdings are actually already right there on your W-2 form. So the real problem is not a lack of information but that people aren't reading their W-2 forms more carefully. (Also, I don't know if people are so upset about their withholdings for Social Security and Medicare, given that they'll be getting that money back when they retire.)

I'm more concerned about the list itself, though. I think a lot of cognitive-perceptual effects are involved in what gets a separate line item, and what doesn't. For example, I see the FBI but not the CIA, the NSA, or weapons procurement. There's a line for "salary and benefits for members of Congress" but nothing for the courts system or the White House. And so on. So, while I agree with McArdle that "more information is generally better," I'm not quite sure how to get there. I'd be very very suspicious of the choice of items that happens to end up included on the hypothetical itemized tax bill. Especially If it's really true that people don't notice those boxes on their W-2 form with FICA and Medicare payments, I also seem to recall seeing some glossy government documents with charts showing where the money is coming from and where it goes. Maybe there's some place other than a W-2 form to put this information where people will notice it.

Why Development Economics Needs Theory?

Robert Neumann writes:

in the JEP 24(3), page18, Daron Acemoglu states:

Why Development Economics Needs Theory

There is no general agreement on how much we should rely on economic theory in motivating empirical work and whether we should try to formulate and estimate "structural parameters." I (Acemoglu) argue that the answer is largely "yes" because otherwise econometric estimates would lack external validity, in which case they can neither inform us about whether a particular model or theory is a useful approximation to reality, nor would they be useful in providing us guidance on what the effects of similar shocks and policies would be in different circumstances or if implemented in different scales. I therefore define "structural parameters" as those that provide external validity and would thus be useful in testing theories or in policy analysis beyond the specific environment and sample from which they are derived. External validity becomes a particularly challenging task in the presence of general equilibrium and political economy considerations, and a major role of economic theory is in helping us overcome these problems or at the very least alerting us to their importance.

Leaving aside the equilibrium debate, what do you think of his remark that the external validity of estimates refers to an underlying model. Isn't it the other way around?

My reply: This reminds me a lot of Heckman's argument of why randomized experiments are not a gold standard. I see the point but, on the other hand, as Don Green and others have noted, observational studies have external validity problems too! Whether or not a model is motivated by economic theory, you'll have to make assumptions to generalize your inferences beyond the population under study.

When Acemoglu writes, " I therefore define 'structural parameters' as those that provide external validity," I take him to be making the point that Bois, Jiang, and I did in our toxicology article from 1996: When a parameter has a generalizable meaning (in our context, a parameter that is "physiological" rather than merely "phenomenological," you can more usefully incorporate it in a hierarchical model. We used statistical language and Acemoglu is using econometric language but it's the same idea, I think, and a point worth making in as many languages as it takes.

I don't know that I completely agree with Acemoglu about "theory," however. Theory is great--and we had it in abundance in our toxicology analysis--but I'd think you could have generalizable parameters without formal theory, if you're careful enough to define what you're measuring.

Joe Blitzstein and Xiao-Li Meng write:

An e ffectively designed examination process goes far beyond revealing students' knowledge or skills. It also serves as a great teaching and learning tool, incentivizing the students to think more deeply and to connect the dots at a higher level. This extends throughout the entire process: pre-exam preparation, the exam itself, and the post-exam period (the aftermath or, more appropriately, afterstat of the exam). As in the publication process, the first submission is essential but still just one piece in the dialogue.

Viewing the entire exam process as an extended dialogue between students and faculty, we discuss ideas for making this dialogue induce more inspiration than perspiration, and thereby making it a memorable deep-learning triumph rather than a wish-to-forget test-taking trauma. We illustrate such a dialogue through a recently introduced course in the Harvard Statistics Department, Stat 399: Problem Solving in Statistics, and two recent Ph.D. qualifying examination problems (with annotated solutions). The problems are examples of "nano-projects": big picture questions split into bite-sized pieces, fueling contemplation and conversation throughout the entire dialogue.

This is just wonderful and it should be done everwhere, including, I hope, in my own department. I am so tired of arguments about what topics students should learn, long lists of seemingly-important material that appears on a syllabus, is taught in a class, and is never used again, and so forth.

(The exam problems described in the article are a bit on the theoretical side for my taste, but I presume the same ideas would apply to applied statistics as well.)

P.S. I have fond memories of my own Ph.D. qualifying exam, which I took a year before Xiao-Li took his. It was an intense 12-day experience and I learned a huge amount from it.

John Christie sends along this. As someone who owns neither a car nor a mobile phone, it's hard for me to relate to this one, but it's certainly a classic example for teaching causal inference.

Statistics and the end of time

Wayne Folta sends in this. It seems nuts to me (although I was happy to see that no mention was made of this horrible argument of a related sort). But I know nothing about theoretical physics so I suppose it's all possible. I certainly have no sense of confidence in anything I'd say about the topic.

Data visualization marathon

A 24-hour student data visualization competition. The funny thing is, the actual graphics on the webpage are pretty ugly. But maybe they're going for the retro, clip-art cool look.

Decision science vs. social psychology

Dan Goldstein sends along this bit of research, distinguishing terms used in two different subfields of psychology. Dan writes:

Intuitive calls included not listing words that don't occur 3 or more times in both programs. I [Dan] did this because when I looked at the results, those cases tended to be proper names or arbitrary things like header or footer text. It also narrowed down the space of words to inspect, which means I could actually get the thing done in my copious free time.

I think the bar graphs are kinda ugly, maybe there's a better way to do it based on classifying the words according to content? Also the whole exercise would gain a new dimension by comparing several areas instead of just two. Maybe that's coming next.

Sandy Gordon sends along this fun little paper forecasting the 2010 midterm election using expert predictions (the Cook and Rothenberg Political Reports). Gordon's gimmick is that he uses past performance to calibrate the reports' judgments based on "solid," "likely," "leaning," and "toss-up" categories, and then he uses the calibrated versions of the current predictions to make his forecast.

As I wrote a few weeks ago in response to Nate's forecasts, I think the right way to go, if you really want to forecast the election outcome, is to use national information to predict the national swing and then do regional, state, and district-level adjustments using whatever local information is available. I don't see the point of using only the expert forecasts and no other data.

Still, Gordon is bringing new information (his calibrations) to the table, so I wanted to share it with you. Ultimately I like the throw-in-everything approach that Nate uses (although I think Nate's description of his own method could be a bit confusing in that it downplays the national-swing estimate which is so crucial to having it all work) . Maybe Nate can throw Gordon's information in too.

Note to John Barnard: Yes, I know, this is more politics stuff. But the forecasting principles apply more generally, I think.

Somebody I know sent me a link to this news article by Martin Robbins describing a potential scientific breakthrough. I express some skepticism but in a vague enough way that, in the unlikely event that the research claim turns out to be correct, there's no paper trail showing that I was wrong. I have some comments on the graphs--the tables are horrible, no need to even discuss them!--and I'd prefer if the authors of the paper could display their data and model on a single graph. I realize that their results reached a standard level of statistical significance, but it's hard for me to interpret their claims until I see their estimates on some sort of direct real-world scale. In any case, though, I'm sure these researchers are working hard, and I wish them the best of luck in their future efforts to replicate their findings.

I'm sure they'll have no problem replicating, whether or not their claims are actually true. That's the way science works: Once you know what you're looking for, you'll find it!

Correlation, prediction, variation, etc.

Hamdan Azhar writes:

"Genomics" vs. genetics

John Cook and Joseph Delaney point to an article by Yurii Aulchenko et al., who write:

54 loci showing strong statistical evidence for association to human height were described, providing us with potential genomic means of human height prediction. In a population-based study of 5748 people, we find that a 54-loci genomic profile explained 4-6% of the sex- and age-adjusted height variance, and had limited ability to discriminate tall/short people. . . .

In a family-based study of 550 people, with both parents having height measurements, we find that the Galtonian mid-parental prediction method explained 40% of the sex- and age-adjusted height variance, and showed high discriminative accuracy. . . .

The message is that the simple approach of predicting child's height using a regression model given parents' average height performs much better than the method they have based on combining 54 genes.

They also find that, if you start with the prediction based on parents' heights and then throw the genetic profile information into the model, you can do better--but not much better. Parents' height + genetic profile is only very slightly better, as a predictor, than parents' height alone.

I have a few thoughts on this study.

1. The most important point, I think, is that made by Delaney: The predictive power of parents' heights on child height is, presumably, itself mostly genetic in this population. Thus, the correct interpretation of the study is not that genetics doesn't predict height, but that the particular technique described in the paper doesn't work well. Galton's predictor also uses a combination of genes.

2. How exactly did the researchers combine those 54 genes to get their predictor? I looked at their paper but couldn't follow all the details. Here's what they write:

The genomic profile, based on 54 recently identified loci, was computed as the sum of the number of height-increasing alleles carried by a person, similar to Weedon et al. This profile explained 3.8% of the sex- and age-adjusted variation of height in the Rotterdam Study (Figure 2a). We also estimated the upper explanatory limit of the 54-loci allelic profile by defining the profile as a weighted sum of height-increasing alleles, with weights proportional to the effects estimated in our own data using a multivariable model.

Is it possible that a savvier use of this genetic information could give a much better predictor? I have no idea.

3. The 5748 people in the study come from "a prospective cohort study that started in 1990 in Ommoord, a suburb of Rotterdam, among 10 994 men and women aged 55 and over." In this homogeneous population (?), maybe these 54 genes don't discriminate so well. But maybe things would look different if they were studying a more diverse group.

P.S. Usually I like to list all the authors of any articles I cite--but this one has 12 authors. C'mon!

An interesting education and statistics blog

Just in case you didn't notice it on the blogroll.

Tapen Sinha writes:

Living in Mexico, I have been witness to many strange (and beautiful) things. Perhaps the strangest happened during the first outbreak of A(H1N1) in Mexico City. We had our university closed, football (soccer) was played in empty stadiums (or should it be stadia) because the government feared a spread of the virus. The Metro was operating and so were the private/public buses and taxis. Since the university was closed, we took the opportunity to collect data on facemask use in the public transport systems. It was a simple (but potentially deadly!) exercise in first hand statistical data collection that we teach our students (Although I must admit that I did not dare sending my research assistant to collect data - what if she contracted the virus?). I believe it was a unique experiment never to be repeated.

The paper appeared in the journal Health Policy. From the abstract:

At the height of the influenza epidemic in Mexico City in the spring of 2009, the federal government of Mexico recommended that passengers on public transport use facemasks to prevent contagion. The Mexico City government made the use of facemasks mandatory for bus and taxi drivers, but enforcement procedures differed for these two categories. Using an evidence-based approach, we collected data on the use of facemasks over a 2-week period. In the specific context of the Mexico City influenza outbreak, these data showed mask usage rates mimicked the course of the epidemic and gender difference in compliance rates among metro passengers. Moreover, there was not a significant difference in compliance with mandatory and voluntary public health measures where the effect of the mandatory measures was diminished by insufficiently severe penalties,

what is = what "should be" ??

This hidden assumption is a biggie.

A simple semigraphic display

John Tukey wrote about semigraphic displays. I think his most famous effort in that area--the stem-and-leaf plot--is just horrible. But the general idea of viewing tables as graphs is good, and it's been a success at least since the early 1900s, when Ramanujan famously intuited the behavior of the partition number by seeing a table of numbers and implicitly reading it as a graph on the logarithmic scale.

To return to the present, Steve Roth sent me a link to these table/graphs that he made:

James O'Brien writes:

How would you explain, to a "classically-trained" hypothesis-tester, that "It's OK to fit a multilevel model even if some groups have only one observation each"?

I [O'Brien] think I understand the logic and the statistical principles at work in this, but I've having trouble being clear and persuasive. I also feel like I'm contending with some methodological conventional wisdom here.

My reply: I'm so used to this idea that I find it difficult to defend it in some sort of general conceptual way. So let me retreat to a more functional defense, which is that multilevel modeling gives good estimates, especially when the number of observations per group is small.

One way to see this in any particular example in through cross-validation. Another way is to consider the alternatives. If you try really hard you can come up with a "classical hypothesis testing" approach which will do as well as the multilevel model. It would just take a lot of work. I'd rather put that effort into statistical modeling and data visualization instead.

If you are in a situation where someone really doesn't want to do the multilevel model, you could perhaps ask your skeptical colleague what his or her goals are in your particular statistical modeling problem. Then you can go from there.

Doug Hibbs on the fundamentals in 2010

Hibbs, one of the original economy-and-elections guys, writes:

The number of House seats won by the presidentÂ’s party at midterm elections is well explained by three pre-determined or exogenous variables: (1) the number of House seats won by the in-party at the previous on-year election, (2) the vote margin of the in-partyÂ’s candidate at the previous presidential election, and (3) the average growth rate of per capita real disposable personal income during the congressional term. Given the partisan division of House seats following the 2008 on-year election, President ObamaÂ’s margin of victory in 2008, and the weak growth of per capita real income during the Â…rst 6 quarters of the 111th Congress, the DemocratÂ’s chances of holding on to a House majority by winning at least 218 seats at the 2010 midterm election will depend on real income growth in the 3rd quarter of 2010. The data available at this writing indicate the that Democrats will win 211 seats, a loss of 45 from the 2008 on-year result that will put them in the minority for the 112th Congress.

Hibbs clarifies:

Although this essay features some predictions about likely outcomes of the 2010 election for the US House of Representatives, the underlying statistical model is meant to be structural or causal and is not targeted on forecasting accuracy.

The model presented in this essay is designed to explain midterm House election outcomes in terms of systematic predetermined and exogenous factors rather than to deliver optimal predictions. For that reason the model does not include trend terms or polling measurements of the publicÂ’s political sentiments and voting intentions of the sort populating forecasting equations.

I defer to Hibbs entirely on the political economy, but I would like to make one small methodological point. Hibbs writes:

Most statistical models of aggregate House election outcomes focus exclusively on vote shares going to the major parties. . . . But aggregate votes are mainly of academic interest. What really matters politically is the partisan division of seats, and that is the object of attention here.

I think Hibbs is missing the point here. Even if your sole goal is to forecast seats, I think the most efficient way to do this is to forecast national vote trends, and then apply the national swing to each district, correcting for incumbency and uncontestedness where appropriate. See here for further discussion of this point. Or you could go even further and use the fundamentals (for example, local economic conditions and demographic trends) to modify your vote forecast at the regional and state levels.

I mean, sure, it's ok to forecast seats directly. It's simple, clear, and less effort than forecasting votes and then doing the district-by-district work of transmuting vote swings to expected seat swings. But it's nothing to be proud of--it's certainly not better than modeling votes, then seats.

But I don't want to end with that criticism, which is (as noted above) minor. The real point is the connection between the economy and the vote, and on that topic Hibbs has interesting things to say.

John Kastellec points me to this blog by Ezra Klein criticizing the following graph from a recent Republican Party report:

spending.png

Klein (following Alexander Hart) slams the graph for not going all the way to zero on the y-axis, thus making the projected change seem bigger than it really is.

I agree with Klein and Hart that, if you're gonna do a bar chart, you want the bars to go down to 0. On the other hand, a projected change from 19% to 23% is actually pretty big, and I don't see the point of using a graphical display that hides it.

The solution: Ditch the bar graph entirely and replace it by a lineplot, in particular, a time series with year-by-year data. The time series would have several advantages:

1. Data are placed in context. You'd see every year, instead of discrete averages, and you'd get to see the changes in the context of year-to-year variation.

2. With the time series, you can use whatever y-axis works with the data. No need to go to zero.

P.S. I like the double-zeroes on the y-axis. How better to convey precision than to write "17.00%"? I assume this is just the default software option; I don't see that anyone would write "17.00%" on purpose.

P.P.S. If you want to see a really ugly graph from that report, check out this one from page 12:

federalpie.png

P.P.P.S. Feel free to link in comments to ugly graphs made by Democrats. I'm sure there are a lot of those too!

P.P.P.P.S. John found this lineplot of the data (from Kevin Drum) which indeed shows much more context, in no more space, than the yucky bar graphs.

Lowess is great

I came across this old blog entry that was just hilarious--but it's from 2005 so I think most of you haven't seen it.

It's the story of two people named Martin Voracek and Maryanne Fisher who in a published discussion criticized lowess (a justly popular nonlinear regression method).

Curious, I looked up "Martin Voracek" on the web and found an article in the British Medical Journal whose the title promised "trend analysis." I was wondering what statistical methods they used--something more sophisticated than lowess, perhaps?

They did have one figure, and here it is:

vorm2338.f1.gif

Voracek and Fisher, the critics of lowess, are fit straight lines to data to clearly nonlinear data! It's most obvious in their leftmost graph. Voracek and Fisher get full credit for showing scatterplots, but hey . . . they should try lowess next time! What's really funny in the graph are the little dotted lines indicating inferential uncertainty in the regression lines--all under the assumption of linearity, of course. (You can see enlarged versions of their graphs at this link.)

As usual, my own house has some glass-based construction and so it's probably not so wise of me to throw stones, but really! Not knowing about lowess is one thing, but knowing about it, then fitting a straight line to nonlinear data, then criticizing someone else for doing it right--that's a bit much.

Data Thief

John Transue sends along a link to this software for extracting data from graphs. I haven't tried it out but it could be useful to somebody out there?

I sent Deborah Mayo a link to my paper with Cosma Shalizi on the philosophy of statistics, and she sent me the link to this conference which unfortunately already occurred. (It's too bad, because I'd have liked to have been there.) I summarized my philosophy as follows:

I am highly sympathetic to the approach of Lakatos (or of Popper, if you consider Lakatos's "Popper_2" to be a reasonable simulation of the true Popperism), in that (a) I view statistical models as being built within theoretical structures, and (b) I see the checking and refutation of models to be a key part of scientific progress. A big problem I have with mainstream Bayesianism is its "inductivist" view that science can operate completely smoothly with posterior updates: the idea that new data causes us to increase the posterior probability of good models and decrease the posterior probability of bad models. I don't buy that: I see models as ever-changing entities that are flexible and can be patched and expanded, and I also feel strongly (based on my own experience and on my understanding of science) that some of our most important learning comes when we refute our models (in relevant ways). To put it another way: unlike many Bayesians, I believe that a model check--a hypothesis test--can be valuable, even when (or especially when) there is no alternative at hand.

I also think that my philosophical approach fits well with modern Bayesian data analysis, which is characterized not just by the calculation of posterior probabilities but by a three-step process: (1) model building, (2) inference conditional on an assumed model, (3) model checking, then returning back to step (1) as needed, either to expand the model or to ditch it and start anew.

I think that the association of Popperian falsification with classical statistical methods, and the association of inductive reasoning with Bayesian inference, is unfortunate, and I'd like to (a) convlnce the Popperians that Bayesian methods allow one to be a most effective Popperian, and (b) convince the Bayesians of the problems with formal inductive reasoning. (See the second column of page 177 here.)

Mayo and I then had an email exchange, which I'll repeat here. I'm hoping this will lead to clearer communications between philosophers and applied statisticians. (As Cosma and I discuss in our paper, philosophy is important for statisticians: it can influence how we use and interpret our methods.)

Mayo:

Here's my discussion of this article for the Journal of the Royal Statistical Society:

I will comment on this paper in my role as applied statistician and consumer of Bayesian computation. In the last few years, my colleagues and I have felt the need to fit predictive survey responses given multiple discrete predictors, for example estimating voting given ethnicity and income within each of the fifty states, or estimating public opinion about gay marriage given age, sex, ethnicity, education, and state. We would like to be able to fit such models with ten or more predictors--for example, religion, religious attendance, marital status, and urban/rural/suburban residence in addition to the factors mentioned above.

There are (at least) three reasons for fitting a model with many predictive factors and potentially a huge number of interactions among them:

1. Deep interactions can be of substantive interest. For example, Gelman et al. (2009) discuss the importance of interactions between income, religion, religious attendance, and state in understanding how people vote.

2. Deep interactions can increase predictive power. For example Gelman and Ghitza (2010) show how the relation between voter turnout and the combination of sex, ethnicity, education, and state has systematic patterns that would be not be captured by main effects or even two-way interactions.

3. Deep interactions can help correct for sampling problems. Nonresponse rates in opinion polls continue to rise, and this puts a premium on post-sampling adjustments. We can adjust for known differences between sampling and population using poststratification, but to do so we need reasonable estimates of the average survey response within narrow slices of the population (Gelman, 2007).

Our key difficulty--familiar in applied statistics but not always so clear in discussions of statistical computation--is that, while we have an idea of the sort of model we would like to fit, we are unclear on the details. Thus, our computational task is not merely to fit a single model but to try out many different possibilities. My colleagues and I need computational tools that are:

(a) able to work with moderately large datasets (aggregations of surveys with total sample size in the hundreds of thousands);
(b) able to handle complicated models with tens of thousands of latent parameters;
(c) flexible enough to fit models that we haven't yet thought of;
(d) fast enough that we can fit model after model.

We all know by now that hierarchical Bayesian methods are a good way of estimating large numbers of parameters. I am excited about the article under discussion, and others like it, because the tools therein promise to satisfy conditions (a), (b), (c), (d) above.

Aleks points me to this article showing some pretty maps by Eric Fisher showing where people of different ethnicity live within several metro areas within the U.S. The idea is simple but effective; in the words of Cliff Kuang:

Fisher used a straight forward method borrowed from Rankin: Using U.S. Census data from 2000, he created a map where one dot equals 25 people. The dots are then color-coded based on race: White is pink; Black is blue; Hispanic is orange, and Asian is green.

The results for various cities are fascinating: Just like every city is different, every city is integrated (or segregated) in different ways.

New York is shown below.

No, San Francisco is not "very, very white"

But I worry that these maps are difficult for non-experts to read. For example, Kuang writes the following::

San Francisco proper is very, very white.

This is an understandable mistake coming from someone who, I assume, has never lived in the Bay Area. But what's amazing is that Kuang made the above howler after looking at the color-coded map of the city!

For those who haven't lived in S.F., here are the statistics:

The city of San Francisco is 45% non-Hispanic white, 14% Hispanic, 7% black, and 31% Asian (with the remaining 3% being Native American, Pacific Islander, or reporting multiple races).

"Very, very white," it ain't.

I'm not trying to pick on Kuang here--I'm sure it's not easy to write on deadline. My point is that even a clean graph like Fisher's--a graph that I love--can still easily be misread. I remember this when I was learning how to present graphs in a talk. It always helps to point to one of the points or lines and explain exactly what it is.

And now, here's the (amazing) graph of the New York area:

NewYorkB.jpg

Bob Erikson, one of my colleagues at Columbia who knows much more about American politics than I do, sent in the following screed. I'll post Bob's note, followed by my comments.

Bob writes:

Monday morning many of us were startled by the following headline:
White House strenuously denies NYT report that it is considering getting aggressive about winning the midterm elections.

At first I [Bob] thought I was reading the Onion, but no, it was a sarcastic comment on the blog Talking Points Memo. But the gist of the headline appears to be correct. Indeed, the New York Times reported that

White House advisers denied that a national ad campaign was being planned. 'There's been no discussion of such a thing at the White House'

What do we make of this? Is there some hidden downside to actually running a national campaign? Of course, money spent nationally is not spent on targeted local campaigns. But that is always the case. What explains the Democrats' trepidation about mounting a campaign and actually trying to woo voters?

Last year we discussed an important challenge in causal inference: The standard advice (given in many books, including ours) for causal inference is to control for relevant pre-treatment variables as much as possible. But, as Judea Pearl has pointed out, instruments (as in "instrumental variables") are pre-treatment variables that we would not want to "control for" in a matching or regression sense.

At first, this seems like a minor modification, with the new recommendation being to apply instrumental variables estimation using all pre-treatment instruments, and to control for all other pre-treatment variables. But that can't really work as general advice. What about weak instruments or covariates that have some instrumental aspects?

I asked Paul Rosenbaum for his thoughts on the matter, and he wrote the following:

In section 18.2 of Design of Observational Studies (DOS), I [Rosenbaum] discuss "seemingly innocuous confounding" defined to be a covariate that predicts a substantial fraction of the variation in treatment assignment but without obvious importance to the outcomes under study.

The word "seemingly" is important: it may not be innocuous, but only seem so. The example is drawn from a study (Silber, et al. 2009, Health Services Research 44: 444-463) of the timing of the discharge of premature babies from neonatal intensive care units (NICUs). Although all babies must reach a certain level of functional maturity before discharge, there is variation in discharge time beyond this, and we were interested in whether extra days in the NICU were of benefit to the babies who received them. (The extra days are very costly.) It is a long story, but one small part of the story concerns two "seemingly innocuous covariates," namely the day of the week on which a baby achieves functional maturity and the specific hospital in the Kaiser family of hospitals. A baby who achieves maturity on a Thursday goes home on Friday, but a baby who achieves maturity on Saturday goes home on Tuesday, more or less. It would, of course, be ideal if the date of discharge were determined by something totally irrelevant, but is it true that day-of-the-week is something totally irrelevant?

Should you adjust for the day of the week? A neonatologist argued that day of the week is not innocuous: a doc will keep a baby over the weekend if the doc is worried about the baby, but will discharge promptly if not worried, and the doc has information not in the medical record. Should you adjust for the day of the week? Much of the variation in discharge time varied between hospitals in the same chain of hospitals, although the patient populations were similar. Perhaps each hospital's NICU has its own culture. Should you adjust for the hospital?

The answer I suggest in section 18.2 of Design of Observational Studies is literally yes-and-no. We did analyses both ways, showing that the substantive conclusions were similar, so whether or not you think day-of-the-week and hospital are innocuous, you still conclude that extra days in the NICU are without benefit (see also Rosenbaum and Silber 2009, JASA, 104:501-511). Section 18.2 of DOS discusses two techniques, (i) an analytical adjustment for matched pairs that did not match for an observed covariate and (ii) tapered matching which does and does not match for the covariate. Detailed references and discussion are in DOS.

Tyler Cowen links approvingly to this review by B. R. Myers of a book that I haven't read. Unlike Cowen, I haven't seen the book in question--so far, I've only read the excerpt that appeared in the New Yorker--but I can say that I found Myers's review very annoying. Myers writes:

Brendan Nyhan gives the story.

Here's Sarah Palin's statement introducing the now-notorious phrase:

The America I know and love is not one in which my parents or my baby with Down Syndrome will have to stand in front of Obama's "death panel" so his bureaucrats can decide, based on a subjective judgment of their "level of productivity in society," whether they are worthy of health care.

And now Brendan:

Palin's language suggests that a "death panel" would determine whether individual patients receive care based on their "level of productivity in society." This was -- and remains -- false. Denying coverage at a system level for specific treatments or drugs is not equivalent to "decid[ing], based on a subjective judgment of their 'level of productivity in society.'"

Seems like an open-and-shut case to me. The "bureaucrats" (I think Palin is referring to "government employees") are making decisions based on studies of the drug's effectiveness:

Dan Corstange writes:

Who sells their votes? Clientelism and vote buying are pervasive electoral practices in developing-world democracies and autocracies alike. I [Corstange] argue that buyers, regardless of regime type, prefer cheap voters, but that parties operating in uncompetitive environments are better able to price discriminate than those operating in competitive elections. I use an augmented list experiment to examine vote selling at the microlevel in Lebanon, in which both types of environment existed in its 2009 elections. I find that just over half of the electorate sold their votes, which is more than double the proportion willing to admit it. The evidence further shows that voters with low reservation prices are most likely to sell, and that monopsonistic buyers are better able to price discriminate among sellers than are dueling machines.

My comments:

This is a fascinating paper. I particularly like the speculations in the conclusion--it's always interesting to think of wider implications. In the abstract, I would rephrase slightly, change "I find that just over half of the electorate sold their votes" to "I estimate..."

Also, I have to admit there's something about list experiments that make me just slightly uneasy. It's too late for this study, but maybe in a future study of this sort, you could try varying conditions in which each item (not just the vote-buying question) is removed from the list. This might offer some sort of calibration.

P.S. The paper has nice graphs. But I strongly, strongly, strongly recommend rotating them 90 degrees so I can read them without turning my computer sideways. I'm not the only one who reads papers online! Also, I recommend labeling your line directly rather than using a legend.

I can't escape it

I received the following email:

Ms. No.: ***

Title: ***

Corresponding Author: ***

All Authors: ***

Dear Dr. Gelman,

Because of your expertise, I would like to ask your assistance in determining whether the above-mentioned manuscript is appropriate for publication in ***. The abstract is pasted below. . . .

My reply:

I would rather not review this article. I suggest ***, ***, and *** as reviewers.

I think it would be difficult for me to review the manuscript fairly.

Brendan pointed me to this news article by David Pogue promoting a website called Hipmunk, a sleek competitor to Travelocity, Expedia, Kayak, and the like.

Coincidentally, I had to a buy a flight right now so I followed the link and found that, indeed, Hipmunk is about a zillion times easier to use and more impressive than Expedia or even Kayak. It's awesome. The others aren't even close. The display was so clean and effective, I felt like ordering a few flights just for fun.

That's the good news. Now the bad news. I wasn't just playing around with the site. There was actually a flight I wanted to buy--an itinerary I'd looked into yesterday but hadn't saved or booked. I effortlessly set up the request in Hipmunk, scanned its impressive graphical display, and . . . couldn't find the flight I wanted! Oh no! The last ticket must've been sold!

Just to check, though, I want on good old ugly Expedia. And my flight was right there! So I bought it.

So, just a quick memo to whoever runs Hipmunk: Your interface is great, but I suggest you scrape Expedia to find more flights. You could start with evening nonstops from RDU to LGA. (Or, if my flight was actually on Hipmunk all the time, you just need a better interface.)

The relevant screenshots are below (right-click to see the full images):

NSF crowdsourcing

I have no idea what this and this are, but Aleks passed these on, and maybe some of you will find them interesting.

Electability and perception of electability

Mark Palko writes:

We've heard a lot recently about the Republican voters going with less electable candidates (last night in particular), but I [Palko] wonder whether this is less a question of putting less weight on electability and more of having a different perception of electability. Is this really a case of primary voters who supported O'Donnell saying "I'd rather be right than be president" or do a large percentage of them believe she has a better than even chance in November.

My reply:

It's not so horrible for people to engage in non-strategic voting! Beyond the immediate probabilities of this candidate winning the Senate election in November, primary challenges keep incumbents accountable. The thing I don't really understand is why there aren't more such challenges. I suppose they're unlikely enough to succeed that it's not usually worth doing it and risking your political career.

But, yes, I'm pretty sure that O'Donnell's voters overestimated the chance that she'd win in November. That's just human nature.

The real question here is why little Delaware has 2 seats in the U.S. Senate. . . .

There's a lot of free advice out there. As I wrote a couple years ago, it's usually presented as advice to individuals, but it's also interesting to consider the possible total effects if the advice is taken.

For example, Nassim Taleb has a webpage that includes a bunch of one-line bits of advice (scroll to item 132 on the linked page). Here's his final piece of advice:

If you dislike someone, leave him alone or eliminate him; don't attack him verbally.

I'm a big Taleb fan (search this blog to see), but this seems like classic negative-sum advice. I can see how it can be a good individual strategy to keep your mouth shut, bide your time, and then sandbag your enemies. But it can't be good if lots of people are doing this. Verbal attacks are great, as long as there's a chance to respond. I've been in environments where people follow Taleb's advice, saying nothing and occasionally trying to "eliminate" people, and it's not pretty. I much prefer for people to be open about their feelings. Or, if you want to keep your dislikes to yourself, fine, but don't go around eliminating people!

On the other hand, maybe I'm missing the point. Taleb attacks people verbally all the time, so maybe his advice is tongue in cheek, along the lines of "do as I say, not as I do."

As noted above, I think Taleb is great, but I'm really down on this sort of advice where people are advised to be more strategic, conniving, etc. In my experience, this does not lead to a pleasant equilibrium where everybody is reasonably savvy. Rather, it can lead to a spiral of mistrust and poor communication.

P.S. Taleb's other suggestions seem more promising.

Here's a good one if you want to tell your students about question wording bias. It's fun because the data are all on the web--the research is something that students could do on their own--if they know what to look for. Another win for Google.

Here's the story. I found the following graph on the front page of the American Enterprise Institute, a well-known D.C. think tank:

aei.png

My first thought was that they should replace this graph by a time series, which would show so much more information. I did a web search and, indeed, looking at a broad range of poll questions over time gives us a much richer perspective on public opinion about Afghanistan than is revealed in the above graph.

I did a quick google search ("polling report afghanistan") and found this. The quick summary is that roughly 40% of Americans favor the Afghan war (down from about 50% from 2006 through early 2009).

The Polling Report page also features the Quninipiac poll featured in the above graph; here it reports that, as of July 2010, 48% think the U.S. is "doing the right thing" by fighting the war in Afghanistan and 43% think the U.S. should "not be involved." This phrasing seems to elicit more support--I guess people don't want to think that the U.S. is not doing the right thing.

OK, so we have 40% support, or maybe 48% support . . . how did the AEI get the 58% support highlighted on its graph?

Steven Hayward at the American Enterprise Institute wrote an article, sure to attract the attention of people such as myself, entitled, "The irrelevance of modern political science," in which he discusses some silly-sounding papers presented at the recent American Political Science Association and then moves to a larger critique of quantitative political science:

Stephanie Evergreen writes:

Now that September has arrived, it's time for us to think teaching. Here's something from Andrew Heckler and Eleanor Sayre. Heckler writes:

The article describes a project studying the performance of university level students taking an intro physics course. Every week for ten weeks we took 1/10th of the students (randomly selected only once) and gave them the same set of questions relevant to the course. This allowed us to plot the evolution of average performance in the class during the quarter. We can then determine when learning occurs: For example, do they learn the material in a relevant lecture or lab or homework? Since we had about 350 students taking the course, we could get some reasonable stats.

In particular, you might be interested in Figure 10 (page 774) which shows student performance day-by-day on a particular question. The performance does not change directly after lecture, but rather only when the homework was due. [emphasis added] We could not find any other studies that have taken data like this, and it has nice potential to measure average effects of instruction.

Note also Figure 9 which show a dramatic *decrease* in student performance--almost certainly due to interference from learning a related topic.

I love this kind of thing. The results are not a huge surprise, but what's important to me about this kind of study is the active measurement that's involved, which can be difficult to set up but, once it's there, allows the opportunity to discover things about teaching and learning that I think would be nearly impossible to find out through our usual informal processes of evaluation. Some time I'm hoping to do this sort of project with our new introductory statistics course. (Not this semester, though; right now we're still busy trying to get it all working.)

Update on marathon statistics

Frank Hansen updates his story and writes:

Here is a link to the new stuff. The update is a little less than half way down the page.

1. used display() instead of summary()

2. include a proxy for [non] newbies -- whether I can find their name in a previous Chicago Marathon.

3. graph actual pace vs. fitted pace (color code newbie proxy)

4. estimate the model separately for newbies and non-newbies.

some incidental discussion of sd of errors.

There are a few things unfinished but I have to get to bed, I'm running the 2010 Chicago Half tomorrow morning, and they moved the start up from 7:30 to 7:00 because it's the day of the Bears home opener too.

Ross Ihaka to R: Drop Dead

Christian Robert posts these thoughts:

I [Ross Ihaka] have been worried for some time that R isn't going to provide the base that we're going to need for statistical computation in the future. (It may well be that the future is already upon us.) There are certainly efficiency problems (speed and memory use), but there are more fundamental issues too. Some of these were inherited from S and some are peculiar to R.

One of the worst problems is scoping. Consider the following little gem.

f =function() {
if (runif(1) > .5)
x = 10
x
}

The x being returned by this function is randomly local or global. There are other examples where variables alternate between local and non-local throughout the body of a function. No sensible language would allow this. It's ugly and it makes optimisation really difficult. This isn't the only problem, even weirder things happen because of interactions between scoping and lazy evaluation.

In light of this, I [Ihaka] have come to the conclusion that rather than "fixing" R, it would be much more productive to simply start over and build something better. I think the best you could hope for by fixing the efficiency problems in R would be to boost performance by a small multiple, or perhaps as much as an order of magnitude. This probably isn't enough to justify the effort (Luke Tierney has been working on R compilation for over a decade now). . . .

If we're smart about building the new system, it should be possible to make use of multi-cores and parallelism. Adding this to the mix might just make it possible to get a three order-of-magnitude performance boost with just a fraction of the memory that R uses.

I don't know what to think about this. Some of my own recent thoughts on R are here. Although I am developing some R packages, overall I think of myself as more of a user than a developer. I find those S4 data types to be an annoyance, and I'm not happy at the "bureaucratic" look of so many R functions. If R could be made 100 times faster, that would be cool. When writing ARM, I was careful to write code in what I considered a readable way, which in many instances involved looping rather than vectorization and the much-hated apply() function. (A particular difficulty arises when dealing with posterior simulations, where scalars become matrices, matrices become two-way arrays, and so forth.) In my programming, I've found myself using notational conventions where the structure in the program should be, and I think this is a common problem in R. (Consider the various objects such as rownames, rows, row.names, etc etc.) And anyone who's worked with R for awhile has had the frustration of having to take a dataset and shake it to wring out all the layers of structure that are put there by default. I'll read in some ascii data and then be going through different permutations of functions such as as.numeric(), as.vector(), as.character() to convert data from "levels" into numbers or strings.

Don't get me wrong. R is great. I love R. And I recognize that many of its problems arise from its generality. I think it's great that Ross Ihaka and others are working to make things even better.

Yesterday at the sister blog, Nate Silver forecast that the Republicans have a two-thirds chance of regaining the House of Representatives in the upcoming election, with an expected gain of 45 House seats.

Last month, Bafumi, Erikson, and Wlezien released their forecast that gives the Republicans an 80% chance of takeover and an expected gain of 50 seats.

As all the above writers emphasize, these forecasts are full of uncertainty, so I treat the two predictions--a 45-seat swing or a 50-seat swing--as essentially identical at the national level.

And, as regular readers know, as far back as a year ago, the generic Congressional ballot (those questions of the form, "Which party do you plan to vote for in November?") was also pointing to big Republican gains.

As Bafumi et al. point out, early generic polls are strongly predictive of the election outcome, but they need to be interpreted carefully. The polls move in a generally predictable manner during the year leading up to an election, and so you want to fit a model to the polls when making a forecast, rather than just taking their numbers at face value.

Methods

Having read Nate's description of his methods and also the Bafumi, Erikson, and Wlezien paper, my impression is that the two forecasting procedures are very similar. Both of them use national-level information to predict the nationwide vote swing, then use district-level information to map that national swing onto a district level. Finally, both methods represent forecasting uncertainty as a probability distribution over the 435 district-level outcomes and then summarize that distribution using simulations.

I'll go through the steps in order.

1. Forecast national vote share for the two parties from a regression model using the generic ballot and other information including the president's party, his approval rating, and recent economic performance.

2. Map the estimated national swing to district-by-district swings using the previous election results in each district as a baseline, then correcting for incumbency and uncontested elections.

2a. Nate also looks at local polls and expert district-by-district forecasts from the Cook Report and CQ Politics and, where these polls forecasts differ from the adjusted-uniform-swing model above, he compromises between the different sources of available information. He also throws in other district-level information including data on campaign contributions.

3. Fit the model to previous years' data and use the errors in those retrospective fit to get an estimate of forecast uncertainty. Using simulation, propagate that uncertainty to get uncertain forecast of elections at the district and national levels.

Step 2 is important and is indeed done by the best political-science forecasters. As Kari and I discuss, to a large extent, local and national swings can be modeled separately, and it is a common mistake for people to look at just one or the other.

The key difference between Nate's forecast and the others seems to be step 2a. Even if, as I expect, step 2a adds little to the accuracy of the national forecast, I think it's an excellent idea--after all, the elections are being held within districts. And, as Nate notes, when the local information differs dramatically from the nationally-forecast trend, often something interesting is going on. And these sorts of anomalies should be much more easily found by comparing to forecasts than by looking at polls in isolation.

GLM - exposure

Bernard Phiri writes:

I came across this blog by Jonathan Weinstein that illustrated, once again, some common confusion about ideas of utility and risk. Weinstein writes:

When economists talk about risk, we talk about uncertain monetary outcomes and an individual's "risk attitude" as represented by a utility function. The shape of the function determines how willing the individual is to accept risk. For instance, we ask students questions such as "How much would Bob pay to avoid a 10% chance of losing $10,000?" and this depends on Bob's utility function.

This is (a) completely wrong, and (b) known to be completely wrong. To be clear: what's wrong here is not that economists talk this way. What's wrong is the identification of risk aversion with a utility function for money. (See this paper from 1998 or a more formal argument from Yitzhak in a paper from 2000.)

It's frustrating. Everybody knows that it's wrong to associate a question such as "How much would Bob pay to avoid a 10% chance of losing $10,000?" with a utility function, yet people do it anyway. It's not Jonathan Weinstein's fault--he's just calling this the "textbook definition"--but I guess it is the fault of the people who write the textbooks.

P.S. Yes, yes, I know that I've posted on this before. It's just sooooooo frustrating that I'm compelled to write about it again. Unlike some formerly recurring topics on this blog, I don't associate this fallacy with any intellectual dishonesty. I think it's just an area of confusion. The appealing but wrong equation of risk aversion with nonlinear utility functions is a weed that's grown roots so deep that no amount of cutting and pulling will kill it.

P.P.S. To elaborate slightly: The equation of risk aversion with nonlinear utility is empirically wrong (people are much more risk averse for small sums than could possibly make sense under the utility model) and conceptually wrong (risk aversion is an attitude about process rather than outcome).

P.P.P.S. I'll have to write something more formal about this some time . . . in the meantime, let me echo the point made by many others that the whole idea of a "utility function for money" is fundamentally in conflict with the classical axiom of decision theory that preferences should depend only on outcomes, not on intermediate steps. Money's value is not in itself but rather in what it can do for you, and in the classical theory, utilities would be assigned to the ultimate outcomes. (But even if you accept the idea of a "utility of money" as some sort of convenient shorthand, you still can't associate it with attitudes about risky gambles, for the reasons discussed by Yitzhak and myself and which are utterly obvious if you ever try to teach the subject.)

P.P.P.P.S. Yes, I recognize the counterargument: that if this idea is really so bad and yet remains so popular, it must have some countervailing advantages. Maybe so. But I don't see it. It seems perfectly possible to believe in supply and demand, opportunity cost, incentives, externalities, marginal cost and benefits, and all the rest of the package--without building it upon the idea of a utility function that doesn't exist. To put it another way, the house stands up just fine without the foundations. To extent that the foundations hold up at all, I suspect they're being supported by the house.

Fighting Migraine with Multilevel Modeling

Hal Pashler writes:

Ed Vul and I are working on something that, although less exciting than the struggle against voodoo correlations in fMRI :-) might interest you and your readers. The background is this: we have been struck for a long time by how many people get frustrated and confused trying to figure out whether something they are doing/eating/etc is triggering something bad, whether it be migraine headaches, children's tantrums, arthritis pains, or whatever. It seems crazy to try to do such computations in one's head--and the psychological literature suggests people must be pretty bad at this kind of thing--but what's the alternative? We are trying to develop one alternative approach--starting with migraine as a pilot project.

We created a website that migraine sufferers can sign up for. The users select a list of factors that they think might be triggering their headaches (eg drinking red wine, eating stinky cheese, etc.--the website suggests a big list of candidates drawn from the migraine literature). Then, every day the user is queried about how much they were exposed to each of these potential triggers that day, as well as whether they had a headache. After some months, the site begins to analyze the user's data to try to figure out which of these triggers--if any--are actually causing headaches.

Our approach uses multilevel logistic regression as in Gelman and Hill, and or Gelman and Little (1997), and we use parametric bootstrapping to obtain posterior predictive confidence intervals to provide practical advice (rather than just ascertain the significance of effects). At the start the population-level hyperparameters on individual betas start off uninformative (uniform), but as we get data from an adequate number of users (we're not there quite yet), we will be able to pool information across users to provide appropriate population-level priors on the regression coefficients for each possible trigger factor for each person. The approach is outlined in this FAQ item.

Looks cool to me.

Cyrus writes:

I [Cyrus] was teaching a class on multilevel modeling, and we were playing around with different method to fit a random effects logit model with 2 random intercepts---one corresponding to "family" and another corresponding to "community" (labeled "mom" and "cluster" in the data, respectively). There are also a few regressors at the individual, family, and community level. We were replicating in part some of the results from the following paper: Improved estimation procedures for multilevel models with binary response: a case-study, by G Rodriguez, N Goldman.

(I say "replicating in part" because we didn't include all the regressors that they use, only a subset.) We were looking at the performance of estimation via glmer in R's lme4 package, glmmPQL in R's MASS package, and Stata's xtmelogit. We wanted to study the performance of various estimation methods, including adaptive quadrature methods and penalized quasi-likelihood.

I was shocked to discover that glmer's default setting is adaptive Gaussian quadrature with 1 integration point (that is, a Laplace approximation). In addition, and even more shocking, glmer **does not allow** more than one integration point to be used for models with more than one random effect. With only one random effect, you can increase the number of integration points. But with multiple random effects (like what we had, which was 2), when you try to specify multiple integration points (with the nAGQ setting), you get an error saying that it can't do it. At first, I kinda shrugged, but then when I saw the difference that it made, alarm bells started going off.

I made a table showing results from estimates using glmer's Laplace approximation, and then also results from the glmmPQL which uses penalized quasi-liklihood, and then results from Stata's xtmelogit using first its own Laplace approximation and then a 7-pt Gaussian adaptive quadrature. Note that in Stata, the 7-pt quadrature fit is the default. This is a high dimsenional "brute force" method for fitting a complex likelihood.

Now, when we look at the results for the random effects standard deviation estimates, we see that the Laplace approximations in R (via glmer) and Stata (model 3 in the Table) are identical. In addition, taking the 7-pt quadrature estimate from xtmelogit to be the best guess of the bunch, the Laplace approximation-based estimates are HORRIBLY biased downward. That is, the estimated random effect variance is way too small. The PQL estimates are also biased downward, but not so badly. This is in line with what Rodriguez and Goldman found, using a 20-point quadrature and an MCMC fit as benchmarks (they produced nearly identical results).

The upshot is that glmer's default is REALLY BAD; and this default is the only option when you have more than one random effect. My suggestion then is to either use Stata, which has a very sensible 7-point quadrature default (although as a brute force method, it is slow) or fit the model with MCMC.

If I understand, you are using glmer? If so, have you (1) looked into this or (2) checked what glmer is giving you against xtmelogit or BUGS? I would trust the xtmelogit or BUGS answers more.

My reply:

1. I am working with Sophia Rabe-Hesketh, who's one of the people who programmed gllam in Stata. Sophia has been telling me for awhile that the method used by gllam is better than the Laplace method that is currently used in glmer.

2. Just some terminology: I'd prefer to talk about different "approximations" rather than different "estimation methods." To me, there's just one estimation method here--hierarchical Bayes (or, as Sophia would say, marginal likelihood inference)--and these various approaches to point estimates of variance parameters are approximations to get around the difficulties of doing the integrals and computing the marginal likelihood.

3. I also prefer to use the term "varying intercepts" (and slopes) rather than "random effects," because the term "random effects" is not so clearly defined in the literature (as Jennifer and I discuss in our book). This doesn't change the substance of your comment but it makes it easier for me to follow.

4. In my project with Sophia (and with Jingchen Liu, also cc-ed here), we're working on several parallel tracks:
(a) Adding a prior distribution on the variance parameters to stabilize the point estimates and keep them away from the boundary of parameter space (which, in the context of scalar variance parameters, implies that we keep the estimates away from zero).
(b) Adding a few steps of the Metropolis algorithm to capture some of the uncertainty in the variance parameters and also fix some of that bias you're talking about. It's my intuition that a low number (e.g., 10) Metropolis steps will clean up a lot of problems. Although, right now, that's just an intuition, we haven't tried it yet.
(c) Working in R (using glmer) and also separately in Stata (using gllam).

5. Last but not least, it sounds like you've encountered an important principle of research: The way to really learn a subject is to teach it! That's why college professors(at least at a place like Columbia) know so much: we're always teaching new things.

Hadley Wickham's talk for Monday 13 Sept at noon in the statistics dept:

As the volume of data increases, so to does the complexity of our models. Visualisation is a powerful tool for both understanding how models work, and what they say about a particularly dataset. There are very many well-known techniques for visualising data, but far fewer for visualising models. In this talk I [Wichkam] will discuss three broad strategies for model visualisation: display the model in the data space; look all members of a collection; and explore the process of model fitting, not just the end result. I will demonstrate these techniques with two examples: neural networks, and ensembles of linear models.

Hey--this is one of my favorite topics!

We're doing a new thing here at the Applied Statistics Center, throwing monthly Friday afternoon mini-conferences in the Playroom (inspired by our successful miniconference on statistical consulting a couple years ago).

This Friday (10 Sept), 1-5pm:

Come join us this Friday, September 10th for an engaging interdisciplinary discussion of risk perception at the individual and societal level, and the role it plays in current environmental, social, and health policy debates. All are welcome!

"Risk Perception in Environmental Decision-Making"

Elke Weber, Columbia Business School

"Cultural Cognition and the Problem of Science Communication"

Dan Kahan, Yale Law School

Discussants include:

Michael Gerrard, Columbia Law School

David Epstein, Department of Political Science, Columbia University

Andrew Gelman, Department of Statistics, Columbia University

The future of R

Some thoughts from Christian, including this bit:

We need to consider separately

1. R's brilliant library

2. R's not-so-brilliant language and/or interpreter.

I don't know that R's library is so brilliant as all that--if necessary, I don't think it would be hard to reprogram the important packages in a new language.

I would say, though, that the problems with R are not just in the technical details of the language. I think the culture of R has some problems too. As I've written before, R functions used to be lean and mean, and now they're full of exception-handling and calls to other packages. R functions are spaghetti-like messes of connections in which I keep expecting to run into syntax like "GOTO 120."

I learned about these problems a couple years ago when writing bayesglm(), which is a simple adaptation of glm(). But glm(), and its workhorse, glm.fit(), are a mess: They're about 10 lines of functioning code, plus about 20 lines of necessary front-end, plus a couple hundred lines of naming, exception-handling, repetitions of chunks of code, pseudo-structured-programming-through-naming-of-variables, and general buck-passing. I still don't know if my modifications are quite right--I did what was needed to the meat of the function but no way can I keep track of all the if-else possibilities.

If R is redone, I hope its functions return to the lean-and-mean aesthetic of the original S (but with better graphics defaults).

Details here.

P.S. No update on Ed Park or Vin Scully.

1. I remarked that Sharad had a good research article with some ugly graphs.

2. Dan posted Sharad's graph and some unpleasant alternatives, inadvertently associating me with one of the unpleasant alternatives. Dan was comparing barplots with dotplots.

3. I commented on Dan's site that, in this case, I'd much prefer a well-designed lineplot. I wrote:

There's a principle in decision analysis that the most important step is not the evaluation of the decision tree but the decision of what options to include in the tree in the first place.

I think that's what's happening here. You're seriously limiting yourself by considering the above options, which really are all the same graph with just slight differences in format. What you need to do is break outside the box.

(Graph 2-which I think you think is the kind of thing that Gelman would like-indeed is the kind of thing that I think the R gurus like, but I don't like it at all. It looks clean without actually being clean. Sort of like those modern architecture buildings from the 1930s-1960s that look all sleek and functional but really aren't so functional at all.)

The big problem with your graphs above is that they place two logical dimensions (the model and the scenario) on the same physical dimension (the y-axis). I find this sort of ABCABCABCABC pattern hard to follow. Instead, you want to be able to compare AAAA, BBBB, CCCC, while still being able to make the four separate ABC comparisons.

How to do this? I suggest a lineplot.

Here's how my first try would go:

On the x-axis, put Music, Games, Movies, and Flu, in that order. (Ordering is important in allowing you to see patterns that otherwise might be obscured; see the cover of my book with Jennifer for an example.)

On the y-axis, put the scale. I'll assume you know what you're doing here, so keep with the .4 to 1 scale. But you only need labels at .4, .6, .8, 1.0. The intermediate labels are overkill and just make the graph hard to follow.

Now draw three lines, one for Search, one for Baseline, and one for Combined. Color the lines differently and label each one directly on the plot (not using a legend).

The resulting graph will be compact, and the next step is for you to replicate your study under different conditions, with a new graph for each. You can put these side by side and make some good comparisons.

4. Sharad took my advice and made such a lineplot (see the Addendum at the end of Dan's blog).

5. Kaiser agrees with me and presents an excellent visualization showing why the lineplot is better. (Kaiser's picture is so great that I'll save it for its own entry here, for those of you who don't click through on all the links.)

6. David Smith posts that I prefer the dotplot. Nooooooooooooooooooooooo!!!!!!!!!!!

The China Study: fact or fallacy?

Alex Chernavsky writes:

I recently came across an interesting blog post, written by someone who is self-taught in statistics (not that there's anything wrong with that).

I have no particular expertise in statistics, but her analysis looks impressive to me. I'd be very interested to find out the opinion of a professional statistician. Do you have any interest in blogging about this subject?

My (disappointing, I'm sure) reply: This indeed looks interesting. I don't have the time/energy to look at it more right now, and it's too far from any areas of my expertise for me to give any kind of quick informed opinion. It would be good for this sort of discussion to appear in a nutrition journal where the real experts could get at it. I expect there are some strong statisticians who work in that field, although I don't really know for sure.

P.S. I suppose I really should try to learn more about this sort of thing, as it could well affect my life more than a lot of other subjects (from sports to sex ratios) that I've studied in more depth.

QB2

Dave Berri writes:

Saw you had a post on the research I did with Rob Simmons on the NFL draft. I have attached the article. This article has not officially been published, so please don't post this on-line.

The post you linked to states the following: "On his blog, Berri says he restricts the analysis to QBs who have played more than 500 downs, or for 5 years. He also looks at per-play statistics, like touchdowns per game, to counter what he considers an opportunity bias."

Two points: First of all, we did not look at touchdowns per game (that is not a per play stat). More importantly -- as this post indicates -- we did far more than just look at data after five years.

We did mention the five year result, but directly below that discussion (and I mean, directly below), the following sentences appear.

Our data set runs from 1970 to 2007 (adjustments were made for how performance changed over time). We also looked at career performance after 2, 3, 4, 6, 7, and 8 years. In addition, we also looked at what a player did in each year from 1 to 10. And with each data set our story looks essentially the same. The above stats are not really correlated with draft position.

This analysis was also updated and discussed in this post (posted on-line last May). Hopefully that post will also help you see the point Rob and I are making.

I'm out of my depth on this football stuff so I'll leave it to you, the commenters.

The $900 kindergarten teacher

Paul Bleicher writes:

This simply screams "post-hoc, multiple comparisons problem," though I haven't seen the paper.

A quote from the online news report:

The findings revealed that kindergarten matters--a lot. Students of kindergarten teachers with above-average experience earn $900 more in annual wages than students of teachers with less experience than average. Being in a class of 15 students instead of a class of 22 increased students' chances of attending college, especially for children who were disadvantaged . . . Children whose test scores improved to the 60th percentile were also less likely to become single parents, more likely to own a home by age 28, and more likely to save for retirement earlier in their work lives.

I haven't seen the paper either. $900 doesn't seem like so much to me, but I suppose it depends where you stand on the income ladder.

Regarding the multiple comparisons problem: this could be a great example for fitting a multilevel model. Seriously.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48  

Recent Comments

  • tbwhite: Well, anecdotally, from my experience with my kids in elementary read more
  • Steve Sailer: The problem, I suspect, is that they are defining a read more
  • Basil: This sounds credible. I teach in the public school sector read more
  • Harvard Grad Student: I have seen the paper. They use the randomization in read more
  • Steve: Ken: They are relying on a experiment that assigned peers, read more
  • Steve: I agree with John. It's not clear why this is read more
  • John: ^ It's the same study. I don't know why multiple read more
  • Ken Williams: At the risk of opening up old wounds: "Q: How read more
  • Amanda: are you sure you dont mean the $320,000 kindergarten teacher? read more

Find recent content on the main index or look in the archives to find all content.