Results matching “R”

Carol Cronin writes:

The new Wolfram Statistics Course Assistant App, which was released today for the iPhone, iPod touch, and iPad. Optimized for mobile devices, the Wolfram Statistics Course Assistant App helps students understand concepts such as mean, median, mode, standard deviation, probabilities, data points, random integers, random real numbers, and more.

To see some examples of how you and your readers can use the app, I'd like to encourage you to check out this post on the Wolfram|Alpha Blog.

If anybody out there with an i-phone etc. wants to try this out, please let me know how it works. I'm always looking for statistics-learning tools for students. I'm not really happy with the whole "mean, median, mode" thing (see above), but if the app has good things, then an instructor could pick and choose what to recommend, I assume.

P.S. This looks better than the last Wolfram initiative we encountered.

Another silly graph

Somebody named Justin writes:

Check this out for some probably bad statistics and bad graphs. It looks like they tallied the most frequent names of CEOs, professions, geography, etc. What inference they are trying to make from this, I have no clue.

I agree this is pretty horrible isn't the sort of graph that I would make. But, readers, please! You don't have to just send me the bad stuff!

Lottery probability update

It was reported last year that the national lottery of Israel featured the exact same 6 numbers (out of 45) twice in the same month, and statistics professor Isaac Meilijson of Tel Aviv University was quoted as saying that "the incident of six numbers repeating themselves within a month is an event of once in 10,000 years."

I shouldn't mock when it comes to mathematics--after all, I proved a false theorem once! (Or, to be precise, my collaborator and I published a false claim which we thought we'd proved, thus we thought was a theorem.)

So let me retract the mockery and move, first to the mathematics and then to the statistics.

Under the headline, "A Raise Won't Make You Work Harder," Ray Fisman writes:

To understand why it might be a bad idea to cut wages in recessions, it's useful to know how workers respond to changes in pay--both positive and negative changes. Discussion on the topic goes back at least as far as Henry Ford's "5 dollars a day," which he paid to assembly line workers in 1914. The policy was revolutionary at the time, as the wages were more than double what his competitors were paying. This wasn't charity. Higher-paid workers were efficient workers--Ford attracted the best mechanics to his plant, and the high pay ensured that employees worked hard throughout their eight-hour shifts, knowing that if their pace slackened, they'd be out of a job. Raising salaries to boost productivity became known as "efficiency wages."

So far, so good. Fisman then moves from history and theory to recent research:

How much gift exchange really matters to American bosses and workers remained largely a matter of speculation. But in recent years, researchers have taken these theories into workplaces to measure their effect on employee behavior.

In one of the first gift-exchange experiments involving "real" workers, students were employed in a six-hour library data-entry job, entering title, author, and other information from new books into a database. The pay was advertised as $12 an hour for six hours. Half the students were actually paid this amount. The other half, having shown up expecting $12 an hour, were informed that they'd be paid $20 instead. All participants were told that this was a one-time job--otherwise, the higher-paid group might work harder in hopes of securing another overpaying library gig.

The experimenters checked in every 90 minutes to tabulate how many books had been logged. At the first check-in, the $20-per-hour employees had completed more than 50 books apiece, while the $12-an-hour employees barely managed 40 each. In the second 90-minute stretch, the no-gift group maintained their 40-book pace, while the gift group fell from more than 50 to 45. For the last half of the experiment, the "gifted" employees performed no better--40 books per 90-minute period--than the "ungifted" ones.

The punchline, according to Fisman:

The goodwill of high wages took less than three hours to evaporate completely--hardly a prescription for boosting long-term productivity.

What I'm wondering is: How seriously should we use an experiment on one-shot student library jobs (or another study, in which short-term employees were rewarded "with a surprise gift of thermoses"), to make general conclusions such as "Raises don't make employees work harder."

What I'm worried about here isn't causal identification--I'm assuming these are clean experiments--but the generalizability to the outside world of serious employment.

Fisman writes:

All participants were told that this was a one-time job--otherwise, the higher-paid group might work harder in hopes of securing another overpaying library gig.

This seems like a direct conflict between the goals of internal and external validity, especially given that one of the key reasons to pay someone more is to motivate them to work harder to secure continuation of the job, and to give them less incentive to spend their time looking for something new.

I'm not saying that the study Fisman cited is useless, just that I'm surprised that he's so careful to consider internal validity issues yet seems to have no problem extending the result to the whole labor force.

These are just my worries. Ray Fisman is an excellent researcher here at the business school at Columbia--actually, I know him and we've talked about statistics a couple times--and I'm sure he's thought about these issues more than I have. So I'm not trying to debunk what he's saying, just to add a different perspective.

Perhaps Fisman's b-school background explains why his studies all seem to be coming from the perspective of the employer: it's the employer who decides what to do with wages (perhaps "presenting the cut as a temporary measure and by creating at least the illusion of a lower workload") and the employees who are the experimental subjects.

Fisman's conclusion:

If we can find other ways of overcoming the simmering resentment that naturally accompanies wage cuts, workers themselves will be better for it in the long run.

The "we" at the beginning of the sentence does not seem to be the same as the "workers" at the end of the sentence. I wonder if there is a problem with designing policies in this unidirectional fashion.

Rechecking the census

Sam Roberts writes:

The Census Bureau [reported] that though New York City's population reached a record high of 8,175,133 in 2010, the gain of 2 percent, or 166,855 people, since 2000 fell about 200,000 short of what the bureau itself had estimated.

Public officials were incredulous that a city that lures tens of thousands of immigrants each year and where a forest of new buildings has sprouted could really have recorded such a puny increase.

How, they wondered, could Queens have grown by only one-tenth of 1 percent since 2000? How, even with a surge in foreclosures, could the number of vacant apartments have soared by nearly 60 percent in Queens and by 66 percent in Brooklyn?

That does seem a bit suspicious. So the newspaper did its own survey:

A common reason for plagiarism is laziness: you want credit for doing something but you don't really feel like doing it--maybe you'd rather go fishing, or bowling, or blogging, or whatever, so you just steal it, or you hire someone to steal it for you.

Interestingly enough, we see that in many defenses of plagiarism allegations. A common response is: I was sloppy in dealing with my notes, or I let my research assistant (who, incidentally, wasn't credited in the final version) copy things for me and the research assistant got sloppy. The common theme: The person wanted the credit without doing the work.

As I wrote last year, I like to think that directness and openness is a virtue in scientific writing. For example, clearly citing the works we draw from, even when such citing of secondary sources might make us appear less erudite. But I can see how some scholars might feel a pressure to cover their traces.

Wegman

Which brings us to Ed Wegman, whose defense of plagiarism in that Computational Statistics and Data Analysis paper is as follows (from this report by John Mashey):

(a) In 2005, he and his colleagues needed "some boilerplate background on social networks" for a high-profile report for the U.S. Congress. But instead of getting an expert on social networks for this background, or even simply copying some paragraphs (suitably cited) from a textbook on the topic, he tasked a Ph.D. student, Denise Reeves, to prepare the boilerplate. Reeves was no expert: her knowledge of social networks came from having taken a short course on the topic. Reeves writes the boilerplate "within a few days" and Wegman writes "of course, I took that to be her original work."

(b) Wegman gave this boilerplate to a second student, Walid Sharabati, who included it in his Ph.D. dissertation "with only minor amendments." (I think he's saying Sharabati copied it.)

(c) Sharabati was a coauthor of the Computational Statistics and Data Analysis article. He took the material he'd copied from Reeves's report and stuck it in to the CSDA article.

Now let's apply our theme of the day, laziness:

Deviance as a difference

Peng Yu writes:

On page 180 of BDA2, deviance is defined as D(y,\theta)=-2log p(y|\theta). However, according to GLM 2/e by McCullagh and Nelder, deviance is the different of the log-likelihood of the full model and the base model (times 2) (see the equation on the wiki webpage). The english word 'deviance' implies the difference from a standard (in this case, the base model). I'm wondering what the rationale for your definition of deviance, which consists of only 1 term rather than 2 terms.

My reply:

Deviance is typically computed as a relative quantity; that is, people look at the difference in deviance. So the two definitions are equivalent.

My new writing strategy

In high school and college I would write long assignments using a series of outlines. I'd start with a single sheet where I'd write down the key phrases, connect them with lines, and then write more and more phrases until the page was filled up. Then I'd write a series of outlines, culminating in a sentence-level outline that was roughly one line per sentence of the paper. Then I'd write. It worked pretty well. Or horribly, depending on how you look at it. I was able to produce 10-page papers etc. on time. But I think it crippled my writing style for years. It's taken me a long time to learn how to write directly--to explain clearly what I've done and why. And I'm still working on the "why" part. There's a thin line between verbosity and terseness.

I went to MIT and my roommate was a computer science major. He wrote me a word processor on his Atari 800, which did the job pretty well. For my senior thesis I broke down and used the computers in campus. I formatted it in troff which worked out just fine.

In grad school I moved toward the Latex approach of starting with the template and an outline (starting with the Introduction and ending with Discussion and References), then putting in paragraphs here and there until the paper was done. I followed the same approach for my first few books.

Blogging was different. When I blog I tend to start at the beginning and just keep writing until I'm done. I've learned that it's best to write an entry all at once--it's hard to come back a day or a week later to fill in any gaps. I think this has helped my writing style and my writing efficiency. The only trouble is that my entries tend to be story-like rather than article-like. In a story you begin with the motivation and then gradually reveal what's happening. When I'm blogging I commonly start at one place but then, once I'm halfway through, I realize I want to go somewhere else. In contrast, in a proper article you jump right in and say the key point right away, and everything gets structured from there. I've tried to improve my blog-writing by contracting introductory paragraphs into introductory sentences.

I've been blogging for over six years, and it's affected my writing. More and more I write articles from beginning to end. It's worked for me to use Word rather than Latex. Somehow in Word, as in the blogging window, it's easy for me to just get started and write, whereas in Latex everything's just too structured. Really what's relevant here, though, is the style not the software.

Sometimes, though, I have a complicated argument to make and it helps to outline it first. In that case I'll write the outline and then use it as the basis for an article.

But recently I came up with a new strategy--the best of both worlds, perhaps. I write the outline but then set it aside and write the article from scratch, from the beginning, not worrying about the outline. The purpose of the outline is to get everything down so I don't forget any key ideas. Having the outline gives me the freedom to write the article without worrying that I might be missing something--I can always check the outline at the end.

Jay Ulfelder asks:

I have a question for you about what to do in a situation where you have two measures of your dependent variable and no prior reasons to strongly favor one over the other.

Here's what brings this up: I'm working on a project with Michael Ross where we're modeling transitions to and from democracy in countries worldwide since 1960 to estimate the effects of oil income on the likelihood of those events' occurrence. We've got a TSCS data set, and we're using a discrete-time event history design, splitting the sample by regime type at the start of each year and then using multilevel logistic regression models with parametric measures of time at risk and random intercepts at the country and region levels. (We're also checking for the usefulness of random slopes for oil wealth at one or the other level and then including them if they improve a model's goodness of fit.) All of this is being done in Stata with the gllamm module.

Our problem is that we have two plausible measures of those transition events. Unsurprisingly, the results we get from the two DVs differ, sometimes not by much but in a few cases to a non-trivial degree. The conventional solution to this problem seems to be to pick one version as the "preferred" measure and then report results from the other version in footnotes as a sensitivity analysis (invariably confirming the other results, of course; when's the last time you saw a sensitivity analysis in a published paper that didn't back up the "main" findings?). I just don't like that solution, though, because it sweeps under the rug some uncertainty that's arguably as informative as the results from either version alone. At the same time, it seems a little goofy just to toss both sets of results on the table and then shrug in cases where they diverge non-trivially.

Do you know of any elegant solutions to this problem? I recall seeing a paper last year that used Bayesian methods to average across estimates from different versions of a dependent variable, but I don't think that paper used multilevel models and am assuming the math required is much more complicated (i.e., there isn't a package that does this now).

My reply:

My quick suggestion would be to add the two measures and then use the sum as the outcome. If it's a continuous measure there's no problem (although you'd want to prescale the measures so that they're roughly on a common scale before you add them). If they are binary outcomes you can just fit an ordered logit.

Jay liked my suggestion but added:

One hitch for our particular problem, though: because we're estimating event history models, the alternate versions of the DV (which is binary) also come with alternate versions of a couple of the IVs: time at risk and counts of prior events. I can't see how we could accommodate those differences in the framework you propose. Basically, we've got two alternate universes (or two alternate interpretations of the same universe), and the differences permeate both sides of the equation. Sometimes I really wish I worked in the natural sciences...

My suggestion would be to combine the predictors in some way as well.

Here and here, for example.

I just hope they're using our survey methods and aren't trying to contact the zombies face-to-face!

Jon Goldhill points us to a new search engine, Zanran, which is for finding data and statistics. Goldhill writes:

It's useful when you're looking for a graph/table rather than a single number. For example, if you look for 'teenage births rates in the united states' in Zanran you'll see a series of graphs. If you check in Google, there's plenty of material - but you'd have to open everything up to see if it had any real numbers. (I hope you'll appreciate Zanran's preview capability as well - hovering over the icons gives a useful preview of the content.)

Literary blurb translation guide

I'm a few weeks behind in my New Yorker reading and so just recently read this fascinating article by Ryan Lizza on the current administration's foreign policy. He gives some insights into the transformation Obama from antiwar candidate to a president conducting three wars.

Speaking as a statistician, though, what grabbed my eye was a doctrine of journalist/professor/policymaker Samantha Power. Lizza writes:

In 2002, after graduating from Harvard Law School, she wrote "A Problem from Hell," which surveyed the grim history of six genocides committed in the twentieth century. Propounding a liberal-interventionist view, Power argued that "mass killing" on the scale of Rwanda or Bosnia must be prevented by other nations, including the United States. She wrote that America and its allies rarely have perfect information about when a regime is about to commit genocide; a President, therefore, must have "a bias toward belief" that massacres are imminent.

From a statistical perspective, this sounds completely wrong! If you want to argue that it's a good idea to intervene, even if you're not sure, or if you want to argue that it's wise to intervene, even if the act of intervention will forestall the evidence for genocide that would be the motivation for intervention, that's fine. It's a cost-benefit analysis and it's best to lay out the costs and benefits as clearly as possible (within the constraints established by military and diplomatic secrecy). But to try to shade the probabilities to get the decision you want . . . that doesn't seem like a good idea at all!

To be fair, the above quote predates the Iraq WMD fiasco, our most notorious recent example of a "bias toward belief" that influenced policy. Perhaps Power has changed her mind on the virtues of biasing one's belief.

P.S. Samantha Power has been non-statistical before.

P.P.S. Just in case anyone wants to pull the discussion in a more theoretical direction: No, Power's (and, for that matter, Cheney's) "bias toward belief" is not simply a Bayesian prior. My point here is that she's constructing a belief system (a prior) based not on a model of what's happening or even on a subjective probability but rather on what she needs to get the outcome she wants. That's not Bayes. In Bayes, the prior and the utility function are separate.

Why no Wegmania?

A colleague asks:

When I search the web, I find the story [of the article by Said, Wegman, et al. on social networks in climate research, which was recently bumped from the journal Computational Statistics and Data Analysis because of plagiarism] only on blogs, USA Today, and UPI. Why is that? Any idea why it isn't reported by any of the major newspapers?

Here's my answer:

1. USA Today broke the story. Apparently this USA Today reporter put a lot of effort into it. The NYT doesn't like to run a story that begins, "Yesterday, USA Today reported..."

2. To us it's big news because we're statisticians. [The main guy in the study, Edward Wegman, won the Founders Award from the American Statistical Association a few years ago.] To the rest of the world, the story is: "Obscure prof at an obscure college plagiarized an article in a journal that nobody's ever heard of." When a Harvard scientist paints black dots on white mice and says he's curing cancer, that's news. When Prof. Nobody retracts an article on social networks, that's not so exciting. True, there's the global warming connection. I think it's possible the story will develop further. If these statisticians get accused of lying to Congress, that could hit the papers.

Basically, plagiarism is exciting to academics but not so thrilling to the general public if no celebrities are involved. I expect someone at the Chronicle of Higher Education

3. One more thing: newspapers like to report things that are clearly news: earthquakes, fires, elections, arrests, . . . If criminal charges come up or if someone starts suing, then I could see the court events as a hook on which to hang a news story.

Any other thoughts?

Baby name wizards

The other day I noticed a car with the improbable name of Nissan Rogue, from Darien, Connecticut (at least that's what the license plate frame said). And, after all, what could be more "rogue"-like than a suburban SUV?

I can't blame the driver of the car for this one; I'm just amused that the marketers and Nissan thought this was an appropriate name for the car.

Duncan Watts gave his new book the above title, reflecting his irritation with those annoying people who, upon hearing of the latest social science research, reply with: Duh-I-knew-that. (I don't know how to say Duh in Australian; maybe someone can translate that for me?) I, like Duncan, am easily irritated, and I looked forward to reading the book. I enjoyed it a lot, even though it has only one graph, and that graph has a problem with its y-axis. (OK, the book also has two diagrams and a graph of fake data, but that doesn't count.)

Before going on, let me say that I agree wholeheartedly with Duncan's central point: social science research findings are often surprising, but the best results cause us to rethink our world in such a way that they seem completely obvious, in retrospect. (Don Rubin used to tell us that there's no such thing as a "paradox": once you fully understand a phenomenon, it should not seem paradoxical any more. When learning science, we sometimes speak of training our intuitions.) I've jumped to enough wrong conclusions in my applied research to realize that lots of things can seem obvious but be completely wrong. In his book, Duncan does a great job at describing several areas of research with which he's been involved, explaining why this research is important for the world (not just a set of intellectual amusements) and why it's not as obvious as one might think at first.

I encountered this news article, "Chicago school bans some lunches brought from home":

At Little Village, most students must take the meals served in the cafeteria or go hungry or both. . . . students are not allowed to pack lunches from home. Unless they have a medical excuse, they must eat the food served in the cafeteria. . . . Such discussions over school lunches and healthy eating echo a larger national debate about the role government should play in individual food choices. "This is such a fundamental infringement on parental responsibility," said J. Justin Wilson, a senior researcher at the Washington-based Center for Consumer Freedom, which is partially funded by the food industry. . . . For many CPS parents, the idea of forbidding home-packed lunches would be unthinkable. . . .

If I had read this two years ago, I'd be at one with J. Justin Wilson and the outraged kids and parents. But last year we spent a sabbatical in Paris, where . . . kids aren't allowed to bring lunches to school. The kids who don't go home for lunch have to eat what's supplied by the lunch ladies in the cafeteria. And it's just fine. Actually, it was more than fine because we didn't have to prepare the kids' lunches every day. When school let out, the kids would run to the nearest boulangerie and get something sweet. So they didn't miss out on the junk food either.

I'm not saying the U.S. system or the French system is better, nor am I expressing an opinion on how they do things in Chicago. I just think it's funny how a rule which seems incredibly restrictive from one perspective is simply, for others, the way things are done. I'll try to remember this story next time I'm outraged at some intolerable violation of my rights.

P.S. If they'd had the no-lunches-from-home rule when I was a kid, I definitely would've snuck food into school. In high school the wait for lunchtime was interminable.

Statistics plagiarism scandal

See more at the Statistics Forum (of course).

Ryan King writes:

This involves causal inference, hierarchical setup, small effect sizes (in absolute terms), and will doubtless be heavily reported in the media.

The article is by Manudeep Bhuller, Tarjei Havnes, Edwin Leuven, and Magne Mogstad and begins as follows:

Does internet use trigger sex crime? We use unique Norwegian data on crime and internet adoption to shed light on this question. A public program with limited funding rolled out broadband access points in 2000-2008, and provides plausibly exogenous variation in internet use. Our instrumental variables and fixed effect estimates show that internet use is associated with a substantial increase in reported incidences of rape and other sex crimes. We present a theoretical framework that highlights three mechanisms for how internet use may affect reported sex crime, namely a reporting effect, a matching effect on potential offenders and victims, and a direct effect on crime propensity. Our results indicate that the direct effect is non-negligible and positive, plausibly as a result of increased consumption of pornography.

How big is the effect?

Jake Porway writes:

We launched Openpaths the other week. It's a site where people can privately upload and view their iPhone location data (at least until an Apple update wipes it out) and also download their data for their own use. More than just giving people a neat tool to view their data with, however, we're also creating an option for them to donate their data to research projects at varying levels of anonymity. We're still working out the terms for that, but we'd love any input and to get in touch with anyone who might want to use the data.

I don't have any use for this personally but maybe it will interest some of you.

From the webpage:

Mark Chaves sent me this great article on religion and religious practice:

After reading a book or article in the scientific study of religion, I [Chaves] wonder if you ever find yourself thinking, "I just don't believe it." I have this experience uncomfortably often, and I think it's because of a pervasive problem in the scientific study of religion. I want to describe that problem and how to overcome it.

The problem is illustrated in a story told by Meyer Fortes. He once asked a rainmaker in
a native culture he was studying to perform the rainmaking ceremony for him. The rainmaker refused, replying: "Don't be a fool, whoever makes a rain-making ceremony in the dry season?"

The problem is illustrated in a different way in a story told by Jay Demerath. He was in Israel, visiting friends for a Sabbath dinner. The man of the house, a conservative rabbi, stopped in the middle of chanting the prayers to say cheerfully: "You know, we don't believe in any of this. But then in Judaism, it doesn't matter what you believe. What's important is what you do."

And the problem is illustrated in yet another way by the Divinity School student who told me not long ago that she was having second thoughts about becoming an ordained minister in the United Church of Christ because she didn't believe in God. She also mentioned that, when she confided this to several UCC ministers, they told her not to worry about it since not believing in God wouldn't make her unusual among UCC clergy.

This last story reminds me of the saying, "It doesn't matter if you believe in God. What matters is if God believes in you."

Also, on a more serious note, I had a friend of a friend who joined a Roman Catholic religious order--she became a nun--because, according to my friend, this person was "looking for Sister Right." (In addition to everything else, she was a lesbian.) A couple of years later she quit, I believe. I have the impression that the generally positive press received by nuns etc. in our culture gives certain naive and idealistic people false expectations of what they can achieve in such a position. (This is true of academia too, I'm sure!)

Here's Chaves's summary:

Religious congruence refers to consistency among an individual's religious beliefs and attitudes, consistency between religious ideas and behavior, and religious ideas, identities, or schemas that are chronically salient and accessible to individuals across contexts and situations. Decades of anthropological, sociological, and psychological research establish that religious congruence is rare, but much thinking about religion presumes that it is common. The religious congruence fallacy [emphasis added] occurs when interpretations or explanations unjustifiably presume religious congruence.

This reminds me of a corresponding political congruence fallacy. My impression is that many people have a personal feeling of political congruence--they feel that all their political views form a coherent structure--even though the perceived-congruent views of person X will only partially overlap the perceived-congruent views of person Y. For example, X can be a Democrat and support legalized gambling, while Y is a Republican who supports legalized gambling, while persons A and B are a Democrat and a Republican who oppose gambling. Four positions, but each has a story of why they are coherent. (For example, X supports gambling as a way of raising tax money, Y supports gambling because he opposes the nanny state, A opposes gambling as a tax on the poor, and B opposes gambling as immoral.)

I've felt for awhile that this phenomenon, in which each of can frame our particular beliefs as being coherent, creates problems for politics. People are just too damn sure of themselves.

On another point, Chaves's discussion of placebo effects reminded me of my irritation of the research on the medical effects of so-called intercessory prayer (person A prays for person B, with B being unaware of the prayer). Every once in a while someone does a study on intercessory prayer which manages to reach the statistical significance threshold and gets published (I can only imagine that secular journal editors bend over backward to accept such papers and are terrified of appearing anti-religion) and gets mentioned in the more credulous or sensationalist quarters of the popular press.

What irritates me about these intercessory prayer studies is not that I care so much about prayer but because such studies seem to me to be a pseudo-scientific effort to remove the part of prayer that can actually work. It's plausible enough from a scientific (i.e., non-supernatural) perspective that if A prays for B with B's knowledge, that this could make B feel better. I doubt it could fix a broken heart valve but perhaps it could be calming enough that a certain heart attack might never happen. This makes sense and is, to my mind, perfectly consistent with a religious interpretation--why couldn't God work through the mechanism of friendship and caring? To me, the studies on intercessory prayer, by trying to isolate the supernatural aspect, end up removing the most interesting part of the story. In the language of Chaves's article, I'd call this an example of the coherence fallacy, the idea that the way to prove the effectiveness of prayer is to treat it as some sort of button-pushing.

I mentioned this above point to Chaves and he wrote:

I agree! Though this is maybe insulting only to those who self-consciously think of themselves as religious while also explicitly rejecting any sort of supernaturalism, and this type of person has become rarer in American society, which is part of the story behind the collapse of liberal Protestantism and the rise of religious "nones." I think it's easier for Jews than Christians to achieve and feel comfortable with this sort of self-conscious liberal religiosity, perhaps because of the ethnic identity aspects of being Jewish.

Interesting. I hadn't though of that.

Chaves also pointed me to this article by Wendy Cadge.

Finally, regarding the WWJD bracelet etc (no, that's not the same WWJD as our motto here in the Applied Statistics Center!)., there's something Chaves implies but doesn't say, which is that presumably the wearing of the bracelet is, in economists' jargon, "endogenous": the bracelet is intended to be part of a commitment device, so the "treatment" is not really the bracelet-wearing but rather the entire constellation of thoughts and behaviors associated with the decision to live a better life.

A couple things in this interview by Andrew Goldman of Larry Summers currently irritated me.

I'll give the quotes and then explain my annoyance.

Stan will make a total lifetime profit of $0, so we can't be sued!

The revised paper
plot13.pdf

Slightly improved figures
figure13.pdf

And just the history part from my thesis - that some find interesting.
(And to provide a selfish wiki meta-analysis entry pointer)
JustHistory.pdf

I have had about a dozen friends read this or earlier versions - they split into finding it interesting (and pragmatic) versus incomprehensible.

The reason for that may or may not point to ways to make it clearer.

K?

About 15 years ago I ran across this book and read it, just for fun. Rhoads is a (nonquantitative) political scientist and he's writing about basic economic concepts such as opportunity cost, marginalism, and economic incentives. As he puts it, "welfare economics is concerned with anything any individual values enough to be willing to give something up for it."

The first two-thirds of the book is all about the "economist's view" (personally, I'd prefer to see it called the "quantitative view") of the world and how it applies to policy issues. The quick message, which I think is more generally accepted now than in the 1970s when Rhoads started working on this book, is that free-market processes can do better than governmental rules in allocating resources. Certain ideas that are obvious to quantitative people--for example, we want to reduce pollution and reduce the incentives to pollute, but it does not make sense to try to get the level of a pollutant all the way down to zero if the cost is prohibitively high--are not always so obvious to others. The final third of Rhoads's book discusses difficulties economists have had when trying to carry their dollar-based reasoning over to the public sector. He considers the logical tangles with the consumer-is-always-right philosophy and also discusses how economists sometimes lose credibility on topics where they are experts by pushing oversimplified ideas in non-market-based settings.

I like the book a lot. Very few readers will agree with Rhoads on all points but that isn't really the point. He explains the ideas and the historical background well, and the topics cover a wide range, from why it makes sense to tax employer-provided health insurance to various ways in which arguments about externalities have been used to motivate various silly (in his opinion, and mine) government subsidies. I also enjoyed the bits of political science that Rhoads tosses in throughout (for example, his serious discussion in chapter 11 of direct referenda, choosing representatives by lot, and various other naive proposals for political reform).

During the 25 years since the publication of Rhoads's book, much has changed in the relation between economics and public policy. Most notably, economists have stepped out of the shadows. No longer mere technicians, they are now active figures in the public debate. Paul Volcker, Alan Greenspan, and to a lesser extent Lawrence Summers have become celebrities in a way that has been rare among government economic officials. (Yes, Galbraith and Friedman were famous in an earlier era but as writers on economics. They were not actually pulling the levers of power at the time that they were economic celebrities.) And microeconomics, characterized by Rhoads as the ugly duckling of the field, has come into its own with Freakonomics and the rest.

Up until the financial crash of 2008--and even now, still--economists have been riding high. And they'd like to ride higher. For example, a few years ago economist Matthew Kahn asked why there aren't more economists in higher office--and I suspect many other prominent economists have thought the same thing. I looked up the numbers of economists in the employed population, and it turned out that they were in fact overrepresented in Congress. This is not to debate the merits of Kahn's argument--perhaps Congress would indeed be better if it included more economists--but rather to note that economists have moved from being a group with backroom influence to wanting more overt power.

So, with this as background, Rhoads's book is needed now more than ever. It's important for readers of all political persuasions to understand the power and generality of the economist's view. Rhoads's son Chris recently informed me that his father is at work on a second edition, so I pulled my well-worn copy of the first edition off the shelf. I hope the comments below will be useful during the preparation of the revision.

What follows is not intended as any sort of a review; it is merely a transcription and elaboration of the post-it notes that I put in, fifteen years ago, noting issues that I had. (In case you're wondering: yes, the notes are still sticky.)

- On page 102, Rhoads explains why economists think that price controls and minimum wage laws are bad for low-income Americans: "It is striking that there is almost no support for any of these price control measures even among the most equity-conscious economists. . . . The real issue is, in large measure, ignorance." This could be, but I'd also guess (although I haven't had a chance to check the numbers) that price controls and minimum wage are more popular among low-income than high-income voters. This does not exactly contradict Rhoads's claim--after all, poorer people might well be less well informed about economic principals--but it makes me wonder. The political scientist in me suspects that a policy that is supported by poorer people and opposed by richer people might well be a net benefit to people on the lower end of the economic distribution. Rhoads points out that there are more economically efficient forms of transfer--for example, direct cash payments to the poor--but that's not so relevant if such policies aren't about to be implemented because of political resistance.

Later on, Rhoads approvingly quotes an economist who writes, "Rent controls destroy incentives to maintain or rehabilitate property, and are thus an assured way to preserve slums." This may have sounded reasonable when it was written in 1970 but seems naive from a modern-day perspective. Sure, you want a good physical infrastructure, you don't want the pipes to break, etc., but what really makes a neighborhood a slum is crime. Rent control can give people a stake in their location (as with mortgage tax breaks, through the economic inefficiency of creating an incentive to not move). There might be better policies to encourage stability--or maybe increased turnover in dwellings is actually preferable--but the path from "incentives to maintain or rehabilitate property" and "slums" is far from clear.

- On page 139, Rhoads writes: "Most of the costs of business safety regulation fall on consumers." Again, this might be correct, but my impression is that the strongest opposition to these regulations come from business operators, not from consumers. Much of this opposition perhaps arises from costs that are not easily measured in dollars: for example, filling out endless forms, worrying about rules and deadlines. This sort of paperwork load is a constant cost that is borne by managers, not consumers. Anyway, my point is the same as above: as a political scientist, I'm skeptical of the argument that consumers bear most of the costs, given that business operators are (I think) the ones who really oppose these regulations. I'm not arguing that any particular regulation is a good idea, just saying that seems naive to me to take economists' somewhat ideologically-loaded claims at face value here.

- On page 217, Rhoads quotes an economics journalist who writes, "Through its tax laws, government can help create a climate for risk-taking. It ought to prey on the greed in human nature and the industriousness in the American character. Otherwise, stand aside." I have a few thoughts on these lines which perhaps sound a bit different now than in 1980 when they first appeared. Most obviously, a natural consequence of greed + industriousness is . . . theft. There's an even larger problem with this attitude, though, even setting aside moral hazard (those asymmetrical bets in which the banker gets rich if he wins but the taxpayer covers any loss). Even in a no-free-lunch environment in which risks are truly risky, why is "a climate for risk-taking" supposed to be a good thing? This seems a leap beyond the principles of economic efficiency that came in the previous chapters, and I have some further thoughts about this below.

- On page 20, Rhoads criticizes extreme safety laws and writes, "There would be nothing admirable about a society that watched the quality of its life steadily decline in hot pursuit of smaller and smaller increments of life extension." He was ahead of his time in considering this issue. Nowadays with health care costs crowding out everything else, we're all aware of this tradeoff as expressed, for example, in these graphs showing the U.S. spending twice as much on health as other countries with no benefit in life expectancy. It turned out, though, that the culprit was not safety laws but rather the tangled mixture of public and private care that we have in this country. This example suggests that the economist's view of the world can be a valuable perspective without always offering a clear direction for improvement.

Another example from Rhodes's book is nuclear power plants. Some economists argue on free-market grounds that the civilian nuclear industry should be left to fend for themselves without further government support while others argue on efficiency grounds that nuclear power is safe and clean and should be subsidized (see p. 230). Ultimately I agree with Rhoads that this comes down to costs and benefits (and I definitely think like an economist in that way) but in the meantime there is a clash of the two fundamental principles of free markets on one side and efficiency on the other. (The economists who support nuclear power on efficiency grounds cannot simply rely on the free market because of existing market-distorting factors such as safety regulations, fossil fuel subsidies, and various complexities in the existing energy supply system.)

- Finally, when economists talk about fundamental principles, they often bring in their value judgments for free. For example, on page 168 Rhoads quotes an economics writer who doubts that "we need the government to subsidize high-brow entertainment--theater, ballet, opera and television drama . . . Let people decide for themselves whether they want to be entertained by the Pittsburgh Steelers or the local symphony." Well, sure, we definitely don't need subsidies for any of these things. The question is not of need but rather of discretionary spending, given that money is indeed being disbursed as part of the political process. But what I really wonder is: what does this guy (not Rhoads, but the writer he quotes) have against the local symphony? The Pittsburgh Steelers are already subsidized! (Everybody knows this. I just did a quick search on "pittsburgh steelers subsidy" and came across this blog by Skip Sauer with this line: "Three Rivers Stadium in Pittsburgh still was carrying $45 million in debt at the time of its demolition in 2001.")

I hope that in his revision, Rhoads will elaborate on the dominant perspectives of different social science fields. Crudely speaking, political scientists speak to princes, economists speak to business owners, and sociologists speak to community organizers. If we're not careful, we political scientists can drift into a "What should the government do?" attitude which presupposes that the government's goals are reasonable. Similarly, economists have their own cultural biases, such as preferring football to the symphony and, more importantly, viewing risk taking as a positive value in and of itself.

In summary, I think The Economist's View of the World is a great book and I look forward to the forthcoming second edition. I think it's extremely important to see the economist's perspective with its strengths and limitations in a single place.

I followed a link from Tyler Cowen to this bit by Daniel Kahneman:

Education is an important determinant of income -- one of the most important -- but it is less important than most people think. If everyone had the same education, the inequality of income would be reduced by less than 10%. When you focus on education you neglect the myriad other factors that determine income. The differences of income among people who have the same education are huge.

I think I know what he's saying--if you regress income on education and other factors, and then you take education out of the model, R-squared decreases by 10%. Or something like that. Not necessarily R-squared, maybe you fit the big model, then get predictions for everyone putting in the mean value for education and look at the sd of incomes or the Gini index or whatever. Or something else along those lines.

My problem is with the counterfactual: "If everyone had the same education . . ." I have a couple problems with this one. First, if everyone had the same education, we'd have a much different world and I don't see why the regressions on which he's relying would still be valid. Second, is it even possible for everyone to have the same education? I majored in physics at MIT. I don't think it's possible for everyone to do this. Setting aside budgetary constraints, I don't think that most college-age kids could handle the MIT physics curriculum (nor do I think I could handle, for example, the courses at a top-ranked music or art college). I suppose you could imagine everyone having the same number of years of education, but that seems like a different thing entirely.

As noted, I think I see what Kahneman is getting at--income is determined by lots of other factors than education--but I'm a bit disappointed that he could be so casual with the causality. And without the causal punch, his statement doesn't seem so impressive to me. Everybody knows that education doesn't determine income, right? Bill Gates never completed college, and everybody knows the story of humanities graduates who can't find a job.

This entry was posted by Phil Price.

A colleague is looking at data on car (and SUV and light truck) collisions and casualties. He's interested in causal relationships. For instance, suppose car manufacturers try to improve gas mileage without decreasing acceleration. The most likely way they will do that is to make cars lighter. But perhaps lighter cars are more dangerous; how many more people will die for each mpg increase in gas mileage?

There are a few different data sources, all of them seriously deficient from the standpoint of answering this question. Deaths are very well reported, so if someone dies in an auto accident you can find out what kind of car they were in, what other kinds of cars (if any) were involved in the accident, whether the person was a driver or passenger, and so on. But it's hard to normalize: OK, I know that N people who were passengers in a particular model of car died in car accidents last year, but I don't know how many passenger-miles that kind of car was driven, so how do I convert this to a risk? I can find out how many cars of that type were sold, and maybe even (through registration records) how many are still on the road, but I don't know the total number of miles. Some types of cars are driven much farther than others, on average.

Most states also have data on all accidents in which someone was injured badly enough to go to the hospital. This lets you look at things like: given that the car is in an accident, how likely is it that someone in the car will die? This sort of analyses makes heavy cars look good (for the passengers in those vehicles; not so good for passengers in other vehicles, which is also a phenomenon of interest!) but perhaps this is misleading: heavy cars are less maneuverable and have longer stopping distance, so perhaps they're more likely to be in an accident in the first place. Conceivably, a heavy car might be a lot more likely to be in an accident, but less likely to kill the driver if it's in one, compared to a lighter car that is better for avoiding accidents but more dangerous if it does get hit.

Confounding every question of interest is that different types of driver prefer different cars. Any car that is driven by a disproportionately large fraction of men in their late teens or early twenties is going to have horrible accident statistics, whereas any car that is selected largely by middle-aged women with young kids is going to look pretty good. If 20-year-old men drove Volvo station wagons, the Volvo station wagon would appear to be one of the most dangerous cars on the road, and if 40-year-old women with 5-year-old kids drove Ferraris, the Ferrari would seem to be one of the safest.

There are lots of other confounders, too. Big engines and heavy frames cost money to make, so inexpensive cars tend to be light and to have small engines, in addition to being physically small. They also tend to have less in the way of safety features (no side-curtain airbags, for example). If an inexpensive car has a poor safety record, is it because it's light, because it's small, or because it's lacking safety features? And yes, size matters, not just weight: a bigger car can have a bigger "crumple zone" and thus lower average acceleration if it hits a solid object, for example. If large, heavy cars really are safer than small, light cars, how much of the difference is due to size and how much is due to weight? Perhaps a large, light car would be the best, but building a large, light car would require special materials, like titanium or aluminum or carbon fiber, which might make it a lot more expensive...what, if anything, do we want to hold constant if we increase the fleet gas mileage? Cost? Size?

And of course the parameters I've listed above --- size, weight, safety features, and driver characteristics --- don't begin to cover all of the relevant factors.

So: is it possible to untangle the causal influence of various factors?

Most people who are involved in this research topic appear to rely on linear or logistic regression, controlling for various explanatory variables, and make various interpretations based on the regression coefficients, r-squared values, etc. Is this the best that can be done? And if so, how does one figure out the right set of explanatory variables?

This is a "causal inference" question, and according to the title of this blog, this blog should be just the place for this sort of thing. So, bring it on: where do I look to find the right way to answer this kind of question?

(And, by the way, what is the answer to the question I posed at the end of this causal inference discussion?)

I was checking the Dilbert blog (sorry! I was just curious what was up after the events of a few weeks ago) and saw this:

I had a couple of email exchanges with Jan-Emmanuel De Neve and James Fowler, two of the authors of the article on the gene that is associated with life satisfaction which we blogged the other day. (Bruno Frey, the third author of the article in question, is out of town according to his email.) Fowler also commented directly on the blog.

I won't go through all the details, but now I have a better sense of what's going on. (Thanks, Jan and James!) Here's my current understanding:

1. The original manuscript was divided into two parts: an article by De Neve alone published in the Journal of Human Genetics, and an article by De Neve, Fowler, Frey, and Nicholas Christakis submitted to Econometrica. The latter paper repeats the analysis from the Adolescent Health survey and also replicates with data from the Framingham heart study (hence Christakis's involvement).

The Framingham study measures a slightly different gene and uses a slightly life-satisfaction question compared to the Adolescent Health survey, but De Neve et al. argue that they're close enough for the study to be considered a replication. I haven't tried to evaluate this particular claim but it seems plausible enough. They find an association with p-value of exactly 0.05. That was close! (For some reason they don't control for ethnicity in their Framingham analysis--maybe that would pull the p-value to 0.051 or something like that?)

2. Their gene is correlated with life satisfaction in their data and the correlation is statistically significant. The key to getting statistical significance is to treat life satisfaction as a continuous response rather than to pull out the highest category and call it a binary variable. I have no problem with their choice; in general I prefer to treat ordered survey responses as continuous rather than discarding information by combining categories.

3. But given their choice of a continuous measure, I think it would be better for the researchers to stick with it and present results as points on the 1-5 scale. From their main regression analysis on the Adolescent Health data, they estimate the effect of having two (compared to zero) "good" alleles as 0.12 (+/- 0.05) on a 1-5 scale. That's what I think they should report, rather than trying to use simulation to wrestle this into a claim about the probability of describing oneself as "very satisfied."

They claim that having the two alleles increases the probability of describing oneself as "very satisfied" by 17%. That's not 17 percentage points, it's 17%, thus increasing the probability from 41% to 1.17*41% = 48%. This isn't quite the 46% that's in the data but I suppose the extra 2% comes from the regression adjustment. Still, I don't see this as so helpful. I think they'd be better off simply describing the estimated improvement as 0.1 on a 1-5 scale. If you really really want to describe the result for a particular category, I prefer percentage points rather than percentages.

4. Another advantage as describing the result as 0.1 on a 1-5 scale is that it is more consistent with intuitive notions of 1% of variance explained. It's good they have this 1% in their article--I should present such R-squared summaries in my own work, to give a perspective on the sizes of the effects that I find.

5. I suspect the estimated effect of 0.1 is an overestimate. I say this for the usual reason, discussed often on this blog, that statistically significant findings, by their very nature, tend to be overestimates. I've sometimes called this the statistical significance filter, although "hurdle" might be a more appropriate term.

6. Along with the 17% number comes a claim that having one allele gives an 8% increase. 8% is half of 17% (subject to rounding) and, indeed, their estimate for the one-allele case comes from their fitted linear model. That's fine--but the data aren't really informative about the one-allele case! I mean, sure, the data are perfectly consistent with the linear model, but the nature of leverage is such that you really don't get a good estimate on the curvature of the dose-response function. (See my 2000 Biostatistics paper for a general review of this point.) The one-allele estimate is entirely model-based. It's fine, but I'd much prefer simply giving the two-allele estimate and then saying that the data are consistent with a linear model, rather than presenting the one-allele estimate as a separate number.

7. The news reports were indeed horribly exaggerated. No fault of the authors but still something to worry about. The Independent's article was titled, "Discovered: the genetic secret of a happy life," and the Telegraph's was not much better: "A "happiness gene" which has a strong influence on how satisfied people are with their lives, has been discovered." An effect of 0.1 on a 1-5 scale: an influence, sure, but a "strong" influence?

8. There was some confusion with conditional probabilities that made its way into the reports as well. From the Telegraph:

The results showed that a much higher proportion of those with the efficient (long-long) version of the gene were either very satisfied (35 per cent) or satisfied (34 per cent) with their life - compared to 19 per cent in both categories for those with the less efficient (short-short) form.

After looking at the articles carefully and having an email exchange with De Neve, I can assure you that the above quote is indeed wrong, which is really too bad because it was an attempted correction of an earlier mistake. The correct numbers are not 35, 34, 19, 19. Rather, they are 41, 46, 37, 44. A much less dramatic difference: changes of 4% and 2% rather than 18% and 15%. The Telegraph reporter was giving P(gene|happiness) rather than P(happiness|gene). What seems to have happened is that he misread Figure 2 in the Human Genetics paper. He then may have got stuck on the wrong track by expecting to see a difference of 17%.

9. The abstract for the Human Genetics paper reports a p-value of 0.01. But the baseline model (Model 1 in Table V of the Econometrica paper) reports a p-value of 0.02. The lower p-values are obtained by models that control for a big pile of intermediate outcomes.

10. In section 3 of the Econometrica paper, they compare identical to fraternal twins (from the Adolescent Health survey, it appears) and estimate that 33% of the variation in reported life satisfaction is explained by genes. As they say, this is roughly consistent with estimates of 50% or so from the literature. I bet their 33% has a big standard error, though: one clue is that the difference in correlations between identical and fraternal twins is barely statistically significant (at the 0.03 level, or, as they quaintly put it, 0.032). They also estimate 0% of the variation to be due to common environment, but again that 0% is gonna be a point estimate with a huge standard error.

I'm not saying that their twin analysis is wrong. To me the point of these estimates is to show that the Adolescent Health data are consistent with the literature on genes and happiness, thus supporting the decision to move on with the rest of their study. I don't take their point estimates of 33% and 0% seriously but it's good to know that the twin results go in the expected direction.

11. One thing that puzzles me is why De Neve et al. only studied one gene. I understand that this is the gene that they expected to relate to happiness and life satisfaction, but . . . given that it only explains 1% of the variation, there must be hundreds or thousands of genes involved. Why not look at lots and lots? At the very least, the distribution of estimates over a large sample of genes would give some sense of the variation that might be expected. I can't see the point of looking at just one gene, unless cost is a concern. Are other gene variants already recorded for the Adolescent Health and Framingham participants?

12. My struggles (and the news reporters' larger struggles) with the numbers in these articles makes me feel, even more strongly than before, the need for a suite of statistical methods for building from simple comparisons to more complicated regressions. (In case you're reading this, Bob and Matt3, I'm talking about the network of models.)

As researchers, transparency should be our goal. This is sometimes hindered by scientific journals' policies of brevity. You can end up having to remove lots of the details that make a result understandable.

13. De Neve concludes the Human Genetics article as follows:

There is no single ''happiness gene.' Instead, there is likely to be a set of genes whose expression, in combination with environmental factors, influences subjective well-being.

I would go even further. Accepting their claim that between one-third and one-half of the variation in happiness and life satisfaction is determined by genes, and accepting their estimate that this one gene explains as much as 1% of the variation, and considering that this gene was their #1 candidate (or at least a top contender) for the "happiness gene" . . . my guess is that the set of genes that influence subjective well-being is a very large number indeed! The above disclaimer doesn't seem disclaimery-enough to me, in that it seems to leave open the possibility that this "set of genes" might be just three or four. Hundreds or thousands seems more like it.

I'm reminded of the recent analysis that found that the simple approach of predicting child's height using a regression model given parents' average height performs much better than a method based on combining 54 genes.

14. Again, I'm not trying to present this as any sort of debunking, merely trying to fit these claims in with the rest of my understanding. I think it's great when social scientists and public health researchers can work together on this sort of study. I'm sure that in a couple of decades we'll have a much better understanding of genes and subjective well-being, but you have to start somewhere. This is a clean study that can be the basis for future research.

Hmmm . . . .could I publish this as a letter in the Journal of Human Genetics? Probably not, unfortunately.

P.S. You could do this all yourself! This and my earlier blog on the happiness gene study required no special knowledge of subject matter or statistics. All I did was tenaciously follow the numbers and pull and pull until I could see where all the claims were coming from. A statistics student, or even a journalist with a few spare hours, could do just as well. (Why I had a few spare hours to do this is another question. The higher procrastination, I call it.) I probably could've done better with some prior knowledge--I know next to nothing about genetics and not much about happiness surveys either--but I could get pretty far just tracking down the statistics (and, as noted, without any goal of debunking or any need to make a grand statement).

P.P.S. See comments for further background from De Neve and Fowler!

A couple years ago we had an amazing all-star session at the Joint Statistical Meetings. The topic was new approaches to survey weighting (which is a mess, as I'm sure you've heard).

Xiao-Li Meng recommended shrinking weights by taking them to a fractional power (such as square root) instead of trimming the extremes.

Rod Little combined design-based and model-based survey inference.

Michael Elliott used mixture models for complex survey design.

And here's my introduction to the session.

John Johnson writes at the Statistics Forum.

Robert Birkelbach:

I am writing my Bachelor Thesis in which I want to assess the reading competencies of German elementary school children using the PIRLS2006 data. My levels are classrooms and the individuals. However, my dependent variable is a multiple imputed (m=5) reading test. The problem I have is, that I do not know, whether I can just calculate 5 linear multilevel models and then average all the results (the coefficients, standard deviation, bic, intra class correlation, R2, t-statistics, p-values etc) or if I need different formulas for integrating the results of the five models into one because it is a multilevel analysis? Do you think there's a better way in solving my problem? I would greatly appreciate if you could help me with a problem regarding my analysis -- I am quite a newbie to multilevel modeling and especially to multiple imputation. Also: Is it okay to use frequentist models when the multiple imputation was done bayesian? Would the different philosophies of scientific testing contradict each other?

My reply:

I receommend doing 5 separate analyses, pushing them all the way thru to the end, then combining them using the combining-imputation rules given in the imputatoin chapter of our book. Everything should go fine.

I took the above headline from a news article in the (London) Independent by Jeremy Laurance reporting a study by Jan-Emmanuel De Neve, James Fowler, and Bruno Frey that reportedly just appeared in the Journal of Human Genetics.

One of the pleasures of blogging is that I can go beyond the usual journalistic approaches to such a story: (a) puffing it, (b) debunking it, (c) reporting it completely flatly. Even convex combinations of (a), (b), (c) do not allow what I'd like to do, which is to explore the claims and follow wherever my exploration takes me. (And one of the pleasures of building my own audience is that I don't need to endlessly explain background detail as was needed on a general-public site such as 538.)

OK, back to the genetic secret of a happy life. Or, in the words the authors of the study, a gene that "explains less than one percent of the variation in life satisfaction."

"The genetic secret" or "less than one percent of the variation"?

Perhaps the secret of a happy life is in that one percent??

I can't find a link to the journal article which appears based on the listing on De Neve's webpage to be single-authored, but I did find this Googledocs link to a technical report from January 2010 that seems to have all the content. Regular readers of this blog will be familiar with earlier interesting research of Fowler and Frey working separately; I had no idea that they have been collaborating.

De Neve et al. took responses to a question on life satisfaction from a survey that was linked to genetic samples. They looked at a gene called 5HTT which, according to their literature review, has been believed to be associated with happy feelings.

I haven't taken a biology class since 9th grade, so I'll give a simplified version of the genetics. You can have either 0, 1, or 2 alleles of the gene in question. Of the people in the sample, 20% have 0 alleles, 45% have 1 allele, and 35% have 2. The more alleles you have, the happier you'll be (on average): The percentage of respondents describing themselves as "very satisfied" with their lives is 37% for people with 0 alleles, 38% for those with one allele, and 41% for those with two alleles.

The key comparison here comes from the two extremes: 2 alleles vs. 0. People with 2 alleles are 4 percentage points (more precisely, 3.6 percentage points) more likely to report themselves as very satisfied with their lives. The standard error of this difference in proportions is sqrt(.41*(1-.41)/862+.37*(1-.37)/509) = 0.027, so the difference is not statistically significant at a conventional level.

But in their abstract, De Neve et al. reported the following:

Having one or two allleles . . . raises the average likelihood of being very satisfied with one's life by 8.5% and 17.3%, respectively?

How did they get from a non-significant difference of 4% (I can't bring myself to write "3.6%" given my aversion to fractional percentage points) to a statistically significant 17.3%?

A few numbers that I can't figure out at all!

Here's the summary from Stephen Adams, medical correspondent of the Daily Telegraph:

The researchers found that 69 per cent of people who had two copies of the gene said they were either satisfied (34) or very satisfied (35) with their life as a whole.

But among those who had no copy of the gene, the proportion who gave either of these answers was only 38 per cent (19 per cent 'very satisfied' and 19 per cent 'satisfied').

This leaves me even more confused! According to the table on page 21 of the De Neve et al. article, 46% of people who had two copies of the gene described themselves as satisfied and 41% described themselves as very satisfied. The corresponding percentages for those with no copies were 44% and 37%.

I suppose the most likely explanation is that Stephen Adams just made a mistake, but it's no ordinary confusion because his numbers are so specific. Then again, I could just be missing something big here. I'll email Fowler for clarification but I'll post this for now so you loyal blog readers can see error correction (of one sort or another) in real time.

Where did the 17% come from?

OK, so setting Stephen Adams aside, how can we get from a non-significant 4% to a significant 17%?

- My first try is to use the numerical life-satisfaction measure. Average satisfaction on a 1-5 scale is 4.09 for the 0-allele people in this sample and 4.25 for the 1-allele people, and the difference has a standard error of 0.05. Hey--a difference of 0.16 with a standard error of 0.05--that's statistically significant! So it doesn't seem just like a fluctuation in the data.

- The main analysis of De Neve et al., reported in their Table 1, appears to be a least-squares regression of well-being (on that 1-5) scale, using the number of alleles as a predictor and also throwing in some controls for ethnicity, sex, age, and some other variables. They include error terms for individuals and families but don't seem to report the relative sizes of the errors. In any case, the controls don't seem to do much. Their basic result (Model 1, not controlling for variables such as marital status which might be considered as intermediate outcomes of the gene) yields a coefficient estimate of 0.06.

They then write, "we summarize the results for 5HTT by simulating first differences from the coefficient covariance matrix of Model 1. Holding all else constant and changing the 5HTT gene of all subjects from zero to one long allele would increase the reporting of being very satisfied with one's life in this population by about 8.5%." Huh? I completely don't understand this. It looks to me that the analyses in Table 1 are regressions on the 1-5 scale. So how can they transfer these to claims about "the reporting of being very satisfied"? Also, if it's just least squares, why do they need to work with the covariance matrix? Why can't they just look at the coefficient itself?

- They report (in Table 5) that whites have higher life satisfaction responses than blacks but lower numbers of alleles, on average. So controlling for ethnicity should increase the coefficient. I still can't see it going all the way from 4% to 17%. But maybe this is just a poverty of my intuition.

- OK, I'm still confused and have no idea where the 17% could be coming from. All I can think of is that the difference between 0 alleles and 2 alleles corresponds to an average difference of 0.16 in happiness on that 1-5 scale. And 0.16 is practically 17%, so maybe when you control for things the number jumps around a bit. Perhaps the result of their "first difference" calculations was somehow to carry that 0.16 or 0.17 and attribute it to the "very satisfied" category?

1% of variance explained

One more thing . . . that 1% quote. Remember? "the 5HTT gene explains less than one percent of the variation in life satisfaction." This is from page 14 of the De Neve, Fowler, and Frey article. 1%? How can we understand this?

Let's do a quick variance calculation:

- Mean and sd of life satisfaction responses (on the 1-5 scale) among people with 0 alleles: 4.09 and 0.8
- Mean and sd of life satisfaction responses (on the 1-5 scale) among people with 2 alleles: 4.25 and 0.8
- The difference is 0.16 so the explained variance is (0.16/2)^2 = 0.08^2
- Finally, R-squared is explained variance divided by total variance: (0.08/0.8)^2 = 0.01.

A difference of 0.16 on a 1-5 scale ain't nothing (it's approximately the same as the average difference in life satisfaction, comparing whites and blacks), especially given that most people are in the 4 and 5 categories. But it only represents 1% of the variance in the data. It's hard for me to hold these two facts in my head at the same time. The quick answer is that the denominator of the R-squared--the 0.8--contains lots of individual variation, including variation in the survey response. Still, 1% is such a small number. No surprise it didn't make it into the newspaper headline . . .

Here's another story of R-squared = 1%. Consider a 0/1 outcome with about half the people in each category. For.example, half the people with some disease die in a year and half live. Now suppose there's a treatment that increases survival rate from 50% to 60%. The unexplained sd is 0.5 and the explained sd is 0.05, hence R-squared is, again, 0.01.

Summary (for now):

I don't know where the 17% came from. I'll email James Fowler and see what he says. I'm also wondering about that Daily Telegraph article but it's usually not so easy to reach newspaper journalists so I'll let that one go for now.

P.S. According to his website, Fowler was named the most original thinker of the year by The McLaughlin Group. On the other hand, our sister blog won an award by the same organization that honored Peggy Noonan. So I'd call that a tie!

P.P.S. Their data come from the National Survey of Adolescent Health, which for some reason is officially called "Add Health." Shouldn't that be "Ad Health" or maybe "Ado Health"? I'm confused where the extra "d" is coming from.

P.P.P.S. De Neve et al. note that the survey did not actually ask about happiness, only about life satisfaction. We all know people who appear satisfied with their lives but don't seem so happy, but the presumption is that, in general, things associated with more life satisfaction are also associated with happiness. The authors also remark upon the limitations using a sample of adolescents to study life satisfaction. Not their fault--as is appropriate, they use the data they have and then discuss the limitations of their analysis.

P.P.P.P.S. De Neve and Fowler have a related paper with a nice direct title, "The MAOA Gene Predicts Credit Card Debt." This one, also from Add Health, reports: "Having one or both MAOA alleles of the low efficiency type raises the average likelihood of having credit card debt by 14%." For some reason I was having difficulty downloading the pdf file (sorry, I have a Windows machine!) so I don't know how to interpret the 14%. I don't know if they've looked at credit card debt and life satisfaction together. Being in debt seems unsatisfying; on the other hand you could go in debt to buy things that give you satisfaction, so it's not clear to me what to expect here.

P.P.P.P.P.S. I'm glad Don Rubin didn't read the above-linked article. Footnote 9 would probably make him barf.

P.P.P.P.P.P.S. Just to be clear: The above is not intended to be a "debunking" of the research of De Neve, Fowler, and Frey. It's certainly plausible that this gene could be linked to reported life satisfaction (maybe, for example, it influences the way that people respond to survey questions). I'm just trying to figure out what's going on, and, as a statistician, it's natural for me to start with the numbers.

P.^7S. James Fowler explains some of the confusion in a long comment.

Bechdel wasn't kidding

Regular readers of this blog know about the Bechdel test for movies:

1. It has to have at least two women in it
2. Who talk to each other
3. About something besides a man

Amusing, huh? But I only really got the point the other day, when I was on a plane and passively watched parts of the in-flight movie. It was something I'd never heard of (of course) and it happened to be a chick flick--even without the soundtrack, it was clear that the main character was a woman and much of it was about her love life. But even this movie failed the Bechdel test miserably! I don't even think it passed item #1 above, but if it did, it certainly failed #2.

If even the chick flicks are failing the Bechdel test, then, yeah, we're really in trouble. And don't get me started on those old Warner Brothers cartoons. They're great but they feature about as many female characters as the average WWII submarine. Sure, everybody knows this, but it's still striking to think about just how unbalanced these things are.

Howard Wainer writes in the Statistics Forum:

The Chinese scientific literature is rarely read or cited outside of China. But the authors of this work are usually knowledgeable of the non-Chinese literature -- at least the A-list journals. And so they too try to replicate the alpha finding. But do they? One would think that they would find the same diminished effect size, but they don't! Instead they replicate the original result, even larger. Here's one of the graphs:

How did this happen?

Full story here.

Another stereotype demolished

I've heard from various sources that when you give a talk in an econ dept that they eat you alive: typically the audience showers you with questions and you are lucky to get past the second slide in your presentation. So far, though, I've given seminar talks in three economics departments--George Mason University a few years ago, Sciences Po last year, and Hunter College yesterday--and all three times the audiences have been completely normal. They did not interrupt unduly and they asked a bunch of good questions at the end. n=3, sure. But still.

Shocking but not surprising

Much-honored playwright Tony Kushner was set to receive one more honor--a degree from John Jay College--but it was suddenly taken away from him on an 11-1 vote of the trustees of the City University of New York. This was the first rejection of an honorary degree nomination since 1961.

The news article focuses on one trustee, Jeffrey Wiesenfeld, an investment adviser and onetime political aide, who opposed Kushner's honorary degree, but to me the relevant point is that the committee as a whole voted 11-1 to ding him.

Kusnher said, "I'm sickened," he added, "that this is happening in New York City. Shocked, really." I can see why he's shocked, but perhaps it's not so surprising that it's happening in NYC. Recall the famous incident from 1940 in which Bertrand Russell was invited and then uninvited to teach at City College. The problem that time was Russell's views on free love (as they called it back then). There seems to be a long tradition of city college officials being willing to risk controversy to make a political point.

P.S. I was trying to imagine what these 11 trustees could've been thinking . . . my guess is it was some sort of group-dynamics thing. They started talking about it and convinced each other that the best thing to do would be to set Kushner's nomination aside. I bet if they'd had to decide separately most of them wouldn't have come to this conclusion. And I wouldn't be surprised if, five minutes after walking away from that meeting, most of those board members suddenly thought, Uh oh--we screwed up on this one! As cognitive psychologists have found, this is one of the problems with small-group deliberation: a group of people can be led to a decision which is not anywhere near the center of their positions considered separately.

A statistician rereads Bill James

Ben Lindbergh invited me to write an article for Baseball Prospectus. I first sent him this item on the differences between baseball and politics but he said it was too political for them. I then sent him this review of a book on baseball's greatest fielders but he said they already had someone slotted to review that book. Then I sent him some reflections on the great Bill James and he published it! If anybody out there knows Bill James, please send this on to him: I have some questions at the end that I'm curious about.

Here's how it begins:

A friend asks the above question and writes:

This article left me thinking - how could the IRS not notice that this guy didn't file taxes for several years? Don't they run checks and notice if you miss a year? If I write a check our of order, there's an asterisk next to the check number in my next bank statement showing that there was a gap in the sequence.

If you ran the IRS, wouldn't you do this: SSNs are issued sequentially. Once a SSN reaches 18, expect it to file a return. If it doesn't, mail out a postage paid letter asking why not with check boxes such as Student, Unemployed, etc. Follow up at reasonable intervals. Eventually every SSN should be filing a return, or have an international address. Yes this is intrusive, but my goal is only to maximize tax revenue. Surely people who do this for a living could come up with something more elegant.

My response:

I dunno, maybe some confidentiality rules? The other thing is that I'm guessing that IRS gets lots of pushback when they hassle rich and influential people. I'm sure it's much less effort for them to go after the little guy, which is less cost effective. And behind this is a lack of societal consensus that the IRS are good guys. They're enforcing a law that something like a third of the people oppose! But I agree: given that we need taxes, I think we should go after the cheats.

Perhaps some informed readers out there can supply more context.

Of beauty, sex, and power: Statistical challenges in estimating small effects.

Thurs 5 May at 11am at Roosevelt House, at 47-49 East 65th Street (north side of East 65th street, between Park and Madison Avenues).

Whassup with glm()?

We're having problem with starting values in glm(). A very simple logistic regression with just an intercept with a very simple starting value (beta=5) blows up.

Statistics ethics question

A graduate student in public health writes:

I have been asked to do the statistical analysis for a medical unit that is delivering a pilot study of a program to [details redacted to prevent identification]. They are using a prospective, nonrandomized, cohort-controlled trial study design.

The investigator thinks they can recruit only a small number of treatment and control cases, maybe less than 30 in total. After I told the Investigator that I cannot do anything statistically with a sample size that small, he responded that small sample sizes are common in this field, and he send me an example of analysis that someone had done on a similar study.

So he still wants me to come up with a statistical plan. Is it unethical for me to do anything other than descriptive statistics? I think he should just stick to qualitative research. But the study she mentions above has 40 subjects and apparently had enough power to detect some effects. This is a pilot study after all so the n does not have to be large. It's not randomized though so I would think it would need a larger n because of the weak design.

My reply:

My first, general, recommendation is that it always makes sense to talk with any person as if he is completely ethical. If he is ethical, this is a good idea, and if he is not, you don't want him to think you think badly of him. If you are worried about a serious ethical problem, you can ask about it by saying something like, "From the outside, this could look pretty bad. An outsider, seeing this plan, might think we are being dishonest etc. etc." That way you can express this view without it being personal. And maybe your colleague has a good answer, which he can tell you.

To get to your specific question, there is really no such thing as a minimum acceptable sample size. You can get statistical significance with n=5 if your signal is strong enough.

Generally, though, the purpose of a pilot study is not to get statistical significance but rather to get experience with the intervention and the measurements. It's ok to do a pilot analysis, recognizing that it probably won't reach statistical significance. Also, regardless of sample size, qualitative analysis is appropriate and necessary in any pilot study.

Finally, of course they should not imply that they can collect a larger sample size than they can actually do.

Chris Masse writes:

I know you hate the topic, but during this debate (discussing both sides), they were issues raised that are of interest of your science.

Actually I just don't have the patience to watch videos. But I'll forward it on to the rest of you. I've already posted my thoughts on the matter here. ESP is certainly something that a lot of people want to be true.

John Sides followed up on a discussion of his earlier claim that political independents vote for president in a reasonable way based on economic performance. John's original post led to the amazing claim by New Republic writer Jonathan Chait that John wouldn't "even want to be friends with anybody who" voted in this manner.

I've been sensitive to discussions of rationality and voting ever since Aaron Edlin, Noah Kaplan, and I wrote our article on voting as a rational choice: why and how people vote to improve the well-being of others.

Models of rationality are controversial In politics, just as they are in other fields ranging from economics to criminology. On one side you have people trying to argue that all behavior is rational, from lottery playing to drug addiction to engaging in email with exiled Nigerian royalty. Probably the only behavior that nobody has yet to claim is rational is blogging, but I bet that's coming too. From the other direction, lots of people point to strong evidence of subject matter ignorance in all fields ranging from demography to the Federal budget to demonstrate that, even if voters think they're being rational, they can't be making reasoned decisions in any clear senses.

Here's what I want to add. In the usual debates, people argue about whether a behavior is rational or not. Or, at a more sophisticated level, people might dispute how rational or irrational a given action is. But I don't think this is the right way of thinking about it.

People have many overlapping reasons for anything they do. For a behavior to be "rational" does not mean that a person does it as the result of a reasoned argument but rather that some aspects of that behavior could be modeled as such. This comes up in section 5.2 of my article with Edlin and Kaplan: To model a behavior as rational does not compete with more traditional psychological explanations; it reinforces them.

For example, voter turnout is higher in elections that are anticipated to be close. This has a rational explanation---if an election is close, it's more likely that you will cast the deciding vote--and also a process explanation: if an election is close, candidates will campaign harder, more people will talk about the election, and a voter is more likely to want to be part of the big stories. These two explanations work together, they don't compete: it's rational for you to vote, and it's also rational for the campaigns to try to get you to vote, to make the race more interesting to increase your motivation level.

I don't anticipate that this note will resolve some of the debates about participation of independents in politics but I hope that this clarifies some of the concerns about the "rationality" label.

P.S. John is better at engaging journalists than I am. When Chait wrote something that I didn't like and then responded to my response, I grabbed on a key point in his response and emphasized our agreement, thus ending the debate (such as it was), rather than emphasizing our remaining points of disagreement. John is better at keeping the discussion alive.

Peter Huber's most famous work derives from his paper on robust statistics published nearly fifty years ago in which he introduced the concept of M-estimation (a generalization of maximum likelihood) to unify some ideas of Tukey and others for estimation procedures that were relatively insensitive to small departures from the assumed model.

Huber has in many ways been ahead of his time. While remaining connected to the theoretical ideas from the early part of his career, his interests have shifted to computational and graphical statistics. I never took Huber's class on data analysis--he left Harvard while I was still in graduate school--but fortunately I have an opportunity to learn his lessons now, as he has just released a book, "Data Analysis: What Can Be Learned from the Past 50 Years."

The book puts together a few articles published in the past 15 years, along with some new material. Many of the examples are decades old, which is appropriate given that Huber is reviewing fifty years of the development of his ideas. (I used to be impatient with statistics books that were full of dead examples but then I started to realize this was happening to me! The 8 schools experiments are almost 35 years old. The Electric Company is 40. The chicken brains are over 20 years old. The radon study is 15 years old, the data from the redistricting study are from the 1960s and 1970s, and so on. And of course even my more recent examples are getting older at the rate of one year per year and don't keep so well once they're out of the fridge. So at this point in my career I'd like to make a virtue of necessity and say that it's just fine to work with old examples that we really understand.

OK. As noted, Huber is modern--a follower of Tukey--in his treatment of computing and graphics as central to the statistical enterprise. His ISP software is R-like (as we would say now; of course ISP came first), and the principle of interactivity was important. He also has worked on various graphical methods for data exploration and dimension reduction; although I have not used these programs myself, I view them as close in spirit to the graphical tools that we now use to explore our data in the context of our fitted models.

Right now, data analysis seems dominated by three approaches:
- Machine learning
- Bayes
- Graphical exploratory data analysis
with some overlap, of course.

Many other statistical approaches/methods exist (e.g., time series/spatial, generalized estimating equations, nonparametrics, even some old-fashioned extensions of Fisher, Neyman, and Pearson), but they seem more along the lines of closed approaches to "inference" rather than open-ended tools for "data analysis."

I like Huber's pluralistic perspective, which ranges from contamination models to object-oriented programming, from geophysics to data cleaning. His is not a book to turn to for specific advice; rather, I enjoyed reading his thoughts on a variety of statistical issues and reflecting upon the connections between Huber's strategies for data analysis and his better-known theoretical work.

Huber writes:

Too much emphasis is put on futile attempts to automate non-routine tasks, and not enough effort is spent on facilitating routine work.

I really like this quote and would take it a step further: If a statistical method can be routinized it can be used much more often and its limitations better understood.

Huber also writes:

The interpretation of the results of goodness-of-fit tests must rely on judgment of content rather than on P-values.

This perspective is commonplace today but, as Huber writes, "for a traditional mathematical statistician, the implied primacy of judgment over mathematical proof and over statistical significance clearly goes against the grain." The next question is where the judgment comes from. One answer is that an experienced statistician might work on a few hundred applied problems during his or her career, and that will impart some judgment. But what advice can we give to people without such a personal history? My approach has been to impart as much of the lessons I have learned into methods in my books, but Huber is surely right that any collection of specific instructions will miss something.

It is an occupational hazard of all scholars to have an incomplete perspective on work outside their own subfield. For example, section 5.2 of the book in question contains the following disturbing (to me) claim: "Bayesian statistics lacks a mechanism for assessing goodness-of-fit in absolute terms. . . . Within orthodox Bayesian statistics, we cannot even address the question whether a model Mi, under consideration at stage i of the investigation, is consonant with the data y."

Huh? Huh? Also please see chapter 6 of Bayesian Data Analysis and my article, "A Bayesian formulation of exploratory data analysis and goodness-of-fit testing," which appeared in the International Statistical Review in 2003. (Huber's chapter 5 was written in 2000 so too soon for my 2003 paper, but the first edition of our book and our paper on posterior predictive checks had already appeared several years before.)

Just to be clear: I'm not faulting Huber for not citing my work. The statistics literature is huge and ever-expanding. It's just unfortunate that such a basic misunderstanding--the idea that Bayesians can't check their models--persists.

I like what Huber writes about approximately specified models, and I think he'd be very comfortable with our formulation of Bayesian data analysis, from the very first page of our book, as comprising three steps: (1) Model building, (2) Inference, (3) Model checking. Step 3 is crucial to making steps 1 and 2 work. Statisticians have written a lot about the problems with inference in a world in which models are tested--and that's fine, such biases are a worthy topic of study--but consider the alternative, in which models were fit without ever being checked. This would be horrible indeed.

Here's a quote that is all too true (from section 5.7, following a long and interesting discussion of a decomposition of a time series in physics):

For some parts of the model (usually the less interesting ones) we may have an abundance of degrees of freedom, and a scarcity for the interesting parts.

This reminds me of a conversation I've had with Don Rubin in the context of several different examples. Like many (most?) statisticians, my tendency is to try to model the data. Don, in contrast, prefers to set up a model that matches what the scientists in the particular field of application are studying. He doesn't worry so much about fit to the data and doesn't do much graphing. For example, the schizophrenics' reaction time example (featured in the mixture-modeling chapter of Bayesian Data Analysis), we used the model Don recommended of a mixture of normal distributions with a fixed lag between them. Looking at the data and thinking about the phenomenon, a fixed lag didn't make sense to me, but Don emphasized that the psychology researchers were interested in an average difference and so it didn't make sense in his perspective to try to do any further modeling on these data. He said that if we wanted to model the variation of the lag, that would be fine but it would make sense to gather more data rather than knocking ourselves out on this particular small data set. In a field such as international relations, this get-more-data approach might not work, but in experimental psychology it seems like a good idea. (And I have to admit that I have not at all kept up with whatever research has been done in eye-tracking and schizophrenia in the past twenty years.)

This all reminds me of another story from when I was teaching at Berkeley. Phil Price and I had two students working with us on hierarchical modeling to estimate the distributions of home radon in U.S. counties. One day, one of the students simply quit. Why? He said he just wasn't comfortable with the Bayesian approach. I was used to the old-style Berkeley environment so just accepted it: the kid had been indoctrinated and I didn't have the energy to try to unbrainwash him. But Phil was curious, having just completed a cross-validation demonstrating how well Bayes was working in this example, Phil asked the student what he would do in our problem instead of a hierarchical model: if the student had a better idea, Phil said, we'd be happy to test it out. The student thought for a moment and said, well, I suppose Professor X (one of my colleagues from down the hall at the time) would say that the solution is to gather more data. At this point Phil blew up. Gather more data! We already have measurements from 80,000 houses! Could you tell us how many more measurements you think you'd need? The student had no answer to this but remained steadfast in his discomfort with the idea of performing statistical inference using conditional probability.

I think that student, and others like him, would benefit from reading Huber's book and realizing that even a deep theoretician saw the need for using a diversity of statistical methods.

Also relevant to those who worship the supposed purity of likelihoods, or permutation tests, or whatever, is this line from Huber's book:

A statistician rarely sees the raw data themselves--most large data collections in the sciences are being heavily preprocessed already in the collection stage, and the scientists not only tend to forget to mention it, but sometimes they also forget exactly what they had done.

We're often torn between modeling the raw raw data or modeling the processed data. The latter choice can throw away important information but has the advantage, not only of computational convenience but also, sometimes, conceptual simplicity: processed data are typically closer to the form of the scientific concepts being modeled. For example, an economist might prefer to analyze some sort of preprocessed price data rather than data on individual transactions. Sure, there's information in the transactions but, depending on the context of the analysis, this behavioral story might distract from the more immediate goals of the economist. Other times, though, the only way to solve a problem is to go back to the raw data, and Huber provides several such examples in his book.

I will conclude with a discussion of a couple of Huber's examples that overlap with my own applied research.

Radon. In section 3.8, Huber writes:

We found (through exploratory data analysis of a large environmental data set) that very high radon levels were tightly localized and occurred in houses sitting on the locations of old mine shafts. . . . The issue here is one of "data mining" in the sense of looking for a rare nugget, not one of looking, like a traditional statistician, "for a central tendency, a measure of variability, measures of pairwise association between a number of variables." Random samples would have been useless, too: either one would have missed the exceptional values altogether, or one would have thrown them out as outliers.

I'm not so sure. Our radon research was based on two random samples, one of which, as noted above, included 80,000 houses. I agree that if you have a nonrandom sample of a million houses, it's a good idea to use it for some exploratory analysis, so I'm not at all knocking what Huber has done, but I think he's a bit too quick to dismiss random samples as "useless." Also, I don't buy his claim that extreme values, if found, would've been discarded as outliers. The point about outliers is that you look at them, you don't just throw them out!

Aggregation. In chapter 6, Huber deplores that not enough attention is devoted to Simpson's paradox. But then he demonstrates the idea with two fake-data examples. If a problem is important, I think it should be important enough to appear in real data. I recommend our Red State Blue State article for starters.

Survey data. In section 7.2, Huber analyzes data from a small survey of the opinions of jurors. When I looked at the list of survey items, I immediately thought of how I would reverse some of the scales to put everything in the same direction (this is basic textbook advice). Huber ends up doing this too, but only after performing a singular value decomposition. That's fine but in general I'd recommend doing all the easy scalings first so the statistical method has a chance to discover something new. More generally, methods such as singular value decomposition and principal components analyses have their limitations--they can work fine for balanced data such as in this example but in more complicated problems I'd go with item-response or ideal-point models. In general I prefer approaches based on models rather than algorithms: when a model goes wrong I can look for the assumption that was violated, whereas when an algorithm spits out a result that doesn't make sense, I'm not always sure how to proceed. This may be a matter of taste or emphasis more than anything else; see my discussion on Tukey's philosophy.

The next example in Huber's book is the problem of reconstructing maps. I think he'd be interested in the work of Josh Tenenbaum and his collaborators on learning structured models such as maps and trees. Multidimensional scaling is fine--Huber gives a couple of references from 1970--but we can do a lot more now!

In conclusion, I found Huber's book to be enjoyable and thought provoking. It's good to have a sense of what a prominent theoretical statistician thinks about applied statistics.

Is that what she said?

Eric Booth cozies up to this article by Chloe Kiddon and Yuriy Brun (software here). I think they make their point in a gentle yet forceful manner.

I was invited by the Columbia University residence halls to speak at an event on gay marriage. (I've assisted my colleagues Jeff Lax and Justin Phillips in their research on the topic.) The event sounded fun--unfortunately I'll be out of town that weekend so can't make it--but it got me thinking about how gay marriage and other social issues are so relaxing to think about because there's no need for doubt.

About half of Americans support same-sex marriage and about half oppose it. And the funny thing is, you can be absolutely certain in your conviction, from either direction. If you support, it's a simple matter of human rights, and it's a bit ridiculous to suppose that if gay marriage is allowed, it will somehow wreck all the straight marriages out there. Conversely, you can oppose on the clear rationale of wanting to keep marriage the same as it's always been, and suggest that same-sex couples can be free to get together outside of marriage, as they always could. (Hey, it was good enough for Abraham Lincoln and his law partner!)

In contrast, the difficulty of expressing opinions about the economy, or about foreign policy, is that you have to realize at some level that you might be wrong.

For example, even Paul Krugman must occasionally wonder whether maybe the U.S. can't really afford another trillion dollars of debt, and even William Beach (he of the 2.8% unemployment rate forecast, later updated to a still-implausible point forecast of 4.3%) must occasionally wonder whether massive budget cuts will really send the economy into nirvana.

Similarly, even John McCain must wonder on occasion whether it would've been better to withdraw from Iraq in 2003, or 2004, or 2005. And even a firm opponent of the war such as the Barack Obama of early 2008 must have occasionally thought that maybe the invasion wasn't such a bad idea on balance.

I don't really have anything more to say on this. I just think it's interesting how there can be so much more feeling of certainty about social policy.

Data mining and allergies

With all this data floating around, there are some interesting analyses one can do. I came across "The Association of Tree Pollen Concentration Peaks and Allergy Medication Sales in New York City: 2003-2008" by Perry Sheffield. There they correlate pollen counts with anti-allergy medicine sales - and indeed find that two days after high pollen counts, the medicine sales are the highest.

pollen.png

Of course, it would be interesting to play with the data to see *what* tree is actually causing the sales to increase the most. Perhaps this would help the arborists what trees to plant. At the moment they seem to be following a rather sexist approach to tree planting:


Ogren says the city could solve the problem by planting only female trees, which don't produce pollen like male trees do.

City arborists shy away from females because many produce messy - or in the case of ginkgos, smelly - fruit that litters sidewalks.

In Ogren's opinion, that's a mistake. He says the females only produce fruit because they are pollinated by the males.

His theory: no males, no pollen, no fruit, no allergies.


Follow the discussion (originated by Mike Jordan) at the Statistics Forum.

Zero is zero

Nathan Roseberry writes:

I thought I had read on your blog that bar charts should always include zero on the scale, but a search of your blog (or google) didn't return what I was looking for. Is it considered a best practice to always include zero on the axis for bar charts? Has this been written in a book?

My reply:

The idea is that the area of the bar represents "how many" or "how much." The bar has to go down to 0 for that to work. You don't have to have your y-axis go to zero, but if you want the axis to go anywhere else, don't use a bar graph, use a line graph. Usually line graphs are better anyway.

I'm sure this is all in a book somewhere.

Asymmetry in Political Bias

Tyler Cowen points to an article by Riccardo Puglisi, who writes:

Controlling for the activity of the incumbent president and the U.S. Congress across issues, I find that during a presidential campaign, The New York Times gives more emphasis to topics on which the Democratic party is perceived as more competent (civil rights, health care, labor and social welfare) when the incumbent president is a Republican. This is consistent with the hypothesis that The New York Times has a Democratic partisanship, with some "anti-incumbent" aspects . . . consistent with The New York Times departing from demand-driven news coverage.

I haven't read the article in the question but the claim seems plausible to me. I've often thought there is an asymmetry in media bias, with Democratic reporters--a survey a few years ago found that twice as many journalists identify as Democrats than as Republicans--biasing their reporting by choosing which topics to focus on, and Republican news organizations (notably Fox News and other Murdoch organizations) biasing in the other direction by flat-out attacks.

I've never been clear on which sort of bias is more effective. On one hand, Fox can create a media buzz out of nothing at all; on the other hand, perhaps there's something more insidious about objective news organizations indirectly creating bias by their choice of what to report.

But I've long thought that this asymmetry should inform how media bias is studied. It can't be a simple matter of counting stories or references to experts and saying that Fox is more biased or the Washington Post is more biases or whatever. Some of the previous studies in this area are interesting but to me don't get at either of the fundamental sorts of bias mentioned above. You have to look for bias in different ways to capture these multiple dimensions. Based on the abstract quoted above, Puglisi may be on to something, maybe this could be a useful start to getting to the big picture.

Hierarchical ordered logit or probit

Jeff writes:

How far off is bglmer and can it handle ordered logit or multinom logit?

My reply:

bglmer is very close. No ordered logit but I was just talking about it with Sophia today. My guess is that the easiest way to fit a hierarchical ordered logit or multinom logit will be to use stan. For right now I'd recommend using glmer/bglmer to fit the ordered logits in order (e.g., 1 vs. 2,3,4, then 2 vs. 3,4, then 3 vs. 4). Or maybe there's already a hierarchical multinomial logit in mcmcpack or somewhere?

"The ultimate left-wing novel"

Tyler Cowen asks what is the ultimate left-wing novel? He comes up with John Steinbeck and refers us to this list by Josh Leach that includes soclal-realist novels from around 1900. But Cowen is looking for something more "analytically or philosophically comprehensive."

My vote for the ultimate left-wing novel is 1984. The story and the political philosophy fit together well, and it's also widely read (which is an important part of being the "ultimate" novel of any sort, I think; it wouldn't do to choose something too obscure). Or maybe Gulliver's Travels, but I've never actually read that, so I don't know if it qualifies as being left-wing. Certainly you can't get much more political than 1984, and I don't think you can get much more left-wing either. (If you get any more left-wing than that, you start to loop around the circle and become right-wing. For example, I don't think that a novel extolling the brilliance of Stalin or Mao would be considered left-wing in a modern context.)

Native Son (also on Leach's list) seems like another good choice to me, but I'm sticking with 1984 as being more purely political. For something more recent you could consider something such as What a Carve Up by Jonathan Coe.

P.S. Cowen's correspondent wrote that "the book needs to do two things: justify the welfare state and argue the limitations of the invisible hand." But I don't see either of these as particularly left-wing. Unless you want to argue that Bismarck was a left-winger.

P.P.S. Commenters suggest Uncle Tom's Cabin and Les Miserables. Good choices: they're big novels, politically influential, and left-wing. There's probably stuff by Zola etc. too. I still stand by 1984. Orwell was left-wing and 1984 was his novel. I think the case for 1984 as a left-wing novel is pretty iron-clad.

GR_GraficFIN-web.jpg
This gets my vote for the worst statistical graphic I (Phil) have seen this year. If you've got a worse one, put a link in the comments. "Credit" for this one goes to "Peter and Maria Hoey (Source: Tommy McCall/Environmental Law Institute)."

My talk at Berkeley on Wednesday

Something on Applied Bayesian Statistics

April 27, 4:10-5 p.m., 1011 Evans Hall

I will deliver one of the following three talks:
1. Of beauty, sex, and power: Statistical challenges in estimating small effects
2. Why we (usually) don't worry about multiple comparisons
3. Parameterization and Bayesian modeling
Whoever shows up on time to the seminar gets to vote, and I'll give the talk that gets the most votes.

My talk at Stanford on Tuesday

Of Beauty, Sex, and Power: Statistical Challenges in Estimating Small Effects.

Tues 26 Apr, 12-1 in the Graham Stuart Lounge, 4th Floor, Encina West.

These are based on raw Pew data, reweighted to adjust for voter turnout by state, income, and ethnicity. No modeling of vote on age, education, and ethnicity.

edu.png

I think our future estimates based on the 9-way model will be better, but these are basically OK, I think. All but six of the dots in the graph are based on sample sizes greater than 30.

I published these last year but they're still relevant, I think. There's lots of confusion when it comes to education and voting.

My NOAA story

I recently learned we have some readers at the National Oceanic and Atmospheric Administration so I thought I'd share an old story.

About 35 years ago my brother worked briefly as a clerk at NOAA in their D.C. (or maybe it was D.C.-area) office. His job was to enter the weather numbers that came in. He had a boss who was very orderly. At one point there was a hurricane that wiped out some weather station in the Caribbean, and his boss told him to put in the numbers anyway. My brother protested that they didn't have the data, to which his boss replied: "I know what the numbers are."

Nowadays we call this sort of thing "imputation" and we like it. But not in the raw data! I bet nowadays they have an NA code.

Details here.

Arrow's other theorem

I received the following email from someone who'd like to remain anonymous:

Lately I [the anonymous correspondent] witnessed that Bruno Frey has published two articles in two well known referreed journals on the Titanic disaster that try to explain survival rates of passenger on board.

The articles were published in the Journal of Economic Perspectives and Rationality & Society. While looking up the name of the second journal where I stumbled across the article I even saw that they put the message in a third journal, the Proceedings of the National Academy of Sciences United States of America.

To say it in Sopranos like style - with all due respect, I know Bruno Frey from conferences, I really appreciate his take on economics as a social science and he has really published more interesting stuff that most economists ever will. But putting the same message into three journals gives me headaches for at least two reasons:

1) When building a track record and scientific reputation, it's publish or perish. What about young scholars that may have interesting stuff to say, but get rejected for (sometimes) obscure reasons, especially if you have innovative ideas that run against the mainstream. Meanwhile acceptance is granted to papers with identical messages in three journals that causes both congestion in the review procedures in biases acceptance, assuming that for two of three articles that are not entirely unique two other manuscripts will be rejected from an editorial point of view to preserve exclusivity by sticking to low or constant acceptance rates. Do you see this as a problem? Or is the main point against this argument that if the other papers would have the quality they would be published.

2) As an author one usually gets the question on "are the results published in another journal" (and therefore not original) or "is this paper under review in an another journal". In their case the answer should be no for both answers as they report different results and use different methods in every paper. But if you check the descriptive statistics in the papers, they are awkwardly similar. At what point do these questions and the content overlap that it really causes problems for authors? Have you ever heard about any stories about double publications that were not authorized reprints or translations in other languages (which usually should not be problematic, as shown by the way in Frey publication list) and had to be withdrawn? Barely happens I guess.

Best regards and thank you for providing an open forum to discuss stuff like that.

I followed the links and read the abstracts. The three papers do indeed seem to describe similar work. But the abstracts are in remarkably different styles. The Rationality and Society abstract is short and doesn't say much. The Journal of Economic Perspectives abstract is long with lots of detail but, oddly, no conclusions! This abstract has the form of a movie trailer: lots of explosions, lots of drama, but no revealing of the plot. Finally, here's the PNAS abstract, which tells us what they found:

To understand human behavior, it is important to know under what conditions people deviate from selfish rationality. This study explores the interaction of natural survival instincts and internalized social norms using data on the sinking of the Titanic and the Lusitania. We show that time pressure appears to be crucial when explaining behavior under extreme conditions of life and death. Even though the two vessels and the composition of their passengers were quite similar, the behavior of the individuals on board was dramatically different. On the Lusitania, selfish behavior dominated (which corresponds to the classical homo economicus); on the Titanic, social norms and social status (class) dominated, which contradicts standard economics. This difference could be attributed to the fact that the Lusitania sank in 18 min, creating a situation in which the short-run flight impulse dominated behavior. On the slowly sinking Titanic (2 h, 40 min), there was time for socially determined behavioral patterns to reemerge. Maritime disasters are traditionally not analyzed in a comparative manner with advanced statistical (econometric) techniques using individual data of the passengers and crew. Knowing human behavior under extreme conditions provides insight into how widely human behavior can vary, depending on differing external conditions.

Interesting. My only quibble here is with the phrase "selfish rationality," which comes up in the very first sentence. As Aaron Edlin, Noah Kaplan, and I have stressed, rationality doesn't have to imply selfishness, and selfishness doesn't have to imply rationality. One can achieve unselfish goals rationally. For example, if I decide not to go on a lifeboat, I can still work to keep the peace and to efficiently pack people onto existing lifeboat slots. I don't think this comment of mine affects the substance of the Frey et al. papers; it's just a slight change of emphasis.

Regarding the other question, of how could the same paper be published three times, my guess is that a paper on the Titanic can partly get published for its novelty value: even serious journals like to sometimes run articles on offbeat topics. I wouldn't be surprised if the editors of each journal thought: Hey, this is fun. We don't usually publish this sort of thing, but, hey, why not? And then it appeared, three times.

How did this happen? Arrow's theorem. Let me explain.

Handbook of Markov Chain Monte Carlo

Galin Jones, Steve Brooks, Xiao-Li Meng and I edited a handbook of Markov Chain Monte Carlo that has just been published. My chapter (with Kenny Shirley) is here, and it begins like this:

Convergence of Markov chain simulations can be monitored by measuring the diffusion and mixing of multiple independently-simulated chains, but different levels of convergence are appropriate for different goals. When considering inference from stochastic simulation, we need to separate two tasks: (1) inference about parameters and functions of parameters based on broad characteristics of their distribution, and (2) more precise computation of expectations and other functions of probability distributions. For the first task, there is a natural limit to precision beyond which additional simulations add essentially nothing; for the second task, the appropriate precision must be decided from external considerations. We illustrate with an example from our current research, a hierarchical model of trends in opinions on the death penalty in U.S. states.

To read all the other chapters, you'll have to buy the book!

One more time-use graph

Evan Hensleigh sens me this redesign of the cross-national time use graph:

excess_sleep_economist.png

Here was my version:

times3.png

And here was the original:

uglyassgraph.gif

Compared to my graph, Evan's has better fonts, and that's important--good fonts can make a display look professional. But I'm not sure about his other innovations. To me, the different colors for the different time-use categories are more of a distraction than a visual aid, and I also don't like how he made the bars fatter. As I noted in my earlier entry, to me this draws unwanted attention to the negative space between the bars. His country labels are slightly misaligned (particularly Japan and USA), and I really don't like his horizontal axis at all! He removed the units of hours and put + and - on the edges so that the axes run into each other. What was the point of that? It's bad news. Also I don't see any advantage at all to the prehensile tick marks. On the other hand, if Evgn and I were working together on such a graph, we would probably come up with something better than either of us would make alone.

Matthew Yglesias shares this graph from the Economist:

uglyassgraph.gif

I hate this graph. OK, sure, I don't hate hate hate hate it: it's not a 3-d exploding pie chart or anything. It's not misleading, it's just extremely difficult to read. Basically, you have to go back and forth between the colors and the labels and the countries and read it like a table. OK, so here's the table:

Average Hours Per Day Spent in Each Activity

           Work,   Unpaid  Eating, Personal
Country    study    work  sleeping   care   Leisure  Other

France       4        3       11       1       2       2
Germany      4        3       10       1       3       3
Japan        6        2       10       1       2       2
Britain      4        3       10       1       3       3
USA          5        3       10       1       3       2
Turkey       4        3       11       1       3       2

Hmm, that didn't work too well. Let's try subtracting the average from each column (for these six countries, the average (unweighted by population) time spent are 4.6 hours on paid work and study, 3.1 hours on unpaid work, 10.2 hours eating and sleeping, etc.):

% Excess Hours Per Day Spent in Each Activity
(compared to avg over all countries)

           Work,   Unpaid  Eating, Personal
Country    study    work  sleeping   care   Leisure  Other

France     -10%       0%    +10%    +50%     -20%    -20%
Germany    -10%       0%      0%    -10%     +10%    +20%
Japan      +40%     -20%      0%      0%     -20%      0%
Britain      0%       0%      0%    -10%     +10%    +10%
USA          0%       0%      0%    -20%     +10%      0%
Turkey     -10%      10%      0%    -20%     +10%    -10%

OK, the Japanese spent more time at work and the French spend more time grooming. Beyond that, I don't see these numbers as particularly "stereotype confirming" (in Yglesias's words). But I'm not fully up on my pop culture. What is the stereotype about Turkish people? I have the impression that in Dashiell Hammett's day they were called "Turks" and the detective was likely to be waylaid by one of them in a dark alley (this counts as "other activities," I believe), but I'm sure there are some new stereotypes I'm not aware of. Blogging counts as "unpaid work," right?

Anyway, my first thought was that the above ugly graph should be redone as a line plot, Here's what I came up with after an hour of work (yeah, yeah, I must have a lot of real work to do if I'm willing to put in this level of procrastination. On the upside, I'm pretty high on the procrastination ladder if I spend an hour on an R script as a way of taking a break!):

times.png Click to see the full-sized version.

I could've done this a little better--in particular, the text is hard to read--but it's basically what I was envisioning. [See P.S. below for something better.] Also, I don't really know what to make of the ordering of the countries or the ordering of the categories on the x-axis--I just copied what the Economist graph did.

Why do I like my display better? I like it because you can directly compare within a country--to see which activities are done more and which are done less, compared to the average. And you can also compare between countries to see where people spend more time on any particular activity. This between-country comparison would be clearer if we put all the lines on the same graph, but that looks a bit busy to me and I'm happier with the separate line plots. If you had data on a lot of countries I could see batching them (for example, the lines for northern European countries on one plot, the lines for Southern European countries on another, and other plots for English-speaking countries, east Asian countries, south Asian countries, Middle Eastern/North Africa, sub-Saharan Africa, and Latin America).

I can see where the Economist's graphics designers were coming from with their plots. In any country, the categories add to 24 hours, and the circle plot enforces that constraint. (They could've made pie charts but everyone knows how bad that is.) But there are a lot of categories so they needed colors and a legend. And the circle arcs are hard to compare so they needed to put in the exact numbers. The result, though, doesn't work for me. I mean, sure, maybe it was fine--Matthew Yglesias is more in the target audience of the Economist than I am, and he liked the graph--but I think it could've been much better. And I'm sure that if a graphics designer worked with me on it, the graph could be better still.

At some point this would represent a bit too much effort spent on one particular graph in a weekly newspaper. But if we have enough good examples of these, they could represent a template that could be used all over.

P.S. I was dissatisfied with my graph above because of lack of readability of the labels. So I spent another 45 minuteshour to make this:

times3.png

Wow! All the information, it's clear and readable, and I got it in under 600 x 250 resolution on a png. I like it.

P.P.S. Here's the R code I used to make the graphs.

P.P.P.S. See here for yet another version.

The R code for those time-use graphs

By popular demand, here's my R script for the time-use graphs:

Catherine Rampell highlights this stunning Gallup Poll result:

6 percent of Americans in households earning over $250,000 a year think their taxes are "too low." Of that same group, 26 percent said their taxes were "about right," and a whopping 67 percent said their taxes were "too high."

OK, fine. Most people don't like taxes. No surprise there. But get this next part:

And yet when this same group of high earners was asked whether "upper-income people" paid their fair share in taxes, 30 percent said "upper-income people" paid too little, 30 percent said it was a "fair share," and 38 percent said it was too much.

30 percent of these upper-income people say that upper-income people pay too little, but only 6 percent say that they personally pay too little. 38% say that upper-income people pay too much, but 67% say they personally pay too much.

Free $5 gift certificate!

I bought something online and got a gift certificate for $5 to use at BustedTees.com. The gift code is TP07zh4q5dc and it expires on 30 Apr. I don't need a T-shirt so I'll pass this on to you.

I assume it only works once. So the first person who follows up on this gets the discount. Enjoy!

The mysterious Gamma (1.4, 0.4)

A student writes:

I have a question about an earlier recommendation of yours on the election of the prior distribution for the precision hyperparameter of a normal distribution, and a reference for the recommendation. If I recall correctly I have read that you have suggested to use Gamma(1.4, 0.4) instead of Gamma(0.01,0.01) for the prior distribution of the precision hyper parameter of a normal distribution.

I would very much appreciate if you would have the time to point me to this publication of yours. The reason is that I have used the prior distribution (Gamma(1.4, 0.4)) in a study which we now revise for publication, and where a reviewer question the choice of the distribution (claiming that it is too informative!).

I am well aware of that you in recent publications (Prior distributions for variance parameters in hierarchical models. Bayesian Analysis; Data Analysis using regression and multilevel/hierarchical models) suggest to model the precision as pow(standard deviation, -2) and to use either a Uniform or a Half-Cauchy distribution. However, since our model was fitted before I saw these publications, I would much like to find your earlier recommendation (which works fine!).

My reply: I've never heard of a Gamma (1.4, 0.4) distribution. I have no idea where this came from! But I can believe that it might work well--it would depend on the application.

Gamma (1.4, 0.4)?? Perhaps this was created by matching some moments or quantiles???

The following is an essay into a topic I know next to nothing about.

As part of our endless discussion of Dilbert and Charlie Sheen, commenter Fraac linked to a blog by philosopher Edouard Machery, who tells a fascinating story:

How do we think about the intentional nature of actions? And how do people with an impaired mindreading capacity think about it?

Consider the following probes:

The Free-Cup Case

Joe was feeling quite dehydrated, so he stopped by the local smoothie shop to buy the largest sized drink available. Before ordering, the cashier told him that if he bought a Mega-Sized Smoothie he would get it in a special commemorative cup. Joe replied, 'I don't care about a commemorative cup, I just want the biggest smoothie you have.' Sure enough, Joe received the Mega-Sized Smoothie in a commemorative cup. Did Joe intentionally obtain the commemorative cup?

The Extra-Dollar Case

Joe was feeling quite dehydrated, so he stopped by the local smoothie shop to buy the largest sized drink available. Before ordering, the cashier told him that the Mega-Sized Smoothies were now one dollar more than they used to be. Joe replied, 'I don't care if I have to pay one dollar more, I just want the biggest smoothie you have.' Sure enough, Joe received the Mega-Sized Smoothie and paid one dollar more for it. Did Joe intentionally pay one dollar more?

You surely think that paying an extra dollar was intentional, while getting the commemorative cup was not. [Indeed, I do--AG.] So do most people (Machery, 2008).

But Tiziana Zalla and I [Machery] have found that if you had Asperger Syndrome, a mild form of autism, your judgments would be very different: You would judge that paying an extra-dollar was not intentional, just like getting the commemorative cup.

I'm not particularly interested in the Asperger's angle (except for the linguistic oddity that most people call it Asperger's but in the medical world it's called Asperger; compare, for example, the headline of the linked blog to its text), but I am fascinated by the above experiment. Even after reading the description, it seems to me perfectly natural to think of the free cup as unintentional and the extra dollar as intentional. But I also agree with the implicit point that, in a deeper sense, the choice to pay the extra dollar isn't really more intentional than the choice to take the cup. It just feels that way.

To engage in a bit of introspective reasoning (as is traditional in in the "heuristics and biases" field), I'd say the free cup just happened whereas in the second scenario Joe had to decide to pay the dollar.

But that's not really it. The passive/active division correctly demarcates the free cup and extra dollar examples, but Machery presents other examples where both scenarios are passive, or where both scenarios are active, and you can get perceived intentionality or lack of intentionality in either case. (Just as we learned from classical decision theory and the First Law of Robotics, to not decide is itself a decision.)

Machery's explanation (which I don't buy)

Leslie McCall spoke in the sociology department here the other day to discuss changes in attitudes about income inequality as well as changes in attitudes about attitudes about income inequality. (That is, she talked about what survey respondents say, and she talked about what scholars have said about what survey respondents say.)

On the plus side, the talk was interesting. On the downside, I had to leave right at the start of the discussion so I didn't have a chance to ask my questions. So I'm placing them below.

I can't find a copy of McCall's slides so I'll link to this recent op-ed she wrote on the topic of "Rising Wealth Inequality: Should We Care?" Her title was "Americans Aren't Naive," and she wrote:

Understanding what Americans think about rising income inequality has been hampered by three problems.

First, polls rarely ask specifically about income inequality. They ask instead about government redistributive polices, such as taxes and welfare, which are not always popular. From this information, we erroneously assume that Americans don't care about inequality. . .. Second, surveys on inequality that do exist are not well known. . . . Third . . . politicians and the media do not consistently engage Americans on the issue. . . .

It is often said that Americans care about opportunity and not inequality, but this is very misleading. Inequality can itself distort incentives and restrict opportunities. This is the lesson that episodes like the financial crisis and Great Recession convey to most Americans.

What follows is not any attempt at an exposition, appreciation, or critique of McCall's work but rather just some thoughts that arose, based on some notes I scrawled during her lecture:

1. McCall is looking at perceptions of perceptions. This reminds me of our discussions in Red State Blue State about polarization and the perception of polarization. The idea is that, even if American voters are not increasingly polarized in their attitudes, there is a perception of polarization, and this perception can itself have consequences (for example, in the support offered to politicians on either side who refuse to compromise).

2. McCall talked about meritocracy and shared a quote from Daniel Bell (who she described as "conservative," which surprised me, but I guess it would be accurate to call him the most liberal of the neoconservatives) about how meritocracy could be good or bad, with bad meritocracy associated with meritocrats who abuse their positions of power and degrade those below in the social ladder.

At this point I wanted to jump up and shout James "the Effect" Flynn's point that meritocracy is a self-contradiction. As Flynn put it:

The case against meritocracy can be put psychologically: (a) The abolition of materialist-elitist values is a prerequisite for the abolition of inequality and privilege; (b) the persistence of materialist-elitist values is a prerequisite for class stratification based on wealth and status; (c) therefore, a class-stratified meritocracy is impossible.

Flynn also points out that the promotion and celebration of the concept of "meritocracy" is also, by the way, a promotion and celebration of wealth and status--these are the goodies that the people with more merit get:

People must care about that hierarchy for it to be socially significant or even for it to exist. . . . The case against meritocracy can also be put sociologically: (a) Allocating rewards irrespective of merit is a prerequisite for meritocracy, otherwise environments cannot be equalized; (b) allocating rewards according to merit is a prerequisite for meritocracy, otherwise people cannot be stratified by wealth and status; (c) therefore, a class-stratified meritocracy is impossible.

In short, when people talk about meritocracy they tend to focus on the "merit" part (Does Kobe Bryant have as much merit as 10,000 schoolteachers? Do doctors have more merit than nurses? Etc.), but the real problem with meritocracy is that it's an "ocracy."

This point is not in any way a contradiction or refutation of McCall. I just think that, to the extent that debates over "just deserts" are a key part of her story, it would be useful to connect to Flynn's reflections on the impossibility of a meritocratic future.

3. I have a few thoughts on the competing concepts of opportunity vs. redistribution, which were central to McCall's framing.

a. Loss aversion. Opportunity sounds good because it's about gains. In contrast, I suspect that, when we think about redistribution, losses are more salient. (Redistribution is typically framed as taking from group A and giving to group B. There is a vague image of a bag full of money, and of course you have to take it from A before giving it to B.) So to the extent there is loss aversion (and I think there is), redistribution is always gonna be a tough sell.

b. The path from goal to policy. If you're going to cut taxes, what services do you plan to cut? If you plan to increase services, who's going to pay for it? Again, economic opportunity sounds great because you're not taking it from anybody. This is not just an issue of question wording in a survey; I think it's fundamental to how people think about inequality and redistribution.

I suspect the cognitive (point "a" above) and political (point "b") framing are central to people's struggles in thinking about economic opportunity. The clearest example is affirmative action, where opportunity for one group directly subtracts from opportunity for others.

4. As I remarked during McCall's talk, I was stunned that more than half the people did not think that family or ethnicity helped people move up in the world. We discussed the case of George W. Bush, who certainly benefited from family connections but can't really said to have moved up in the world--for him, being elected president was just a way to stand still, intergenerationally-speaking. As well as being potentially an interesting example for McCall's book-in-progress, the story of G. W. Bush illustrates some of the inherent contradictions in thinking about mobility in a relative sense. Not everyone can move up, at least not in a relative sense.

5. McCall talked about survey results on Americans' views of rich people and, I think, of corporate executives. This reminds me of survey data from 2007 on Americans' views of corporations:

Nearly two-thirds of respondents say corporate profits are too high, but, according to a Pew research report, "more than seven in ten agree that 'the strength of this country today is mostly based on the success of American business' - an opinion that has changed very little over the past 20 years." People like business in general (except for those pesky corporate profits) but they love individual businesses, with 95% having a favorable view of Johnson and Johnson (among those willing to give a rating), 94% liking Google, 91% liking Microsoft, . . . I was surprised to find that 70% of the people were willing to rate Citibank, and of those people, 78% had a positive view. I don't have a view of Citibank one way or another, but it would seem to me to be the kind of company that people wouldn't like, even in 2007. Were banks ever popular? I guess so.

The Pew report broke things down by party identification (Democrat or Republican) and by "those who describe their household as professional or business class; those who call themselves working class; and those who say their family or household is struggling."

Republicans tend to like corporations, with little difference between the views of professional-class and working-class Republicans. For Democrats, though, there's a big gap, with professionals having a generally more negative view, compared to the working class. Follow the link for some numbers and some further discussion for some fascinating patterns that I can't easily explain.

6. In current debates over the federal budget, liberals favor an economic stimulus (i.e., deficit spending) right now, while conservatives argue that, not only should we decrease the deficit, but that our entire fiscal structure is unsustainable, that we can't afford the generous pensions and health care that's been promised to everyone. The crisis in the euro is often taken by fiscal conservatives as a signal that the modern welfare state is a pyramid scheme, and something has to get cut.

When the discussion shifts to the standard of living of the middle class, though, we get a complete reversal. McCall's op-ed was part of an online symposium on wealth inequality. One thing that struck me about the discussions there was the reversal of the usual liberal/conservative perspectives on fiscal issues.

Liberals who are fine with deficits at the national level argue that, in the words of Michael Norton, "the expansion of consumer credit in the United States has allowed middle class and poor Americans to live beyond their means, masking their lack of wealth by increasing their debt." From the other direction, conservatives argue that Americans are doing just fine, with Scott Winship reporting that "four in five Americans have exceeded the income their parents had at the same age."

From the left, we hear that America is rich but Americans are broke. From the right, the story is the opposite. America (along with Europe and Japan) are broke but individual Americans are doing fine.

I see the political logic to these positions. If you start from the (American-style) liberal perspective favoring government intervention in the economy, you'll want to argue that (a) people are broke and need the government's help, and (b) we as a society can afford it. If you start from the conservative perspective favoring minimal government intervention, you'll want to argue that (a) people are doing just fine as they are, and (b) anyway, we can't afford to help them.

I won't try to adjudicate these claims: as I've written a few dozen times in this space already, I have no expertise in macroeconomics (although I did get an A in the one and only econ class I ever took, which was in 11th grade). I bring them up in order to demonstrate the complicated patterns between economic ideology, political ideology, and views about inequality.

This one was so beautiful I just had to repost it:

From the New York Times, 9 Sept 1981:

IF I COULD CHANGE PARK SLOPE

If I could change Park Slope I would turn it into a palace with queens and kings and princesses to dance the night away at the ball. The trees would look like garden stalks. The lights would look like silver pearls and the dresses would look like soft silver silk. You should see the ball. It looks so luxurious to me.

The Park Slope ball is great. Can you guess what street it's on? "Yes. My street. That's Carroll Street."

-- Jennifer Chatmon, second grade, P.S. 321

This was a few years before my sister told me that she felt safer having a crack house down the block because the cops were surveilling it all the time.

We were having so much fun on this thread that I couldn't resist linking to this news item by Adrian Chen. The good news is that Scott Adams (creater of the Dilbert comic strip) "has a certified genius IQ" and that he "can open jars with [his] bare hands." He is also "able to lift heavy objects." Cool!

In all seriousness, I knew nothing about this aspect of Adams when I wrote the earlier blog. I was just surprised (and remain surprised) that he was so impressed with Charlie Sheen for being good-looking and being able to remember his lines. At the time I thought it was just a matter of Adams being overly-influenced by his direct experience, along with some satisfaction in separating himself from the general mass of Sheen-haters out there. But now I wonder if something more is going on, that maybe he feels that he and Sheen are on the same side in a culture war.

In any case, the ultimate topic of interest here is not Sheen or Adams but rather more general questions of what it takes for someone to root for someone. I agree with some of the commenters on the earlier thread that it's not about being a good guy or a bad guy. Lots of people rooted for the Oakland Raiders (sorry, I'm showing my age here), maybe partly because of their reputation as bad boys. And Charlie Sheen is definitely an underdog right now.

P.S. Amazingly enough, Chen includes a link to a Dilbert strip mocking the very behavior that Adams was doing. Not a bit deal but it's a bit odd.

P.P.S. No, I'm not Dilbert-obsessed! It just happened that I was reading Gawker (sorry!) and the Scott Adams entry caught my eye.

P.P.P.S. My favorite part of this whole story is Russell's-paradox-evoking thread centered around Adams's self-contradicting statement, "You're talking about Scott Adams. He's not talking about you."

Bayesian statistical pragmatism

Rob Kass's article on statistical pragmatism is scheduled to appear in Statistical Science along with some discussions. Here are my comments.

I agree with Rob Kass's point that we can and should make use of statistical methods developed under different philosophies, and I am happy to take the opportunity to elaborate on some of his arguments.

I'll discuss the following:
- Foundations of probability
- Confidence intervals and hypothesis tests
- Sampling
- Subjectivity and belief
- Different schools of statistics

Happy tax day!

Your taxes pay for the research funding that supports the work we do here, some of which appears on this blog and almost all of which is public, free, and open-source. So, to all of the taxpayers out there in the audience: thank you.

This announcement might be of interest to some of you. The application deadline is in just a few days:

The National Center for Complementary and Alternative Medicine at the National Institutes of Health is seeking an additional experienced statistician to join our Office of Clinical and Regulatory Affairs team. www.usajobs.gov is accepting applications through April 22, 2011 for the general announcement and April 21 for status (typically current federal employee) candidates. To apply to this announcement or for more information, click on the links provided below or the USAJobs link provided above and search for NIH-NCCAM-DE-11-448747 (external) or NIH-NCCAM-MP-11-448766 (internal).

You have to be a U.S. citizen for this one.

NYC 1950

Coming back from Chicago we flew right over Manhattan. Very impressive as always, to see all those buildings so densely packed. But think of how impressive it must have seemed in 1950! The world had a lot less of everything back in 1950 (well, we had more oil in the ground, but that's about it), so Manhattan must have just seemed amazing. I can see how American leaders of that period could've been pretty smug. Our #1 city was leading the world by so much, it was decades ahead of its time, still impressive even now after 60 years of decay.

A few years ago Larry Bartels presented this graph, a version of which latter appeared in his book Unequal Democracy:

larry2.png

Larry looked at the data in a number of ways, and the evidence seemed convincing that, at least in the short term, the Democrats were better than Republicans for the economy. This is consistent with Democrats' general policies of lowering unemployment, as compared to Republicans lowering inflation, and, by comparing first-term to second-term presidents, he found that the result couldn't simply be explained as a rebound or alternation pattern.

The question then arose, why have the Republicans won so many elections? Why aren't the Democrats consistently dominating? Non-economic issues are part of the story, of course, but lots of evidence shows the economy to be a key concern for voters, so it's still hard to see how, with a pattern such as shown above, the Republicans could keep winning.

Larry had some explanations, largely having to do with timing: under Democratic presidents the economy tended to improve at the beginning of the four-year term, while gains under Republicans tended to occur in years 3 and 4--just in time for the next campaign!

See here for further discussion (from five years ago) of Larry's ideas from the perspective of the history of the past 60 years.

Enter Campbell

Jim Campbell recently wrote an article, to appear this week in The Forum (the link should become active once the issue is officially published) claiming that Bartels is all wrong--or, more precisely, that Bartels's finding of systematic differences in performance between Democratic and Republican presidents is not robust and goes away when you control the economic performance leading in to a president's term.

Here's Campbell:

Previous estimates did not properly take into account the lagged effects of the economy. Once lagged economic effects are taken into account, party differences in economic performance are shown to be the effects of economic conditions inherited from the previous president and not the consequence of real policy differences. Specifically, the economy was in recession when Republican presidents became responsible for the economy in each of the four post-1948 transitions from Democratic to Republican presidents. This was not the case for the transitions from Republicans to Democrats. When economic conditions leading into a year are taken into account, there are no presidential party differences with respect to growth, unemployment, or income inequality.

For example, using the quarterly change in GDP measure, the economy was in free fall in Fall 2008 but in recovery during the third and fourth quarters of 2009, so this counts as Obama coming in with a strong economy. (Campbell emphasizes that he is following the lead of Bartels and counting a president's effect on the economy to not begin until year 2.)

It's tricky. Bartels's claims are not robust to changes in specifications, but Campbell's conclusions aren't completely stable either. Campbell finds one thing if he controls for previous year's GNP growth but something else if he controls only for GNP growth in the 3rd and 4th quarter of the previous year. This is not to say Campbell is wrong but just to say that any atheoretical attempt to throw in lags can result in difficulty in interpretation.

I'm curious what Doug Hibbs thinks about all this; I don't know why, but to me Hibbs exudes an air of authority on this topic, and I'd be inclined to take his thoughts on these matters seriously.

What struck me the most about Campbell's paper was ultimately how consistent its findings are with Bartels's claims. This perhaps shouldn't be a surprise, given that they're working with the same data, but it did surprise me because their political conclusions are so different.

Here's the quick summary, which (I think) both Bartels and Campbell would agree with:

- On average, the economy did a lot better under Democratic than Republican presidents in the first two years of the term.

- On average, the economy did slightly better under Republican than Democratic presidents in years 3 and 4.

These two facts are consistent with the Hibbs/Bartels story (Democrats tend to start off by expanding the economy and pay the price later, while Republicans are more likely to start off with some fiscal or monetary discipline) and also consistent with Campbell's story (Democratic presidents tend to come into office when the economy is doing OK, and Republicans are typically only elected when there are problems).

But the two stories have different implications regarding the finding of Hibbs, Rosenstone, and others that economic performance in the last years of a presidential term predicts election outcomes. Under the Bartels story, voters are myopically chasing short-term trends, whereas in Campbell's version, voters are correctly picking up on the second derivative (that is, the trend in the change of the GNP from beginning to end of the term).

Consider everyone's favorite example: Reagan's first term, when the economy collapsed and then boomed. The voters (including Larry Bartels!) returned Reagan by a landslide in 1984: were they suckers for following a short-term trend or were they savvy judges of the second derivative?

I don't have any handy summary here--I don't see a way to declare a winner in the debate--but I wanted to summarize what seem to me to be the key points of agreement and disagreement in these very different perspectives on the same data.

One way to get leverage on this would be to study elections for governor and state economies. Lots of complications there, but maybe enough data to distinguish between the reacting-to-recent-trends and reacting-to-the-second-derivative stories.

P.S. See below for comments by Campbell.

Johathan Chait writes:

Parties and candidates will kill themselves to move the needle a percentage point or two in a presidential race. And again, the fundamentals determine the bigger picture, but within that big picture political tactics and candidate quality still matters around the margins.

I agree completely. This is the central message of Steven Rosenstone's excellent 1983 book, Forecasting Presidential Elections.

So, given that Chait and I agree 100%, why was I so upset at his recent column on "The G.O.P.'s Dukakis Problem"?

I'll put the reasons for my displeasure below the fold because my main point is that I'm happy with Chait's quote above. For completeness I want to explain where I'm coming from but my take-home point is that we're mostly in agreement.

At the Statistics Forum, we highlight a debate about how statistics should be taught in high schools. Check it out and then please leave your comments there.

Scott "Dilbert" Adams has met Charlie Sheen and thinks he really is a superbeing. This perhaps relates to some well-known cognitive biases. I'm not sure what this one's called, but the idea is that Adams is probably overweighting his direct impressions: he saw Sheen-on-the-set, not Sheen-beating-his-wife. Also, everybody else hates Sheen, so Adams can distinguish himself by being tolerant, etc.

I'm not sure what this latter phenomenon is called, but I've noticed it before. When I come into a new situation and meet some person X, who everybody says is a jerk, and then person X happens to act in a civilized way that day, then there's a real temptation to say, Hey, X isn't so bad after all. It makes me feel so tolerant and above-it-all. Perhaps that's partly what's going on with Scott Adams here: he can view himself as the objective outsider who can be impressed by Sheen, not like all those silly emotional people who get hung up on the headlines. From here, though, it just makes Adams look silly, to be so impressed that Sheen didn't miss a line of dialogue, etc. The logical next step is the story of how he met John Edwards and was impressed at how statesmanlike he was.

Awhile ago I was cleaning out the closet and found some old unread magazines. Good stuff. As we've discussed before, lots of things are better read a few years late.

Today I was reading the 18 Nov 2004 issue of the London Review of Books, which contained (among other things) the following:

- A review by Jenny Diski of a biography of Stanley Milgram. Diski appears to want to debunk:

Milgram was a whiz at devising sexy experiments, but barely interested in any theoretical basis for them. They all have the same instant attractiveness of style, and then an underlying emptiness.

Huh? Michael Jordan couldn't hit the curveball and he was reportedly an easy mark for golf hustlers but that doesn't diminish his greatness on the basketball court.

She also criticizes Milgram for being "no help at all" for solving international disputes. OK, fine. I haven't solved any international disputes either. Milgram, though, . . . he conducted an imaginative experiment whose results stunned the world. And then in his afterlife he must suffer the indignity of someone writing that his findings are useless because people still haven't absorbed them. I agree with Diski that some theory might help, but it hardly seems to be Milgram's fault that he was ahead of his time.

- A review by Patrick Collinson of a biography of Anne Boleyn. Mildly interesting stuff, and no worse for being a few years delayed. Anne Boleyn isn't going anywhere.

- An article by Charles Glass on U.S. in Afghanistan. Apparently it was already clear in 2004 that it wasn't working. Too bad the policymakers weren't reading the London Review of Books. For me, though, it's even more instructive to see this foretold six years ago.

- A review by Wyatt Mason of a book by David Foster Wallace. Mason reviews in detail a story with a complicated caught-in-a-dream plot which the critic James Wood, writing for the New Republic, got completely wrong. Wood got a key plot point backwards and as a result misunderstands the story and blames Wallace for creating an unsympathetic character.

Again, the time lag adds an interesting twist. I was curious as to whether Wood ever acknowledged Mason's correctly, or apologized to Wallace for misreading his story, so I Googled "james wood david foster wallace." What turned up was a report by James Yeh of a lecture by Wood at the 92nd St. Y on Wallace after the author's death. Discussing a later book by Wallace, Wood said, "Wallace gives you the key, overexplaining the hand, instead of actually being enigmatic, like Beckett."

I dunno: After reading Wood's earlier review, maybe Wallace felt he had to overexplain. Damned if you do, etc.

- A review by Hugh Pennington of some books about supermarkets that contains the arresting (to me) line:

Consumption [of chicken] in the US has increased steadily since Herbert Hoover's promise of 'a chicken in every pot' in 1928; it rose a hundredfold between 1934 and 1994, from a quarter of a chicken a year to half a chicken a week.

A hundredfold--that's a lot! I thought it best to look this one up so I Googled "chicken consumption usda" and came up with this document by Jean Buzby and Hodan Farah, which contains this delightfully-titled graph:

chicken.png

OK, so it wasn't a hundredfold increase, actually only sixfold. People were eating way more than a quarter of a chicken a year in 1934. And chicken consumption did not increase steadily since 1928. The curve is flat until the early 1940s.

This got me curious: who is Hugh Pennington, exactly? In that issue of the LRB, it says he "sits on committees that advise the World Food Programme and the Food Standards Agency. I guess he was just having a bad day, or maybe his assistant gave him some bad figures. Too bad they didn't have Google back in 1994 or he could've looked up the numbers directly. "A hundredfold" . . . didn't that strike him as a big number??

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48  

Recent Comments

  • jme: You appear to have made Chait....um...a wee bit annoyed: http://www.tnr.com/blog/jonathan-chait/86518/do-candidates-matter-all-yes read more
  • Nick Cox: I always knew you were above average. read more
  • Andrew Gelman: Nick: I eat one or two chickens a week. read more
  • Nick Cox: Another way to think of it is very simple. If read more
  • Andrew Gelman: Nick: Yes, I agree that there's more about Hugh Pennington read more
  • Nick Cox: It is good that we now have Google so that read more
  • Xi'an: (The French) Henry the IVth also promised a chicken in read more
  • Andrew Gelman: Nick: What's my own precept? read more
  • Nick Cox: You could follow your own precept: http://en.wikipedia.org/wiki/Hugh_Pennington read more
  • http://models.street-artists.org/?author=2: When something starts out near zero and becomes a nontrivial read more
  • Jonathan: In fact, look at this Table, which suggests that Pennington's read more
  • Jonathan: Well, this table http://usda.mannlib.cornell.edu/usda/ers/89007/table0080.xls Shows a sevenfold increase from 1960 read more

Find recent content on the main index or look in the archives to find all content.