Recently in Sociology Category

I think I'm starting to resolve a puzzle that's been bugging me for awhile.

Pop economists (or, at least, pop micro-economists) are often making one of two arguments:

1. People are rational and respond to incentives. Behavior that looks irrational is actually completely rational once you think like an economist.

2. People are irrational and they need economists, with their open minds, to show them how to be rational and efficient.

Argument 1 is associated with "why do they do that?" sorts of puzzles. Why do they charge so much for candy at the movie theater, why are airline ticket prices such a mess, why are people drug addicts, etc. The usual answer is that there's some rational reason for what seems like silly or self-destructive behavior.

Argument 2 is associated with "we can do better" claims such as why we should fire 80% of public-schools teachers or Moneyball-style stories about how some clever entrepreneur has made a zillion dollars by exploiting some inefficiency in the market.

The trick is knowing whether you're gonna get 1 or 2 above. They're complete opposites!

Our story begins . . .

Here's a quote from Steven Levitt:

One of the easiest ways to differentiate an economist from almost anyone else in society is to test them with repugnant ideas. Because economists, either by birth or by training, have their mind open, or skewed in just such a way that instead of thinking about whether something is right or wrong, they think about it in terms of whether it's efficient, whether it makes sense. And many of the things that are most repugnant are the things which are indeed quite efficient, but for other reasons -- subtle reasons, sometimes, reasons that are hard for people to understand -- are completely and utterly unacceptable.

As statistician Mark Palko points out, Levitt is making an all-too-convenient assumption that people who disagree with him are disagreeing because of closed-mindedness. Here's Palko:

There are few thoughts more comforting than the idea that the people who disagree with you are overly emotional and are not thinking things through. We've all told ourselves something along these lines from time to time.

I could add a few more irrational reasons to disagree with Levitt: political disagreement (on issues ranging from abortion to pollution) and simple envy at Levitt's success. (It must make the haters even more irritated that Levitt is, by all accounts, amiable, humble, and a genuinely nice guy.) In any case, I'm a big fan of Freakonomics.

But my reaction to reading the above Levitt quote was to think of the puzzle described at the top of this entry. Isn't it interesting, I thought, that Levitt is identifying economists as rational and ordinary people as irrational. That's argument 2 above. In other settings, I think we'd hear him saying how everyone responds to incentives and that what seems like "efficiency" to do-gooding outsiders is actually not efficient at all. The two different arguments get pulled out as necessary.

The set of all sets that don't contain themselves

Which in turn reminds me of this self-negating quote from Levitt protoge Emily Oster:

anthropologists, sociologists, and public-health officials . . . believe that cultural differences--differences in how entire groups of people think and act--account for broader social and regional trends. AIDS became a disaster in Africa, the thinking goes, because Africans didn't know how to deal with it.

Economists like me [Oster] don't trust that argument. We assume everyone is fundamentally alike; we believe circumstances, not culture, drive people's decisions, including decisions about sex and disease.

I love this quote for its twisted logic. It's Russell's paradox all over again. Economists are different from everybody else, because . . . economists "assume everyone is fundamentally alike"! But if everyone is fundamentally alike, how is it that economists are different "from almost anyone else in society"? All we can say for sure is that it's "circumstances, not culture." It's certainly not "differences in how entire groups of people think and act"--er, unless these groups are economists, anthropologists, etc.

OK, fine. I wouldn't take these quotations too seriously; they're just based on interviews, not careful reflection. My impression is that these quotes come from a simple division of the world into good and bad things:

- Good: economists, rationality, efficiency, thinking the unthinkable, believing in "circumstances"

- Bad: anthropologists, sociologists, public-health officials, irrationality, being deterred by repugnant ideas, believing in "culture"

Good is entrepreneurs, bad is bureaucrats. At some point this breaks down. For example, if Levitt is hired by a city government to help reform its school system, is he a rational, taboo-busting entrepreneur (a good thing) or a culture-loving bureaucrat who thinks he knows better than everybody else (a bad thing)? As a logical structure, the division into Good and Bad has holes. But as emotionally-laden categories ("fuzzy sets," if you will), I think it works pretty well.

The solution to the puzzle

OK, now to return to the puzzle that got us started. How is it that economics-writers such as Levitt are so comfortable flipping back and forth between argument 1 (people are rational) and argument 2 (economists are rational, most people are not)?

The key, I believe, is that "rationality" is a good thing. We all like to associate with good things, right? Argument 1 has a populist feel (people are rational!) and argument 2 has an elitist feel (economists are special!). But both are ways of associating oneself with rationality. It's almost like the important thing is to be in the same room with rationality; it hardly matters whether you yourself are the exemplar of rationality, or whether you're celebrating the rationality of others.

Conclusion

I'm not saying that arguments based on rationality are necessarily wrong in particular cases. (I can't very well say that, given that I wrote an article on why it can be rational to vote.) I'm just trying to understand how pop-economics can so rapidly swing back and forth between opposing positions. And I think it's coming from the comforting presence of rationality and efficiency in both formulations. It's ok to distinguish economists from ordinary people (economists are rational and think the unthinkable, ordinary people don't) and it's also ok to distinguish economists from other social scientists (economists think ordinary people are rational, other social scientists believe in "culture"). You just have to be careful not to make both arguments in the same paragraph.

P.S. Statisticians are special because, deep in our bones, we know about uncertainty. Economists know about incentives, physicists know about reality, movers can fit big things in the elevator on the first try, evolutionary psychologists know how to get their names in the newspaper, lawyers know you should never never never talk to the cops, and statisticians know about uncertainty. Of that, I'm sure.

Faux-antique

| No Comments

Ole Rogeberg writes:

Here`s a blogpost regarding a new paper (embellished with video and an essay) where a colleague and I try to come up with an explanation for why the discipline of economics ends up generating weird claims such as those you`ve blogged on previously regarding rational addiction.

From Ole's blog:

The puzzle that we try to explain is this frequent disconnect between high-quality, sophisticated work in some dimensions, and almost incompetently argued claims about the real world on the other. . . .

Our explanation can be put in terms of the research process as an "evolutionary" process: Hunches and ideas are turned into models and arguments and papers, and these are "attacked" by colleagues who read drafts, attend seminars, perform anonymous peer-reviews or respond to published articles. Those claims that survive this process are seen as "solid" and "backed by research." If the "challenges" facing some types of claims are systematically weaker than those facing other types of claims, the consequence would be exactly what we see: Some types of "accepted" claims would be of high standard (e.g., formal, theoretical models and certain types of statistical fitting) while other types of "accepted claims" would be of systematically lower quality (e.g., claims about how the real world actually works or what policies people would actually be better off under).

In our paper, we pursue this line of thought by identifying four types of claims that are commonly made - but that require very different types of evidence (just as the Pythagorean theorem and a claim about the permeability of shale rock would be supported in very different ways). We then apply this to the literature on rational addiction and argue that this literature has extended theory and that, to some extent, it is "as if" the market data was generated by these models. However, we also argue that there is (as good as) no evidence that these models capture the actual mechanism underlying an addiction or that they are credible, valid tools for predicting consumer welfare under addictions. All the same - these claims have been made too - and we argue that such claims are allowed to piggy-back on the former claims provided these have been validly supported. We then discuss a survey mailed to all published rational addiction researchers which provides indicative support - or at least is consistent with - the claim that the "culture" of economics knows the relevant criteria for evaluating claims of pure theory and statistical fit better than it knows the relevant criteria for evaluating claims of causal or welfare "insight". . . .

If this explanation holds up after further challenges and research and refinement, it would also provide a way of changing things - simply by demanding that researchers state claims more explicitly and with greater precision, and that we start discussing different claims separately and using the evidence relevant to each specific one. Unsupported claims about the real world should not be something you`re allowed to tag on at the end of a work as a treat for competently having done something quite unrelated.

Or, as Kaiser Fung puts it, "story time." (For a recent example, see the background behind the claim that "a raise won't make you work harder.")

This (Ole's idea) is just great: moving from criticism to a model and pointing the way forward to possible improvement.

According to a New York Times article, cognitive scientists Hugo Mercier and Dan Sperber have a new theory about rational argument: humans didn't develop it in order to learn about the world, we developed it in order to win arguments with other people. "It was a purely social phenomenon. It evolved to help us convince others and to be careful when others try to convince us."

Based on the NYT article, it seems that Mercier and Sperber are basically flipping around the traditional argument, which is that humans learned to reason about the world, albeit imperfectly, and learned to use language to convey that reasoning to others. These guys would suggest that it's the other way around: we learned to argue with others, and this has gradually led to the ability to actually make (and recognize) sound arguments, but only indirectly. The article says ""At least in some cultural contexts, this results in a kind of arms race towards greater sophistication in the production and evaluation of arguments," they write. "When people are motivated to reason, they do a better job at accepting only sound arguments, which is quite generally to their advantage."

Of course I have no idea if any of this is true, or even how to test it. But it's definitely true that people are often convinced by wrong or even crazy arguments, and they (we) are subject to confirmation bias and availability bias and all sorts of other systematic biases. One thing that bothers me especially is that a lot of people are simply indifferent to facts and rationality when making decisions. Mercier and Sperber have at least made a decent attempt to explain why people are like this.

Sanjay Srivastava reports:

Recently Ben Goldacre wrote about a group of researchers (Stuart Ritchie, Chris French, and Richard Wiseman) whose null replication of 3 experiments from the infamous Bem ESP paper was rejected by JPSP - the same journal that published Bem's paper.

Srivastava recognizes that JPSP does not usually publish replications but this is a different story because it's an anti-replication.

Here's the paradox:

- From a scientific point of view, the Ritchie et al. results are boring. To find out that there's no evidence for ESP . . . that adds essentially zero to our scientific understanding. What next, a paper demonstrating that pigeons can fly higher than chickens? Maybe an article in the Journal of the Materials Research Society demonstrating that diamonds can scratch marble but not the reverse??

- But from a science-communication perspective, the null replication is a big deal because it adds credence to my hypothesis that the earlier ESP claims arose from some sort of measurement error (which might be of interest for people doing straight psychology experiments using similar methods).

The rules of journal publication are all about scientific progress, but scientific journals are plugged into the news media, where the rules are different. My guess is that the JPSP editors thought the original Bem article was not real science even when they accepted it for publication, but they wanted to be open-minded and bend over backward to be fair. Sort of like what happened when Statistical Science published that notorious Bible Code paper back in 1994.

This news article by Ryan Tate is an amazing bit of sociology:

If you want Facebook to spend millions of dollars hiring you, it helps to be a talented engineer, as the New York Times today [18 May 2011] suggests. But it also helps to carouse with Facebook honchos, invite them to your dad's Mediterranean party palace, and get them introduced to your father's venture capital pals, like Sam Lessin did.

Lessin is the poster boy for today's Times story on Facebook "talent acquisitions." Facebook spent several million dollars to buy Lessin's drop.io, only to shut it down and put Lessin to work on internal projects. To the Times, Lessin is an example of how "the best talent" fetches tons of money these days. "Engineers are worth half a million to one million," a Facebook executive told the paper.

We'll let you in on a few things the Times left out: Lessin is not an engineer, but a Harvard social studies major and a former Bain consultant. His file-sharing startup drop.io was an also-ran competitor to the much more popular Dropbox, and was funded by a chum from Lessin's very rich childhood. Lessin's wealthy investment banker dad provided Facebook founder Mark Zuckerberg crucial access to venture capitalists in Facebook's early days. And Lessin had made a habit of wining and dining with Facebook executives for years before he finally scored a deal, including at a famous party he threw at his father's vacation home in Cyprus with girlfriend and Wall Street Journal tech reporter Jessica Vascellaro. (Lessin is well connected in media, too.) . . .

And the connections continue from there. (Click on the article above to get all the links.)

There are two interesting parts of the story. First, the network of connections and free money. It makes sense: if you're a rich businessman and your kid wants to follow in your footsteps, you'd help him out, right? Second, the way that the New York Times (and the news media in general) missed the story.

This is a perfect illustration of James Flynn's point that meritocracy is self-contradictory.

Also a good argument for a multiplicity of news sources.

Christakis-Fowler update

| 1 Comment

After I posted on Russ Lyons's criticisms of the work of Nicholas Christakis and James Fowler's work on social networks, several people emailed in with links to related articles. (Nobody wants to comment on the blog anymore; all I get is emails.)

Here they are:

After posting on David Rubinstein's remarks on his "cushy life" as a sociology professor at a public university, I read these remarks by some of Rubinstein's colleagues at the University of Illinois, along with a response from Rubinstein.

Before getting to the policy issues, let me first say that I think it must have been so satisfying, first for Rubinstein and then for his colleagues (Barbara Risman, William Bridges, and Anthony Orum) to publish these notes. We all have people we know and hate, but we rarely have a good excuse for blaring our feelings in public. (I remember when I was up for tenure, I was able to read the outside letters on my case (it's a public university and they have rules), and one of the letter writers really hated my guts. I was surprised--I didn't know the guy well (the letters were anonymized but it was clear from context who the letter writer was) but the few times we'd met, he'd been cordial enough--but there you have it. He must have been thrilled to have the opportunity to write, for an audience, what he really thought about me.)

Anyway, reading Rubinstein's original article, it's clear that his feelings of alienation had been building up inside of him for oh I don't know how long, and it must have felt really great to tell the world how fake he really felt in his job. And his colleagues seem to have detested him for decades but only now have the chance to splash this all out in public. Usually you just don't have a chance.

Looking for a purpose in life

To me, the underlying issue in Rubinstein's article was his failure to find a purpose to his life at work. To go into the office, year after year, doing the very minimum to stay afloat in your classes, to be teaching Wittgenstein to a bunch of 18-year-olds who just don't care, to write that "my main task as a university professor was self-cultivation"--that's got to feel pretty empty.

In my remarks on Arrow's theorem (the weak form of Arrow's Theorem is that any result can be published no more than five times. The strong form is that every result will be published five times), I meant no criticism of Bruno Frey, the author of the articles in question: I agree that it can be a contribution to publish in multiple places. Regarding the evaluation of contributions, it should be possible to evaluate research contributions and also evaluate communication. One problem is that communication is both under- and over-counted. It's undercounted in that we mostly get credit for original ideas not for exposition; it's overcounted in that we need communication skills to publish in the top journals. But I don't think these two biases cancel out.

The real reason I'm bringing this up, though, is because Arrow's theorem happened to me recently and in interesting way. Here's the story.

Two years ago I was contacted by Harold Kincaid to write a chapter on Bayesian statistics for the Oxford Handbook of the Philosophy of the Social Sciences. I typically decline such requests because I don't know that people often read handbooks anymore, but in this case I said yes, because for about 15 years I'd been wanting to write something on the philosophy of Bayesian inference but had never gotten around to collecting my thoughts on the topic. While writing the article for Kincaid, I realized I'd like to reach a statistical audience also, so I enlisted the collaboration of Cosma Shalizi. After quite a bit of effort, we wrote an article that was promptly rejected by a statistics journal. We're now revising and I'm sure it will appear somewhere. (I liked the original a lot but the revision will be much better.)

In the meantime, though, we completed the chapter for the handbook. It overlaps with our journal article but we're aiming for different audiences.

Then came opportunity #3: I was asked if I wanted to contribute something to an online symposium on the philosophy of statistics. I took this as an opportunity to express my views as clearly and succinctly as possible. Again, there's overlap with the two previous papers but I felt that for some reason I was able to make my point more directly on this third try.

The symposium article is still under revision and I'll post it when it's done, but here's how the first draft begins:

Abstract

The frequentist approach to statistics is associated with a deductivist philosophy of science that follows Popper's doctrine of falsification. In contrast, Bayesian inference is associated with inductive reasoning and the idea that a model can be dethroned by a competing mode but can never be falsified on its own.

The purpose of this article is to break these associations, which I think are incorrect and have been detrimental to statistical practice, in that they have steered falsificationists away from the very useful tools of Bayesian inference and have discouraged Bayesians from checking the fit of their models. From my experience using and developing Bayesian methods in social and environmental science, I have found model checking and falsification to be central in the modeling process.

1. The standard view of the philosophy of statistics, and its malign influence on statistical practice

Statisticians can be roughly divided into two camps, each with a clear alignment of practice and philosophy. I will divide some of the relevant adjectives into two columns:

Frequentist Bayesian
Objective Subjective
Procedures Models
P-values Bayes factors
Deduction Induction
Falsification Pr (model is true)

I shall call this the standard view of the philosophy of statistics and abbreviate it as S. The point of this article is that S is a bad idea and that one can be a better statistician--and a better philosopher--by picking and choosing among the two columns rather than simply choosing one.

In case anybody is wondering what we really spend our time talking about . . .

Siobhan Mattison pointed me to this. I'm just disappointed they didn't use my Fenimore Cooper line. Although I guess that reference wouldn't resonate much outside the U.S.

P.S. My guess was correct See comments below. Actually, the reference probably wouldn't resonate so well among under-50-year-olds in the U.S. either. Sort of like the Jaycees story.

fdr.jpg

That cute picture is of toddler FDR in a dress, from 1884. Jeanne Maglaty writes:

A Ladies' Home Journal article [or maybe from a different source, according to a commenter] in June 1918 said, "The generally accepted rule is pink for the boys, and blue for the girls. The reason is that pink, being a more decided and stronger color, is more suitable for the boy, while blue, which is more delicate and dainty, is prettier for the girl." Other sources said blue was flattering for blonds, pink for brunettes; or blue was for blue-eyed babies, pink for brown-eyed babies, according to Paoletti.

In 1927, Time magazine printed a chart showing sex-appropriate colors for girls and boys according to leading U.S. stores. In Boston, Filene's told parents to dress boys in pink. So did Best & Co. in New York City, Halle's in Cleveland and Marshall Field in Chicago.

Today's color dictate wasn't established until the 1940s . . .

When the women's liberation movement arrived in the mid-1960s, with its anti-feminine, anti-fashion message, the unisex look became the rage . . . in the 1970s, the Sears, Roebuck catalog pictured no pink toddler clothing for two years. . . . Gender-neutral clothing remained popular until about 1985. . . .

Which reminds me of this delightfully ridiculous story.

John Cook discusses the John Tukey quote, "The test of a good procedure is how well it works, not how well it is understood." Cook writes:

At some level, it's hard to argue against this. Statistical procedures operate on empirical data, so it makes sense that the procedures themselves be evaluated empirically.

But I [Cook] question whether we really know that a statistical procedure works well if it isn't well understood. Specifically, I'm skeptical of complex statistical methods whose only credentials are a handful of simulations. "We don't have any theoretical results, buy hey, it works well in practice. Just look at the simulations."

Every method works well on the scenarios its author publishes, almost by definition. If the method didn't handle a scenario well, the author would publish a different scenario.

I agree with Cook but would give a slightly different emphasis. I'd say that a lot of methods can work when they are done well. See the second meta-principle listed in my discussion of Efron from last year. The short story is: lots of methods can work well if you're Tukey. That doesn't necessarily mean they're good methods. What it means is that you're Tukey. I also think statisticians are overly impressed by the appreciation of their scientific collaborators. Just cos a Nobel-winning biologist or physicist or whatever thinks your method is great, it doesn't mean your method is in itself great. If Brad Efron or Don Rubin had come through the door bringing their methods, Mister Nobel Prize would probably have loved them too.

Second, and back to the original quote above, Tukey was notorious for developing methods that were based on theoretical models and then rubbing out the traces of the theory and presenting the methods alone. For example, the hanging rootogram makes some sense--if you think of counts as following Poisson distributions. This predilection of Tukey's makes a certain philosophical sense (see my argument a few months ago) but I still find it a bit irritating to hide one's traces even for the best of reasons.

Under the headline, "A Raise Won't Make You Work Harder," Ray Fisman writes:

To understand why it might be a bad idea to cut wages in recessions, it's useful to know how workers respond to changes in pay--both positive and negative changes. Discussion on the topic goes back at least as far as Henry Ford's "5 dollars a day," which he paid to assembly line workers in 1914. The policy was revolutionary at the time, as the wages were more than double what his competitors were paying. This wasn't charity. Higher-paid workers were efficient workers--Ford attracted the best mechanics to his plant, and the high pay ensured that employees worked hard throughout their eight-hour shifts, knowing that if their pace slackened, they'd be out of a job. Raising salaries to boost productivity became known as "efficiency wages."

So far, so good. Fisman then moves from history and theory to recent research:

How much gift exchange really matters to American bosses and workers remained largely a matter of speculation. But in recent years, researchers have taken these theories into workplaces to measure their effect on employee behavior.

In one of the first gift-exchange experiments involving "real" workers, students were employed in a six-hour library data-entry job, entering title, author, and other information from new books into a database. The pay was advertised as $12 an hour for six hours. Half the students were actually paid this amount. The other half, having shown up expecting $12 an hour, were informed that they'd be paid $20 instead. All participants were told that this was a one-time job--otherwise, the higher-paid group might work harder in hopes of securing another overpaying library gig.

The experimenters checked in every 90 minutes to tabulate how many books had been logged. At the first check-in, the $20-per-hour employees had completed more than 50 books apiece, while the $12-an-hour employees barely managed 40 each. In the second 90-minute stretch, the no-gift group maintained their 40-book pace, while the gift group fell from more than 50 to 45. For the last half of the experiment, the "gifted" employees performed no better--40 books per 90-minute period--than the "ungifted" ones.

The punchline, according to Fisman:

The goodwill of high wages took less than three hours to evaporate completely--hardly a prescription for boosting long-term productivity.

What I'm wondering is: How seriously should we use an experiment on one-shot student library jobs (or another study, in which short-term employees were rewarded "with a surprise gift of thermoses"), to make general conclusions such as "Raises don't make employees work harder."

What I'm worried about here isn't causal identification--I'm assuming these are clean experiments--but the generalizability to the outside world of serious employment.

Fisman writes:

All participants were told that this was a one-time job--otherwise, the higher-paid group might work harder in hopes of securing another overpaying library gig.

This seems like a direct conflict between the goals of internal and external validity, especially given that one of the key reasons to pay someone more is to motivate them to work harder to secure continuation of the job, and to give them less incentive to spend their time looking for something new.

I'm not saying that the study Fisman cited is useless, just that I'm surprised that he's so careful to consider internal validity issues yet seems to have no problem extending the result to the whole labor force.

These are just my worries. Ray Fisman is an excellent researcher here at the business school at Columbia--actually, I know him and we've talked about statistics a couple times--and I'm sure he's thought about these issues more than I have. So I'm not trying to debunk what he's saying, just to add a different perspective.

Perhaps Fisman's b-school background explains why his studies all seem to be coming from the perspective of the employer: it's the employer who decides what to do with wages (perhaps "presenting the cut as a temporary measure and by creating at least the illusion of a lower workload") and the employees who are the experimental subjects.

Fisman's conclusion:

If we can find other ways of overcoming the simmering resentment that naturally accompanies wage cuts, workers themselves will be better for it in the long run.

The "we" at the beginning of the sentence does not seem to be the same as the "workers" at the end of the sentence. I wonder if there is a problem with designing policies in this unidirectional fashion.

A common reason for plagiarism is laziness: you want credit for doing something but you don't really feel like doing it--maybe you'd rather go fishing, or bowling, or blogging, or whatever, so you just steal it, or you hire someone to steal it for you.

Interestingly enough, we see that in many defenses of plagiarism allegations. A common response is: I was sloppy in dealing with my notes, or I let my research assistant (who, incidentally, wasn't credited in the final version) copy things for me and the research assistant got sloppy. The common theme: The person wanted the credit without doing the work.

As I wrote last year, I like to think that directness and openness is a virtue in scientific writing. For example, clearly citing the works we draw from, even when such citing of secondary sources might make us appear less erudite. But I can see how some scholars might feel a pressure to cover their traces.

Wegman

Which brings us to Ed Wegman, whose defense of plagiarism in that Computational Statistics and Data Analysis paper is as follows (from this report by John Mashey):

(a) In 2005, he and his colleagues needed "some boilerplate background on social networks" for a high-profile report for the U.S. Congress. But instead of getting an expert on social networks for this background, or even simply copying some paragraphs (suitably cited) from a textbook on the topic, he tasked a Ph.D. student, Denise Reeves, to prepare the boilerplate. Reeves was no expert: her knowledge of social networks came from having taken a short course on the topic. Reeves writes the boilerplate "within a few days" and Wegman writes "of course, I took that to be her original work."

(b) Wegman gave this boilerplate to a second student, Walid Sharabati, who included it in his Ph.D. dissertation "with only minor amendments." (I think he's saying Sharabati copied it.)

(c) Sharabati was a coauthor of the Computational Statistics and Data Analysis article. He took the material he'd copied from Reeves's report and stuck it in to the CSDA article.

Now let's apply our theme of the day, laziness:

Why no Wegmania?

| 8 Comments

A colleague asks:

When I search the web, I find the story [of the article by Said, Wegman, et al. on social networks in climate research, which was recently bumped from the journal Computational Statistics and Data Analysis because of plagiarism] only on blogs, USA Today, and UPI. Why is that? Any idea why it isn't reported by any of the major newspapers?

Here's my answer:

1. USA Today broke the story. Apparently this USA Today reporter put a lot of effort into it. The NYT doesn't like to run a story that begins, "Yesterday, USA Today reported..."

2. To us it's big news because we're statisticians. [The main guy in the study, Edward Wegman, won the Founders Award from the American Statistical Association a few years ago.] To the rest of the world, the story is: "Obscure prof at an obscure college plagiarized an article in a journal that nobody's ever heard of." When a Harvard scientist paints black dots on white mice and says he's curing cancer, that's news. When Prof. Nobody retracts an article on social networks, that's not so exciting. True, there's the global warming connection. I think it's possible the story will develop further. If these statisticians get accused of lying to Congress, that could hit the papers.

Basically, plagiarism is exciting to academics but not so thrilling to the general public if no celebrities are involved. I expect someone at the Chronicle of Higher Education

3. One more thing: newspapers like to report things that are clearly news: earthquakes, fires, elections, arrests, . . . If criminal charges come up or if someone starts suing, then I could see the court events as a hook on which to hang a news story.

Any other thoughts?

Baby name wizards

| 7 Comments

The other day I noticed a car with the improbable name of Nissan Rogue, from Darien, Connecticut (at least that's what the license plate frame said). And, after all, what could be more "rogue"-like than a suburban SUV?

I can't blame the driver of the car for this one; I'm just amused that the marketers and Nissan thought this was an appropriate name for the car.

Duncan Watts gave his new book the above title, reflecting his irritation with those annoying people who, upon hearing of the latest social science research, reply with: Duh-I-knew-that. (I don't know how to say Duh in Australian; maybe someone can translate that for me?) I, like Duncan, am easily irritated, and I looked forward to reading the book. I enjoyed it a lot, even though it has only one graph, and that graph has a problem with its y-axis. (OK, the book also has two diagrams and a graph of fake data, but that doesn't count.)

Before going on, let me say that I agree wholeheartedly with Duncan's central point: social science research findings are often surprising, but the best results cause us to rethink our world in such a way that they seem completely obvious, in retrospect. (Don Rubin used to tell us that there's no such thing as a "paradox": once you fully understand a phenomenon, it should not seem paradoxical any more. When learning science, we sometimes speak of training our intuitions.) I've jumped to enough wrong conclusions in my applied research to realize that lots of things can seem obvious but be completely wrong. In his book, Duncan does a great job at describing several areas of research with which he's been involved, explaining why this research is important for the world (not just a set of intellectual amusements) and why it's not as obvious as one might think at first.

I encountered this news article, "Chicago school bans some lunches brought from home":

At Little Village, most students must take the meals served in the cafeteria or go hungry or both. . . . students are not allowed to pack lunches from home. Unless they have a medical excuse, they must eat the food served in the cafeteria. . . . Such discussions over school lunches and healthy eating echo a larger national debate about the role government should play in individual food choices. "This is such a fundamental infringement on parental responsibility," said J. Justin Wilson, a senior researcher at the Washington-based Center for Consumer Freedom, which is partially funded by the food industry. . . . For many CPS parents, the idea of forbidding home-packed lunches would be unthinkable. . . .

If I had read this two years ago, I'd be at one with J. Justin Wilson and the outraged kids and parents. But last year we spent a sabbatical in Paris, where . . . kids aren't allowed to bring lunches to school. The kids who don't go home for lunch have to eat what's supplied by the lunch ladies in the cafeteria. And it's just fine. Actually, it was more than fine because we didn't have to prepare the kids' lunches every day. When school let out, the kids would run to the nearest boulangerie and get something sweet. So they didn't miss out on the junk food either.

I'm not saying the U.S. system or the French system is better, nor am I expressing an opinion on how they do things in Chicago. I just think it's funny how a rule which seems incredibly restrictive from one perspective is simply, for others, the way things are done. I'll try to remember this story next time I'm outraged at some intolerable violation of my rights.

P.S. If they'd had the no-lunches-from-home rule when I was a kid, I definitely would've snuck food into school. In high school the wait for lunchtime was interminable.

See more at the Statistics Forum (of course).

Jake Porway writes:

We launched Openpaths the other week. It's a site where people can privately upload and view their iPhone location data (at least until an Apple update wipes it out) and also download their data for their own use. More than just giving people a neat tool to view their data with, however, we're also creating an option for them to donate their data to research projects at varying levels of anonymity. We're still working out the terms for that, but we'd love any input and to get in touch with anyone who might want to use the data.

I don't have any use for this personally but maybe it will interest some of you.

From the webpage:

Mark Chaves sent me this great article on religion and religious practice:

After reading a book or article in the scientific study of religion, I [Chaves] wonder if you ever find yourself thinking, "I just don't believe it." I have this experience uncomfortably often, and I think it's because of a pervasive problem in the scientific study of religion. I want to describe that problem and how to overcome it.

The problem is illustrated in a story told by Meyer Fortes. He once asked a rainmaker in
a native culture he was studying to perform the rainmaking ceremony for him. The rainmaker refused, replying: "Don't be a fool, whoever makes a rain-making ceremony in the dry season?"

The problem is illustrated in a different way in a story told by Jay Demerath. He was in Israel, visiting friends for a Sabbath dinner. The man of the house, a conservative rabbi, stopped in the middle of chanting the prayers to say cheerfully: "You know, we don't believe in any of this. But then in Judaism, it doesn't matter what you believe. What's important is what you do."

And the problem is illustrated in yet another way by the Divinity School student who told me not long ago that she was having second thoughts about becoming an ordained minister in the United Church of Christ because she didn't believe in God. She also mentioned that, when she confided this to several UCC ministers, they told her not to worry about it since not believing in God wouldn't make her unusual among UCC clergy.

This last story reminds me of the saying, "It doesn't matter if you believe in God. What matters is if God believes in you."

Also, on a more serious note, I had a friend of a friend who joined a Roman Catholic religious order--she became a nun--because, according to my friend, this person was "looking for Sister Right." (In addition to everything else, she was a lesbian.) A couple of years later she quit, I believe. I have the impression that the generally positive press received by nuns etc. in our culture gives certain naive and idealistic people false expectations of what they can achieve in such a position. (This is true of academia too, I'm sure!)

Here's Chaves's summary:

Religious congruence refers to consistency among an individual's religious beliefs and attitudes, consistency between religious ideas and behavior, and religious ideas, identities, or schemas that are chronically salient and accessible to individuals across contexts and situations. Decades of anthropological, sociological, and psychological research establish that religious congruence is rare, but much thinking about religion presumes that it is common. The religious congruence fallacy [emphasis added] occurs when interpretations or explanations unjustifiably presume religious congruence.

This reminds me of a corresponding political congruence fallacy. My impression is that many people have a personal feeling of political congruence--they feel that all their political views form a coherent structure--even though the perceived-congruent views of person X will only partially overlap the perceived-congruent views of person Y. For example, X can be a Democrat and support legalized gambling, while Y is a Republican who supports legalized gambling, while persons A and B are a Democrat and a Republican who oppose gambling. Four positions, but each has a story of why they are coherent. (For example, X supports gambling as a way of raising tax money, Y supports gambling because he opposes the nanny state, A opposes gambling as a tax on the poor, and B opposes gambling as immoral.)

I've felt for awhile that this phenomenon, in which each of can frame our particular beliefs as being coherent, creates problems for politics. People are just too damn sure of themselves.

On another point, Chaves's discussion of placebo effects reminded me of my irritation of the research on the medical effects of so-called intercessory prayer (person A prays for person B, with B being unaware of the prayer). Every once in a while someone does a study on intercessory prayer which manages to reach the statistical significance threshold and gets published (I can only imagine that secular journal editors bend over backward to accept such papers and are terrified of appearing anti-religion) and gets mentioned in the more credulous or sensationalist quarters of the popular press.

What irritates me about these intercessory prayer studies is not that I care so much about prayer but because such studies seem to me to be a pseudo-scientific effort to remove the part of prayer that can actually work. It's plausible enough from a scientific (i.e., non-supernatural) perspective that if A prays for B with B's knowledge, that this could make B feel better. I doubt it could fix a broken heart valve but perhaps it could be calming enough that a certain heart attack might never happen. This makes sense and is, to my mind, perfectly consistent with a religious interpretation--why couldn't God work through the mechanism of friendship and caring? To me, the studies on intercessory prayer, by trying to isolate the supernatural aspect, end up removing the most interesting part of the story. In the language of Chaves's article, I'd call this an example of the coherence fallacy, the idea that the way to prove the effectiveness of prayer is to treat it as some sort of button-pushing.

I mentioned this above point to Chaves and he wrote:

I agree! Though this is maybe insulting only to those who self-consciously think of themselves as religious while also explicitly rejecting any sort of supernaturalism, and this type of person has become rarer in American society, which is part of the story behind the collapse of liberal Protestantism and the rise of religious "nones." I think it's easier for Jews than Christians to achieve and feel comfortable with this sort of self-conscious liberal religiosity, perhaps because of the ethnic identity aspects of being Jewish.

Interesting. I hadn't though of that.

Chaves also pointed me to this article by Wendy Cadge.

Finally, regarding the WWJD bracelet etc (no, that's not the same WWJD as our motto here in the Applied Statistics Center!)., there's something Chaves implies but doesn't say, which is that presumably the wearing of the bracelet is, in economists' jargon, "endogenous": the bracelet is intended to be part of a commitment device, so the "treatment" is not really the bracelet-wearing but rather the entire constellation of thoughts and behaviors associated with the decision to live a better life.

Bechdel wasn't kidding

| 19 Comments

Regular readers of this blog know about the Bechdel test for movies:

1. It has to have at least two women in it
2. Who talk to each other
3. About something besides a man

Amusing, huh? But I only really got the point the other day, when I was on a plane and passively watched parts of the in-flight movie. It was something I'd never heard of (of course) and it happened to be a chick flick--even without the soundtrack, it was clear that the main character was a woman and much of it was about her love life. But even this movie failed the Bechdel test miserably! I don't even think it passed item #1 above, but if it did, it certainly failed #2.

If even the chick flicks are failing the Bechdel test, then, yeah, we're really in trouble. And don't get me started on those old Warner Brothers cartoons. They're great but they feature about as many female characters as the average WWII submarine. Sure, everybody knows this, but it's still striking to think about just how unbalanced these things are.

Howard Wainer writes in the Statistics Forum:

The Chinese scientific literature is rarely read or cited outside of China. But the authors of this work are usually knowledgeable of the non-Chinese literature -- at least the A-list journals. And so they too try to replicate the alpha finding. But do they? One would think that they would find the same diminished effect size, but they don't! Instead they replicate the original result, even larger. Here's one of the graphs:

How did this happen?

Full story here.

I've heard from various sources that when you give a talk in an econ dept that they eat you alive: typically the audience showers you with questions and you are lucky to get past the second slide in your presentation. So far, though, I've given seminar talks in three economics departments--George Mason University a few years ago, Sciences Po last year, and Hunter College yesterday--and all three times the audiences have been completely normal. They did not interrupt unduly and they asked a bunch of good questions at the end. n=3, sure. But still.

Of beauty, sex, and power: Statistical challenges in estimating small effects.

Thurs 5 May at 11am at Roosevelt House, at 47-49 East 65th Street (north side of East 65th street, between Park and Madison Avenues).

Arrow's other theorem

| 12 Comments

I received the following email from someone who'd like to remain anonymous:

Lately I [the anonymous correspondent] witnessed that Bruno Frey has published two articles in two well known referreed journals on the Titanic disaster that try to explain survival rates of passenger on board.

The articles were published in the Journal of Economic Perspectives and Rationality & Society. While looking up the name of the second journal where I stumbled across the article I even saw that they put the message in a third journal, the Proceedings of the National Academy of Sciences United States of America.

To say it in Sopranos like style - with all due respect, I know Bruno Frey from conferences, I really appreciate his take on economics as a social science and he has really published more interesting stuff that most economists ever will. But putting the same message into three journals gives me headaches for at least two reasons:

1) When building a track record and scientific reputation, it's publish or perish. What about young scholars that may have interesting stuff to say, but get rejected for (sometimes) obscure reasons, especially if you have innovative ideas that run against the mainstream. Meanwhile acceptance is granted to papers with identical messages in three journals that causes both congestion in the review procedures in biases acceptance, assuming that for two of three articles that are not entirely unique two other manuscripts will be rejected from an editorial point of view to preserve exclusivity by sticking to low or constant acceptance rates. Do you see this as a problem? Or is the main point against this argument that if the other papers would have the quality they would be published.

2) As an author one usually gets the question on "are the results published in another journal" (and therefore not original) or "is this paper under review in an another journal". In their case the answer should be no for both answers as they report different results and use different methods in every paper. But if you check the descriptive statistics in the papers, they are awkwardly similar. At what point do these questions and the content overlap that it really causes problems for authors? Have you ever heard about any stories about double publications that were not authorized reprints or translations in other languages (which usually should not be problematic, as shown by the way in Frey publication list) and had to be withdrawn? Barely happens I guess.

Best regards and thank you for providing an open forum to discuss stuff like that.

I followed the links and read the abstracts. The three papers do indeed seem to describe similar work. But the abstracts are in remarkably different styles. The Rationality and Society abstract is short and doesn't say much. The Journal of Economic Perspectives abstract is long with lots of detail but, oddly, no conclusions! This abstract has the form of a movie trailer: lots of explosions, lots of drama, but no revealing of the plot. Finally, here's the PNAS abstract, which tells us what they found:

To understand human behavior, it is important to know under what conditions people deviate from selfish rationality. This study explores the interaction of natural survival instincts and internalized social norms using data on the sinking of the Titanic and the Lusitania. We show that time pressure appears to be crucial when explaining behavior under extreme conditions of life and death. Even though the two vessels and the composition of their passengers were quite similar, the behavior of the individuals on board was dramatically different. On the Lusitania, selfish behavior dominated (which corresponds to the classical homo economicus); on the Titanic, social norms and social status (class) dominated, which contradicts standard economics. This difference could be attributed to the fact that the Lusitania sank in 18 min, creating a situation in which the short-run flight impulse dominated behavior. On the slowly sinking Titanic (2 h, 40 min), there was time for socially determined behavioral patterns to reemerge. Maritime disasters are traditionally not analyzed in a comparative manner with advanced statistical (econometric) techniques using individual data of the passengers and crew. Knowing human behavior under extreme conditions provides insight into how widely human behavior can vary, depending on differing external conditions.

Interesting. My only quibble here is with the phrase "selfish rationality," which comes up in the very first sentence. As Aaron Edlin, Noah Kaplan, and I have stressed, rationality doesn't have to imply selfishness, and selfishness doesn't have to imply rationality. One can achieve unselfish goals rationally. For example, if I decide not to go on a lifeboat, I can still work to keep the peace and to efficiently pack people onto existing lifeboat slots. I don't think this comment of mine affects the substance of the Frey et al. papers; it's just a slight change of emphasis.

Regarding the other question, of how could the same paper be published three times, my guess is that a paper on the Titanic can partly get published for its novelty value: even serious journals like to sometimes run articles on offbeat topics. I wouldn't be surprised if the editors of each journal thought: Hey, this is fun. We don't usually publish this sort of thing, but, hey, why not? And then it appeared, three times.

How did this happen? Arrow's theorem. Let me explain.

Matthew Yglesias shares this graph from the Economist:

uglyassgraph.gif

I hate this graph. OK, sure, I don't hate hate hate hate it: it's not a 3-d exploding pie chart or anything. It's not misleading, it's just extremely difficult to read. Basically, you have to go back and forth between the colors and the labels and the countries and read it like a table. OK, so here's the table:

Average Hours Per Day Spent in Each Activity

           Work,   Unpaid  Eating, Personal
Country    study    work  sleeping   care   Leisure  Other

France       4        3       11       1       2       2
Germany      4        3       10       1       3       3
Japan        6        2       10       1       2       2
Britain      4        3       10       1       3       3
USA          5        3       10       1       3       2
Turkey       4        3       11       1       3       2

Hmm, that didn't work too well. Let's try subtracting the average from each column (for these six countries, the average (unweighted by population) time spent are 4.6 hours on paid work and study, 3.1 hours on unpaid work, 10.2 hours eating and sleeping, etc.):

% Excess Hours Per Day Spent in Each Activity
(compared to avg over all countries)

           Work,   Unpaid  Eating, Personal
Country    study    work  sleeping   care   Leisure  Other

France     -10%       0%    +10%    +50%     -20%    -20%
Germany    -10%       0%      0%    -10%     +10%    +20%
Japan      +40%     -20%      0%      0%     -20%      0%
Britain      0%       0%      0%    -10%     +10%    +10%
USA          0%       0%      0%    -20%     +10%      0%
Turkey     -10%      10%      0%    -20%     +10%    -10%

OK, the Japanese spent more time at work and the French spend more time grooming. Beyond that, I don't see these numbers as particularly "stereotype confirming" (in Yglesias's words). But I'm not fully up on my pop culture. What is the stereotype about Turkish people? I have the impression that in Dashiell Hammett's day they were called "Turks" and the detective was likely to be waylaid by one of them in a dark alley (this counts as "other activities," I believe), but I'm sure there are some new stereotypes I'm not aware of. Blogging counts as "unpaid work," right?

Anyway, my first thought was that the above ugly graph should be redone as a line plot, Here's what I came up with after an hour of work (yeah, yeah, I must have a lot of real work to do if I'm willing to put in this level of procrastination. On the upside, I'm pretty high on the procrastination ladder if I spend an hour on an R script as a way of taking a break!):

times.png Click to see the full-sized version.

I could've done this a little better--in particular, the text is hard to read--but it's basically what I was envisioning. [See P.S. below for something better.] Also, I don't really know what to make of the ordering of the countries or the ordering of the categories on the x-axis--I just copied what the Economist graph did.

Why do I like my display better? I like it because you can directly compare within a country--to see which activities are done more and which are done less, compared to the average. And you can also compare between countries to see where people spend more time on any particular activity. This between-country comparison would be clearer if we put all the lines on the same graph, but that looks a bit busy to me and I'm happier with the separate line plots. If you had data on a lot of countries I could see batching them (for example, the lines for northern European countries on one plot, the lines for Southern European countries on another, and other plots for English-speaking countries, east Asian countries, south Asian countries, Middle Eastern/North Africa, sub-Saharan Africa, and Latin America).

I can see where the Economist's graphics designers were coming from with their plots. In any country, the categories add to 24 hours, and the circle plot enforces that constraint. (They could've made pie charts but everyone knows how bad that is.) But there are a lot of categories so they needed colors and a legend. And the circle arcs are hard to compare so they needed to put in the exact numbers. The result, though, doesn't work for me. I mean, sure, maybe it was fine--Matthew Yglesias is more in the target audience of the Economist than I am, and he liked the graph--but I think it could've been much better. And I'm sure that if a graphics designer worked with me on it, the graph could be better still.

At some point this would represent a bit too much effort spent on one particular graph in a weekly newspaper. But if we have enough good examples of these, they could represent a template that could be used all over.

P.S. I was dissatisfied with my graph above because of lack of readability of the labels. So I spent another 45 minuteshour to make this:

times3.png

Wow! All the information, it's clear and readable, and I got it in under 600 x 250 resolution on a png. I like it.

P.P.S. Here's the R code I used to make the graphs.

P.P.P.S. See here for yet another version.

Why Edit Wikipedia?

| 17 Comments

Zoe Corbyn's article for The Guardian (UK), titled Wikipedia wants more contributions from academics, and the followup discussion on Slashdot got me thinking about my own Wikipedia edits.

The article quotes Dario Taraborelli, a research analyst for the Wikimedia Foundation, as saying "Academics are trapped in this paradox of using Wikipedia but not contributing," Huh? I'm really wondering what man-in-the-street wrote all the great stats stuff out there. And what's the paradox? I use lots of things without contributing to them.

Taraborelli is further quoted as saying "The Wikimedia Foundation is looking at how it might capture expert conversation about Wikipedia content happening on other websites and feed it back to the community as a way of providing pointers for improvement."

This struck home. I recently went through the entry for latent Dirichlet allocation and found a bug in their derivation. I wrote up a revised derivation and posted it on my own blog.

But why didn't I go back and fix the Wikipedia? One, editing in their format is a pain. Second, as Corbyn's article points out, I was afraid I'd put in lots of work and my changes would be backed out. I wasn't worried that Wikipedia would erase whole pages, but apparently it's an issue for some these days. A real issue is that most of the articles are pretty good, and while they're not necessarily written the way I'd write them, they're good enough that I don't think it's worth rewriting the whole thing (also, see point 2).

If you're status conscious in a traditional way, you don't blog either. It's not what "counts" when it comes time for tenure and promotion. And if you think blogs don't count, which are at least attributed, what about Wikipedia? Well, encyclopedia articles and such never counted for much on your CV. I did a few handbook type things and then started turning them down, mainly because I'm not a big fan of the handbook format.

In that sense, it's just like teaching. I was told many times on tenure track that I shouldn't be "wasting" so much time teaching. I was even told by a dean at a major midwestern university that they barely even counted teaching. So is it any surprise we don't want to focus on teaching or writing encyclopedia articles?

A common aphorism among artificial intelligence practitioners is that A.I. is whatever machines can't currently do.

Adam Gopnik, writing for the New Yorker, has a review called Get Smart in the most recent issue (4 April 2011). Ostensibly, the piece is a review of new books, one by Joshua Foer, Moonwalking with Einstein: The Art and Science of Remembering Everything, and one by Stephen Baker Final Jeopardy: Man vs. Machine and the Quest to Know Everything (which would explain Baker's spate of Jeopardy!-related blog posts). But like many such pieces in highbrow magazines, the book reviews are just a cover for staking out a philosophical position. Gopnik does a typically New Yorker job in explaining the title of this blog post.

In my discussion of dentists-named-Dennis study, I referred to my back-of-the-envelope calculation that the effect (if it indeed exists) corresponds to an approximate 1% aggregate chance that you'll pick a profession based on your first name. Even if there are nearly twice as many dentist Dennises as would be expected from chance alone, the base rate is so low that a shift of 1% of all Dennises would be enough to do this. My point was that (a) even a small effect could show up when looking at low-frequency events such as the choice to pick a particular career or live in a particular city, and (b) any small effects will inherently be difficult to detect in any direct way.

Uri Simonsohn (the author of the recent rebuttal of the original name-choice article by Brett Pelham et al.) wrote:

Uh-oh

| 7 Comments

I don't know for sure, but I've long assumed that we get most of our hits from the link on the Marginal Revolution page. The bad news is that in their new design, they seem to have removed the blogroll!

Coauthorship norms

| 21 Comments

I followed this link from Chris Blattman to an article by economist Roland Fryer, who writes:

I [Fryer] find no evidence that teacher incentives increase student performance, attendance, or graduation, nor do I find any evidence that the incentives change student or teacher behavior.

What struck me were not the findings (which, as Fryer notes in his article, are plausible enough) but the use of the word "I" rather than "we." A field experiment is a big deal, and I was surprised to read that Fryer did it all by himself!

Here's the note of acknowledgments (on the first page of the article):

This project would not have been possible without the leadership and support of Joel Klein. I am also grateful to Jennifer Bell-Ellwanger, Joanna Cannon, and Dominique West for their cooperation in collecting the data necessary for this project, and to my colleagues Edward Glaeser, Richard Holden, and Lawrence Katz for helpful comments and discussions. Vilsa E. Curto, Meghan L. Howard, Won Hee Park, Jörg Spenkuch, David Toniatti, Rucha Vankudre, and Martha Woerner provided excellent research assistance.

Joel Klein was the schools chancellor so I assume he wasn't deeply involved in the study; his role was presumably to give it his OK. I'm surprised that none of the other people ended up as coauthors on the paper. But I guess it makes sense: My colleagues and I will write a paper based on survey data without involving the data collectors as coauthors, so why not do this with experimental data too? I guess I just find field experiments so intimidating that I can't imagine writing an applied paper on the topic without a lot of serious collaboration. (And, yes, I feel bad that it was only my name on the cover of Red State, Blue State, given that the book had five authors.) Perhaps the implicit rules about coauthorship are different in economics than in political science.

P.S. I was confused by one other thing in Fryer's article. On page 1, it says:

Despite these reforms to increase achievement, Figure 1 demonstrates that test scores have been largely constant over the past thirty years.

Here's Figure 1:

testtrends.png

Once you get around the confusingly-labeled lines and the mass of white space on the top and bottom of each graph, you see that math scores have improved a lot! Since 1978, fourth-grade math scores have gone up so much that they're halfway to where eighth grade scores were in 1978. Eighth grade scores also have increased substantially, and twelfth-grade scores have gone up too (although not by as much). Nothing much has happened with reading scores, though. Perhaps Fryer just forgot to add the word "reading" in the sentence above. Or maybe something else is going on in Figure 1 that I missed. I only wish that he'd presented the rest of his results graphically. Even a sloppy graph is a lot easier for me to follow than a table full of numbers presented to three decimal places. I know Fryer can do better; his previous papers had excellent graphs (see here and here).

Will Wilkinson adds to the discussion of Jonathan Haidt's remarks regarding the overwhelming prevalance of liberal or left-wing attitudes among psychology professors. I pretty much agree with Wilkinson's overview:

Folks who constantly agree with one another grow insular, self-congratulatory, and not a little lazy. The very possibility of disagreement starts to seem weird or crazy. When you're trying to do science about human beings, this attitude's not so great.

Wilkinson also reviewed the work of John Jost in this area. Jost is a psychology researcher with the expected liberal/left political leanings, but his relevance here is that he has actually done research on political attitudes and personality types. In Wilkinson's words:

Jost has done plenty of great work that helps explain not only why the best minds in science are liberal, but why most scientists-most academics, even-are liberal. Individuals with the personality trait that most strongly predicts an inclination toward liberal politics also predict an attraction to academic careers. That's why, as Haidt well notes, it's silly to expect the distribution of political opinion in academia to mirror the distribution of opinion in society at large. . . . one of the most interesting parts of Jost's work shows how personality, which is largely hereditary, predicts political affinity. Of the "Big Five" personality traits, "openness to experience" and "conscientiousness" stand out for their effects on political inclination. . . . the content of conservatism and liberalism changes over time. We live in a liberal and liberalizing culture, so today's conservatives, for example, are very liberal compared to conservatives of their grandparents' generation. But there is a good chance they inherited some of their tendency toward conservatism from grandparents.

University professors and military officers

The cleanest analogy, I think, is between college professors (who are disproportionately liberal Democrats) and military officers (mostly conservative Republicans; see this research by Jason Dempsey). In both cases there seems to be a strong connection between the environment and the ideology. Universities have (with some notable exceptions) been centers of political radicalism for centuries, just as the military has long been a conservative institution in most places (again, with some exceptions).

And this is true even though many university professors are well-paid, live well, and send their kids to private schools, and even though the U.S. military has been described as the one of the few remaining bastions of socialism remaining in the 21st century.

Responding to a proposal to move the journal Political Analysis from double-blind to single-blind reviewing (that is, authors would not know who is reviewing their papers but reviewers would know the authors' names), Tom Palfrey writes:

I agree with the editors' recommendation. I have served on quite a few editorial boards of journals with different blinding policies, and have seen no evidence that double blind procedures are a useful way to improve the quality of articles published in a journal. Aside from the obvious administrative nuisance and the fact that authorship anonymity is a thing of the past in our discipline, the theoretical and empirical arguments in both directions lead to an ambiguous conclusion. Also keep in mind that the editors know the identity of the authors (they need to know for practical reasons), their identity is not hidden from authors, and ultimately it is they who make the accept/reject decision, and also lobby their friends and colleagues to submit "their best work" to the journal. Bias at the editorial level is far more likely to affect publication decisions than bias at the referee level, and double blind procedures don't affect this. One could argue then that perhaps the main thing double blinding does is shift the power over journal content even further from referees and associate editors to editors. It certainly increases the informational asymmetry.

Another point of fact is that the use of double blind procedures in economics and political science shares essentially none of the justifications for it with the other science disciplines from which the idea was borrowed. In these other disciplines, like biology, such procedures exist for different (and good) reasons. Rather than a concern about biasing in favor of well-known versus lesser-known authors, in these other fields it is driven by a concern of bias because of the rat-race competition over a rapidly moving frontier of discovery. Because of the speed at which the frontier is moving, authors of new papers are intensely secretive (almost paranoid) about their work. Results are kept under wrap until the result has been accepted for publication - or in some cases until it is actually published. [Extra, Extra, Read All About It: PNAS article reports that Caltech astronomer Joe Shmoe discovered a new planet three months ago...] Double blind is indeed not a fiction in these disciplines. It is real, and it serves a real purpose. Consider the contrast with our discipline, in which many researchers drool over invitations from top places to present their newest results, even if the paper does not yet exist or is in very rough draft form. Furthermore, financial incentives for bias in these other disciplines are very strong, given the enormous stakes of funding. [Think how much a new telescope costs.] Basically none of the rationales for double blinding in those disciplines applies to political science. One final note. In those disciplines, editors are often "professional" editors. That is, they do not have independent research careers. This may have to do with the potential bias that results from intense competition in disciplines where financial stakes are enormous and the frontier of discovery moves at 'blinding' speed.

Tom's comparison of the different fields was a new point to me and it seems sensible.

I'd also add that I'm baffled by many people's attitudes toward reviewing articles for journals. As I've noted before, I don't think people make enough of the fact that editing and reviewing journal articles is volunteer work. Everyone's always getting angry at referees and saying what they should or should not do, but, hey--we're doing it for free. In this situation, I think it's important to get the most you can out of all participants.

John Cook links to a blog by Ben Deaton arguing that people often waste time trying to set up ideal working conditions, even though (a) your working conditions will never be ideal, and (b) the sorts of constraints and distractions that one tries to avoid, can often stimulate new ideas.

Deaton seems like my kind of guy--for one thing, he works on nonlinear finite element analysis, which is one of my longstanding interests--and in many ways his points are reasonable and commonsensical (I have little doubt, for example, that Feynman made a good choice in staying clear of the Institute for Advanced Study!), but I have a couple of points of disagreement.

1. In my experience, working conditions can make a difference. And once you accept this, it could very well make sense to put some effort into improving your work environment. I like to say that I spent twenty years reconstructing what it felt like to be in grad school. My ideal working environment has lots of people coming in and out, lots of opportunities for discussion, planned and otherwise. It's nothing like I imagine the Institute for Advanced Study (not that I've ever been there) but it makes me happy. So I think Deaton is wrong to generalize to "don't spend time trying to keep a very clean work environment" to "don't spend time trying to get a setup that works for you."

2. Also consider effects on others. I like to feel that the efforts I put into my work environment have positive spillovers on others--the people I work with, the other people they work with, etc., also as setting an example for others in the department. In contrast, people who want super-clean work conditions (the sort of thing that Deaton, rightly, is suspicious of) can impose negative externalities on others. For example, one of the faculty in my department once removed my course listings from the department webpage. I never got a straight answer on why this happened, but I assumed it was because he didn't like what he taught, and it offended his sensibilities to see these courses listed. Removing the listing had the advantage from his perspective of cleanliness (I assume) but negatively impacted potential students and others who might have been interested in our course offerings. That is an extreme case, but I think many of us have experienced work environments in which intellectual interactions are discouraged in some way. This is clear from Deaton's stories as well.

3. Deaton concludes by asking his readers, "How ideal is ideal enough for you to do something great?" I agree with his point that there are diminishing returns to optimization and that you shouldn't let difficulties with our workplace stop us from doing good work (unless, of course, you're working somewhere where your employer gets possession of everything you do). But I am wary of his implicit statement that "you" (whoever you are) can "do something great." I think we should all try to do our best, and I'm sure that almost all of us are capable of doing good work. But is everyone out there really situated in a place where he or she can "do something great"? I doubt it. Doing something "great" is a fine aspiration, but I wonder if some of this go-for-it advice can backfire for the people out there who really aren't in a position to achieve greatness.

Statisticians vs. everybody else

| 7 Comments

Statisticians are literalists.

When someone says that the U.K. boundary commission's delay in redistricting gave the Tories an advantage equivalent to 10 percent of the vote, we're the kind of person who looks it up and claims that the effect is less than 0.7 percent.

When someone says, "Since 1968, with the single exception of the election of George W. Bush in 2000, Americans have chosen Republican presidents in times of perceived danger and Democrats in times of relative calm," we're like, Hey, really? And we go look that one up too.

And when someone says that engineers have more sons and nurses have more daughters . . . well, let's not go there.

So, when I was pointed to this blog by Michael O'Hare making the following claim, in the context of K-12 education in the United States:

My [O'Hare's] favorite examples of this junk [educational content with no workplace value] are spelling and pencil-and-paper algorithm arithmetic. These are absolutely critical for a clerk in an office of fifty years ago, but being good at them is unrelated to any real mental ability (what, for example, would a spelling bee in Chinese be?) and worthless in the world we live in now. I say this, by the way, aware that I am the best speller that I ever met (and a pretty good typist). But these are idiot-savant abilities, genetic oddities like being able to roll your tongue. Let's just lose them.

My first reaction was: Are you sure? I also have no systematic data on this, but I strongly doubt that being able to spell and add are "unrelated to any real world abilities" and are "genetic oddities like being able to roll your tongue." For one thing, people can learn to spell and add but I think it's pretty rare for anyone to learn how to roll their tongue! Beyond this, I expect that one way to learn spelling is to do a lot of reading and writing, and one way to learn how to add is to do a lot of adding (by playing Monopoly or whatever). I'd guess that these are indeed related to "real mental ability," however that is defined.

My guess is that, to O'Hare, my reactions would miss the point. He's arguing that schools should spend less time teaching kids spelling and arithmetic, and his statements about genetics, rolling your tongue, and the rest are just rhetorical claims. I'm guessing that O'Hare's view on the relation between skills and mental ability, say, is similar to Tukey's attitude about statistical models: they're fine as an inspiration for statistical methods (for Tukey) or as an inspiration for policy proposals (for O'Hare), but should not be taken literally. That things I write are full of qualifications, which might be a real hindrance if you're trying to propose policy changes.

Steve Hsu has posted a series of reflections here, here, and here on the dominance of graduates of HYPS (Harvard, Yale, Princeton, and Stanford (in that order, I believe)) in various Master-of-the-Universe-type jobs at "elite law firms, consultancies, and I-banks, hedge/venture funds, startups, and technology companies." Hsu writes:

In the real world, people believe in folk notions of brainpower or IQ. ("Quick on the uptake", "Picks things up really fast", "A sponge" ...) They count on elite educational institutions to do their g-filtering for them. . . .

Most top firms only recruit at a few schools. A kid from a non-elite UG school has very little chance of finding a job at one of these places unless they first go to grad school at, e.g., HBS, HLS, or get a PhD from a top place. (By top place I don't mean "gee US News says Ohio State's Aero E program is top 5!" -- I mean, e.g., a math PhD from Berkeley or a PhD in computer science from MIT -- the traditional top dogs in academia.) . . .

I teach at U Oregon and out of curiosity I once surveyed the students at our Honors College, which has SAT-HSGPA characteristics similar to Cornell or Berkeley. Very few of the kids knew what a venture capitalist or derivatives trader was. Very few had the kinds of life and career aspirations that are *typical* of HYPS or techer kids. . . .

I have just a few comments.

1. Getting in to a top college is not the same as graduating from said college--and I assume you have to have somewhat reasonable grades (or some countervailing advantage). So, yes, the people doing the corporate hiring are using the educational institutions to do their "g-filtering," but it's not all happening at the admissions stage. Hsu quotes researcher Lauren Rivera as writing, "it was not the content of an elite education that employers valued but rather the perceived rigor of these institutions' admissions processes"--but I don't know if I believe that!

2. As Hsu points out (but maybe doesn't emphasize enough), the selection processes at these top firms don't seem to make a lot of sense even on their own terms. Here's another quote from Rivera: "his halo effect of school prestige, combined with the prevalent belief that the daily work performed within professional service firms was "not rocket science" gave evaluators confidence that the possession of an elite credential was a sufficient signal of a candidate's ability to perform the analytical capacities of the job." The reasoning seems to be: The job isn't so hard so the recruiters can hire whoever they want if such people pass a moderately stringent IQ threshold, thus they can pick the HYPS graduates who they like. It seems like a case of the lexicographic fallacy: the idea that you pick IQ based on the school and then clubbability, etc., among the subset of applicants who remain.

3. I should emphasize that academic hiring is far from optimal. We never know who's going to apply for our postdoc positions. And, when it comes to faculty hiring, I think Don Rubin put it best when he said that academic hiring committees all to often act as if they're giving out an award rather than trying to hire someone to do a job. And don't get me started on tenure review committees.

4. Regarding Hsu's last point above, I've long been glad that I went to MIT rather than Harvard, maybe not overall--I was miserable in most of college--but for my future career. Either place I would've taken hard classes and learned a lot, but one advantage of MIT was that we had no sense--no sense at all--that we could make big bucks. We had no sense of making moderately big bucks as lawyers, no sense of making big bucks working on Wall Street, and no sense of making really big bucks by starting a business. I mean, sure, we knew about lawyers (but we didn't know that a lawyer with technical skills would be a killer combination), we knew about Wall Street (but we had no idea what they did, other than shout pork belly prices across a big room), and we knew about tech startups (but we had no idea that they were anything to us beyond a source of jobs for engineers). What we were all looking for was a good solid job with cool benefits (like those companies in California that had gyms at the office). I majored in physics, which my friends who were studying engineering thought was a real head-in-the-clouds kind of thing to do, not really practical at all. We really had no sense that a physicist degree from MIT degree with good grades was a hot ticket.

And it wasn't just us, the students, who felt this way. It was the employers too. My senior year I applied to some grad schools (in physics and in statistics) and to some jobs. I got into all the grad schools and got zero job interviews. Not just zero jobs. Zero interviews. And these were not at McKinsey, Goldman Sachs, etc. (none of which I'd heard of). They were places like TRW, etc. The kind of places that were interviewing MIT physics grads (which is how I thought of applying for these jobs in the first place). And after all, what could a company like that do with a kid with perfect physics grades from MIT? Probably not enough of a conformist, eh?

This was fine for me--grad school suited me just fine. I'm just glad that big-buck$ jobs weren't on my radar screen. I think I would've been tempted by the glamour of it all. If I'd gone to college 10 or 20 years later, I might have felt that as a top MIT grad, I had the opportunity--even the obligation, in a way--to become some sort of big-money big shot. As it was, I merely thought i had the opportunity and obligation to make important contributions in science, which is a goal that I suspect works better for me (and many others like me).

P.S. Hsu says that "much of (good) social science seems like little more than documenting what is obvious to any moderately perceptive person with the relevant life experience." I think he might be making a basic error here. If you come up with a new theory, you'll want to do two things: (a) demonstrate that it predicts things you already know, and (b) use it to make new predictions. To develop, understand, and validate a theory, you have to do a lot of (a)--hence Hsu's impression--in order to be ready to do (b).

A simpler response to Hsu is that it's common for "moderately perceptive persons with the relevant life experience" to disagree with each other. In my own field of voting and elections, even someone as renowned as Michael Barone (who is more than moderately perceptive and has much more life experience than I do) can still get things embarrassingly wrong. (My reflections on "thinking like a scientist" may be relevant here.)

P.P.S. Various typos fixed.

Dennis the dentist, debunked?

| 4 Comments

Devah Pager points me to this article by Uri Simonsohn, which begins:

Three articles published [by Brett Pelham et al.] have shown that a disproportionate share of people choose spouses, places to live, and occupations with names similar to their own. These findings, interpreted as evidence of implicit egotism, are included in most modern social psychology textbooks and many university courses. The current article successfully replicates the original findings but shows that they are most likely caused by a combination of cohort, geographic, and ethnic confounds as well as reverse causality.

From Simonsohn's article, here's a handy summary of the claims and the evidence (click on it to enlarge):

simonsohn1.png

The Pelham et al. articles have come up several times on the blog, starting with this discussion and this estimate and then more recently here. I'm curious what Pelham and his collaborators think of Simonsohn's claims.

Andrew has pointed to Jonathan Livengood's analysis of the correlation between poverty and PISA results, whereby schools with poorer students get poorer test results. I'd have written a comment, but then I couldn't have inserted a chart.

Andrew points out that a causal analysis is needed. This reminds me of an intervention that has been done before: take a child out of poverty, and bring him up in a better-off family. What's going to happen? There have been several studies examining correlations between adoptive and biological parents' IQ (assuming IQ is a test analogous to the math and verbal tests, and that parent IQ is analogous to the quality of instruction - but the point is in the analysis not in the metric). This is the result (from Adoption Strategies by Robin P Corley in Encyclopedia of Life Sciences):

adoptive-birth.png

So, while it did make a difference at an early age, with increasing age of the adopted child, the intelligence of adoptive parents might not be making any difference whatsoever in the long run. At the same time, the high IQ parents could have been raising their own child, and it would probably take the same amount of resources.

There are conscientious people who might not choose to have a child because they wouldn't be able to afford to provide to their own standard (their apartment is too small, for example, or they don't have enough security and stability while being a graduate student). On the other hand, people with less comprehension might neglect this and impose their child on society without the means to provide for him. Is it good for society to ask the first group to pay taxes, and reallocate the funds to the second group? I don't know, but it's a very important question.

I am no expert, especially not in psychology, education, sociology or biology. Moreover, there is a lot more than just IQ: ethics and constructive pro-social behavior are probably more important, and might be explained a lot better by nurture than nature.

I do know that I get anxious whenever a correlation analysis tries to look like a causal analysis. A frequent scenario introduces an outcome (test performance) with a highly correlated predictor (say poverty), and suggests that reducing poverty will improve the outcome. The problem is that poverty is correlated with a number of other predictors. A solution I have found is to understand that multiple predictors information about the outcome overlaps - a tool I use is interaction analysis, whereby we explicate that two predictors' information overlaps (in contrast to regression coefficients which misleadingly separate the contributions of each predictors). But the real solution is a study of interventions, and the twin and adoptive studies with a longer time horizon are pretty rigorous. I'd be curious about similarly rigorous studies of educational interventions, or about the flaws in the twin and adoptive studies.

[Feb 7, 8:30am] An email points out a potential flaw in the correlation analysis:


The thing which these people systematically missed, was that we don't really care at all about the correlation between the adopted child's IQ and that of the adopted parent. The right measure of effect is to look at the difference in IQ level.

Example to drive home the point: Suppose the IQ of every adoptive parent is 120, while the IQ of the biological parents is Normal(100,15), as is that of the biological control siblings is, but that of the adopted children is Normal(110,15). The correlation between adopted children and adopted parents would be exactly zero (because the adopted parents are all so similar), but clearly adoption would have had a massive effect. And, yes, adopted parents, especially in these studies, are very different from the norm, and similar to each other: I don't know about the Colorado study, but in the famous Minnesota twins study, the mean IQ of the adoptive fathers was indeed 120, as compared to a state average of 105.

The review paper you link to is, so far as I can tell, completely silent about these obvious-seeming points.

I would add that correlations are going to be especially misleading for causal inference in any situation where a variable is being regulated towards some goal level, because, if the regulation is successful. It's like arguing that the temperature in my kitchen is causally irrelevant to the temperature in my freezer --- it's uncorrelated, but only because a lot of complicated machinery does a lot of work to keep it that way! With that thought in mind, read this.


Indeed, the model based on correlation doesn't capture the improvement in the average IQ of what the adoptive child would have if brought up in an orphanage or by unwilling or incapable biological parents (as arguably all children put up for adoption are) vs being brought up in a well-functioning family (as probably all adoptive families are). And comments like these are precisely why we should discuss these topics systematically, so that better models can be developed and studied! As a European I am regularly surprised how politicized this topic seems to be in the US. It's an important question that needs more rigor.

Thanks for the emails and comments, they're the main reason why I still write these blog posts.

Benedict Carey writes a follow-up article on ESP studies and Bayesian statistics. (See here for my previous thoughts on the topic.) Everything Carey writes is fine, and he even uses an example I recommended:

The statistical approach that has dominated the social sciences for almost a century is called significance testing. The idea is straightforward. A finding from any well-designed study -- say, a correlation between a personality trait and the risk of depression -- is considered "significant" if its probability of occurring by chance is less than 5 percent.

This arbitrary cutoff makes sense when the effect being studied is a large one -- for example, when measuring the so-called Stroop effect. This effect predicts that naming the color of a word is faster and more accurate when the word and color match ("red" in red letters) than when they do not ("red" in blue letters), and is very strong in almost everyone.

"But if the true effect of what you are measuring is small," said Andrew Gelman, a professor of statistics and political science at Columbia University, "then by necessity anything you discover is going to be an overestimate" of that effect.

The above description of classical hypothesis testing isn't bad. Strictly speaking, one would follow "is less than 5 percent" above with "if the null hypothesis of zero effect were actually true," but they have serious space limitations, and I doubt many readers would get much out of that elaboration, so I'm happy with what Carey put there.

One subtlety that he didn't quite catch was the way that researchers mix the Neyman-Pearson and Fisher approaches to inference. The 5% cutoff (associated with Neyman and Pearson) is indeed standard, and it is indeed subject to all the problems we know about, most simply that statistical significance occurs at least 5% of the time, so if you do a lot of experiments you're gonna have a lot of chances to find statistical significance. But p-values are also used as a measure of evidence: that's Fisher's approach and it leads to its own problems (as discussed in the news article as well).

The other problem, which is not so well known, comes up in my quote: when you're studying small effects and you use statistical significance as a filter and don't do any partial pooling, whatever you have that's left standing that survives the filtering process will overestimate the true effect. And classical corrections for "multiple comparisons" do not solve the problem: they merely create a more rigorous statistical significance filter, but anything that survives that filter will be even more of an overestimate.

If classical hypothesis testing is so horrible, how is it that it could be so popular? In particular, what was going on when a well-respected researcher like this ESP guy would use inappropriate statistical methods.

My answer to Carey was to give a sort of sociological story, which went as follows.

Psychologists have experience studying large effects, the sort of study in which data from 24 participants is enough to estimate a main effect and 50 will be enough to estimate interactions of interest. I gave the example of the Stroop effect (they have a nice one of those on display right now at the Natural History Museum) as an example of a large effect where classical statistics will do just fine.

My point was, if you've gone your whole career studying large effects with methods that work, then it's natural to think you have great methods. You might not realize that your methods, which appear quite general, actually fall apart when applied to small effects. Such as ESP or human sex ratios.

The ESP dude was a victim of his own success: His past accomplishments studying large effects gave him an unwarranted feeling of confidence that his methods would work on small effects.

This sort of thing comes up a lot, and in my recent discussion of Efron's article, I list it as my second meta-principle of statistics, the "methodological attribution problem," which is that people think that methods that work in one sort of problem will work in others.

The other thing that Carey didn't have the space to include was that Bayes is not just about estimating the weight of evidence in favor of a hypothesis. The other key part of Bayesian inference--the more important part, I'd argue--is "shrinkage" or "partial pooling," in which estimates get pooled toward zero (or, more generally, toward their estimates based on external information).

Shrinkage is key, because if all you use is a statistical significance filter--or even a Bayes factor filter--when all is said and done, you'll still be left with overestimates. Whatever filter you use--whatever rule you use to decide whether something is worth publishing--I still want to see some modeling and shrinkage (or, at least, some retrospective power analysis) to handle the overestimation problem. This is something Martin and I discussed in our discussion of the "voodoo correlations" paper of Vul et al.

Should the paper have been published in a top psychology journal?

Real-life psychology researcher Tal Yarkoni adds some good thoughts but then he makes the ridiculous (to me) juxtaposition of the following two claims: (1) The ESP study didn't find anything real, there's no such thing as ESP, and the study suffered many methodological flaws, and (2) The journal was right to publish the paper.

If you start with (1), I don't see how you get to (2). I mean, sure, Yarkoni gives his reasons (basically, the claim that the ESP paper, while somewhat crappy, is no crappier than most papers that are published in top psychology journals), but I don't buy it. If the effect is there, why not have them demonstrated it for real? I mean, how hard would it be for the experimenters to gather more data, do some sifting, find out which subjects are good at ESP, etc. There's no rush, right? No need to publish preliminary, barely-statistically-significant findings. I don't see what's wrong with the journal asking for better evidence. It's not like a study of the democratic or capitalistic peace, where you have a fixed amount of data and you have to learn what you can. In experimental psychology, once you have the experiment set up, it's practically free to gather more data.

P.S. One thing that saddens me is that, instead of using the sex-ratio example (which I think would've been perfect for this article, Carey uses the following completely fake example:

Consider the following experiment. Suppose there was reason to believe that a coin was slightly weighted toward heads. In a test, the coin comes up heads 527 times out of 1,000.

And they he goes on two write about coin flipping. But, as I showed in my article with Deb, there is no such thing as a coin weighted to have a probability p (different from 1/2) of heads.

OK, I know about fake examples. I'm writing an intro textbook, and I know that fake examples can be great. But not this one!

P.P.S. I'm also disappointed he didn't use the famous dead-fish example, where Bennett, Baird, Miller, and Wolferd found statistically significant correlations in an MRI of a dead salmon. The correlations were not only statistically significant, they were large and newsworthy!

P.P.P.S. The Times does this weird thing with its articles where it puts auto-links on Duke University, Columbia University, and the University of Missouri. I find this a bit distracting and unprofessional.

A colleague recently sent me a copy of some articles on the estimation of treatment interactions (a topic that's interested me for awhile). One of the articles, which appeared in the Lancet in 2000, was called "Subgroup analysis and other (mis)uses of baseline data in clinical trials," by Susan F. Assmann, Stuart J. Pocock, Laura E. Enos, and Linda E. Kasten. . . .

Hey, wait a minute--I know Susan Assmann! Well, I sort of know her. When I was a freshman in college, I asked my adviser, who was an applied math prof, if I could do some research. He connected me to Susan, who was one of his Ph.D. students, and she gave me a tiny part of her thesis to work on.

The problem went as follows. You have a function f(x), for x going from 0 to infinity, that is defined as follows. Between 0 and 1, f(x)=x. Then, for x higher than 1, f'(x) = f(x) - f(x-1). The goal is to figure out what f(x) does. I think I'm getting this right here, but I might be getting confused on some of the details. The original form of the problem had some sort of probability interpretation, I think--something to do with a one-dimensional packing problem, maybe f(x) was the expected number of objects that would fit in an interval of size x, if the objects were drawn from a uniform distribution. Probably not that, but maybe something of that sort.

One of the fun things about attacking this sort of problem as a freshman is that I knew nothing about the literature on this sort of problem or even what it was called (a differential-difference equation, or it can also be formulated using as an integral). Nor was I set up to do any simulations on the computer. I just solved the problem from scratch. First I figured out the function in the range [1,2], [2,3], and so forth, then I made a graph (pencil on graph paper) and conjectured the asymptotic behavior of f. The next step was to prove my conjecture. It ate at me. I worked on the problem on and off for about eleven months, then one day I finally did it: I had carefully proved the behavior of my function! This accomplishment gave me a warm feeling for years after.

I never actually told Susan Assmann about this--I think that by then she had graduated, and I never found out whether she figured out the problem herself as part of her Ph.D. thesis or whether it was never really needed in the first place. And I can't remember if I told my adviser. (He was a funny guy: extremely friendly to everyone, including his freshman advisees, but one time we were in his office when he took a phone call. He was super-friendly during the call, then after the call was over he said, "What an asshole." After this I never knew whether to trust the guy. If he was that nice to some asshole on the phone, what did it mean that he was nice to us?) I switched advisers. the new adviser was much nicer--I knew him because I'd taken a class with him--but it didn't really matter since he was just another mathematician. I was lucky enough to stumble into statistics, but that's another story.

Anyway, it was funny to see that name--Susan Assmann! I did a quick web search and I'm pretty sure it is the same person. And her paper was cited 430 times--that's pretty impressive!

P.S. The actual paper by Assmann et al. is reasonable. It's a review of some statistical practice in medical research. They discuss the futility of subgroup analysis given that, compared to main effects, interactions are typically (a) smaller in magnitude and (b) estimated with larger standard errors. That's pretty much a recipe for disaster! (I made a similar argument in a 2001 article in Biostatistics, except that my article went in depth for one particular model and Assmann et al. were offering more general advice. And, unlike me, they had some data.) Ultimately I do think treatment interactions and subgroup analysis are important, but they should be estimated using multilevel models. If you try to estimate complex interactions using significance tests or classical interval estimation, you'll probably just be wasting your time, for reasons explained by Assmann et al.

A link from Tyler Cowen led me to this long blog article by Razib Khan, discussing some recent genetic findings on human origins in the context of the past twenty-five years of research and popularization of science.

Costless false beliefs

| 22 Comments

horsejpg.jpg

From the Gallup Poll:

Four in 10 Americans, slightly fewer today than in years past, believe God created humans in their present form about 10,000 years ago.

They've been asking the question since 1982 and it's been pretty steady at 45%, so in some sense this is good news! (I'm saying this under the completely unsupported belief that it's better for people to believe truths than falsehoods.)

The title of this blog post quotes the second line of the abstract of Goldstein et al.'s much ballyhooed 2008 tech report, Do More Expensive Wines Taste Better? Evidence from a Large Sample of Blind Tastings.

The first sentence of the abstract is

Individuals who are unaware of the price do not derive more enjoyment from more expensive wine.

Perhaps not surprisingly, given the easy target wine snobs make, the popular press has picked up on the first sentence of the tech report. For example, the Freakonomics blog/radio entry of the same name quotes the first line, ignores the qualification, then concludes

Wishing you the happiest of holiday seasons, and urging you to spend $15 instead of $50 on your next bottle of wine. Go ahead, take the money you save and blow it on the lottery.

In case you're wondering about whether to buy me a cheap or expensive bottle of wine, keep in mind I've had classical "wine training". After ten minutes of training with some side by side examples, you too will be able to distinguish traditional old world wine from 3-buck Chuck in a double blind tasting. Whether you'll be able to tell a quality village Volnay from a premier cru's another matter.

There's another problem with the experimental design. Wines that stand out in a side-by-side tasting are not necessarily the ones you want to pair with food or even drink all night on their own.

The other problem is that some people genuinely prefer the 3 buck Chuck. Most Americans I've observed, including myself, start out enjoying sweeter new world style wines and then over time gravitate to more structured (tannic), complex (different flavors) and acidic wines.

Followup questions

| 5 Comments

Upon returning from sabbatical I came across a few magazines from a year ago that I hadn't gotten around to reading. I'm thinking that I should read everything on a one-year delay. The too-topical stuff (for example, promos tied to upcoming movies) I can ignore, and other items are enhanced by knowing what happened a year or two later.

For example, the 11 May 2009 issue of the New Yorker featured an article by Douglas McGray about an organization in Los Angeles called Green Dot that runs charter schools. According to the article, Green Dot, unlike typical charter school operators, educate just about everyone in their schools' areas and so don't benefit so much from selection. I don't know enough about the details to evaluate these claims, but I was curious about this bit:

[L.A. schools superintendent] Cortines has also agreed in principle to a partnership in Los Angeles. . . . Green Dot could take over as many as five Los Angeles schools in 2010, and maybe more. This month, Barr expects to meet again with [teacher's union leader] Weingarten and her staff and outline plans for a Green Dot America . . . Their first city would most likely be Washington, D.C. "If we're successful there, we'll get the attention of a lot of lawmakers," Barr said. . . There are risks for Barr [the operator of Green Dot] in this kind of expansion. It will be months, and maybe years, before there's hard evidence about what Green Dot has accomplished at Locke [High School]. And that one takeover put a real strain on the organization. . . .

A year and a half have passed. What's happened, I wonder?

For awhile I've been curious (see also here) about the U-shaped relation between happiness and age (with people least happy, on average, in their forties, and happier before and after).

But when I tried to demonstrate it to me intro statistics course, using the General Social Survey, I couldn't find the famed U, or anything like it. Using pooled GSS data mixes age, period, and cohort, so I tried throwing in some cohort effects (indicators for decades) and a couple other variables, but still couldn't find that U.

So I was intrigued when I came across this paper by Paul Frijters and Tony Beatton, who write:

Whilst the majority of psychologists have concluded there is not much of a relationship at all, the economic literature has unearthed a possible U-shape relationship. In this paper we [Frijters and Beatton] replicate the U-shape for the German SocioEconomic Panel (GSOEP), and we investigate several possible explanations for it.

They write:

What is the relationship between happiness and age? Do we get more miserable as we get older, or are we perhaps more or less equally happy throughout our lives with only the occasional special event (marriage, birth, promotion, health shock) that temporarily raises or reduces our happiness, or do we actually get happier as life gets on and we learn to be content with what we have?

The answer to this question in the recent economic literature on the subject is that the age-happiness relationship is U-shaped. This finding holds for the US, Germany, Britain, Australia, Europe, and apparently even South Africa. The stylised finding is that individuals gradually get unhappier after their 18th birthday, with a dip around 50 followed by a gradual upturn in old age. The predicted effect of age can be quite large, i.e. the difference in average happiness between an 18 year old and a 50 year old can be as much as 1.5 points on a 10 point scale.

Their conclusion:

The inclusion of the usual socio-economic variables in a cross-section leads to a U-shape in age that results from indirectly-age-related reverse causality. Putting it simply: good things, like getting a job and getting married, appear to happen to middle aged individuals who were already happy. . . . The found effect of age in fixed-effect regressions is simply too large and too out of line with everything else we know to be believable. The difference between first-time respondents and stayers and between the number of years someone stays in the panel doesn't allow for explanations based on fixed traits or observables. There has to be either a problem on the left-hand side (i.e. the measurement of happiness over the life of a panel) or on the right-hand side (selection on time-varying unobservables).

They think it's a sample-selection bias and not a true U-shaped pattern. Another stylized fact bites the dust (perhaps).

. . . they're not in awe of economists.

In contrast, economists sometimes treat each other with the soft bigotry of low expectations. For example, here's Brad DeLong in defense of Larry Summers:

[During a 2005 meeting, Summers] said that in a modern economy with sophisticated financial markets we were likely to have more and bigger financial crises than we had before, just as the worst modern transportation accidents are worse than the worst transportation accidents back in horse-and-buggy days. . . . Indeed, for twenty years one of Larry's conversation openers has been: "You really should write something else good on positive-feedback trading and its dangers for financial markets."

That's fine, but, hey, I've been going around saying this for many years too, and I'm not even an economist (although I did get an A in the last econ class I took, which was in eleventh grade). Lots and lots of people have been talking for years about the dangers of positive feedback, the risks of insurers covering the small risks and thus increasing correlation in the system and setting up big risks, etc.

I don't think Summers, as one of the world's foremost economists, deserves much credit for noticing this theoretical problem too and going around telling people that they "really should write something" on the topic. You get credit by doing, not by telling other people to do.

I think Steve Hsu (see above link) gets the point. No one's going to go around saying that some physicist is a genius because he's been going around for twenty years with a conversation opener like, "Hey--general relativity and quantum mechanics are incoherent. You should really write something about how to put them together in a single mathematical model."

P.S. Just to be clear, I'm not trying to argue with DeLong on the economics here. He may be completely right that Rajan was wrong and Summers was right in their 2005 exchange. But I do think he's a bit too overawed by Summers's putative brilliance. In a dark room with many of the lights covered up by opaque dollar bills, even a weak and intermittent beam can appear brilliant, if you look right at it.

Prison terms for financial fraud?

| 18 Comments

My econ dept colleague Joseph Stiglitz suggests that financial fraudsters be sent to prison. He points out that the usual penalty--million-dollar fines--just isn't enough for crimes whose rewards can be in the hundreds of millions of dollars.

That all makes sense, but why do the options have to be:

1. No punishment

2. A fine with little punishment or deterrent value

3. Prison.

What's the point of putting nonviolent criminals in prison? As I've said before, I'd prefer if the government just took all these convicted thieves' assets along with 95% of their salary for several years, made them do community service (sorting bottles and cans at the local dump, perhaps; a financier should be good at this sort of thing, no?), etc. If restriction of personal freedom is judged be part of the sentence, they could be given some sort of electronic tag that would send a message to the police if you are ever more than 3 miles from your home. And a curfew so you have to stay home between the hours of 7pm and 7am. Also take away internet access and require that you live in a 200-square-foot apartment in a grungy neighborhood. And so forth. But no need to bill the taxpayers for the cost of prison.

Stiglitz writes:

When you say the Pledge of Allegiance you say, with "justice for all." People aren't sure that we have justice for all. Somebody is caught for a minor drug offense, they are sent to prison for a very long time. And yet, these so-called white-collar crimes, which are not victimless, almost none of these guys, almost none of them, go to prison.

To me, though, this misses the point. Why send minor drug offenders to prison for a very long time? Instead, why not just equip them with some sort of recorder/transmitter that has to be always on. If they can do all their drug deals in silence, then, really, how much trouble are they going to be causing?

Readers with more background in criminology than I will be able to poke holes in my proposals, I'm sure.

P.S. to the impatient readers out there: Yeah, yeah, I have some statistics items on deck. They'll appear at the approximate rate of one a day.

In response to my most recent post expressing bafflement over the Erving Goffman mystique, several commenters helped out by suggesting classic Goffman articles for me to read. Naturally, I followed the reference that had a link attached--it was for an article called Cooling the Mark Out, which analogized the frustrations of laid-off and set-aside white-collar workers to the reactions to suckers after being bilked by con artists.

Goffman's article was fascinating, but I was bothered by a tone of smugness. Here's a quote from Cooling the Mark Out that starts on the cute side but is basically ok:

In organizations patterned after a bureaucratic model, it is customary for personnel to expect rewards of a specified kind upon fulfilling requirements of a specified nature. Personnel come to define their career line in terms of a sequence of legitimate expectations and to base their self-conceptions on the assumption that in due course they will be what the institution allows persons to become.

It's always amusing to see white-collar types treated anthropologically, so that's fine. But then Goffman continues:

Sometimes, however, a member of an organization may fulfill some of the requirements for a particular status, especially the requirements concerning technical proficiency and seniority, but not other requirements, especially the less codified ones having to do with the proper handling of social relationships at work.

This seemed naive at best and obnoxious at worst. As if, whenever someone is not promoted, it's either because he can't do the job or he can't play the game. Unless you want to define this completely circularly (with "playing the game" retrospectively equaling whatever it takes to do to keep the job), this just seems wrong. In corporate and academic settings alike, lots of people get shoved aside either for reasons entirely beyond their control (e.g., a new division head comes in and brings in his own people) or out of simple economics.

Goffman was a successful organization man and couldn't resist taking a swipe at the losers in the promotion game. It wasn't enough for him to say that some people don't ascend the ladder; he had to attribute that to not fulfilling the "less codified [requirements] having to do with the proper handling of social relationships at work."

Well, no. In the current economic climate this is obvious, but even back in the 1960s there were organizations with too few slots at the top for all the aspirants at the bottom, and it seems a bit naive to suppose that not reaching the top rungs is necessarily a sign of improper handling of social relationships.

In this instance, Goffman seems like the classic case of a successful person who things that, hey, everybody could be a success where they blessed with his talent and social skills.

This was the only thing by Goffman I'd read, though, so to get a broader perspective I sent a note to Brayden King, the sociologist whose earlier post on Goffman had got me started on this.

King wrote:

People in sociology are mixed on their feelings about Goffman's scholarship. He's a love-him-or-hate-him figure. I lean more toward the love him side, if only because I think he really built up the symbolic interactionist theory subfield in sociology.

I think that one of the problems is that you're thinking of this as a proportion of variance problem, in which case I think you're right that "how you play the game" explains a lot less variance in job attainment than structural factors. Goffman wasn't really interested in explaining variance though. His style was to focus on a kind of social interaction and then try to explain the strategies or roles that people use in those interactions to engage in impression management. So, for him, a corporate workplace was interesting for the same reason an asylum is - they're both places where role expectations shape the way people interact and try to influence the perceptions that others have of them.

It's a very different style of scholarship, but nevertheless it's had a huge influence in sociology's version of social psych. The kind of work that is done in this area is highly qualitative, often ethnographic. From a variance-explanation perspective, though, I see your point. How much does "playing the game" really matter when the economy is collapsing and companies are laying off thousands of employees?

Greg Kaplan writes:

I noticed that you have blogged a little about interstate migration trends in the US, and thought that you might be interested in a new working paper of mine (joint with Sam Schulhofer-Wohl from the Minneapolis Fed) which I have attached.

Briefly, we show that much of the recent reported drop in interstate migration is a statistical artifact: The Census Bureau made an undocumented change in its imputation procedures for missing data in 2006, and this change significantly reduced the number of imputed interstate moves. The change in imputation procedures -- not any actual change in migration behavior -- explains 90 percent of the reported decrease in interstate migration between the 2005 and 2006 Current Population Surveys, and 42 percent of the decrease between 2000 and 2010.

I haven't had a chance to give a serious look so could only make the quick suggestion to make the graphs smaller and put multiple graphs on a page, This would allow the reader to better follow the logic in your reasoning.

But some of you might be interested in the substance of the paper. In any case, it's pretty scary how a statistical adjustment can have such a large effect. (Not that, in general, there's any way to use "unadjusted" data. As Little and Rubin have pointed out, lack of any apparent adjustment itself corresponds to some strong and probably horrible assumptions.)

P.S. See here for another recently-discovered problem with Census data.

A recent story about academic plagiarism spurred me to some more general thoughts about the intellectual benefits of not giving a damn.

I'll briefly summarize the plagiarism story and then get to my larger point.

Copying big blocks of text from others' writings without attribution

Last month I linked to the story of Frank Fischer, an elderly professor of political science who was caught copying big blocks of text (with minor modifications) from others' writings without attribution.

Our apartment is from earlier in the century, so I can't give Tyler Cowen's first answer, but, after that, I follow him in thinking of the several books I have from that decade. Beyond that, lemme think . . . We occasionally play Risk, and our set dates from the 50s. Some kitchen implements (a mixmaster, a couple of cookbooks, who knows which old bowls, forks, etc). Probably some of the furniture, although I don't know which. Probably some of the items in our building (the boiler?) What else, I wonder? There are probably a few things I'm forgetting.

50-60 years is a long time, I guess.

P.S. to the commenters: I'm taking the question to refer to things manufactured in the 1950s and not before!

Maria Wolters writes:

The parenting club Bounty, which distributes their packs through midwives, hospitals, and large UK supermarket and pharmacy chains, commissioned a fun little survey for Halloween from the company OnePoll. Theme: Mothers as tricksters - tricking men into fathering their babies. You can find a full smackdown courtesy of UK-based sex educator and University College London psychologist Petra Boynton here.

I was recently speaking with a member of the U.S. House of Representatives, a Californian in a tight race this year. I mentioned the fivethirtyeight.com prediction for him, and he said "fivethirtyeight.com? What's that?"

Musical chairs in econ journals

| 9 Comments

Tyler Cowen links to a paper by Bruno Frey on the lack of space for articles in economics journals. Frey writes:

To further their careers, [academic economists] are required to publish in A-journals, but for the vast majority this is impossible because there are few slots open in such journals. Such academic competition maybe useful to generate hard work, however, there may be serious negative consequences: the wrong output may be produced in an inefficient way, the wrong people may be selected, and losers may react in a harmful way.

According to Frey, the consensus is that there are only five top economics journals--and one of those five is Econometrica, which is so specialized that I'd say that, for most academic economists, there are only four top places they can publish. The difficulty is that demand for these slots outpaces supply: for example, in 2007 there were only 275 articles in all these journals combined (or 224 if you exclude Econometrica), while "a rough estimate is that there are around 10,000 academics actively aspiring to publish in A-journals."

I agree completely with Frey's assessment of the problem, and I've long said that statistics has a better system: there are a lot fewer academic statisticians than academic economists, and we have many more top journals we can publish in (all the probability and statistics journals, plus the econ journals, plus the poli sci journals, plus the psych journals, etc), so there's a lot less pressure.

I wonder if part of the problem with the econ journals is that economists enjoy competition. If there were not such a restricted space in top journals, they wouldn't have a good way to keep score.

Just by comparison, I've published in most of the top statistics journals, but my most cited articles have appeared in Statistical Science, Statistica Sinica, Journal of Computational and Graphical Statistics, and Bayesian Analysis. Not a single "top 5 journal" in the bunch.

But now let's take the perspective of a consumer of economics journals, rather than thinking about the producers of the articles. From my consumer's perspective, it's ok that the top five journals are largely an insider's club (with the occasional exceptional article from an outsider). These insiders have a lot to say, and it seems perfectly reasonable for them to have their own journal. The problem is not the exclusivity of the journals but rather the presumption that outsiders and new entrants should be judged based on their ability to conform to the standards of these journals. The tenured faculty at the top 5 econ depts are great, I'm sure--but does the world really need 10,000 other people trying to become just like them??? Again, based on my own experience, some of our most important work is the stuff that does not conform to conventional expectations.

P.S. I met Frey once. He said, "Gelman . . . you wrote the zombies paper!" So, you see, you don't need to publish in the AER for your papers to get noticed. Arxiv is enough. I don't know whether this would work with more serious research, though.

P.P.S. On an unrelated note, if you have to describe someone as "famous," he's not. (Unless you're using "famous" to distinguish two different people with the same name (for example, "Michael Jordan--not the famous one"), but it doesn't look like that's what's going on here.)

I was flipping through the paper yesterday and noticed something which I think is a bit of innumeracy--although I don't have all the facts at my disposal so I can't be sure. It came in an item by Robert Woletz, society editor of the New York Times, in response to the following letter from Max Sarinsky (click here and scroll down):

Mankiw tax update

| 13 Comments

I was going through the blog and noticed this note on an article by Mankiw and Weinzierl who implied that the state only has a right to tax things that are "unjustly wrestled from someone else." This didn't make much sense to me--whether it's the sales tax, the income tax, or whatever, I see taxes as a way to raise money, not as a form of punishment. At the time, I conjectured this was a general difference in attitude between political scientists and economists, but in retrospect I realize I'm dealing with n=1 in each case.

See here for further discussion of taxing "justly acquired endowments."

The only reason I'm bringing this all up now is that I think it is relevant to our recent discussion here and here of Mankiw's work incentives. Mankiw objected to paying a higher marginal tax rate, and I think part of this is that he sees taxes as a form of punishment, and since he came by his income honestly he doesn't think it's fair to have to pay taxes on it. My perspective is slightly different, partly because I never thought of taxation as being restricted to funds that have been "unjustly wrestled."

Underlying this is a lot of economics, and I'm not presenting this as any sort of argument for higher (or lower) marginal tax rates. I'm just trying to give some insight into where Mankiw might be coming from. A.lot of people thought his column on this 80% (or 90% or 93%) marginal tax rate was a little weird, but if you start from the position that only unjust income should be taxed, it all makes a lot more sense.

Erving Goffman archives

| 7 Comments

Brayden King points to this page of materials on sociologist Erving Goffman. Whenever I've read about Goffman, it always seems to be in conjunction with some story about his bad behavior--in that respect, King's link above does not disappoint. In the absence of any context, it all seems mysterious to me Once or twice I've tried to read passages in books by Goffman but have never manage to get through any of it. (This is not mean as any kind of criticism, it's just a statement of my lack of knowledge.) I was amused enough by the stories reported by King that I clicked through to the Biographical Materials section of the Goffman page and read a few. I still couldn't really quite get the point, though, perhaps in part because I only know one of the many people on that list.

Hadley Wickham sent me this, by Keith Baggerly and Kevin Coombes:

In this report we [Baggerly and Coombes] examine several related papers purporting to use microarray-based signatures of drug sensitivity derived from cell lines to predict patient response. Patients in clinical trials are currently being allocated to treatment arms on the basis of these results. However, we show in five case studies that the results incorporate several simple errors that may be putting patients at risk. One theme that emerges is that the most common errors are simple (e.g., row or column offsets); conversely, it is our experience that the most simple errors are common.

This is horrible! But, in a way, it's not surprising. I make big mistakes in my applied work all the time. I mean, all the time. Sometimes I scramble the order of the 50 states, or I'm plotting a pure noise variable, or whatever. But usually I don't drift too far from reality because I have a lot of cross-checks and I (or my close collaborators) are extremely familiar with the data and the problems we are studying.

Genetics, though, seems like more of a black box. And, as Baggerly and Coombes demonstrate in their fascinating paper, once you have a hypothesis, it doesn't seem so difficult to keep coming up with what seems like confirming evidence of one sort or another.

To continue the analogy, operating some of these methods seems like knitting a sweater inside a black box: it's a lot harder to notice your mistakes if you can't see what you're doing, and it can be difficult to tell by feel if you even have a functioning sweater when you're done with it all.

Ranking on crime rankings

| No Comments

Following up on our discussion of crime rates--surprisingly (to me), Detroit's violent crime rate was only 75% more than Minneapolis's--Chris Uggen pointed me to this warning from Richard Rosenfeld and Janet Lauritsen about comparative crime stats.

Christopher Uggen reports.

I'm surprised the difference is so small. I would've thought the crime rate was something like 5 times higher in Detroit than in Minneapolis. I guess Minneapolis must have some rough neighborhoods. Or maybe it's just that I don't have a good framework for thinking about crime statistics.

Meow!

Wow--economists are under a lot of pressure. Not only do they have to keep publishing after they get tenure; they have to be funny, too! It's a lot easier in statistics and political science. Nobody expects us to be funny, so any little witticism always gets a big laugh.

P.S. I think no one will deny that Levitt has a sense of humor. For example, he ran this item with a straight face, relaying to NYT readers in October 2008 that "the current unemployment rate of 6.1 percent is not alarming."

P.P.S. I think this will keep me safe for awhile.

Cameron McKenzie writes:

I ran into the attached paper [by Dave Marcotte and Sara Markowitz] on the social benefits of prescription of psychotropic drugs, relating a drop in crime rate to an increase in psychiatric drug prescriptions. It's not my area (which is psychophysics) but I do find this kind of thing interesting. Either people know much more than I think they do, or they are pretending to, and either is interesting. My feeling is that it doesn't pass the sniff test, but I wondered if you might (i) find the paper interesting and/or (ii) perhaps be interested in commenting on it on the blog. It seems to me that if we cumulated all econometric studies of crime rate we would be able to explain well over 100% of the variation therein, but perhaps my skepticism is unwarranted.

My reply:

I know what you mean. The story seems plausible but the statistical analysis seems like a stretch. I appreciate that the authors included scatterplots of their data, but the patterns they find are weak enough that it's hard to feel much confidence in their claim that "about 12 percent of the recent crime drop was due to expanded mental health treatment." The article reports that the percentage of people with mental illness getting treatment increased by 13 percentage points (from 20% to 33%) during the period under study. For this to have caused a 12 percent reduction in crime, you'd have to assume that nearly all the medicated people stopped committing crimes. (Or you'd have to assume that the potential criminals were more likely to be getting treated.) But maybe the exact numbers don't matter. The 1960s/1970s are over, and nowadays there is little controversy about the idea of using drugs and mental illness treatments as a method of social control. And putting criminals on Thorazine or whatever seems a lot more civilized than throwing them in prison. For example, if you put Tony Hayward or your local strangler on mind-numbing drugs and have them do community service with some sort of electronic tag to keep them out of trouble, they'd be making a much more useful contribution to society than if they're making license plates and spending their days working out in the prison yard.

P.S. It looks like I was confused on this myself. See Kevin Denny's comment below.

Racism!

| 15 Comments

Last night I spoke at the Columbia Club of New York, along with some of my political science colleagues, in a panel about politics, the economy, and the forthcoming election. The discussion was fine . . . until one guy in the audience accused us of bias based on what he imputed as our ethnicity. One of the panelists replied by asking the questioner what of all the things we had said was biased, and the questioner couldn't actually supply any examples.

It makes sense that the questioner couldn't come up with a single example of bias on our part, considering that we were actually presenting facts.

At some level, the questioner's imputation of our ethnicity and accusation of bias isn't so horrible. When talking with my friends, I engage in casual ethnic stereotyping all the time--hey, it's a free country!--and one can certainly make the statistical argument that you can guess people's ethnicities from their names, appearance, and speech patterns, and in turn you can infer a lot about people's political attitudes from their occupations, ethnicities, and so on. Still, I think it was a pretty rude comment and pretty pointless. How was he expecting us to respond? Maybe he thought we'd break down under the pressure and admit that we were all being programmed by our KGB handlers??

Then, later on, someone asked a truly racist question--a rant, really--that clearly had a close relation to his personal experiences even while having essentially zero connection to the real world as we understand it statistically.

I've seen the polls and I know that there are a lot of racists out there, of all stripes. Still, I don't encounter this sort of thing much in my everyday life, and it was a bit upsetting to see it in the flesh. Blog commenters come to life, as it were. (Not this blog, though!)

P.S. Yes, I realize that women and minorities have to deal with this all the time. This was the first time in my professional life that I've been accused of bias based on my (imputed) ethnicity, but I'm sure that if you're a member of a traditionally-disparaged group, it happens all over. So I'm not complaining, exactly, but it still upsets me a bit.

Dan Goldstein sends along this bit of research, distinguishing terms used in two different subfields of psychology. Dan writes:

Intuitive calls included not listing words that don't occur 3 or more times in both programs. I [Dan] did this because when I looked at the results, those cases tended to be proper names or arbitrary things like header or footer text. It also narrowed down the space of words to inspect, which means I could actually get the thing done in my copious free time.

I think the bar graphs are kinda ugly, maybe there's a better way to do it based on classifying the words according to content? Also the whole exercise would gain a new dimension by comparing several areas instead of just two. Maybe that's coming next.

Somebody I know sent me a link to this news article by Martin Robbins describing a potential scientific breakthrough. I express some skepticism but in a vague enough way that, in the unlikely event that the research claim turns out to be correct, there's no paper trail showing that I was wrong. I have some comments on the graphs--the tables are horrible, no need to even discuss them!--and I'd prefer if the authors of the paper could display their data and model on a single graph. I realize that their results reached a standard level of statistical significance, but it's hard for me to interpret their claims until I see their estimates on some sort of direct real-world scale. In any case, though, I'm sure these researchers are working hard, and I wish them the best of luck in their future efforts to replicate their findings.

I'm sure they'll have no problem replicating, whether or not their claims are actually true. That's the way science works: Once you know what you're looking for, you'll find it!

Aleks points me to this article showing some pretty maps by Eric Fisher showing where people of different ethnicity live within several metro areas within the U.S. The idea is simple but effective; in the words of Cliff Kuang:

Fisher used a straight forward method borrowed from Rankin: Using U.S. Census data from 2000, he created a map where one dot equals 25 people. The dots are then color-coded based on race: White is pink; Black is blue; Hispanic is orange, and Asian is green.

The results for various cities are fascinating: Just like every city is different, every city is integrated (or segregated) in different ways.

New York is shown below.

No, San Francisco is not "very, very white"

But I worry that these maps are difficult for non-experts to read. For example, Kuang writes the following::

San Francisco proper is very, very white.

This is an understandable mistake coming from someone who, I assume, has never lived in the Bay Area. But what's amazing is that Kuang made the above howler after looking at the color-coded map of the city!

For those who haven't lived in S.F., here are the statistics:

The city of San Francisco is 45% non-Hispanic white, 14% Hispanic, 7% black, and 31% Asian (with the remaining 3% being Native American, Pacific Islander, or reporting multiple races).

"Very, very white," it ain't.

I'm not trying to pick on Kuang here--I'm sure it's not easy to write on deadline. My point is that even a clean graph like Fisher's--a graph that I love--can still easily be misread. I remember this when I was learning how to present graphs in a talk. It always helps to point to one of the points or lines and explain exactly what it is.

And now, here's the (amazing) graph of the New York area:

NewYorkB.jpg

I can't escape it

| 11 Comments

I received the following email:

Ms. No.: ***

Title: ***

Corresponding Author: ***

All Authors: ***

Dear Dr. Gelman,

Because of your expertise, I would like to ask your assistance in determining whether the above-mentioned manuscript is appropriate for publication in ***. The abstract is pasted below. . . .

My reply:

I would rather not review this article. I suggest ***, ***, and *** as reviewers.

I think it would be difficult for me to review the manuscript fairly.

1. I remarked that Sharad had a good research article with some ugly graphs.

2. Dan posted Sharad's graph and some unpleasant alternatives, inadvertently associating me with one of the unpleasant alternatives. Dan was comparing barplots with dotplots.

3. I commented on Dan's site that, in this case, I'd much prefer a well-designed lineplot. I wrote:

There's a principle in decision analysis that the most important step is not the evaluation of the decision tree but the decision of what options to include in the tree in the first place.

I think that's what's happening here. You're seriously limiting yourself by considering the above options, which really are all the same graph with just slight differences in format. What you need to do is break outside the box.

(Graph 2-which I think you think is the kind of thing that Gelman would like-indeed is the kind of thing that I think the R gurus like, but I don't like it at all. It looks clean without actually being clean. Sort of like those modern architecture buildings from the 1930s-1960s that look all sleek and functional but really aren't so functional at all.)

The big problem with your graphs above is that they place two logical dimensions (the model and the scenario) on the same physical dimension (the y-axis). I find this sort of ABCABCABCABC pattern hard to follow. Instead, you want to be able to compare AAAA, BBBB, CCCC, while still being able to make the four separate ABC comparisons.

How to do this? I suggest a lineplot.

Here's how my first try would go:

On the x-axis, put Music, Games, Movies, and Flu, in that order. (Ordering is important in allowing you to see patterns that otherwise might be obscured; see the cover of my book with Jennifer for an example.)

On the y-axis, put the scale. I'll assume you know what you're doing here, so keep with the .4 to 1 scale. But you only need labels at .4, .6, .8, 1.0. The intermediate labels are overkill and just make the graph hard to follow.

Now draw three lines, one for Search, one for Baseline, and one for Combined. Color the lines differently and label each one directly on the plot (not using a legend).

The resulting graph will be compact, and the next step is for you to replicate your study under different conditions, with a new graph for each. You can put these side by side and make some good comparisons.

4. Sharad took my advice and made such a lineplot (see the Addendum at the end of Dan's blog).

5. Kaiser agrees with me and presents an excellent visualization showing why the lineplot is better. (Kaiser's picture is so great that I'll save it for its own entry here, for those of you who don't click through on all the links.)

6. David Smith posts that I prefer the dotplot. Nooooooooooooooooooooooo!!!!!!!!!!!

Republicans are much more likely than Democrats to think that Barack Obama is a Muslim and was born in Kenya. But why? People choose to be Republicans or Democrats because they prefer the policy or ideology of one party or another, and it's not obvious that there should be any connection whatsoever between those factors and their judgment of a factual matter such as Obama's religion or country of birth.

In fact, people on opposite sides of many issues, such as gay marriage, immigration policy, global warming, and continued U.S. presence in Iraq, tend to disagree, often by a huge amount, on factual matters such as whether the children of gay couples have more psychological problems than the children of straight couples, what are the economic impacts of illegal immigration, what is the effect of doubling carbon dioxide in the atmosphere, and so on.

Of course, it makes sense that people with different judgment of the facts would have different views on policies: if you think carbon dioxide doesn't cause substantial global warming, you'll be on the opposite side of the global warming debate from someone who thinks it does. But often the causality runs the other way: instead of choosing a policy that matches the facts, people choose to believe the facts that back up their values-driven policies. The issue about Obama's birth country is an extreme example: it's clear that people did not first decide whether Obama was born in the U.S., and then decide whether to vote Republican or Democratic. They are choosing their fact based on their values, not the other way around. Perhaps it is helpful to think of people as having an inappropriate prior distribution that makes them more likely to believe things that are aligned with their desires.

Good stuff, as always, from Laura Wattenberg.

A few months ago I questioned Dan Ariely's belief that Google is the voice of the people by reporting the following bizarre options that Google gave to complete the simplest search I could think of:

When is expertise relevant?

| 15 Comments

Responding to journalist Elizabeth Kolbert's negative review of Freakonomics 2 in the New Yorker, Stephen Dubner writes, that, although they do not have any training in climate science, it's also the case that:

Neither of us [Levitt and Dubner] were Ku Klux Klan members either, or sumo wrestlers or Realtors or abortion providers or schoolteachers or even pimps. And yet somehow we managed to write about all that without any horse dung (well, not much at least) flying our way.

But Levitt is a schoolteacher (at the University of Chicago)! And, of course, you don't have to be a sumo wrestler to be (some kind of an) expert on sumo wrestling, nor do you have to teach in the K-12 system to be an expert in education, nor do you have to provide abortions to be an expert on abortion, etc. And Levitt has had quite a bit of horse dung thrown at him for the abortion research. The connection is that abortion and climate change matter to a lot of people, while sumo wrestling and pimps and teachers who cheat are more like feature-story material.

Here's a pretty funny example of silly statistics, caught by Lisa Wade:

A study published in 2001 . . . asked undergraduate college students their favorite color and presented the results by sex. Men's favorites are on the left, women's on the right:

color-preferences1.jpg

The authors of the study, Lee Ellis and Christopher Ficek, wrote:

We are inclined to suspect the involvement of neurohormonal factors. Studies of rats have found average sex differences in the number of neurons comprising various parts of the visual cortex. Also, gender differences have been found in rat preferences for the amount of sweetness in drinking water. One experiment demonstrated that the sex differences in rat preferences for sweetness was eliminated by depriving males of male-typical testosterone levels in utero. Perhaps, prenatal exposure to testosterone and other sex hormones operates in a similar way to "bias" preferences for certain colors in humans.

As Wade points out, that all seems a bit ridiculous given some much more direct stories based on the big-time association of pink and blue with girls and boys.

No big deal, it's just sort of funny to see this sort of pseudoscientific explanation in such pure form.

And what kind of person lists green as their favorite color? 20% and 29%? I can't believe it! Sure, green is the color of money, but still . . .

P.S. That blog entry has 68 comments! I don't think there's really so much to say about this study. I guess it's like 538: the commenters just start arguing with each other.

P.P.S. This one is pretty funny too. (See here for more detail.)

David Blackwell

| 6 Comments

David Blackwell was already retired by the time I came to Berkeley, and probably our closest connection was that I taught the class in decision theory that he used to teach. I enjoyed that class a lot, partly because it took me out of my usual comfort zone of statistical inference and data analysis toward something more theoretical and mathematical. Blackwell was one of the legendary figures in the department at that time and was also one of the most tolerant of alternative approaches to statistics, perhaps because of combination of a mathematical background, applied research in the war and after (which I learned about in this recent obituary), and personal experiences,

Blackwell may be best known in statistics for the Rao-Blackwell theorem. Rao, of course, is also famoust for the Cramer-Rao lower bound. Both theorems relate to minimum-variance statistical estimators.

Here's a quote from Thomas (Jesus's dad) Ferguson in Blackwell's obituary:

He went from one area to another, and he'd write a fundamental paper in each, He would come into a field that had been well studied and find something really new that was remarkable. That was his forte.

And here's a quote from Peter Bickel, who in 1967 published an important paper on Bayesian inference:

He had this great talent for making things appear simple, He liked elegance and simplicity. That is the ultimate best thing in mathematics, if you have an insight that something seemingly complicated is really simple, but simple after the fact.

And here's Blackwell himself, from 1983:

Basically, I'm not interested in doing research and I never have been, I'm interested in understanding, which is quite a different thing. And often to understand something you have to work it out yourself because no one else has done it.

I'm surprised to hear Blackwell consider "research" and "understanding" to be different, as to me they seem to be closely related. One of the most interesting areas of statistical research today is on methods for understanding models as maps from data to predictions. As Blackwell and his collaborators demonstrated, even the understanding of simple statistical inferences is not a simple task.

P.S. According to the obituary, Blackwell was denied jobs at Princeton and the University of California because of racial discrimination, and so, a year after receiving his Ph.D., he "sent out applications to 104 black colleges on the assumption that no other schools would hire him." The bit about the 104 job applications surprised me. Nowadays I know that people send out hundreds of job applications, but I didn't know that this was done back in 1943. I somehow thought the academic world was more self-contained back then.

P.P.S. My Barnard College colleague Rajiv Sethi discusses Blackwell's research as seen by economists.

Inequality and health

| 3 Comments

Several people asked me for my thoughts on Richard Wilkinson and Kate Pickett's book, "The Spirit Level: Why Greater Equality Makes Societies Stronger." I've outsourced my thinking on the topic to Lane Kenworthy.

Tyler Cowen hypothesizes a "dogmatism portfolio" or a "quota of dogmatism": in his words,

If you're very dogmatic in one area, you may be less dogmatic in others.

OK, well "may be" is pretty vague. There's not really anything to disagree with, yet. But then Cowen continues:

A comment by Mark Palko reminded me that, while I'm a huge Marquand fan, I think The Late George Apley is way overrated. My theory is that Marquand's best books don't fit into the modernist way of looking about literature, and that the gatekeepers of the 1930s and 1940s, when judging Marquand by these standards, conveniently labeled Apley has his best book because it had a form--Edith-Wharton-style satire--that they could handle. In contrast, Point of No Return and all the other classics are a mixture of seriousness and satire that left critics uneasy.

Perhaps there's a way to study this sort of thing more systematically?

Recent Comments

  • C Ryan King: I'd say that the previous discussion had a feature which read more
  • K? O'Rourke: On the surface, it seems like my plots, but read more
  • Vic: I agree with the intervention-based approach -- spending and growth read more
  • Phil: David: Ideally I think one would model the process that read more
  • Bill Jefferys: Amplifying on Derek's comment: http://en.wikipedia.org/wiki/Buridan%27s_ass read more
  • Nameless: It is not uncommon in macro to have relationships that read more
  • derek: taking in each others' laundry It's more like the farmer read more
  • DK: #17. All these quadrillions and other super low p-values assume read more
  • Andrew Gelman: Anon: No such assumption is required. If you multiply the read more
  • anon: Doesn't this rely on some form of assumed orthogonality in read more
  • Andrew Gelman: David: Yup. What makes these graphs special is: (a) Interpretation. read more
  • David Shor: This seems pretty similar to the "Correlations" feature in the read more
  • David W. Hogg: If you want probabilistic results (probabilities over outcomes, with and read more
  • Cheryl Carpenter: Bob is my brother and he mentioned this blog entry read more
  • Bob Carpenter: That's awesome. Thanks. Exactly the graphs I was talking about. read more

About this Archive

This page is an archive of recent entries in the Sociology category.

Public Health is the previous category.

Sports is the next category.

Find recent content on the main index or look in the archives to find all content.