Recently in Sociology Category

1. Understanding the 'Russian Mortality Paradox' in Central Asia: Evidence from Kyrgyzstan

Short answer: alcohol and suicide.

2. Lumberjacks as a counterexample to the idea of a "risk premium"

They take lots of risks and don't get paid well for it.

3. Cell size and scale

This is a visualization you won't want to miss.

4. Three guys named Matt

5. The political philosophy of the private eye

A genre that was rendered obsolete in 1961 (but nobody realizes it).

The two blogs

| 1 Comment

Tyler Cowen writes:

Andrew Gelman will have a second blog. I don't yet understand the forthcoming principle of individuation across the two blogs.

Aleks sends along this amusing news article by Jennifer Levitz:

A new study found that rates of marriage outside the faith were sharply curbed among young Jews who have taken "birthright" trips to Israel . . . Over the past decade, Taglit-Birthright Israel, a U.S. nonprofit founded by Jewish businessmen, has sponsored nearly 225,000 young Jewish adults for free 10-day educational tours of Israel as a way to foster Jewish identity. . . .

A study [by Brandeis University researcher Leonard Saxe and partly funded by Taglit-Birthright] showed that 72% of those who went on the trip married within the faith, compared with 46% of people who applied for the trip but weren't selected in a lottery. . . . The Brandeis study looked at 1,500 non-Orthodox Jewish adults who took Taglit trips or applied for one between 2001 and 2004. . . . The Brandeis study looked at 1,500 non-Orthodox Jewish adults who took Taglit trips or applied for one between 2001 and 2004.

The article also said that 10,000 people participated in these trips last summer, which suggests that the 1,500 people in the research study represent a very small fraction of the participants from 2001-2004. I have no idea if this is a random sample, or what. Also I wonder about the people who participated in the lottery, were selected, but didn't go on the trip. Excluding these people (if there are many of them) could bias the results. The news article unfortunately doesn't link to any research report.

I just today learned about an organization called SourceWatch--they have an article on the tobacco connections of the well-known sociologist Peter Berger. Beyond the inherent interest of the topic, I was fascinated by the way that the Sourcewatch webpage mimicked Wikipedia:

berger.png

This is a smart move, I think: for better or worse, Wikipedia is generally considered to be authoritative.

But then I was thinking . . . is this the beginning of the end for Wikipedia. I don't know anything about Sourcewatch, if they're good guys or bad guys or whatever--but if they can mimic Wikipedia, I'm sure lots of other organizations could do so too. And, when they do it, all of a sudden there will be a lot of authoritative-looking Wikipedia-like pages floating around, a sort of counterfeit money devaluing the "real" 'pedia, which will then have to respond by branding itself--"100% real Wikipedia, accept no imitations"--and so on. Not a bad thing, perhaps, but not what we have now.

From Aaron Swartz, a link stating that famous sociologist Peter L. Berger was a big-time consultant for the Tobacco Insitute:

Peter L. Berger is an academic social philosopher and sociologist who served as a consultant to the tobacco industry starting with the industry's original 1979 Social Costs/Social Values Project (SC/SV). According to a 1980 International Committee on Smoking Issues/Social Acceptability Working Party (International Committee on Smoking Issues/SAWP) progress report, Berger's primary assignment was "to demonstrate clearly that anti-smoking activists have a special agenda which serves their own purposes, but not necessarily the majority of nonsmokers."

This news article has made a bit of a splash: Seth Borenstein sent around a temperature time series to four statisticians--just sending the numbers without saying where they came from--and the statisticians uniformly concluded that there were no consistent temperature declines over time:

"If you look at the data and sort of cherry-pick a micro-trend within a bigger trend, that technique is particularly suspect," said John Grego, a professor of statistics at the University of South Carolina.

I don't have anything to add on the temperature series--there's only so much you can learn from a context-free data analysis, and I don't think anyone would want to take this particular set of blind statistical analyses as being at all informative about the science. But there's more going on here.

Bing sucks

| 11 Comments

When I search Gelman in Google, I'm right up there at #2. With Bing, I'm not even on the front page. Heck, I'm not even on the second page! Or the third, or the fourth, or the fifth, or the sixth, . . . OK, enough already! I know, I know, I shouldn't be searching myself anyway, but I had a legitimate reason . . . I had to find my talk on the web today from someone else's computer.

P.S. OK, I take it all back about Bing. I searched my name on Yahoo, and again my homepage did not appear in the first seven pages of search listings. So, really, I shouldn't be blaming Bing, I should be thanking Google for being so nice to me.

Thanks, Google!

The sequel is already assured of box-office success, so now's the time to start thinking about what's gonna be in volume 3. Here are a few models that Levitt and Dubner could consider, in no particular order:

Freakonomics update

| 25 Comments

Dubner defends himself here. No word on the drunk driving advice, but he has some backstory on the interviews that he and Levitt did regarding global warming. It seems pretty clear that their approach to writing Freakonomics 2 was much different than the original book: the first Freakonomics was all about Levitt's work, whereas the most prominent part of the sequel is a discussion of the ideas of others. As I noted yesterday, this creates a huge selection issue--how did they decide whom to interview?--which is much less present in the first book. I'm also still confused that Dubner describes global warming as "a very difficult problem to solve," given that on his blog the other day he seemed to be endorsing the view that future trends are "virtually assuring us of about 30 years of global cooling."

My guess is that Levitt/Dubner's views on the topic are not completely coherent (by which I mean, not that Levitt and Dubner disagree with each other, but that between them they have a bunch of partly conflicting attitudes on the topic). As a political scientist, I'm the last person to criticize attitudes for being incoherent, and given that neither Levitt nor Dubner is an expert on climate change, it's probably a good thing that their attitudes are fluid and not so easy to pin down. The difficulty comes when they feel the need to defend everything that they've written so far. Again, this is tougher to do here than in the Freakonomics 1 examples, partly because Levitt was much more of an expert on his own research than on others' research, and partly, I suppose, because you'll get a lot more flak in the major news media if you question global warming than if you write about the beneficial consequences of abortion.

P.S. But see the second blurb here!

No data, Part 3

| 1 Comment

Just following up . . . this time Dr. McWilliams includes many qualifiers: "I suggested . . . I also suggested . . . Of course, this is only a possibility. I have no numbers to draw on. . . . In any case, it's just a thought."

This helps. As I said before, I have no problem with this sort of op-ed-style reasoning; it just seems out of place on Freakonomics. Anyway, this was part 3 of 3, so I'll have no more to say on the topic.

Survey of blog readers

| 1 Comment

Stephen Kershaw writes:

Fernando Hoces De La Guardia writes:

Last night we did the traditional first year econ phd student's skit nite @ Penn.

One particular thing that I noticed was that we had less public that what the upper years told us to be prepared for.

Somebody suggested that it was due to Passover and Good Friday. My immediate reaction was "science & religion don't go usually together". By this I meant a prior of mine that the fraction of religious people is a lot less within a scientific discipline than among the rest of the population.

Two things pop out of my head this morning:

- in which data base can I check that prior?

- if true, are economists more religious than other scientists?

My reply: Usually people look these things up at the General Social Survey, which has a convenient web interface. Good luck!

No data, Part Two

| 7 Comments

A few days ago I posted a note about a Freakonomics blog by James McWilliams, who asked, "Do Farmers' Markets Really Strengthen Local Communities?" I was disappointed to see that he offered a historical discussion but no quantitative data or analysis, merely a barrage of subjective impressions and rhetorical questions of the "Who is to say?" sort.

I was hoping for something more in the next installment, but Part Two is unfortunately more of the same. Lots of qualitative quotes but still no data. We get this sort of thing: "Building on this suspicion, she acknowledges that many small farms are indeed more sustainable than larger ones, but then reminds us that "Small scale, 'local' farmers are not inherently better environmental stewards."

"Not inherently better"? That's the best he can do??

Again, if this were an ordinary magazine article or posted in an ordinary blog, it would be fine. Personal impressions make the world go round. But I expect something more when I turn to Freakonomics. Hard-edged data analysis is what makes Freakonomics special. Otherwise it's just the sort of opinionating that anyone can do in their sleep.

Part Three is forthcoming. Maybe we'll see some economic analysis there.

I enjoy reading the Freakonomics blog, but as I've noted previously, I remain puzzled by the presence of two appealing but, to my mind, incompatible forms of reasoning that seem to be used more generally in the world of "freakonomics" (which I'm using in lower-case to indicate not just the famous book and blog, but the larger world of empirical microeconomic analyses intended for a popular audience).

Ole Rogeberg writes:

Saw your comments on rational addiction - thought you might like to know that some economists think the "theory" is pretty silly as well. It's worse than you think: They assume people smoke cigarettes, shoot up heroin etc. at increasing rates because they've planned out their future consumption paths and found that to be the optimal way to adjust their "addiction stocks" in the way maximizing discounted, lifetime utility. To quote Becker and Murphy's original article: "[I]n our model, both present and future behavior are part of a consistent, maximizing plan."

Yeah, right

Here's Ole's article, "Taking Absurd Theories Seriously: Economics and the Case of Rational Addiction Theories," which begins:

Rational addiction theories illustrate how absurd choice theories in economics get taken seriously as possibly true explanations and tools for welfare analysis despite being poorly interpreted, empirically unfalsifiable, and based on wildly inaccurate assumptions selectively justified by ad-hoc stories. The lack of transparency introduced by poorly anchored mathematical models, the psychological persuasiveness of stories, and the way the profession neglects relevant issues are suggested as explanations for how what we perhaps should see as displays of technical skill and ingenuity are allowed to blur the lines between science and games.

I agree, and I'd also add that this problem isn't unique to economics. Political science and statistics also have lots of silly models that seem to have a life of their own.

Bill Ricker points me to this blog from Mark Liberman on whether (and how much) managers are more likely to use management jargon. Or, to be more precise, whether knowing that someone uses management jargon in their speech gives you information on how likely they are to be a manager. The motivation was this quote from Peter Taylor:

I [Peter Taylor] argue that the first question to ask is whether hearing someone use the phrase "At the end of the day" conveys information on whether they are likely to be a manager...

Much Bayesian inference follows. My only comment here is not on the Bayesian inference but rather on the idea that "managers" are dweeby Dilbert characters who talk using management jargon. I was thinking about it, and I realized that I'm a manager. I manage projects, hire people, etc. But of course I don't usually think of myself as a "manager" since that's considered a bad thing to be.

For another example, Liberman considers a "spokesperson for a manufacturer of sex toys" as a manager. I don't know what this person does, but I wouldn't usually think of a spokesperson as a manager at all.

To me, the most interesting linguistic phenomenon here is the floating definition of "manager."

P.S. Lots and lots and lots of discussion here. Somehow I think that Mark Liberman gets a lot more readers on his blog than I do on mine!

Who has babies when?

| 22 Comments

Sheril Kirshenbaum links to this graph from economists Kasey Buckles and Daniel Hungerman showing differences in who conceives babies in the fall (older, better-educated people) and the spring (younger, less well-educated people):

NA-BA643_BIRTH_NS_20090921192123.gif

Pretty stunning. And a nice graph. The repeating pattern over the years is super-clear. I'd also like to see a version that just shows the averages for the 12 months, so I could see the pattern in more detail. Also I'd like to subtract 40 weeks so it shows the data by (approximate) month/date of conception.

P.S. This news article by Justin Lahart is excellent. But I did notice one funny thing (to a statistician):

The two economists examined birth-certificate data from the Centers for Disease Control and Prevention for 52 million children born between 1989 and 2001 . . . 13.2% of January births were to teen mothers, compared with 12% in May--a small but statistically significant difference, they say.

Well, yeah, with n=52,000,000, I'd think that a 1 percentage point difference would be statistically significant! More seriously, with that many cases, it sounds like the next step (if the researcher haven't already done this) is to break things down by subgroups of the population. I wonder what data are available from the birth certificate records. To start with, there's geographic information.

Chris Wiggins sent me a link to this article by Caroline Savage and Andrew Vickers, which, as he puts it, "takes an empirical approach to revealing the community's publishing practices." Here's the abstract:

Many journals now require authors share their data with other investigators, either by depositing the data in a public repository or making it freely available upon request. These policies are explicit, but remain largely untested. We sought to determine how well authors comply with such policies by requesting data from authors who had published in one of two journals with clear data sharing policies. . . .

We received only one of ten raw data sets requested. This suggests that journal policies requiring data sharing do not lead to authors making their data sets available to independent investigators.

Not good. Personally, I hate it when people don't share their data. I've found researchers in biomedical sciences to be particularly bad about this, possibly because (a) these are big-money fields where the investigators are just too damn busy to reply to requests, and (b) pain-in-the-butt Institutional Review Boards make it difficult to share data. Bad stuff all around, and maybe Savage and Vickers's paper will be a valuable wake-up call.

A correspondent who wishes to remain anonymous sends this in:

From Pat Robertson's Regent University course catalog:

GOV 601 Quantitative Analysis (3) Skills for quantitative data gathering, measurement, policy analysis and program evaluation. Research and sampling design, surveys, data collection and data reduction and display. Review of basic statistics through multivariate analysis, z-scores, regression through the use of statistical computer package (SPSS), and a Judeo-Christian perspective on the use of statistics.

I wonder if they teach the principle that God is in every leaf of every tree.

(I looked for the course description online but couldn't find it. But the description seems consistent with others in the catalog.)

Mary Towner sends along this article by herself and Barney Luttbeg that discusses the Trivers-Willard hypothesis and its applications to humans.

I think that Towner and Luttbeg agree with David Weakliem and myself on the substance, but I disagree with them on the question of what models to fit. It's not so much a Bayesian or non-Bayesian question--we use both approaches in our article--but rather a question of whether to treat parameters as continuous or discrete. In their example on page 100, you consider models in which the probability of boy births is 0.50 and 0.53. I think it would make more sense to consider theta to be a continuous parameter with distribution centered on the historical value of 0.515. Neither of those hypothesized values seem vary plausible to me. On the substance, though, I think we're all on the same page.

P.S. I was curious.

More on the problem with email

| 10 Comments

Jenny has declared email bankruptcy but is watching her debts pile up again. I have (with effort) followed the Inbox Zero route. John Cook thinks email isn't the problem; on the other hand, he's reacting to a chorus of people telling him that email is ruining their lives, and maybe they have some good reason for saying this. Cook's commenter Heather appears to be staying barely above water with 200 messages in her inbox, while commenter Mr. Gunn recommends a technological solution.

From the comfort of my empty inbox, I thought of another big issue with email. Actually, a huge issue.

Email is a way to feel like you're working without actually thinking very hard. Sort of like blogging, actually--but blogging at least has the side benefit of sharing information with the world, focusing one's thoughts, etc. Actually, one of the unanticipated advantages of blogging, for me, was to organize the ideas that otherwise were going out into a million little emails.

I could--and have, often enough--put all of my work effort on some days into working with the inbox. What's the problem with that? First, I'm letting others drive my priorities. Some of this is fine--I certainly don't delude myself that I'm like that guy who sat in a room by himself and proved Fermat's Last Theorem--but at some point I think a little more direction to my work is useful. Second, inbox-handling just isn't usually the highest-quality thinking. It's just hard enough to occupy my mind without actually pushing me. I might as well just be playing Tetris for two hours.

My plan with Inbox Zero is to spend less total time on my email. I will spend some of the released time on more interesting, useful work, and it will also free up time for leisure.

The next step is to cut back on blogging. (Things will improve once we get the Scheduled Posting feature working again, so I can just write 10 blog entries, schedule them, and not have to think about them anymore.)

Inbox zero

| 7 Comments

It took me a week of hard work, culminating now at 5 in the morning. (I haven't gone to sleep so, no, it's not "before 4pm," as the saying goes.)

This time it's for real. I will never again read an email without immediately handling it. It works with referee reports, why can't it work for everything?

P.S. Now I have a backlog of 35 blog entries which I'll have to spread over the next month.

Big juicy datasets

| No Comments

From Ben Olding, the MIT cell phone data and a corresponding article. I can't remember what this was about, but when Ben described it to me a few months ago, I recall that it sounded cool.

I'm still waiting for someone to work with me to reanalyze Iyengar and Fishman's speed-dating data using hierarchical models.

Lee Sigelman points to this article by physicist Rick Trebino describing his struggles to publish a correction in a peer-reviewed journal. It's pretty frustrating, and by the end of it--hell, by the first third of it--I share Trebino's frustration. It would be better, though, if he'd link to his comment and the original article that inspired it. Otherwise, how can we judge his story? Somehow, by the way that it's written, I'm inclined to side with Trebino, but maybe that's not fair--after all, I'm only hearing half of the story.

Anyway, reading Trebino's entertaining rant (and I mean "rant" in a good way, of course) reminded me of my own three stories on this topic. Rest assured, none of them are as horrible as Trebino's.

I don't really think this one is of general interest so I'll put it all below the jump . . .

One of those funny things

| 4 Comments

I published an article in the Stata Journal even though I don't know how to use Stata.

John Sides links to an (unintentionally, I assume) hilarious peer-reviewed article by C. K. Rowley, which begins:

Robin Hanson is skeptical of my response in the following exchange:

Hanson: What do the customers who are paying your salary get from you?

Gelman: They learn how to fit multilevel models.

After finding the Howard Wainer interview, I looked up the entire series of Profiles in Research published by the Journal of Educational and Behavioral Statistics. I don't have much to say about most of these interviews: some of these people I'd never heard of, and I don't really have much research overlap with the others. Probably I have the most overlap with R. D. Bock, who's done a lot of work on multilevel modeling, but, for whatever reason, his stories didn't grab my interest.

But I was curious about the interview with Arthur Jensen. I've never met him--he gave a talk at the Berkeley statistics department once when I was there, but for some reason I wasn't able to attend the talk. But I've heard of him. As the interviewers (Daniel Robinson and Howard Wainer) state:

One major impediment, scientists agree, is the grant system itself. It has become a sort of jobs program, a way to keep research laboratories going year after year . . .

I was on an NIH panel a couple of years ago with about 25 other scientists, reviewing something like 90 grants. It was pointless. 25 people is just too many to make a decision. What happened was that there were 3 or 4 people who were experienced in the process, who ended up guiding the entire discussion.

The highlight--or, I should say, lowlight--was when we were reviewing a proposal involving the study of the carcinogenic effects of hookah (water pipe) smoking. I asked if this was really such a big deal, and one of the panel members told me that smoking tobacco through a hookah is something like 10 times worse than smoking a cigarette. If so, the public health consequences could be pretty serious, even if not so many people did it. I said this sounded like a reasonable point to me. Then this guy across the table from me spoke up and said that he knew somebody who was 80 years old, had been smoking with a hookah all his life and was none the worse from it. At this point, I blew up. I couldn't believe that the "my elderly aunt smokes and she didn't get cancer" argument could be brought up at an NIH panel!

Robin Hanson writes,

In academia, one often finds folks who are much more (or less) smart and insightful than their colleagues, where most who know them agree with this assessment. Since academia is primarily an institution for credentialling folks as intellectually impressive, so that others can affiliate with them, one might wonder how such mis-rankings can persist.

I added the bold font myself for emphasis. Granted, Robin is far from a typical economist. Nonetheless, that he would write such an extreme statement without even feeling the need to justify it (and, no, I don't think it's true, at least not in the "academia" that I know about) . . . that I see as a product of being in an economics department.

P.S. Robin definitely is correct about the "more (or less) smart and insightful" bit. But here I think there are two things going on. First, in any group of people you'll see some variation, especially given that there are other factors going on than "smart and insightful" when it comes to selecting people in an academic environment. Second, there's more to life--even to academic life--than being smart and insightful. Even setting aside teaching, advising, administration, etc., some other crucial qualities for academic research include working hard, having the "taste" to work on important problems, intellectual honesty, and caring enough about getting the right answer. I know some very smart and insightful people who have not made the contributions that they are capable of, because (I think) of gaps in some of these other important traits.

"A fondness for collecting a salary and getting away with as little intellectual intercourse as possible is endemic to the academic world." Not just the academic world, I think. Working is hard work. That's why they call it work. On the other hand, I'm doing this for free.

This issue reminds me of a discussion that's sometimes come up about a well-known listserv participant who is (a) very helpful, and (b) very rude. Or maybe I'm exaggerating a bit: this person is (a) often helpful, and (b) often rude. Anyway, I've always maintained that, rudeness aside, this person is altruistic, providing free statistical help to strangers. But it's true that answering listserv questions isn't intellectually taxing. Sort of like writing this blog, it's work-like without usually quite being work.

P.S. I think the point is best made by keeping the listserv and its well-known participant anonymous.

Fred Bookstein was at my talk in Seattle on voting power (the relevant articles are here and here) but didn't get a chance to ask a question, so he's asking it now:

Why is voting power considered a "good" in all those models? What is good about it? With what generally shared human desiderata, if any, is it associated?

Kieran points me to this.

Triple-blinding

| No Comments

Fred Bookstein writes:

Your blog comment about triple-blinding was a joke, but there IS a triple-blinding procedure in which the identity of the two groups is not revealed to the statistician on the project until the very end. At all times the data analyses proceed solely in reference to a comparison of some unspecified "group A" with a similarly unspecified "group B," and the identification of who were the intervened-upon and who were not is concealed from him or her until the computations are finished. (There are some other assumptions, e.g. absence of baseline differences, required for this to make sense; it applies mainly in contexts like randomized clinical trials.) You can't really purge the Discussion section of an article of the possibility of spin, but at least you can get the right scatters and tables into the dossier that they're spinning. The possibility was called to my attention a while ago by Michael Myslobodsky, a wise old man from my schizophrenia research world, who did not remotely intend it as a joke.

Interesting. My only experience along these lines is when I was working with a student doing matching for a public health study: There were something like 100 treated units and 1000 potential controls, and we wanted to select 300 of these as matched controls. The researchers were careful to give us only the background information and no outcomes.

Double-Bubble-Tin-Sign-C13111649.jpg

A correspondent who prefers to remain anonymous asks:

Since you publish a lot of papers, I wonder if you've ever come across this issue. Journal reviews are supposed to be double-blind, but authors always have great familiarity with their own work, and cite it frequently. So what is the sense of sending an "anonymized" review copy to a journal editor when a line like "In a previous paper (Smith and Jones, 1999) we showed that ..." lets you know right away that Smith and Jones are the authors of the paper being reviewed?

I have thought about altering the review copy to make it look we are citing a paper by someone else ["In a previous paper, Smith and Jones (1999) showed that..."]. Should I even worry about this? How do you handle it?

My reply: I don't think it matters much. If the rules say to anonymize the references, then I do so, but I don't really worry whether a reviewer can figure out whether it is me writing the article. From the other direction, I review lots of articles (more than I write, actually), and I am very rarely curious enough to bother trying to figure out (for example, using Google) who is writing them.

What bothers me more, actually, is the idea that somebody out there is submitting a crappy article but citing me in such a way that the reviewers think I wrote it. The other thing I worry about is when I review an article negatively, that the authors might be able to figure out that I'm the reviewer. Or, that someone else is reviewing an article negatively and in the review points to my work, leading the author to think that I'm being the bad guy.

P.S. Somebody once told me about triple-blind submission, where even the author doesn't know who wrote the article. Apparently this is standard in medical research.

P.P.S. More thoughts here.

New Twitter research

| 5 Comments

Drew Conway writes:

Stephen Senn quips: "A theoretical statistician knows all about measure theory but has never seen a measurement whereas the actual use of measure theory by the applied statistician is a set of measure zero."

Which reminds me of Lucien Le Cam's reply when I asked him once whether he could think of any examples where the distinction between the strong law of large numbers (convergence with probability 1) and the weak law (convergence in probability) made any difference. Le Cam replied, No, he did not know of any examples. Le Cam was the theoretical statistician's theoretical statistician, so there's your answer.

The other comment of Le Cam's that I remember was his comment when I showed him my draft of Bayesian Data Analysis. I told him I thought that chapter 5 (on hierarchical models) might especially interest him. A few days later I asked him if he'd taken a look, and he said, yes, this stuff wasn't new, he'd done hierarchical models back when he'd been an applied Bayesian back in the 1940s.

A related incident occurred when I gave a talk at Berkeley in the early 90s in which I described our hierarchical modeling of votes. One of my senior colleagues--a very nice guy--remarked that what I was doing was not particularly new; he and his colleagues had done similar things for one of the TV networks at the time of the 1960 election.

At the time, these comments irritated me. But, from the perspective of time, I now think that they were probably right. Our work in chapter 5 of Bayesian Data Analysis is--to put it in its best light--a formalization or normalization of methods that people had done in various particular examples and mathematical frameworks. (Here I'm using "normalization" not in the mathematical sense of multiplying a function by a constant so that it sums to 1, but in the sociological sense of making something more normal.) Or, to put it another way, we "chunked" hierarchical models, so that future researchers (including ourselves) could apply them at will, allowing us to focus on the applied aspects of our problems rather than on the mathematics.

To put it another way: why did Le Cam's hierarchical Bayesian work in the 1940s and my other colleague's work in 1960s not lead to more widespread use of these methods? Because these methods were not yet normalized--there was not a clear separation between the math, the philosophy, and the applications.

To focus on a more specific example, consider the method of multilevel regression and poststratification ("Mister P"), which Tom Little and I wrote about in 1997, then David Park, Joe Bafumi and I picked back up in 2004, and then finally took off with the series of articles by Jeff Lax and Justin Phillips (see here and here). This is a lag of over 10 years, but really it's more than that: when Tom and I sent our article to the journal Survey Methodology back in 2006, the reviews said basically that our article was a good exposition of a well-known method. Well-known, but it took many many steps before it became normalized.

Daniel Carlat posts a link to this news article by John Fauber about a medical researcher, James Stein, who took big bucks in lecturing and consulting fees from drug companies over a 12-year period, before stopping a few months ago. Stein said:

I was sure I could avoid bias because I controlled the content and I had these strong personal convictions. Well, unfortunately, over the past several months, I've learned that I was wrong. I've learned that I could not stay unbiased, that I could not control all the content of my talks, and that my personal convictions were not good enough.

Regarding disclosure as a potential solution, Stein said:

I really felt that if I stood up in front of a crowd and said that these are my disclosures, look how honest I am, that I was really managing conflict of interest. But actually the medical literature and the social science literature tells me that it is actually the opposite effect. Although it is laudable to disclose your relationships, actually thinking that disclosure manages relationships is harmful. It has the perverse effect that when you disclose your relationship, the recipient of your information becomes more trusting, and the social scientists also have shown us that professionals who disclose actually become more biased.... I would argue ... that the solution is not disclosure, because if you are doing something that is wrong or unethical, don't disclose, just don't do it!

There was also this amazing bit:

Huge fines or convictions for gross ethical conduct were being issued against every drug company that he worked with. Doctors were being investigated on allegations of taking kickbacks.

Eric Gilbert and Karrie Karahalios have a paper on tie strength, distinguishing between strong and weak ties in social networks, published at the Computer and Human Interaction conference. Eric is one of the recipients of 2009 Google fellowships. There are some neat ideas there:

Presenting the distributions of predictors
predictors.png

Pretty, informative and compact.

Distribution of outcomes
outcomes.png

Not sure the median is particularly interesting.

Graphical model summary
model summary.png

They describe it as:

The predictive power of the seven tie strength dimensions. [...] A dimension's weight is computed by summing the absolute values of the coefficients belonging to it. The diagram also lists the top three predictive variables for each dimension. [...]

While the aggregation of coefficients in the same category is nice, there are some problems summing betas together. Rarely occurring values with huge betas are often an artifact of overfitting and not of informativity, and betas for continuous predictors are strongly affected by scale. Consider these betas:

Days since last communication -0.76
Days since first communication 0.755
Intimacy × Structural 0.4
Wall words exchanged 0.299

So, the top two predictors are probably correlated, and opposite to one another - resulting in runaway absolute betas.

I've suggested the concept of net leverage a few years ago in a natural language binary outcome setting as an attempt to improve the presentation of feature importance in regression models, but this topic is worth revisiting.

Yesterday I showed Laura Wattenberg's graphs showing the most popular last letters of boys' names in 1906, 1956, and 2006. The quick story is that 100 years ago, there were about 10 last letters that dominated; 50 years ago, the number of popular last letters declined slightly, to about 6; but now, a single letter stands out: an amazing 36% of baby boys in America have names ending in N.

This is super-cool. As a commenter wrote, there should be some sort of award for finding the largest effect "in plain sight" that nobody has noticed before.

But, beyond pure data-coolness, what does this mean? My story, based on reading Wattenberg's blog, goes as follows:

100 years ago, parents felt very constrained in their choice of names (especially for boys). A small set of very common names (John, William, etc.) dominated. And, beyond that, people would often choose names of male relatives. Little flexibility, a few names being extremely common, resulting in a random (in some sense) distribution of last letters.

Nowadays, parents have a lot of freedom in choosing their names. As a result, there are lots and lots of names that seem acceptable, but the most common names are not so common as they were fifty or a hundred years ago. With so much choice, what do people do? Wattenberg suggests they go with popular soundalikes (for example, Aidan/Jaden/Hayden), which leads to clustering in the last letter. Even so, the pattern with N is so striking, there's gotta be more to say about it.

But I like the paradox: 100 years ago, the distribution of names was more concentrated but the distribution of sounds (as indicated by last letters) was broader. Nowadays, the distribution of names is more diffuse but the distribution of sounds is more concentrated.

Less constraint -> more diffuse distribution of names -> more concentrated distribution of last letters.

This must occur in other aspects of life. For example, consider food. We eat lots more different types of food than we used to, but a single ingredient--corn syrup--makes up more and more of our diet (or so I'm told). Again, lack of constraint (this time for economic reasons) leads to more diversity in some ways and more homogeneity (by choice) in others.

Is this for real?

1906:
last-letters-boys-1906.gif

1956:
last-letters-boys-1956.gif

2006:
last-letters-boys-2006.gif

Wow.

Buh-bye.

Interesting comments too.

Conformity in academia?

| 12 Comments

Justin Wolfers writes:

Dick [Easterlin] was the first economist to start taking subjective well-being data seriously. While this sort of research is now pretty mainstream, I have to imagine that it took a fair bit of courage back in the early 1970's.

This was interesting to me: the idea that it would take courage to study a particular research topic. Especially something such as subjective well-being, which doesn't have any direct political connections. I mean, it's not like we're talking about the economic benefits of torture, or whatever. "Subjective well-being" seems pretty innocuous to me: whatever objections made it courageous to study this topic must have been intellectual and stylistic rather than political.

P.S. Back when I taught at Berkeley, I did get some flak for doing research on Bayesian statistics--some students told me that other faculty had told them not to take my course--but I wouldn't describe my decision to do work on that topic as "courageous." I think the atmosphere in economics in the 1970s must have been much different than anything I've ever experienced.

This is horrible

| No Comments

Connected

| No Comments

Medical researcher Nicholas Christakis and political scientist James Fowler sent me an advance copy of their book, "Connected: The Surprising Power of Social Networks and How They Shape Our Lives." Christakis and Fowler are best known for their work connecting social networks and epidemiology, in particular the fact that obese people are more likely to have obese friends. As one wag put it, they find that obesity is environmental and voting is genetic. I guess that sort of interpretation is the inevitable outcome of man-bites-dog reporting, with the real story being that obesity is more determined by social behavior than we might have thought, while voting behavior is more tied to genes than we might have thought.

Anyway, I like their new book a lot. They bring in many different research findings. It's a popular science book but with a much higher meat-to-filler ratio. Considering myself as the ideal reader, I wish they had more discussion of methodology, of the strengths and potential weaknesses of each of the findings they cite. I understand why they didn't want to clutter a popular book with such discussion, but in retrospect I wish we'd put in more methods talk within our own Red State, Blue State book. My impression from talking with people is that (a) they respect open discussions of strengths and weaknesses of an argument, and (b) many people find methods fun and enjoy chewing on issues such as representativeness of sampling, etc. In our own book, I think we tried to hard to be reader friendly and wish we'd laid more of the struggle out there for the reader to see. On the other hand, their book has many advantages over mine, being from a major publisher and also covering so many different topics that there's something for everyone. It's actually much more general in scope than its title indicated. It's really about all of social science. I'm not sure how a reader who isn't familiar with all this work would think of the book, but I enjoyed seeing all this stuff in one place.

Huh?

| 13 Comments

Henry thinks that, if you a review an article for an academic journal, they should email you a copy of their letter deciding what to with the manuscript. I review lots and lots of articles and occasionally get these letters, which I always immediately delete. I guess it's not really a bad thing when they send these emails--they're easy enough to remove--but I certainly don't see why it would upset someone not to be bothered by them.

I have noticed a big problem recently, though: with electronic publishing, I see fewer and fewer hardcopy journals lying around, and this removes a key way for me to keep up with what's going on in statistics.

I recently read Nicholas Chistakis and James Fowler's Connected, and now everything I see makes me think of social networks.

For example, Richard Florida links to a research article by Bart Bronnenberg, Sanjay Dhar, and Jean‐Pierre Dubé, who write:

We [Bronnenberg et al.] document evidence of a persistent "early entry" advantage for brands in 34 consumer packaged goods industries across the 50 largest U.S. cities. Current market shares are higher in markets closest to a brand's historic city of origin than in those farthest. For six industries, we know the order of entry among the top brands in each of the markets. We find an early entry effect on a brand's current market share and perceived quality across U.S. cities. The magnitude of this effect typically drives the rank order of market shares and perceived quality levels across cities.

I haven't read the article, but assuming it's findings are correct, could some of this be the effect of employees and investors in the company, as well as local pride? I doubt Heinz Ketchup currently employs a lot of people in the Pittsburgh area, but over the years it must add up to a lot of people. Then add in their friends and relatives, along with people who get business from Heinz (suppliers and the like), and that's a whole bunch of Pittsburghers with some connection to Heinz.

The social network bit is the idea that the employees and the like are multiplied by their friends. Beyond this, of course, people are creatures of habits, tastes can get established young, and so forth.

Also, Heinz ketchup is something that anyone can buy. The very fact that it's (a) substitutable with other items and (b) just different enough to be distinguishable (it doesn't taste _exactly_ like other ketchups, it's not a pure commodity), might make it particularly susceptible to this sort of effect. It may be no coincidence that Bronnenberg et al. found this effect in the area of low-cost packaged foods.

Popularity (of a sort)

| 2 Comments

Different Sorts of Political Polarization in the United States

Monday, April 6, 2009, 12:00-1:45 p.m.
(Buffet lunch available at 12:00 -- Presentation begins at 12:15)

Location: Harvard Kennedy School | Allison Dining Room (Taubman 5th floor)

I'm not sure exactly what I'll talk about. I gave them the link to this article with Delia but I think there's other stuff I want to discuss too. I gave the Red State, Blue State talk at Harvard a few months ago so I don't need to talk so much about that stuff. Probably I'll throw a lot of data at 'em without any single coherent storyline. If you show up, please think of some tough questions: the format is that I speak for 45 minutes and then there's 45 minutes of discussion.

P.S. Here's what happened.

Recession Naming Guide

| No Comments

Some useful ideas:

With trendy letters like Z and Q commanding 10 points a piece, parents across the country are rethinking their naming expenditures. If you're looking to maximize style but minimize points, try these tips for cool, cost-effective baby-names. . . .

Perhaps Nan and Sue will make a comeback?

JAMA Editors Go Nuts

| 11 Comments

This is pretty funny.

Whaddya think of that, Matt?

Economist-centrism

| 4 Comments

Steven Levitt writes of Time Magazine's list of the 100 people who "shape our world," that one year they included him but that, in his opinion, "Economists have not figured very prominently on the previous lists; there has been roughly one economist in the top 100 per year."

One per hundred seems pretty good to me, considering that economists represent only 0.1% of the employed population in the United States!

I guess the real moral of the story is that, whatever people have, they will consider it as a baseline and then want more.

P.S. Of course I'm happy that Nate is ranked in the top 200, but, no, he's not an economist. He's a sabermetrician, or, if you want to use a more general term, "statistician." If you call someone an economist just because he majored in economics in college, then I'm a physicist.

I thought that economists might be interested in my thoughts on the new book by Angrist and Pischke and, more generally, on the different perspectives that statisticians and economists have on causal inference. So I wrote them up as a short document and asked an econometrician friend where to send it. He said that the Journal of Economic Literature does book reviews so I sent it there. They returned it to me with kind words on my review but the note: "The JEL has avoided reviewing textbooks, focusing instead on research monographs. The review makes fine points about the coverage in this textbook, but neither the book nor the review are attempting to advance the state of the art."

Fair enough. So where to send the review. I asked some colleagues and they all agreed that JEL is the only economics journal that reviews books. So I guess econ textbooks just don't get reviewed!

This surprised me, given that book reviews appear in several top statistical journals, including the Journal of the American Statistical Association, the American Statistician, Biometrics, the Journal of the Royal Statistical Society, Statistics in Medicine, and Technometrics. There are also lots of places that review books in political science.

I'm surprised that there's only one place for book reviews for economists.

See here for my thoughts on the surprising stability of the economics curriculum.

And is there anything out there that can serve as a reasonable substitute?

Seth points to this wonderful suggestion by Tim Hartford on "how to enjoy the thrill of the lottery without the fool's bet":

Choose your numbers, but don't buy a ticket. You'll win almost every week - the fear that your number might actually come up is an adrenaline rush to beat them all.

I love this. But now I want to return to Seth, who draws a connection to what scientists do. I don't quite agree with what Seth writes--I think he gets his argument tangled up--but it's interesting, so let me repeat it and then follow up with my own comments. Seth writes:

It is the average [lottery] consumer who is gullible and makes the whole thing work . . . Scientists are no less gullible. Self-experimentation, like Hartford's advice, takes advantage of that gullibility. Because scientists essentially play the lottery in their research -- devote considerable resources (their careers) to looking for discoveries in one specific way (scientists are hemmed in by many rules, which also slow them down) -- this leaves a great deal to be discovered by research that doesn't cost a lot and can be done quickly. All of my interesting self-experimental discoveries have involved treatments that conventional scientists couldn't study because their research has to be expensive. Could a conventional scientist study the effect of seeing faces in the morning? No, because you couldn't get funding. And all research must require funding. (Research without funding is low status.) In practice, this means you can't take risks and you can't do very much. Like the lottery, this is a poor bet.

Let me untangle this. Seth is saying that the typical scientist is like a lottery player whereas, by doing self-experimentation, Seth is more like Tim Hartford's reverse lottery player, going for the near-sure thing rather than investing time in the hope of a hypothetical breakthrough.

It's funny that Seth says this, because I've always told him the opposite: conventional scientists such as myself are the plodders, squeezing out little research results each year, publishing in journals and getting grants, whereas Seth has always seemed to me to be the gambler, stepping away from the near-sure thing of the scientific treadmill and risking something like 10 years of his life on self-experimentation--it was about 10 years after he began that he started to get useful results. I've always admired Seth for his gamble.

Right now I can see that Seth views self-experimentation as a grind-it-out way to make discovery after discovery, but 20 years ago, not so much. Conversely, I don't think of conventional scientists as staking their careers on the chance of making a single big discovery. Rather, we make no risks at all! To paraphrase Paul Erdos, a scientist is a machine for turning hard work into little bits of publishable research.

P.S. I don't buy Seth's claim that "research without funding is low status." My impression is that people seek funding because they feel their research is important and they want help getting it done faster. I don't see that status has anything to do with it.

Andy Sutter writes:

It's been a while (~2 years?) since I was last reading your blog semi-regularly and submitted a comment or two, but I was reading something today that made me recall those days.

At the time, I was curious about why social scientists present data as charts of regression coefficients, since I'd never seen such a presentation in the physical sciences.

Helen DeWitt, commenting on about friends/colleagues/acquaintances who ask her for reference letters, writes of "a mythical entity: a reference that can just be dashed off in half an hour and popped in the post / fired off in an e-mail. There is no such thing."

Jenny Davidson follows up with:

I [Jenny] do not know why someone thinks that it is possible to write a good letter of recommendation without a HUGE amount of supplementary paperwork . . .

What's my experience? I get asked for a fair number of letters of recommendation or evaluation, and I take about 15 minutes to write such a letter and email it off (to someone who prints it on letterhead paper and mails it). From the remarks above, I suspect that it's considered a norm to spend more time than that, but I think it's a bit of an arms race: your letter has to be long so it can compete with other people's letters. So by writing short letters, I'm doing my part to make the process more sane. There's a well-known statistician who always writes letters for his students saying essentially that they're the second coming of Cauchy; he's recognized for doing this. As long as people have the expectation that my letters will be short, everything should work out fine.

I had a discussion with Christian Robert about the mystical feelings that seem to be sometimes inspired by Bayesian statistics. Christian began by describing this article that was on the web about constructing Bayes' theorem for simple binomial outcomes with two possible causes as "indeed funny and entertaining (at least at the beginning) but, as a mathematician, I [Christian] do not see how these many pages build more intuition than looking at the mere definition of a conditional probability and at the inversion that is the essence of Bayes' theorem. The author agrees to some level about this . . . there is however a whole crowd on the blogs that seems to see more in Bayes's theorem than a mere probability inversion . . . a focus that actually confuses--to some extent--the theorem [two-line proof, no problem, Bayes' theorem being indeed tautological] with the construction of prior probabilities or densities [a forever-debatable issue]."

I replied that there are several different points of fascination about Bayes:

Eric Loken writes:

Last week the New York Times published an article on a possible Obama effect on test scores of black test takers. . . . The authors claim that they gave a short academic aptitude type test to black and white test-takers. When they administered the test last summer, they noted a difference between average scores for blacks and whites. However, after (now) President Obama had received his party's nomination and given his acceptance speech, the difference in scores disappeared. The theory is that Obama's rise has had a positive motivating influence on test taking performance.

Eric then gives some background:

Hal Pashler writes:

I [Pashler] thought you guys would enjoy this charming little 1950 paper by Edward Cureton entitled "Validity, Reliability, and Baloney" (Dirk Vorberg, a German math psych guy, sent it). Long before machine learning, it seems that psychometrics people were confronting this issue--and the concrete form it took was "What should we make of validation measures computed with the same data that were used to select out particular items for inclusion in the test?". Just swap voxels for items, and it's the same problem [as in the Vul, Harris, Winkelman, and Pashler paper on suspiciously high correlations in bran imaging studies].

This reminds me of a longstanding principle in statistics, which is that, whatever you do, somebody in psychometrics already did it long before. I've noticed this a few times. Once, about ten years ago, I was at a conference where computer scientists were talking about some pretty elaborate statistical models, and I realized these were the same as some things I'd seen Iven Van Mechelen and his colleagues working on in the Psychology Department at Leuven. Then, more recently, I wrote this article with David Park on splitting a predictor into three parts, and it turned out that similar work had been done in 1928! by psychometric researcher T. L. Kelley (and, oddly enough, E. Cureton in 1957).

Is it something I said?

| 12 Comments

I had a grant application turned down and wrote the following polite email to the program director:

Dear Dr. ***,

I am sorry to hear this. In particular, I can't understand how the panel could've thought that the methods are "not in themselves new." Clearly we have more work to do in explaining our proposal.

But I will look on the upside, which is that ** must have received some excellent proposals to fund that were even better than ours! So congratulations on that.

Yours
Andrew Gelman

I was surprised that he did not respond, but when I related the story to my colleagues, they explained to me that the director might have thought I was being sarcastic in my email. I was actually sincere. But intonation is notoriously difficult to convey via email.

A couple of months ago, my article on the probability of a single vote being decisive in the presidential election (at most 1 in 10 million, according our calculations) was picked up by the Associated Press, and shortly after I received the following email from Deron Reynolds, a pilot in the U.S. Air Force:

Peter Woit relates a story about how four physicists did work that led to a Nobel Prize, but the rules only allowed it to be given to three of them, creating a motive for murder. The story is consistent with Andrew Oswald's finding that not getting the Nobel Prize reduces your expected lifespan by two years. The fited article frames it as that winning the prize increases your lifespan, but so many more eligible people don't get it than do (and the No comes year after year). I'd guess that it's a net reducer of scientists' lifespans. Even setting murder aside.

Lane Kenworthy writes (link from here and here):

The notion that political parties are a key determinant of income inequality has been around for a long time. I suspect many non-academics take its truth for granted. Among American scholars, the notion is perhaps most closely associated with Douglas Hibbs . . .

[In his recent book, Unequal Democracy], Larry Bartels suggests that a key part of the story is different policies pursued by Democratic and Republican presidents. . . . Bartels' argument, while by no means novel, is very much a fresh one. It is based on extensive empirical analysis of the post-World War II period. Is he correct? I think Bartels probably has it right for part of this period, but I'm not convinced that his hypothesis holds up for the other part. . . .

This relates to some ideas I had after seeing Bartels speak on his work at Columbia a couple of years ago; see here and here. In particular, in that last link, I wrote the following:

After seeing Larry Bartels present his findings on how the economy has done better, for the poor and middle class, under Democratic presidents than Republican presidents, I was puzzled. Not that it couldn't be true, but it seemed a little mysterious, given the general sense that presidents don't have much control over the econony--business cycles just seem to happen sometime.

But the general perceptions about Presidents and the economy have changed over time.

I might be wrong here, not having lived through the entire postwar period, but my perception is that, during most of this time, "competence" was not an issue; rather, there was a general belief that the president could do some things, most notably help labor (for the Democrats) or business (for the Republicans).

The exception here was the 1976-1996 period, during which there was a real sense of economic incompetence or powerlessness of some presidents (Ford with his Whip Inflation Now, Carter with stagflation, the residual view of Democrats being incompetent for the economy, George H.W. Bush with the deficit and the regression, perhaps extending to Dole in 1996). Then, since 2000, we've returned to the general attitude that both parties have essential competence but have different goals. (Not that everyone agrees on the "competence" issue, but it seems to me that the battle is more being fought on priorities than competence--in contrast to 1992, for example.)

So, the conventional wisdom based on the 1976-1996 period is that presidents can't do much, they're at the mercy of the business cycle, etc., which makes Bartels's results seem like some sort of fluke, or a perhaps meaningless juxtaposition of one-off results. But taking the 1948-1972 and 2000-2004 perspectives, Bartels's graph makes a lot of sense. From this perspective, the Democrats did their thing, and the Republicans did theirs, and you'd expect to see a big difference at the low end of the income scale. (Again, this is inherently short-term reasoning, not long-term, but as Larry pointed out in his talk, the evidence is that voters are susceptible to short-term inferences.)

In summary: we're used to thinking of presidents as fairly powerless surfers on the global economy, able to tinker with tax rates but not much more--but thinking about the entire postwar period, there's certainly been at least the perception that presidents can deliver the economic goods to their constituencies. So from that perspective, Larry's curves should not be much of a surprise--at least in that the slope for Democrats goes down (i.e., poor people do better under Democratic presidents) and the slope for Republicans goes up (i.e., rich people do better under Republican presidents). The relative positions of the lines is another story, which perhaps corresponds to random alignments of the business cycle.

Perhaps Kenworthy can connect this thinking more directly to his arguments. My time frames don't quite align with his, but it's a similar idea of breaking the period into smaller segments.

And, to comment on my comments . . . when posting the above in 2006, I wrote, "since 2000, we've returned to the general attitude that both parties have essential competence but have different goals. . . . we're used to thinking of presidents as fairly powerless surfers on the global economy, able to tinker with tax rates but not much more. . ." Things sure have changed in 2 1/2 years!

Speaking of Steve Hsu

| 3 Comments

This is cute too. I suggest, in addition:

1. Plotting the y-axis on the log scale.

2. Normalizing by total number of words (of, if that's hard to find, something easier such as total number of articles).

Darryl Caldwell writes:

I enjoyed your response to Satoshi Kanazawa's statistical data on sex ratios. I have a quick question for you. Did he respond? How was his response?

My reply: I sent him an email but he did not respond. I assume he must be aware of my comment on his article, in any case. I was disappointed in his lack of response and even more disappointed that he then wrote an entire book on this stuff without addressing these concerns. (Search this blog for Kanazawa for more than you want to know about this topic.) The upside is that I got a publication in the Journal of Theoretical Biology, something which probably otherwise never would've happened!

Also relevant is this paper I wrote with David Weakliem, a paper which I will soon devote an entire blog entry, if for no other reason than to highlight some really quotable bits that David threw in to the revision.

Could we publish your paper?

| 4 Comments

I majored in math and physics in college. I knew all along that math was a dead end for me (in high school, I was in the U.S. Math Olympiad program and learned that there were kids who were much better at math than me. My impression of math was that in every century there are the Cauchys and Fouriers who do the real stuff and then a bunch of other guys who pretty much spin their wheels--and I didn't want to be one of these other guys) but physics was cooler. (I ended up deciding that I didn't understand physics well enough to continue with it, but that's another story.)

One of the requirements for the physics degree was to do an undergraduate thesis. There was a booklet listing the faculty who would take research assistants. My junior year I went to a few of these physics professors but they told me that I should wait until my senior year when I was ready to do a thesis. Then as a senior I went back to these places and was told that they only took students who'd been working with them earlier. (To be fair, though, maybe I wasn't trying so hard. I worked in physics labs in summer jobs all through high school and college, and while it was interesting and I learned a lot--among other things, I became an expert at programming the finite element method for thermal analysis--it never really seemed to be me.)

So I broadened my search and found a professor of political science who accepted undergraduates and did research in game theory. (Although the undergraduate physics degree required a thesis, it did not have to be in physics. And I'd taken a couple of political science classes already.) Game theory sounded interesting so I went to Prof. Alker's office and he told me about a recent book called The Evolution of Cooperation by a political scientist named Robert Axelrod. Alker told me to buy the book, read it, and come back to him with some research ideas. I did so, and we had our next discussion a week or two later.

My ideas were a bunch of pretty technical game theoretic questions involving different prisoner's dilemma strategies, and Alker, to his eternal credit, pointed me in a better direction. Axelrod had a chapter on First World War trench warfare: did his model make sense there? Alker pointed me to a book by Tony Ashworth--Axelrod's main source--and also the book, Men Against Fire, by S. L. A. Marshall (see here for a recent overview), and The Face of Battle, John Keegan's recent historical overview of combat.

These were great leads. Over the next few weeks I read these books and realized that (at least to me) Axelrod's application of game theory to First World War trenches didn't hold up. Alker felt that criticism wasn't enough and pointed me toward recent political science literature on cooperative games, which allowed me to place the trench warfare example in this more general framework.

I liked my undergraduate thesis but never thought to submit it for publication for another fifteen years or so--too bad, I think it could've been influential back in 1986! After a couple of submissions to different journals, I didn't have the energy to try to revise further but luckily had a convenient opportunity to put the article in a book I was writing and editing (it's coming out next year, under the title A Quantitative Tour of the Social Sciences, edited by Jeronimo Cortina and myself). To keep it clean, I took out the alternative models and just focused the chapter on an exposition and criticism of Axelrod's model. The book chapter is fun because I also quoted from, and responded to, some of the referee reports I got from the journals. I also posted the article on my website.

(Just to be clear: I'm a big fan of Axelrod's book, which has been rightfully influential . Even if he overstretched the applicability of his model in one case, this isn't meant as any kind of devastating criticism of his book as a whole.)

Anyway, a year or so ago I got an email from the editor of an Italian sociology journal saying that he liked my article and could he publish it in his journal, QA-Rivista dell'Associazione Rossi-Doria (whatever that means)? I immediately responded yes, as I had no plans to try to go through the submission-and-revision process. And so the article duly appeared. It has a nice title: Methodology as Ideology.

The funny thing is that I thought it was so cool that the journal wanted to publish my paper. No effort needed on my part! On the other hand, why was I so happy to give them my work for free? I mean, suppose I ran into somebody on the street and said, "I really like your bike--could I have it?" Would I say Yeah, sure? But with intellectual property, I'm so eager to give it away! Sure, the article was already posted on the web, but allowing someone else to publish it is slightly different.

P.S. When undergraduates want to work with me, I just about always say yes. Not that it always works out--often I'll give a project to a student and then never hear back from him or her--but I'll give them the chance.

The score

| 3 Comments

Occasionally I post comments here on other people's books or articles, and sometimes I email the authors to get their feedback. Here's the score:

Responded:

John Clute
Richard Florida
Malcolm Gladwell
Sander Greenland
Daniel Gross
Mickey Kaus
Paul Krugman
Andrew Leonard
John Lott
Jay Nordlinger
Andrew Oswald
Ed Park
Steve Sailer
John Seabrook
Nassim Taleb
Josh Tenenbaum

Did not respond:

Robert Frank
Satoshi Kanazawa
George Packer
Russ Alan Price
David Runciman

I think I've missed a few here (in both categories). Also, some people I'm still waiting to hear from, and some respond but not in a useful way.

P.S. I just noticed: all these people are male (and most are white)! I'll have to diversify a bit!

Malcolm Gladwell recounts the story of Sidney Weinberg, a kid who grew up in the slums of Brooklyn around 1900 and rose to become the head of Goldman Sachs and well-connected rich guy extraordinaire. Gladwell conjectures that Weinberg's success came not in spite of but because of his impoverished background:

Why did [his] strategy work . . . it's hard to escape the conclusion that . . . there are times when being an outsider is precisely what makes you a good insider.

Later, he continues:

It’s one thing to argue that being an outsider can be strategically useful. But Andrew Carnegie went farther. He believed that poverty provided a better preparation for success than wealth did; that, at root, compensating for disadvantage was more useful, developmentally, than capitalizing on advantage.

At some level, there's got to be some truth to this: you learn things from the school of hard knocks that you'll never learn in the Ivy League, and so forth. But . . . there are so many more poor people than rich people out there. Isn't this just a story about a denominator? Here's my hypothesis:


Pr (success | privileged background) >> Pr (success | humble background)

# people with privileged background << # of people with humble background


Multiply these together, and you might find that many extremely successful people have humble backgrounds, but it does not mean that being an outsider is actually an advantage.

Here's more from Gladwell's article:

John Seabrook writes:

There is also little consensus among researchers about what causes psychopathy. Considerable evidence, including several large-scale studies of twins, points toward a genetic component. Yet psychopaths are more likely to come from neglectful families than from loving, nurturing ones.

I'm confused here. If there's a big genetic component, wouldn't it stand to reason that parents of psychopaths are more likely to be neglectful and less likely to be loving and nurturing? So why the "Yet" in the quote above? Or is there something I'm missing?

P.S. in response to commenters: Yes, I agree that it's possible for psychopathy to be largely genetic without parents of psychopaths being much more likely to be neglectful.

What I didn't understand was Seabrook's implication that this would be surprising, the idea that if (a) a trait is genetically linked, and (b) a trait can be (somewhat) predicted by parental behavior, that the combination of (a) and (b) should be considered puzzling. By default, I'd think (a) and (b) would go together.

Location, location, location

| 3 Comments

Yes, I'm a nerd. Yes, I'm sitting in a hotel room at my computer typing in data (too early to have anything in downloadable form) and doing scatterplots and regressions. But the hotel room is in Chicago.

Mathematics.

Statistics.

Some differences:

- Tao uses more words. This makes sense: he's busy explaining this stuff to himself as well as to his readers. To a statistician, these ideas are so basic that it's hard for us to really elaborate. (Also, I had a word limit.)

- Tao emphasizes that a confidence interval is not a probability interval. In my experience, confidence intervals are always treated as probability intervals anyway, so I don't spend time with the distinction.

- I emphasize that a poll is a snapshot, not a forecast.

- Tao says that the number of polled voters is fixed in advance. I don't think this is exactly true, what with nonresponse.

- Tao fills his blog entry with Wikipedia links. Wikipedia is ok but I'm not so thrilled with it; I'm happy with people looking things up in it if they want but I won't encourage it.

But we're basically saying the same thing. I like how I put it, but I'm sure a lot of people prefer Tao's style. Luckily there's room on the web for both!

Tyler McCormick, Matt Salganik, and Tian Zheng just wrote this article on using the scale-up method to estimate the size of people's social networks using responses to questions such as "How many people do you know named Kevin?" They build upon earlier work by Bernard, Killworth, McCarty et al. and Zheng, Salganik, and Gelman. This new paper is great; it takes these methods from the "cool" stage to the "useful" stage.

This is funny. It reminds me of when I was asked to help design a study, and I told the researcher I was upset to be involved in the design. Why? Because the #1 thing that statisticians like to say is, "Sorry, the analysis is really difficult because you screwed up the design." So, if you ask me to help with the design, I lose my best alibi!

Blogs as places?

| 11 Comments

Henry Farrell referred here to his blog as a "place." Which seemed funny to me because I think of a blog as a "thing." Henry replied:

That's the way that I [Henry] think about blogs (or at least group blogs and blogs with comments) - places where people meet up, chat, form communities, drift away from each other etc.

My analogy was blog-as-newspaper, the self-publishing idea, and I'm not used to thinking of a newspaper, or even a listserv, as a place. I think there is an aspect of the analogy that I'm still missing.

P.S. See Mark Liberman's thoughts in his blog here.

Laura Wattenberg writes, "in baby naming as in so many parts of life, style, not values, is the guiding light."

Tyler Cowen discusses the possibility of economics prodigies. I refer him and his commenters to Dick De Veaux's saying, "Math is like music, statistics is like literature." You can decide yourself where economics is or should stand in this spectrum. I will say, though, that it can take decades to develop a good idea, just because you can be busy doing other things.

They pay you for not working

| 5 Comments

A few months ago I noticed on my friend Seth's website that he was an "emeritus professor." I called him up, first thinking it was a mistake--he's well under sixty years old--but, no, he really is retired. He taught at Berkeley for 30 years. We had the following exchange on the phone:

Me: Why retire? As a professor, they pay you even if you don't work.

Seth: They pay you for not working if you're retired, too.

He's got a point.

See here for Jeremy's comments to my comments. I agree with what he writes. The whole discussion reminds me of a comment made to me once by a statistician who generally works with engineers. He said that when he talks with people about statistical procedures, engineers focus on the algorithm being applied to the data, whereas statisticians are always thinking about the psychology of the person doing the analysis.

I encountered this. It writes, of our blog and Econlog:

Comments require pre-approval. Here's where the bullshit really starts. I only understand this if there's a clear policy that it's to reduce liability and to prevent posting of illegal, defamatory, or commercially exploitive materials. I don't see such an explicit policy with these blogs. So the feeling is that any comment posted has to also meet the threshhold in being something the blog owner is comfortable with. Yuck.

Actually, we require pre-approval because we sometimes we get tons of spam. Going in and approving comments once a day (or more frequently if I feel like having more distractions) seemed better than going in and deleting spam once a day. But, I don't know, maybe this other approach is better. There was a time when we were getting dozens of spam per day but now we only get something a couple of spam comments per day that slip through the filter.

Am I a boy or a girl?

| 5 Comments

Mike Kruger pointed me to this site which estimates the probability you're female and the probability you're male based on your browsing history. It estimates the probability that I'm male as 66%. I don't really know what these "probabilities" are, though. They're between 0 and 1, but I doubt they're calibrated to give a direct probability interpretation. (For example, if you took everybody whose claimed probabilities were 66%, would 66% of these people actually be male?)

And here are Mike's comments. He calls the method "Bayesian" but I'm not so sure.

In an article on U.S. foreign policy and domestic politics, Samantha Power writes:

Since 1968, with the single exception of the election of George W. Bush in 2000, Americans have chosen Republican presidents in times of perceived danger and Democrats in times of relative calm.

So here's the difference between qualitative and quantitative researchers. Samantha Power knows more about foreign policy and politics than I'll ever know. But she could whip off the above sentence without pause. Whereas, when I see it, I think:

- Why start in 1968? Is this just a convenient choice of endpoint? Eisenhower ran as a national security expert, no?
- What evidence can you expect to get about public opinion from the essentially tied elections of 1968, 1976, and 2000?
- Anyway, if you're talking public opinion, it was Gore who won more votes in 2000--so it's funny to be taking that as an exception at all!
- How are "perceived danger" and "relative calm" defined? Was 1988, when George H. W. Bush floored Michael Dukakis, really such a time of "perceived danger"?

I have no expertise to comment on the rest of Power's article; I just think it's funny that she'd throw in a sentence like that. It's just a throwaway comment she made; I wouldn't put it in the class of David Runciman's "but viewed in retrospect, it is clear that it has been quite predictable" or John Yoo writing an entire op-ed on something he appears to know nothing about. It's just one of these things that rings alarm bells to a "quant" such as myself but just passes right by the qualitative analyst.

P.S. On an unrelated note, that same issue of the New York Review of Books had this great line by Michael Dirda: "Real readers always read for excitement; only the nature of that excitement changes through life."

See here.

Alone in the car at night

| 3 Comments

I drove a car for 30 miles yesterday. I hadn't driven a car so far without passengers in over 15 years, and boy did it feel weird. All these cars on the road with people sitting perfectly still holding their steering wheels and having to remember not to go off the road. Driving, I feel a bizarre mixture of complete control and no control at all.

driving.jpg

Integrate this, pal

| 3 Comments

socparticles.png

I copied this image over here, certain that I'd be able to add a witty remark of my own, but I give up.

In the Playroom today, I came across a book called "A Design for Scholarship," a collection of speeches from 1935-1936 by Isaiah Bowman, president of Johns Hopkins University. Flipping through, I came across this quote:

If you wish to live in bovine contentment, the University is no place for you.

Things sure have changed, huh?

The Graduate Junction

| 2 Comments

Esther Dingley sent an email about this site which is intended to help graduate students share research ideas. I'm not sure where it falls in the spectrum from Facebook to Wikipedia, but perhaps it will be useful. Looking up some of my own research interests, I found nothing for "statistics" or "political science," but there was a group for "social networks."

Our publisher is putting together our new book (no, not Red State, Blue State, I'm talking about our next book, A Quantitative Tour of the Social Sciences), and we need a cover design. Now. Any ideas? Free book to the person with the best idea. And anybody with a particularly good idea, I'll take to lunch. (Or maybe Jeronimo, my coeditor, will take you to lunch if you're in Houston...)

Some background: The book has sections on history, economics, sociology, political science, and psychology, and each section has a different author (or set of authors). It's not a statistics book; rather, it's a set of discussions and case studies, giving the reader (most likely a student of one of the social sciences) a sense of how to think like a historian, economict, sociologist, etc. It's based on a course I created for our Quantitative Methods in Social Science program at Columbia. Anyway, there will be plenty of time for book promotion later; now, I'm just trying to give you enough information to come up with a good cover design for us.

Here's the table of contents:

Citation statistics

| 7 Comments

Juli sent me this article by Robert Adler, John Ewing, and Peter Taylor arguing that "impact factors" journals are overrated. I'm sympathetic to this argument because my articles are typically published in low-impact-factor journals (at least in comparison with psychology). I also like the article because it has lots of graphs, no tables. Here's a graph for ya:

impact.png

Adler et al. also criticize the so-called h-index and its kin; as they write, "These are often breathtakingly naive attempts to capture a complex citation record with a single number. Indeed, the primary advantage of these new indices over simple histograms of citation counts is that the indices discard almost all the detail of citation records, and this makes it possible to rank any two scientists. . . . Unfortunately, having a single number to rank each scientist is a seductive notion – one that may spread more broadly to a public that often misunderstands the proper use of statistical reasoning in far simpler settings."

Here's an amusing story. I can understand why the guy computed his test from scratch, but I agree with Dan that the two-page appendix is kind of over the top.

Really, we were just making our own independent decision. Just another example of unexplained group-level variance.

After I remarked here on the notorious rudeness of one of the frequent posters to the R listserv, several commenters agreed with me (for example, Richard Morey wrote, "I've found that the R-help forums are legendary for the rude poster(s)"), but Nick Cox writes,

problems of clashing styles and expectations are generic to all technical lists that I've ever heard of that are not selective about membership. . . . The problem is a political question, not a technical one. The question is what to do when people, for whatever reason, do not follow the standards laid down for proper behaviour in a group, a discussion list, and one that they willingly join. . . . The solution being complained about is that some people -- usually "senior" people on a list with recognisable expertise -- are very firm in reminding posters of poor questions about the need to be much more precise about what their difficulty is, to read the documentation, etc. As this advice is very much part of the guidelines that people are asked to follow, it seems disingenuous, if not hypocritical, to complain when those people are trying their level best to maintain the standards of the list, exactly as advertised.

Cox's comments are interesting--and they suggest that when I and others think the R posters are particularly rude, we just don't have much experience with large listservs--but I actually want to take this in a different direction.

In my previous entry, I wrote, "I think it's ok for you to just post your questions and ignore that one [rude] person if he replies to you." Personally, I find rudeness from strangers unpleasant, even on a listserv, but I recognize that's just the way things are, and that's why I advise people to post to the list anyway.

Why is rudeness so upsetting, and how does it relate to altruism?

The more interesting question, perhaps, is why does this sort of rudeness hurt so much? Even though, as a logical matter, we should be able to ignore this rudeness, it actually hurts enough to dissuade people from posting.

The other aspect of listserv rudeness that intrigues me is that posting answers to a listserv is basically an act of altruism. People don't get paid to do it, they don't get academic credit, and in fact if they're rude they don't even get a lot of respect for it (at least, not as much respect as they might deserve, given their contributions). So, it's not enough of an answer to describe rude posters as assholes--if they were real assholes, they wouldn't be posting at all, right? They're sort of like those legendary caring-but-firm teachers who put a huge amount of effort into helping each individual student, but show it in this crusty, sarcastic, tough-guy fashion. I guess I can see, following Cox's argument above, that rude posters are serving the greater good by hurting people's feelings. I just think it's impressive that people would be so altruistic as to do this.

P.S. Posting on a blog is similar but less altruistic. For example, yes, I answer people's questions here but I also get a chance to promote my own work, which I don't see being done so much on the R listserv.

P.P.S. Given all the comments below, I fear I wasn't being clear enough in my own views here. I am being serious above, not sarcastic. Although certain listserv posters can be abrasive and even rude, I really do feel they are altruistic in providing all this free help on the list. Stylistic issues aside, I think that people who give help in this way are performing a valuable service, and I appreciate this, both for myself (on the occasions that I use the list) and on behalf of students and others whom I refer to the list.

Sam Roberts writes,

In 1984, according to the Social Security Administration, nearly 3.4 million Smiths lived in the United States. In 1990, the census counted 2.5 million. By 2000, the Smith population had declined to fewer than 2.4 million.

Where did all the Smiths go from 1984 to 1990? I can believe it flatlined after 1990, but it's hard to believe that the count could have changed so much in 6 years.

Perhaps it's the difference between the SSA and Census methods of counting?

Recent Comments

  • jonathan: When I was a kid, I saw "powers of ten" read more
  • Andrew Gelman: I discussed this issue in the blog entry linked above, read more
  • Andrew Gelman: Yes, exactly. I think people are making a big mistake read more
  • Bill Drissel: As I hear English, {problem} linked to {candidate cause} and read more
  • Bill Jefferys: I appreciate the link to the very cool "size of read more
  • Thank God for western civ: The under 30 crowd supports school vouchers and social security read more
  • Jared: Elke Weber, right there at Columbia, has done a bunch read more
  • Thorfinn: Maybe you're right about the risk premium, but I'm not read more
  • Bill Harris: I've got a similar question, and I wonder if your read more
  • JonBen: Very interesting data. I understand the social context of putting read more
  • Radu Craiu: I feel compelled to confess that I have read K read more
  • Paul: I think a lot of the issue comes down to read more
  • Nick Cox : Jacob: Thanks for your extra comments. You'd have saved yourself read more
  • Asa: Thanks everyone. I figured out a pretty solid solution to read more
  • Stuart Buck: Is it that medical schools are trying to screen out read more
  • Jacob: BTW, in no way I am putting down R. R read more
  • Jacob: Nick, Of course, my comment on MATLAB's popularity is based read more
  • Steven: http://www.cockeyed.com/science/gallon/liquid.html See for more info read more
  • Andrew Gelman: Jonathan: You are giving the conventional definition of risk aversion read more
  • Jonathan: As an economist who does his work with "the public," read more