Recently in Miscellaneous Science Category

The other day I commented on an article by Peter Bancel and Roger Nelson that reported evidence that "the coherent attention or emotional response of large populations" can affect the output of quantum-mechanical random number generators.

I was pretty dismissive of the article; in fact elsewhere I gave my post the title, "Some ESP-bashing red meat for you ScienceBlogs readers out there."

Dr. Bancel was pointed to my blog and felt I wasn't giving the full story. I'll give his comments and then at the end add some thoughts of my own. Bancel wrote:

Finding signal from noise

| 15 Comments

A reporter contacted me to ask my impression of this article by Peter Bancel and Roger Nelson, which reports evidence that "the coherent attention or emotional response of large populations" can affect the output of quantum-mechanical random number generators.

I spent a few minutes looking at the article, and, well, it's about what you might expect. Very professionally done, close to zero connection between their data and whatever they actually think they're studying.

I received the following email:

Hello, my name is Lauren Schmidt, and I recently graduated from the Brain & Cognitive Sciences graduate program at MIT, where I spent a lot of time doing online research using human subjects. I also spent a lot of time being frustrated with the limitations of various existing online research tools. So now I am co-founding a start-up, HeadLamp Research, with the goal of making online experimental design and data collection as fast, easy, powerful, and painless as can be. But we need your help to come up with an online research tool that is as useful as possible!

We have a short survey (5-10 min) on your research practices and needs, and we would really appreciate your input if you are interested in online data collection.

I imagine they're planning to make money off this start-up and so I think it would be only fair if they pay their survey participants. Perhaps they can give them a share of the profits, if any exist?

Bill Browne sends in this interesting job possibility. Closing date for applications is 30 Oct 2009, so if you're interested, let him know right away!

Rorschach's on the loose

| 9 Comments

According to Josh Millet, the notorious Rorschach inkplots have been posted on the web, leading to much teeth-gnashing among psychologists, who worry that they can't use the test anymore now that civilians can get their hands on the images ahead of time.

For example, here's a hint for Card IV (see below): "The human or animal content seen in the card is almost invariably classified as male rather than female, and the qualities expressed by the subject may indicate attitudes toward men and authority."

160px-Rorschach_blot_04.jpg

So, if they show you this one on a pre-employment test, better play it safe and say that the big figure looks trustworthy and that you'd never, ever steal paperclips from it.

Oh, and when Card II comes up, maybe you should just play it safe and not mention blood at all.

More general concerns

I'm not particularly worried about the Rorschach test since it's pretty much a joke--you can read into it whatever you want--but, as Millet points out, similar issues would arise, for example, if someone stole a bunch of SAT questions and posted them. It would compromise the test's integrity. Millet points out that this problem could be solved if you were to release thousands and thousands of potential SAT questions: nobody could memorize all of these, it would be easier to just learn the material.

I've had the plan for many years to do this for introductory statistics classes: to have, say, 200 questions for the final exam, give out the questions to all the students, and explain ahead of time that the actual exam will be a stratified sample from the list. This would encourage students to study the material but not in a way that they could usefully "game the system." I haven't done this yet--it's a lot of work!--but I'm still planning to do so.

This is all standard physics. Consider the two-slit experiment--a light beam, two slits, and a screen--with y being the place on the screen that lights up. For simplicity, think of the screen as one-dimensional. So y is a continuous random variable.

Consider four experiments:

1. Slit 1 is open, slit 2 is closed. Shine light through the slit and observe where the screen lights up. Or shoot photons through one at a time, it doesn't matter. Either way you get a distribution, which we can call p1(y).

2. Slit 1 is closed, slit 2 is open. Same thing. Now we get p2(y).

3. Both slits are open. Now we get p3(y).

4. Now run experiment 3 with detectors at the slits. You'll find out which slit each photon goes through. Call the slit x. So x is a discrete random variable taking on two possible values, 1 or 2. Assuming the experiment has been set up symmetrically, you'll find that Pr(x=1) = Pr(x=2) = 1/2.

You can also record y, thus you can get p4(y), and you can also observe the conditional distributions, p4(y|x=1) and p4(y|x=2). You'll find that p4(y|x=1) = p1(y) and p4(y|x=2) = p2(y). You'll also find that p4(y) = (1/2) p1(y) + (1/2) p2(y). So far, so good.

The problem is that p4 is not the same as p3. Heisenberg's uncertainty principle: putting detectors at the slits changes the distribution of the hits on the screen.

This violates the laws of conditional probability, in which you have random variables x and y, and in which p(x|y) is the distribution of x if you observe y, p(y|x) is the distribution of y if you observe x, and so forth.

A dissenting argument (that doesn't convince me)

To complicate matters, Bill Jefferys writes:

As to the two slit experiment, it all depends on how you look at it. Leslie Ballentine wrote an article a number of years ago in The American Journal of Physics, in which he showed that conditional probability can indeed be used to analyze the two slit experiment. You just have to do it the right way.

I looked at the Ballentine article and I'm not convinced. Basically he's saying that the reasoning above isn't a correct application of probability theory because you should really be conditioning on all information, which in this case includes the fact that you measured or did not measure a slit. I don't buy this argument. If the probability distribution changes when you condition on a measurement, this doesn't really seem to be classical "Boltzmannian" probability to me.

In standard probability theory, the whole idea of conditioning is that you have a single joint distribution sitting out there--possibly there are parts that are unobserved or even unobservable (as in much of psychometrics)--but you can treat it as a fixed object that you can observe through conditioning (the six blind men and the elephant). Once you abandon the idea of a single joint distribution, I think you've moved beyond conditional probability as we usually know it.

And so I think I'm justified in pointing out that the laws of conditional probability are false. This is not a new point with me--I learned it in college, and obviously the ideas go back to the founders of quantum mechanics. But not everyone in statistics knows about this example, so I thought it would be useful to lay it out.

What I don't know are whether there are any practical uses to this idea in statistics, outside of quantum physics. For example, would it make sense to use "two-slit-type" models in psychometrics, to capture the idea that asking one question affects the response to others? I just don't know.

Lee Sigelman points to this article by physicist Rick Trebino describing his struggles to publish a correction in a peer-reviewed journal. It's pretty frustrating, and by the end of it--hell, by the first third of it--I share Trebino's frustration. It would be better, though, if he'd link to his comment and the original article that inspired it. Otherwise, how can we judge his story? Somehow, by the way that it's written, I'm inclined to side with Trebino, but maybe that's not fair--after all, I'm only hearing half of the story.

Anyway, reading Trebino's entertaining rant (and I mean "rant" in a good way, of course) reminded me of my own three stories on this topic. Rest assured, none of them are as horrible as Trebino's.

I received this question in the mail:

Your Biometrics article, Multiple imputation for model checking: completed-data plots with missing and latent data, suggests diagnostics when the missing values of a dataset are filled in by multiple imputation. But suppose we have two equivalent files--File A with variable y left-censored at known threshold and File B with y fully observed. We draw multiple imputations of censored y in File A. (1) Can we validate our imputation model by setting y in File B as left-censored according to the inclusion indicator from A, performing multiple imputation of these "censored" data, and comparing imputed to observed values? (2) In particular, what diagnostic measure(s) would tell us whether the imputed and observed values fit closely enough to validate our imputation model?

My reply: I'm a little confused: if you already have File B, what do you need File A for? Do the two files have different data, or are you just using this to validate your imputation model? If the latter, then, yes, you can see whether the observations in File B are consistent with the predictive distributions obtained from your multiple imputations on File A. You wouldn't expect the imputations to be perfect, but you'd like the imputed 50% intervals to have approximate 50% coverage, you'd like the average values of the true data to equal the predictions from the imputations, on average, and conditional on any information in the observed data in File A. (But the imputations don't have to--and, in general, shouldn't--be correct on average, conditional on the hidden true values.)

You may also be interested in my 2004 article, Exploratory data analysis for complex models, which actually an example on death-penalty sentencing, with censored data.

Truth in Data

| 1 Comment

David Blei is teaching this cool new course at Princeton in the fall. I'll give the description and then my thoughts.

This is just sad

| 7 Comments

Daniel Lakeland writes:

You may be astounded that people are still reporting 26% more probability to have daughters than sons, and then extrapolating this to decide that evolution is strongly favoring beautiful women... Or considering the degree of innumeracy in the population perhaps you wouldn't be astounded.... in any case... they are still reporting such things.

If anyone out there happens to know Jonathan Leake, the reporter who wrote this story for the (London) Sunday Times, perhaps you could send him a copy of our recent article in the American Scientist. Or, if he'd like more technical details, this article from the Journal of Theoretical Biology?

Thank you. I have nothing more to say at this time.

I just read Charles Seife's excellent book, "Sun in a bottle: The strange history of fusion and the science of wishful thinking." One thing I found charming about the book was that it lumped crackpot cold fusion, nutty plans to use H-bombs to carve out artificial harbors in Alaska, and mainstream tokomaks into the same category: wildly-hyped but unsuccessful promises to change the world. The "wishful thinking" framing seems to fit all these stories pretty well, much better than the usual distinction between the good science of big-budget lasers and tokomaks and the bad science of cold fusion and the like. The physics explanations were good also.

The only part I really disagreed with. On page 220, Seife writes, "Science is little more than a method of tearing away notions that are not supported by cold, hard data." I disagree. Just for a few examples from physics, how about Einstein's papers on Brownian motion and the photoelectric effect? And what about lots of biology, chemistry, and solid state physics, figuring out the structures of crystals and semiconductors and protein folding and all that? Sure, all of this work involves some "tearing away" of earlier models, but much of it--often the most important part--is constructive, building a model--a story--that makes sense and backing it up with data.

After finding the Howard Wainer interview, I looked up the entire series of Profiles in Research published by the Journal of Educational and Behavioral Statistics. I don't have much to say about most of these interviews: some of these people I'd never heard of, and I don't really have much research overlap with the others. Probably I have the most overlap with R. D. Bock, who's done a lot of work on multilevel modeling, but, for whatever reason, his stories didn't grab my interest.

But I was curious about the interview with Arthur Jensen. I've never met him--he gave a talk at the Berkeley statistics department once when I was there, but for some reason I wasn't able to attend the talk. But I've heard of him. As the interviewers (Daniel Robinson and Howard Wainer) state:

The roach-bombing puzzle

| 8 Comments

I've been assured, and I believe, that the effective way to get rid of the roaches in your apartment is to clean the place, put poison in the cracks, and then seal them. Some people do that. But a lot of people go for the "bombing" approach: the exterminator comes to the building once a month, drops the bomb, leaves, and comes back the next month.

My question is: what are these people thinking?? Why do these people willingly get bombed once a month instead of following the simpler and effective approach? Part of this is ignorance, surely, but I think there's more to it than that, some underlying psychological appeal. I don't think it's just ignorance because, when I talk with people who get bombed and discuss the "clean, poison, and seal" approach, I've found them to be very resistant and (I would say) "defensive." They seem to want to believe that bombing is effective and really don't want to hear about alternative strategies.

What's going on? I have some theories. Maybe bombing seems like less effort than cleaning the food out of your closet and sealing the cracks. Also it seems sort of decisive. On the other hand, shouldn't people pause a little when they think about needing the exterminator every month? Yet, that doesn't seem to bother people. Conceptually, getting the exterminator to bomb your apartment feels to me a bit like "taking a pill." Maybe there's some technological appeal. Sort of like the way that photovoltaics are sexy in a way that passive solar isn't.

I don't know. I'll have to ask some psychologists of my acquaintance who work on environmental decision making.

I want to explore the distinction between self-experimentation and formal experimentation in the context of a recent discussion on Seth's blog.

The story begins with two people who found, via self-experimentation, how to make their acne go away:

A student . . . had gone on a camping trip and found that her acne went away. At first she thought it was the sunshine; but then, by self-experimentation, she discovered that the crucial change was that she had stopped using soap to wash her face.
A friend of Seth writes: "I started "washing" my face with water about a month ago, and [now] my face is acne free and soft as a pair of brand new UGG boots. [He had had acne for years.]"

In the comments section, someone writes:

While it would be nice to think that all we have to do to get rid of acne is stop using those expensive cleanser and just use water - this is just anecdotal evidence you present. It would require a large clinical trial to be conclusive.

Seth replies that informal experimentation is cheaper and faster than more formal clinical trials. Also, different things might work for different people, so whether or not a treatment has been evaluated a large study, it might make sense to test it yourself--especially for something such as acne or weight loss that is not an urgent concern.

This got me thinking . . . what are the benefits (if any) of a formal controlled trial? In statistics, we usually frame these benefits by comparing to observational studies. The big risk in an observational study is that the treatment and control groups will differ in important ways (as in the famous hormone replacement therapy story). Is this worth the cost? Maybe. Sometimes.

A related issue is bias, a word which I am using in the conversational rather than the statistical sense. For example, how would you want to evaluate the risks and effectiveness of a new drug that was developed by a pharmaceutical company at the cost of millions of dollars? I'd be suspicious of an observational study: even if conducted by professionals, there just seem to be too many ways for things to be biased.

In Seth's acne example, there is no financial source of bias. And, as Seth points out, the test is free to apply on yourself. If I had a kid with acne, I'd give it a try and do an experiment--which means trying the soap and no-soap conditions on different days (or different weeks, or months) and measuring and recording acne levels. One thing I've gathered from Seth's work is that there are big benefits to be gained by doing self-experimentation with careful measurement and record keeping, rather than simply trying different things and trying to remember what works.

On the other hand, yeah, I'm skeptical about Seth's acne claims, and I think a larger study would be more likely to convince me. But I don't think it would have to be expensive. All Seth (or somebody) needs is to set up a protocol for deciding when to wash with soap or water and a protocol for measuring acne, then he could get a bunch of volunteers to flip coins and try it. This blog has a few thousand readers, and Seth's diet forum has thousands of participants, so it shouldn't be so hard to find people to do this. I'm not so interested in acne myself, but according to Seth (and others, I assume), "acne really matters," so maybe it's worth giving this a try.

Mandelbrot on taxonomy

| 3 Comments

Taxonomies are fractal with, at any node, some number of branches (typically one or two major branches and several minor ones). Here's a fascinating article by Benoit Mandelbrot from 1955 on models of taxonomic structures. Great stuff. The article was published in Information Theory--3rd London Symposium, ed. Colin Cherry, and is hard to find online. At least it was until now.

mandelbrot2.png

The only thing that puzzles me about this article (sent to me by Chris Wiggins) is that at first it's presented as new: "The trend is buried deep in United States census data . . " A couple paragraphs down, the article explains that these patterns were published last year by Lena Edlund and Doug Almond (who presented the results in our quantitative political science seminar). In any case, it's an excellent news article and discusses the issues well. The only thing I'd like to see are some sample sizes, so that students who are given this article to read can compute the standard errors on their own.

Also, I have a couple problems with their graph. First, I'm not a fan of expressing sex ratios as #boys per 100 girls. To me, it's clearer just to give %girls (or %boys) as a straight number: 48.8% or whatever. Second, it's a mistake to make these as bar graphs starting at zero. Here, zero is not a reasonable baseline: it's not like you're really expecting to see zero girl births. I appreciate that they were trying to make a pretty graph, but in this case I'd go with a simple dot plot with +/- 1 standard error bars on the points. Or, better still, a line plot with time on the x-axis (one point for each decade) lines connecting the dots for each ethnic group, and also the vertical lines indicating standard errors.

Line plots are the best, and it's great when you can put time on the x-axis.

Something I learned today

| 1 Comment

You can recognize celery in an x-ray scanner.

I was biking down the street and had to discard a banana peel. I approached a trash can and had to consciously decide to throw it at a point that seemed "too soon"--I was still several feet behind the receptacle. It went in.

What mystifies me is that the action was so unnatural. It really really felt like I should be throwing the banana peel just when I was going past the trash can. It was only my memories of physics class (and, I suppose, years of experience tossing things from a moving bike) that let me know when to release it--and, even then, it felt wrong.

I know that psychologists have done research on "folk physics"--the wrong intuitions we all have about heavy objects falling farther, ignorance of the action-and-reaction principle, and all the rest. But this thing with the banana peel is simple kinematics. You can't get much more basic than that. Yet my intuition remains out of whack. What's the deal?

Jimmy found this amusing.

After I posted again on the dentists named Dennis, commenter Donovan wrote:

The base rate given for the names Dennis, Jerry & Walter doesn't pan out when you review the NPI Registry file maintained by the Centers for Medicare & Medicaid Services (CMS). The NPI Registry lists every health care provider in the US who bills for services. The frequency distribution of these three names in the NPI file is: Dennis 4,442 47.42% Jerry 2,423 25.87% Walter 2,502 26.71%

If you run the same frequency distribution where the primary taxonomy is either 122300000X (generic code for dentist) or 1223G0001X (general practice dentist), here is what you get:
Dennis 556 48.06%
Jerry 291 25.15%
Walter 310 26.79%

So there is a tiny difference, but not impressively so. I [Donovan] declare the Dennis dentist myth busted!

I sent this to Brett Pelham, the author of the original study on names and life choices (Why Susie Sells Seashells by the Seashore: Implicit Egotism and Major Life Decisions). His response:

After a quick read of that comment, I [Pelham] am not sure I understand the critique, Is this person saying that the percentages for the three names for dentists are very similar to the percentages for all health care providers? If so, I'd suggest that it's at least possible that this is because doctor also starts with D. At any rate, I do agree that the evidence we have for careers is methodoloigcally the weakest of all the evidence we have gotten over the years, and it's easy to generate alternate explanations for some of the results. I think we've gotten much stronger results for marriages. Also,since we published that first paper in 2002, we've done quite a few lab experiments that document the effect quite clearly devoid of any conceivable confounds. For example, people like a woman more than usual if she is wearing a jersey whose number was paired subliminally (below conscious threshold) with their own name in a 70 second conditioning procedure. Jerry Burger et al.(I think) have also done quite a few experiments that show that you're more likely to help people whose (fake) first names are the same as your own.

I also reviewed a paper last month that used a much bigger data base than we found to look at doctors and lawyers. The paper showed (and I checked some of the data myself) that lawyers are more likely than doctors to have the surname "Lawyer" whereas doctors are more likely than lawyers to have the exact surname "Doctor." I didn't believe the names could be frequent enough to yield the effect until I repeated the part of the search myself that I could do for free.

Our new book!

| 6 Comments

A Quantitative Tour of the Social Sciences has just come out. The book is edited by Jeronimo Cortina and myself, and it is intended to give the reader a sense of how research is done in different areas of social science. It is not a book of statistical methods, nor is it that sort of academic book that has a zillion little chapters of things that people submitted because they couldn't get them accepted into journals. Rather, it is a set of in-depth examples and discussions of social science research from a variety of perspectives.

I think the book should be especially useful for courses for graduate students or advanced undergraduates in social science, who typically aren't familiar with the way people think in neighboring fields. For example, a political science student might know a little bit about economics but nothing about psychology. Or a sociology student might not know much about historical data collection. And so forth.

Here's the table of contents:

I. Models and Methods in the Social Sciences (Andrew Gelman)
1. Introduction and overview
2. What's in a number? Definitions of fairness and political representation
3. The allure and limitations of mathematical modeling: Game theory and trench warfare

II. History (Herbert Klein and Charles Stockley)
1. Historical background of quantitative social science
2. Sources of historical data
3. Historical perspectives on international exchange rates
4. Historical data and demography in Europe and the Americas

III. Economics (Richard Clarida and Marta Noguer)
1. Learning from economic data
2. Econometric forecasting and the flow of information
3. Two studies of interest rates and monetary policy

IV. Sociology (Seymour Spilerman and Emanuele Gerratana)
1. Models and theories in sociology
2. Demographic explanations of social disturbances in the 1960s
3. Studying the time series of lynchings in the South
4. Attainment processes in a large organization

V. Political Science (Charles Cameron)
1. What is political science?
2. The politics of Supreme Court nominations: the critical role of the media environment
3. Modeling strategy in congressional hearings

VI. Psychology (E. Tory Higgins, Elke Weber, and Heidi Grant)
1. Formulating and testing theories in psychology
2. Some theories in cognitive and social psychology
3. Signal detection theory and models for tradeoffs in decision making

VII. To Treat or Not to Treat: Causal Inference in the Social Sciences (Jeronimo Cortina)
1. The potential-outcomes model of causation; propensity scores
2. Some statistical tools for causal inference with observational data
3. Migration and Solidarity

The cover is an adaptation of this image that was sent to us from Chris Albon last year after we asked for cover ideas on the blog. Thanks, Chris. You're getting a free copy!

Ian Ayers refers to the research by Brett Pelham, Matthew Mirenberg, and John Jones that people are likely to have names that are related to their occupations, places of birth, etc. Pelham et al. write:

Taken together, the names Jerry and Walter have an average frequency of 0.416%, compared with a frequency of 0.415% for the name Dennis. Thus, if people named Dennis are more likely than people named Jerry or Walter to work as dentists, this would suggest that people named Dennis do, in fact, gravitate toward dentistry. A nationwide search focusing on each of these specific first names revealed 482 dentists named Dennis, 257 dentists named Walter, and 270 dentists named Jerry.

In his blog, Ayres referred to this finding but wrote:

To be honest, I [Ayres] am not fully persuaded that either of these results is true.

I think that Ayres is saying this because the effect sounds so large: Even if there really were something going on, could it really explain the difference between 482 and 257, nearly a factor of 2?

Let me repost a simple conditional probability calculation that might put Ayres's mind at ease:

There were 482 dentists in the United States named Dennis, as compared to only about 260 that would be expected simply from the frequencies of Dennises and dentists in the population. On the other hand, the 222 "extra" Dennis dentists are only a very small fraction of the 620,000 Dennises in the country; this name pattern thus is striking but represents a small total effect. If we assume that 222 of these Dennises are "extra" dentists--choosing the profession just based on their name--that gives 221/620000= .035% of Dennises choosing their career using this rule. I can certainly believe that the naming effect could be as high as .035%.

What percentage of people pick their job based on their name?

And here is my quick calculation that approximately 1% of Americans choose their career based on their first name:

That is all.

How to fix the grant system?

| 1 Comment
Different types of peer-reviewed research journals

Image via Wikipedia

Via Andraz's Twitter feed, I came across the following:
Using Natural Science and Engineering Research Council Canada (NSERC) statistics, we show that the $40,000 (Canadian) cost of preparation for a grant application and rejection by peer review in 2007 exceeded that of giving every qualified investigator a direct baseline discovery grant of $30,000 (average grant).

This would lead to an explosion in the number of "qualified investigators," and bring many lazy and mediocre ones in and drive most of the good and driven ones out. Also, the EU does compensate the preparation of grants. The preparation of a proposal is not all wasted effort: a proposal requires a researcher to organize his ideas.

Now, I don't want to come out of this post as a defender of the grant system - after all, the grant system has been one of the main centrifugal forces pulling me away from a career in research. Three things have been most problematic:

  1. Lack of transparency: Arbitrary decisions by anonymous reviewers without the opportunity to address their criticism give them the power to make essentially political decisions. Grant picking should be a transparent public process - and research should strive towards something good for humanity.
  2. Lack of accountability: Once the project has been approved, there is little pressure on the PI to actually achieve goals. Consequently, research often ends with the grant proposal, and research for a new grant proposal then begins. It would be better to spend most money for awards recognizing past research than to spend practically all of it for vapor and smoke.
  3. Lack of a productive environment: To execute successful projects one needs the freedom to pick the best people with the right skills, competing with the industry. Good work requires focus, and there are not many people who can both do quality research and quality teaching. I cannot do both at the same time myself. Moreover, many research institutions have become internally rigid, slow and top-heavy, and the overhead is suffocating.

This is great. I'm not commenting one way or another on the science--it's not something I know anything about. Rather, it's just funny to see the phrase "researchers in Brooklyn" in a newspaper article. Brooklyn's usually a punchline but this time they're serious.

A great exam question

| No Comments

Iain writes:

I [Iain] remember Dennis Cook used to have a multiple choice question in an exam for a regression class that asked simply "if in doubt do what?" with correct answer "take the log."

I just want to know what the other options were in the multiple choice.

Alex Frankel sent in this:

A professor at Oxford University and his team have perfected a model whereby they can calculate whether the relationship will succeed. In a study of 700 couples, Professor James Murray, a maths expert, predicted the divorce rate with 94 per cent accuracy. His calculations were based on 15-minute conversations between couples who were asked to sit opposite each other in a room on their own and talk . . . Professor Murray and his colleagues recorded the conversations and awarded each husband and wife positive or negative points depending on what was said. Partners who showed affection, humour or happiness as they talked were given the maximum points, while those who displayed contempt or belligerence received the minimum. . . .

I looked up James Murray and couldn't find any article describing these results; 94% accuracy sounds pretty good to me, but it's difficult to make any comment based only on news reports. It appears, though, that Murray's main home is the University of Washington, not Oxford--at least, there seems to be a lot more info on Murray at UW than at Oxford--and he's cowritten a book on The Mathematics of Marriage, so this isn't a new area for him.

There must be a bit of a discussion of this sort of thing in the clinical psychology literature? Perhaps this would be a good topic for teaching logistic regression forecasting, better than our usual boring examples.

One thing about the news report puzzled me, though; at the end, it says:

The forecast of who would get divorced in his study of 700 couples over 12 years was 100 per cent correct, he said. But "what reduced the accuracy of our predictions was those couples who we thought would stay married and unhappy actually ended up getting divorced".

Huh?? If the accuracy was 100%, then what does he mean by "what reduced the accuracy of our predictions"? Were they hoping for 110%?

Main effects and interactions

| 7 Comments

We all know to look at main effects first and then look for interactions. But a former student pointed me to some disturbing advice from some statistics textbooks. I'll give his quotes and then my reactions:

Self-experimentation

| 8 Comments

Jimmy sent this along:

Still, Mr. Perry wondered whether caffeine would help him. When he retired from rowing last July, he decided to do a randomized, blinded, placebo-controlled experiment on himself.

JAMA Editors Go Nuts

| 11 Comments

This is pretty funny.

Sharad's blog

| No Comments

Sharad Goel is a brilliant guy who works at Yahoo with Duncan Watts and just started a blog on statistical topics. It's great so far and I'm sure will continue to be so.

Life Expectancy at birth (years) {{col-begin}}...

Image via Wikipedia

Johannes pointed me to FindMyWorth, a website that provides another formula for monetary value of a human life, this one conditioned on income, spending, financial growth rate, rate of return, life expectancy and quality of life. If you live in Qatar, you're worth the most, almost $6M:

quatar.png

While one could argue a lot about the formula, the author Zeeshan-ul-hassan Usmani has made a good example of how to properly publish a working paper in this age: not just that he has the paper, he has an interactive demonstration, graphs, data, and a 30-second "executive" summary of the methodology for all of us with attention deficit disorders. He could have a comment section, but that's the way to go!

Timothy Teräväinen pointed to an interesting journal, the Journal of Articles in Support of the Null Hypothesis:

In the past other journals and reviewers have exhibited a bias against articles that did not reject the null hypothesis. We seek to change that by offering an outlet for experiments that do not reach the traditional significance levels (p < .05). Thus, reducing the file drawer problem, and reducing the bias in psychological literature. Without such a resource researchers could be wasting their time examining empirical questions that have already been examined. We collect these articles and provide them to the scientific community free of cost.

I've three comments.

Branding: Perhaps more people would understand what this is about if the journal was titled, say, "Status Quo" or "Nothing new under the Sun".

Topic or theme: Only statisticians would be instinctively attracted to a standalone topic like this. JASNH would work better as a subtopic (or a folksonomic "tag") of every academic discipline, or a section of any journal. At the same time, it's good to keep all such articles in one place.

Format: I am not sure it's worth writing a whole article about a negative result. Instead of articles, some sort of a shorter write-up would be more efficient - people might not want to spend too much time elaborating on the support of status quo, but other researchers would benefit from knowing what is unlikely to work.

Chris Wiggins points us to this announcement for a conference next year:

Simulation has greatly advanced climate science, but not sufficiently to the profit of theory and understanding. How can simulation better advance climate science and what mathematical issues does this raise? Our hypothesis is that the development of climate science (i.e., theory and understanding) will be best served by focusing computational and intellectual resources on model and data hierarchies. By bringing together physicists, mathematicians, statisticians, engineers and climate-scientists, and focusing on several themes that reach across scales and scientific methodologies, our program will provide a framework for advancing our use of hierarchical methods in our attempt to understand the climate system.

There will be an active program of research activities, seminars and workshops throughout the March 8 - June 11, 2010 period and core participants will be in residence at IPAM for fourteen weeks. The program will open with tutorials, and will be punctuated by four major workshops and a culminating workshop.

This all makes sense to me, although, given the topic, I'm surprised that no statisticians seem to be involved. Lots of potential for interesting models and graphs.

This one's for Zacky

| No Comments

I'm working on a project involving the evaluation of social service innovations, and the other day one of my colleagues remarked that in many cases, we really know what works, the issue is getting it done. This reminded me of a fascinating article by Atul Gawande on the use of checklists for medical treatments, which in turn made me think about two different paradigms for improving a system, whether it be health, education, services, or whatever.

The first paradigm--the one we're taught in statistics classes--is of progress via "interventions" or "treatments." The story is that people come up with ideas (perhaps from fundamental science, as we non-biologists imagine is happening in medical research, or maybe from exploratory analysis of existing data, or maybe just from somebody's brilliant insight), and then these get studied (possibly through randomized clinical trials, but that's not really my point here; my real focus is on the concept of the discrete "intervention"), and then some ideas are revealed to be successful and some are not (with allowances taken for multiple testing or hierarchical structure in the studies), and the successful ideas get dispersed and used widely. There's then a secondary phase in which interventions can get tested and modified in the wild.

The second paradigm, alluded to by my colleague above, is that of the checklist. Here the story is that everyone knows what works, but for logistical or other reasons, not all these things always get done. Improvement occurs when people are required (or encouraged or bribed or whatever) to do the 10 or 12 things that, together, are known to improve effectiveness. This "checklist" paradigm seems much different than the "intervention" approach that is standard in statistics and econometrics.

The two paradigms are not mutually exclusive. For example, the items on a checklist might have had their effectiveness individually demonstrated via earlier clinical trials--in fact, maybe that's what got them on the checklist in the first place. Conversely, the procedure of "following a checklist" can itself be seen as an intervention and be evaluated as such.

And there are other paradigms out there, such as the self-experimentation paradigm (in which the generation and testing of new ideas go together) and the "marketplace of ideas" paradigm (in which more efficient systems are believed to evolve and survive through competitive pressures).

I just think it's interesting that the intervention paradigm, which is so central to our thinking in statistics and econometrics (not to mention NIH funding), is not the only way to think about process improvement. A point that is obvious to nonstatisticians, perhaps.

"In Pain and Joy of Envy, the Brain May Play a Role"

May play a role??? I guess the jury is still out on whether the seat of envy is actually in the liver. . . .

This level of scientific illiteracy disturbs me. I'm not knocking the news article or the scientific study being described there, just the headline, which is in a class by itself.

Hey, this looks cool!

| No Comments

Visualization and Control in Insect Flight

Atilla Bergou, Physics Department, Cornell University

Insects have a 100 million year head-start on us in learning how to fly. Thus, we have a lot to learn from them. Currently, one of the greatest challenges in this study is the accurate measurement, characterization and visualization of the motions of these animals. Recent advances in high-speed videography have allowed us to begin exploiting techniques from computer vision which hold immense promise to resolve these problems. In this talk, I will show our efforts in incorporating ideas from computer vision and physics to study the complex motion of an insect's wing. This motion is due not only to muscular activation but also to fluid, inertial, and elastic forces. Thus, it may be that not all aspects of the wing motion are actively controlled by the insect. We ask whether changes in the wing orientation of flying fruit flies are actuated by insect muscles, or if their wings turn over passively like a falling leaf. By applying a three- dimensional reconstruction technique to high-speed films of freely flying fruit flies, we are able to capture their intricate motion at a level of detail that has previously been impossible. We extract the detailed wing kinematics of flies using a novel motion tracking algorithm, compute the forces acting on the wings and infer whether flapping flight is possible without pitching control.

The talk is 3pm Wed 4 Feb CESPR 414 Sindeband East.

Ed Vul, Christine Harris, Piotr Winkielman, and Harold Pashler wrote an article where:

1. They point out that correlations reported in FMRI medical imaging studies are commonly overstated because researchers tend to report only the highest correlations, or only those correlations that exceed some threshold.

2. They suggest that these statistical problems are leading researchers, and the general public, to overstate the connections between social behaviors and specific brain patterns.

After posting on this article, I received a bunch of comments and questions as well as some responses:

This article by Jabbi, Keysers, Singer, and Stephan argues that, because brain imaging resesarchers adjust their p-values and significance thresholds for multiple comparisons (the thousands of voxels in a brain image), their statistical methods don't have the problems that Vul et al. claimed.

This reply by Vul to the Jabbi et al. article. Here Vul argues that adjustment of significance levels does not stop the selected correlations themselves from being too high. I found Vul's argument here to be convincing. Multiple comparisons methods control the rate of false alarms in a setting where true effects are zero--but I don't see that to be relevant to the imaging setting, where differences are not in fact zero. Lots of things affect blood flow in the brain, and we would never expect the average scans of two different groups of people to be the same.

This article by Lieberman, Berkman, and Wager, who defend social neuroscience and argue the following:

1. They accept Vul et al.'s point 1 above (correlations are overstated) but present some evidence that the correlations aren't as overstated as Vul et al. might fear.

2. They disagree with the implied claim that the overstated correlations have distorted scientists' understanding of social neuroscience research.

3. They object to Vul et al's focusing on social neuroscience, given that the same statistical issues arise in all sorts of brain imaging studies.

4. They point out some specific areas where Vul et al. mischaracterized the data-analytic methods used in this field.

I think Lieberman et al. make some good points, but, as Vul et al. point out, researchers often do use correlations to summarize their results. And, even if said correlations survived a multiple-comparisons analysis, readers might interpret these at face value without understanding the selection issue. So all this shake-out is probably a good thing, especially where correlation estimates are being compared to each other.

My thoughts

First off, I haven't worked seriously in medical imaging for nearly 20 years and have only one published paper in the area, so my comments are mostly informed by my perspective on general statistical issues, as well as my own experience thinking about estimation of effect sizes in studies with low statistical power.

Regarding the singling-out of social neuroscience, I see the point of Lieberman et al. I was thinking that maybe one reason for this is that in social neuroscience it's perhaps more difficult to get external validation in the way that might be more possible in other areas of neuroscience where there is some measurement in the blood or whatever that can be taken. I'm not sure about this, just a conjecture.

It's hard for me to believe that the approach based on separate analyses of voxels and p-values, is really the best way to go. The null hypothesis of zero correlations isn't so interesting. What's really of interest is the pattern of where the differences are in the brain.

Related to this point is that, ultimately, when trying to understand differences in brain processing between different sorts of people (or between people doing different tasks), the maximum correlation among voxels is ultimately not what you're looking for. That is why researchers summarize using regions of interest (as in p.7 of the Lieberman et al. article). Vul et al. were correct to warn about overinterpretation of correlations that have been selected as the maximum: the naive reader can see such correlations (and accompanying scatterplots) to think that certain personality traits are more predictable from brain scans than they actually are.

I think the way forward will be to go beyond correlations and the horrible multiple-comparisons framework, which causes so much confusion. Vul et al. and Lieberman et al. both point out that classical multiple comparisons adjustments do not eliminate the systematic overstatement of correlations. A hierarchical Bayes approach (using some sort of mixture for the population of pixel differences, ideally modeled hierarchically with pixels grouped within regions of interest) would help here..

And now for some amateur psychologizing (unsupported by any statistical analysis, correlational or other)

I suspect that one of the motivations of Vul et al in writing their article was frustration at too-good-to-be-true numbers which they felt led to exaggerated claims of neuro super-science.

Conversely, I suspect one of the frustrations of Lieberman et al. is that they are doing a lot more than correlations and fishing expeditions--they're running experiments to test theories in psychology, they're trying to synthesize results from many different labs. And from that perspective it must be frustrating for them to see a criticism (featured in the popular press) that is so focused on correlation, which is really the least of their concerns.

It also seems that both sides were irritated by what they saw as giddy press coverage: on one side, claims of dramatic breakthroughs in understanding the biological basis of behavior and personality; on the other, claims of a dramatic Emperor-has-no-clothes debunking. As scientists, most of us welcome press coverage--after all, we think this work is important and we'd like others to know about it--but . . . fawning press coverage of something that we think is wrong--that's just annoying.

P.S. Wager is a friend--he teaches in the psychology department here--but I don't think my personal knowledge has hindered my evaluation here.

P.P.S. I ran the above by various people involved and they gave some helpful clarifications. But I've probably left in a couple of sloppy statements here and there.

Good stuff from 2008

| No Comments

Peter Woit relates a story about how four physicists did work that led to a Nobel Prize, but the rules only allowed it to be given to three of them, creating a motive for murder. The story is consistent with Andrew Oswald's finding that not getting the Nobel Prize reduces your expected lifespan by two years. The fited article frames it as that winning the prize increases your lifespan, but so many more eligible people don't get it than do (and the No comes year after year). I'd guess that it's a net reducer of scientists' lifespans. Even setting murder aside.

Seth points to this article by Edward Vul, Christine Harris, Piotr Winkielman, and Harold Pashle, which begins:

The newly emerging field of Social Neuroscience has drawn much attention in recent years, with high-profile studies frequently reporting extremely high (e.g., >.8) correlations between behavioral and self-report measures of personality or emotion and measures of brain activation obtained using fMRI. We show that these correlations often exceed what is statistically possible assuming the (evidently rather limited) reliability of both fMRI and personality/emotion measures. The implausibly high correlations are all the more puzzling because social-neuroscience method sections rarely contain sufficient detail to ascertain how these correlations were obtained. We surveyed authors of 54 articles that reported findings of this kind to determine the details of their analyses. More than half acknowledged using a strategy that computes separate correlations for individual voxels, and reports means of just the subset of voxels exceeding chosen thresholds. We show how this non-independent analysis grossly inflates correlations, while yielding reassuring-looking scattergrams. This analysis technique was used to obtain the vast majority of the implausibly high correlations in our survey sample. In addition, we argue that other analysis problems likely created entirely spurious correlations in some cases.

This is cool statistical detective work. I love this sort of thing. I also appreciate that the article has graphs but no tables. I have only two very minor comments:

1. As Seth points out, the authors write that many of the mistakes appear in "such prominent journals as Science, Nature, and Nature Neuroscience." My impression is that these hypercompetitive journals have a pretty random reviewing process, at least for articles outside of their core competence of laboratory biology. Publication in such journals is taken much more of a seal of approval than it should be, I think. The authors of this article are doing a useful service by pointing this out.

2. I think it's a little tacky to use "voodoo" in the title of the article.

In his aforementioned chapter, Stephen Senn writes:

"In order to interpret a trial it is necessary to know its power": This is a rather silly point of view that nevertheless continues to attract adherents. A power calculation is used for planning trials and is effectively superseded once the data are in. . . . An analogy may be made. In determining to cross the Atlantic it is important to consider what size of boat it is prudent to employ. If one sets sail from Plymouth and several days later sees the Statue of Liberty and the Empire State Building, the fact that the boat employed was rather small is scarcely relevant to deciding whether the Atlantic was crossed.

I used to think this too, but after writing my paper with David Weakliem, I've changed my stance on the relevance of retrospective power calculations. In that article, Weakliem and I discussed the problem of Type M (magnitude) errors, where the true effect is small but it is estimated to be large. One problem with underpowered studies is that, when they do turn up statistically significant results, they tend to be huge compared to the true effect sizes.

On the other hand, large studies can be a huge waste of effort, so I don't really know what I would recommend for medical research.

Jesse Bering writes:

Researchers have found that, at least when it comes to what goes on in our own heads, there's not much of a conflict between religion and science. Sure, that bad case of strep throat your kid got right before your scheduled vacation to Barbados was caused by her chewing on a virus-laden pencil she'd borrowed in math class. . . . But that doesn't mean God's not trying to tell you something by--what's the best word here--'authoring' these events. . . . this way of thinking as "co-existence reasoning," where natural, scientific forces are viewed as directly causing a certain event, but supernatural forces are perceived simultaneously as somehow blowing life into this science. Another way to say this is that science and God often co-exist harmoniously in the same mindset, with science acting 'proximally' and God acting 'distally.' . . .

This looks interesting but I can't quite figure out what the experimental findings are. I'll have to try to track down the researcher who did the study.

I went to the webpage of physicist / computer scientist David MacKay and found that he had written a book on energy policy for general audiences. It's basically a physics book where he computes the energy costs of different aspects of our lifestyles and then estimates the potential for getting power from various non-carbon-emitting sources. It's a fun read and I recommend taking a look. I don't know enough to offer any serious endorsement or criticism of his claims, but he presents his reasoning very clearly, which I like. He has lots of graphs, and I view his book as being somewhat in the spirit of Red State, Blue State, as organizing a bunch of information so that the reader is in a better position to make his or her own judgments. (Again, I'm in no position to endorse or criticize MacKay's specific recommendations.)

My main suggestion is that MacKay follow up on one of his suggestions and connect his work to that of advocates on different sides of the issue. He begins his book as follows:

I [MacKay] recently read two books, one by a physicist, and one by an economist. In Out of Gas, Caltech physicist David Goodstein describes an impending energy crisis brought on by The End of the Age of Oil. . . .

In The Skeptical Environmentalist, Bjørn Lomborg paints a completely different picture. "Everything is fine." Indeed, "everything is getting better." Furthermore, "we are not headed for a major energy crisis," and "there is plenty of energy." How could two smart people come to such different conclusions? I had to get to the bottom of this.

This sounded good, and I was looking forward to the resolution. But in all the rest of the book, MacKay never mentioned Goodstein or Lomborg again (except once in a brief aside to say that their books are "full of interesting numbers and back-of-envelope calculations," and once to cite Lomborg's estimate of bird deaths caused by wind turbines)!

This was a letdown. I think MacKay's argument would be stronger if he could loop back and address the arguments of Goodstein, Lomborg, and others.

Kevin Denny writes:

Depressive symptoms are significantly higher amongst left-handed men. While 19% of right handed men report experiencing depressive symptoms for at least a two week period, the figure for left handed men is almost 25%. For women the corresponding percentages are 33% and 36% respectively but the difference is not statistically significant.

The analysis is of "a new large population survey from twelve European countries," a random sample of 27000 non-institutionalized people aged 50 and older. Handedness was classified based on self-reporting, and depression is measured using standard questions. Of the sample, about 7% of men and 6% of women were classified as left-handed.

My only suggestion (beyond reporting fewer significant digits in the tables) is to rescale the depression scale by dividing by two standard deviations; this would allow the coefficients to be interpretable on the same scale as those for the binary outcome (see Table 2).

A bootstrap by another name

| 3 Comments

Yes, there are topics other than the U.S. election . . . 'Richard Sperling writes:

I'm having a little problem discerning the difference(s) between the parametric bootstrap and Monte Carlo simulation. I'd appreciate it if you would clarify the distinction.

This reminds me in grad school, when Raghu said that in the future, instead of saying "I took a sample of size n from a normal distribution" or whatever, he'd say "I took a bootstrap of size n . . ." and it would sound so much cooler.

Deception blog

| 1 Comment

I've linked to this before, but it's worth a reminder. Maybe one reason this stuff interests me is that I'm so bad at deception myself.

I just received by email a request to review a manuscript called "Acute Inflammatory Proteins Constitute the Organic Matrix of Prostatic Corpora Amylacea and Calculi in Men with Prostate Cancer." The abstract is below:

Learning structural forms

| 1 Comment

Josh Tenenbaum sent me a link to a paper, The discovery of structural form, by C. Kemp and himself. Also commentary by Keith Holyoak and some supporting information. Code and datasets are here.

For my own thoughts on this work, see here. Josh's talk at Columbia made me realize that all these years I'd been thinking of life as part of a "great chain of being" without realizing it.

Thinking like a scientist

| 1 Comment

I spoke at the University Commons retirement community about the Red State, Blue State book--a great audience, it was lots of fun. Anyway, in the talk, around the time I told them Pauline Kael's non-quote and Michael Barone's actual quote, I had the occasion to mention that I tell my students that, in any research project, you need to answer the following four questions:

1. What's your evidence?
2. How does this fit in with what else you know?
3. What have you found beyond what people thought before?
4. How did all those smart people who came before get things wrong?

(Item 4 is the topic of chapter 3 of our book.)

Fund the answer you want...

| 3 Comments

Aleks sends in this article by David Michaels:

More than 90 percent of the 100-plus government-funded studies performed by independent scientists found health effects from low doses of BPA, while none of the fewer than two dozen chemical-industry-funded studies did. . .

Patent absurdity

| 6 Comments

Jouni writes,

Here is a link (see also here) to a patent on Bayesian linear regression. Yes, they call their algorithm an "invention."
A simple yet powerful Bayesian model of linear regression is disclosed for methods and systems of machine learning. Unlike previous treatments that have either considered finding hyperparameters through maximum likelihood or have used a simple prior that makes the computation tractable but can lead to overfitting in high dimensions, the disclosed methods use a combination of linear algebra and numerical integration to work a full posterior over hyperparameters in a model with a prior that naturally avoids overfitting. The resulting algorithm is efficient enough to be practically useful. The approach can be viewed as a fully Bayesian version of the discriminative regularized least squares algorithm.

Now, hurry up and patent Bayesian nonlinear regression before they do it.

Jouni continues:

Maybe we all should be submitting our papers to the patent office instead of journals? Perhaps they would probably be more easily accepted?

It's all fun and games until they sue your a$$. . . .

In their article, "High-Stakes Testing in Higher Education and Employment Appraising the Evidence for Validity and Fairness," Paul Sackett, Matthew Borneman, and Brian Connelly write:

As young adults complete high school in the United States, they typically pursue one of three options: continue their education, enter the civilian work force, or join the military. In all three settings, there is a long history of using standardized tests of developed cognitive abilities for selection decisions. In these domains, the tests themselves often are very similar. For example, Frey and Detterman (2004) reported a correlation of .82 between scores on the SAT, widely used for college admissions, and a composite score on the Armed Services Vocational Aptitude Battery.

The question is: are these tests any good? The authors say yes:

The authors review criticisms commonly leveled against cognitively loaded tests used for employment and higher education admissions decisions, with a focus on large-scale databases and meta-analytic evidence. They conclude that (a) tests of developed abilities are generally valid for their intended uses in predicting a wide variety of aspects of short-term and long-term academic and job performance, (b) validity is not an artifact of socioeconomic status, (c) coaching is not a major determinant of test performance, (d) tests do not generally exhibit bias by underpredicting the performance of minority group members, and (e) test-taking motivational mechanisms are not major determinants of test performance in these high-stakes settings.

Their key methodological point:

Mark Levy pointed me to this. I don't know anything about this area of research, but if true, it's just an amazing, amazing example of the importance of measurement error:

The 20th century warming trend is not a linear affair. The iconic climate curve, a combination of observed land and ocean temperatures, has quite a few ups and downs, most of which climate scientists can easily associate with natural phenomena such as large volcanic eruptions or El Nino events.

But one such peak has confused them a hell of a lot. The sharp drop in 1945 by around 0.3 °C - no less than 40% of the century-long upward trend in global mean temperature - seemed inexplicable. There was no major eruption at the time, nor is anything known of a massive El Nino that could have caused the abrupt drop in sea surface temperatures. The nuclear explosions over Hiroshima and Nagasaki are estimated to have had little effect on global mean temperature. Besides, the drop is only apparent in ocean data, but not in land measurements.

nature06982-f3.2.jpg

Now scientists have found – not without relief - that they have been fooled by a mirage.

When I took science in 9th grade, I remember being disturbed by a gap in the story. From one direction, we were told about atoms and subatomic particles and how they clustered into molecules. From the other, we were told about cells--single-celled animals and single human cells, then multicelled animals, then larger things such as jellyfish, etc., building up to people. We even talked about the parts of a cell--nucleus, axons, cilia, etc.

But we never were given the link between molecules and cells. And what really bothered me was that there was never even any recognition of the gap. This was really too bad, because long molecules are cool--there are proteins shaped like hooks that grab onto other molecules, etc. But it was either atoms or cells, nothing in between.

I was thinking about this recently after reading two blog entries by Steven Levitt. Here he writes that rich people aren't really so much richer than poor people because rich people pay more for "fancy cars, expensive wine, etc." This confuses me because I thought that, under the usual principles of economics, we should assume that fancy cars, etc., are worth their price--otherwise competitors would come into the market and sell them for less. Levitt's related point is that the narrowing of the gap between rich and poor can be credited to Wal-Mart. I can see how this could be true, but once again I'm confused, because I thought standard economic theory said that if Wal-Mart didn't exist, someone would invent it. I have an uncomfortable feeling here that economics is sometimes telling us that things are inevitable (the law of supply and demand) and other times is celebrating unique organizations such as Wal-Mart.

I'm not saying that economists are wrong on this--clearly, supply and demand are powerful forces, and it's also clear that organizations such as Toyota or Bell Labs or, for that matter, City Harvest, can make a difference. Marketing is an art, and just as, if Picasso had never been born, there would still be abstract art but there would be no Picassos, I can well imagine that in a different world, there would be no Wal-Mart, and maybe Americans would all be paying fifteen cents more each for peanut butter, or whatever.

But . . . I'm still disturbed by the lack of connection that is made between the fundamental principles of economics (under which $5,000 worth of expensive wine has the same value as $5,000 worth of Cheetos) and the sort of technocratic reasoning (the kind of thing that makes me, as a statistician, happy) where you try to assign a cost to each thing.

Really this applies to economics, or "freakanomics," in general: For example, you can do some data analysis to see if sumo wrestlers are cheating, or you can just say that sumo wrestling supplies an entertainment niche and leave it to the wrestlers to figure out how to optimally collude. Either sort of analysis is ok, but I rarely see them juxtaposed--it's typically one or the other, and the conclusions seem to depend a lot on which mode of analysis is chosen.

I don't think there are any easy answers here--to borrow a physics analogy, a stable economy is necessarily at a phase transition, entrepreneurs can't repeal the law of supply and demand, and conversely "supply and demand" don't mean squat if nobody's there to take advantage of opportunities, etc. But I think there can be trouble if you can pull out a macro or a micro argument and not always see the connection between them.

P.S. This problem is not at all unique to economics. For example, some political scientists (such as myself) study public opinion and others study strategic bargaining among political actors. And we tend to work in parallel, even though of course these concepts interact. I study voters' attitudes on issues and where they stand compared to the Democrats and Republicans, whereas Thomas Ferguson studies campaign contributions by major industries. It's all part of the same big picture but it's hard to put it all together in one place.

And I'm not saying this to criticize Levitt: he has interesting things to say both in the "big picture" sense and in detailed technical analyses. I just think there's a big gap there that's not often acknowledged.

Division of labor

| 2 Comments

In a comment here, Sean writes, "I would more interested in an in-depth discussion of the statistical challenges of climate modeling than in the political angles of the question." I respect that opinion, but I think it makes more sense for me to write about what I know more about, which in this case is patterns in public opinion.

John Shonder points me to this article on the work of Brett Pelham, who's been featured here before. The news article states,

In studies involving Internet telephone directories, Social Security death index records and clinical experiments, Brett Pelham, a social psychologist, and colleagues have found in the past six years that Johnsons are more likely to wed Johnsons, women named Virginia are more likely to live in (and move to) Virginia, and people whose surname is Lane tend to have addresses that include the word “lane,” not “street.”

They didn't mention my favorite, which is that there are almost twice as many dentists named Dennis in the United States, compared to what you would expect based on the number of dentists and Dennises alone.

Nooooooooooo...............

I want to correct one misconception that was aired in the Times article. As with many such things, it turns on conditional probability. The article states,

In studies that make believers in free will squirm, Dr. Pelham’s team asserts that names and the letters in them are surprisingly influential in people’s lives. . . . Skeptics of the name-letter effect question how strong the affinity really is between a person’s name and his or her destiny. 'I’m willing to believe that such patterns exist,' said Stanton Wortham, a professor of education and anthropology at the University of Pennsylvania. 'But I’m not willing to grant that those sorts of patterns are going to explain or drive a substantial amount of behavior.'"

OK, first off, free will has nothing to do with it. Everybody agrees that your party identification and, for that matter, your religious affiliation, are highly correlated with your parents'; does this mean you don't have a chance to alter these things? Free will requires the ability to alter things; it doesn't require complete statistical independence of preconditions and outcomes. Just a moment's thought, blah blah blah.

On to the second point. The pattern of names and occupations (for example) can be clear and still represent a small effect. Just for example, there were 482 dentists in the United States named Dennis, as compared to only 260 that would be expected simply from the frequencies of Dennises and dentists in the population. On the other hand, the 222 "extra" Dennis dentists are only a very small fraction of the 620,000 Dennises in the country; this name pattern thus is striking but represents a small total effect. Some quick calculations suggest that approximately 1% of Americans' career choices are influenced by the sound of their first name.

P.S. I agree that stories about names are amusing.

Seth is skeptical of skepticism in evaluating scientific research. He starts by pointing out that it can be foolish to ignore data, just because they don't come from a randomized experiment. The "gold standard" of double-blind experimentation has become an official currency, and Seth is arguing for some bimetallism. To continue with this ridiculous analogy, a little bit of inflation is a good thing: some liquidity in scientific research is needed in order to keep the entire enterprise moving smoothly.

As Gresham has taught us, if observational studies are outlawed, then only outlaws will do observational studies.

I think Seth goes too far, though, and that brings up an interesting question.

Writing about four-leaf clovers, Steven Levitt says, "I’ve been looking my whole life and never found one." This reminds me that when I was a kid, my sister Susie and I used to find them in the backyard all the time. We'd also occasionally find siamese dandelions (one stalk, two heads) that we'd put on our older sister's bed to freak her out. Much later, Susie told me that our land was on some sort of former waste dump and so we (along with the clovers and dandelions) were probably being poisoned.

The New York version of this story: several years ago I was standing on the subway platform, and I offhandedly said to my companion, Hey, let's look for rats. We looked, and, indeed, there was a rat. I mean, I knew that there were rats in the subway--I've occasionally even seen them on the platform--but I didn't know they could be summoned at will in this way.

The great chain of being

| 2 Comments

In his talk on mental models of the structure of the world, Josh Tenenbaum talked about how people think of animals as being classified in a tree structure, and how this structure might differ from those implied by different scientific models. This kind of thing:

tree.png

Anyway, as an aside, Tenenbaum pointed out that, although the tree structure seems so natural to us, it doesn't have to be this way. He noted that, traditionally, creatures have been organized into a linear "great chain of being" rather than as a tree structure. Then I realized . . . that's how I think of the animal kingdom. It's how we learned things in 9th grade biology. At the bottom are single-celled animals (amoebas and so forth), then gradually through the invertebrates, then the vertebrates, starting with the fish (with the sharks at the bottom because of their primitive structure), then amphibians, then reptiles (amphs are lower than reps because of being more fish-like and primitive, I think), then birds (higher because they're warm-blooded), then animals, with primates at the top and, well, you know what's the #1 primate . . .

Anyway, only when sitting in Tenenbaum's talk did I realize that I'd swallowed this whole great-chain-of-being formulation without even thinking about it. The assumption is that every invertebrate is lower than every vertebrate, that the most complicated bird is lower than any mammal, that all plants are lower than all animals, etc. It's still hard for me to shake this mode of thinking.

I guess it's a good thing I'm not a biologist. (I did publish in the Journal of Theoretical Biology once, but we all know that knowledge of biology is not necessary to publish in that journal.)

Chris sent in this quote from Bill James:

"All research," he says, "begins with ignorance. The ability to focus on what it is that you do not know is critical to doing research. I'm absolutely convinced that none of us understands the world.

"I'm not a person that the world irritates, to quote Bill Buckley, but you turn on the radio and in any debate, you've got people who are convinced they know. Liberals, conservatives, Christians, Muslims, people who think Terry Francona is a genius, those that think he's an idiot. They're all convinced they've got this figured out.

"None of them has it figured out. We do not understand the world; the world is billions of times more complicated than our minds.

"You can make a useful contribution to a discussion if you can figure out specifically what it is you don't understand and try to work on it. If you try to start from the other end - 'I've got the world figured out and I'm going to explain it to everybody' - maybe there are a lot of people who succeed in doing that, but it doesn't work for me."

I agree. As Earl Weaver said, it's what you learn after you know it all that counts.

Two sides to the IRB story

| No Comments

1. This article by Carl Elliott reminded me why institutional review boards (IRBs) are needed.

2. This site (via Seth) reminds me of why IRBs can be a bad thing.

For me, IRBs are typically a waste of time, nothing more, but for others they are a (potential) protection against health hazards and exploitation, and for others they are a barrier to research progress.

I was lucky to see most of the talk that Josh Tenenbaum gave in the psychology department a couple weeks ago. He was talking about some experiments that he, Charles Kemp, and others have been doing to model people's reasoning about connectedness of concepts. For example, they give people a bunch of questions about animals (is a robin more like a sparrow than a lion is like a tiger, etc.), and then they use this to construct an implicit tree structure of how people view animals. (The actual experiments were interesting and much more sophisticated than simply asking about analogies; I'm just trying to give the basic idea.) Here's a link to some of this work.

My quick thought was that Tenenbaum, Kemp, et al. were using real statistics to model people's "folk statistics" (by which I mean the mental structures that people use to model the world). I have a general sense that folk statistical models are more typically treelike or even lexicographical, whereas reality (for social phenomena) is more typically approximately linear and additive. (I'm thinking here of Robyn Dawes's classic paper on the robust beauty of additive models, and similar work on clinical vs. statistical prediction.) Anyway, the method is interesting. I wondered whether, in the talk, Tenenbaum might have been slightly blurring the distinction between normative and descriptive, in that people might actually think in terms of discrete models, but actual social phenomena might be better modeled by continuous models. So, in that sense, even if people are doing approximate Bayesian inference in their brains, it's not quite the Bayesian inference I would do, because people are working with a particular set of discrete, even lexicographic, models, which are not what I suspect are good descriptions of most of the phenomena I study (although they might work for problems such as classifying ostriches, robins, platypuses, etc.).

Near the end of his talk, Tenenbaum did give an example where the true underlying structure was Euclidean rather than tree-like (it was a series of questions about the similarity of U.S. cities), and, indeed, there he could better model people's responses using an underlying two-dimensional model (roughly but not exactly corresponding to the latitude-longitude positions of the cities) than a tree model, which didn't fit so well.

I sent Tenenbaum my above comment about real and folk statistics, and he replied:

I'd expect that for either the real world or the mind's representations of the world, some domains would be better modeled in a more discrete way and others in a more continuous way. In some cases those will match up - I talked about these correspondences towards the end of the talk, not sure if you were still there - while in other cases they might not. It would be interesting to think about both kinds of errors: domains which our best scientific understanding suggests are fundamentally continuous while the naive mind treats them as more discrete, and domains which our best scientific understanding suggests are discrete while the naive mind treats them as more continuous. I expect both situations exist.

Also, the "naive mind" is quite an idealization here. The kind of mental representation that someone adopts, and in particular whether it's more continuous or discrete, is likely to vary with expertise, culture, and other experiential factors.

My reply:

I think the discrete/continuous distinction is a big one in statistics and not always recognized. Sometimes when people argue about Bayes/frequentist or parametric/nonparametric or whatever, I think the real issue is discrete/continuous. And I wouldn't be surprised if this is true in psychology (for example, in my sister s work on how children think about essentialism).

Tenenbaum replied to this with:

While the focus for most of my talk emphasized tree-structured representations, towards the end I talked about a broader perspective, looking at how people might use different forms of representations to make inferences about different domains. Even the trees have a continuous flavor to them, like phylogenetic trees in biology: edge length in the graph matters for how we define the prior over distributions of properties on objects.

I'll buy that.

On a less serious note . . .

This reminds me of all sorts of things from children's books, such as pictures of animals that include "chicken" and "bird" as separate and parallel categories, or stories in which talking cats and dogs go fishing and catch and eat real fish! (The most bizarre of all these, to me, are the Richard Scarry stories in which the sentient characters include a cat, a dog, and a worm, and they go fishing. My naive view of the "great chain of being" would put fish above worms, but I guess Scarry had a different view.)

This looks it could be interesting.

Chewy food

| No Comments

This is interesting. As a bread-lover, though, I don't particularly enjoy hearing people tell me not to eat white flour. Also, I don't see the relevance of the tree-climbing crabs, but they do look cool:

coconut_crab.jpg

Aleks pointed me to this website:

Dream Recorder is the ideal companion of your nights, allowing you to understand better this third of our life spent in bed. Dream recollection, sleep hygiene, curiosity, you will find your own reasons for using this software of a new kind. Nights after nights, Dream Recorder keeps records of your sleep profiles. It provides statistics and give you the possibility to annotate your dream records with notes or keywords. . . .

slepProfileExtract.png

Dream Recorder uses the difference between successive reconstructed images for computing the quantity of motion (see image on the right). Quantity of motions are reflected by the colored bar graph. High peaks mean motions. Very low peaks are just in the detection noise base level. Dream periods are lit up by spotlights. Normal sleeps are represented by the dark blue shades. Deep sleeps have no lights nor shading. Night events are displayed under the timeline, here a dream feedback followed by a voice recording.

Seth would love this (I assume).

Columbia's International Affairs Building has fifteen floors and four elevators which ave what seem to me to be really crappy software. While you're waiting for the elevators on the 4th floor (which happens to be street level; the campus is on a hill), there are readouts showing where each elevator is currently located and whether it is going up or down. Sometimes there will be several elevators coming down at once, and then the one that's closest will turn around at 5, leaving us waiting. When an elevator finally comes, everyone has to cram in. Other times, the elevators seem to be chasing each other around and are never where you want them to be. (I'm part of the problem myself, taking the elevator just three floors from 4 to the political science department on 7.)

But maybe the elevators are programmed the best they can be, given the pattern of demand. I don't know.

What I do think would be cool would be to use these 'vators as an engineering class project: the students could first get some information on the technical specifications of the elevators and their current software, then they could gather some data on the customers (here, I'm thinking of at least two surveys: first, simply going by the different floors at randomly-sampled times and counting how many people are waiting, for how long, and where they're going to; second, a survey asking people if they're satisfied with the elevator service and, if not, what bothers them), then they could create a computer simulation and play with various algorithms, and ultimately they could reprogram the elevators and perform an evaluation (comparing customer satisfaction, waiting time, etc., before and after).

Is this the sort of thing they do in the industrial engineering and operations research department? It seems like a group of students could learn a lot from this.

Meer op junkonderzoek

| 3 Comments

Hans van Maanem scrivt op deze manuscript hier van Michael Foster. Hans scrivt:

I [Hans] wrote about the paper when it came out -- not as thorough, I am afraid, but enough to warn readers against this kind of science. Maybe your colleagues are heartened by the fact that not everybody took it at face value! Could you please forward them my column from May 2004 in De Volkskrant (and maybe translate it...)?

In response to this, Carrie and Michael had an interesting exchange on possible junk science.

Lee Sigelman writes,

In a brief article (abstract here) in the current issue of Current Directions in Psychological Science, Dan Ariely and Michael Norton analyze the wide gap currently separating psychologically- and economically-based experimental research — a gap clearly perceptible in experimental work within political science, a heavy borrower from both psychology and economics.

“Psychologists have not traditionally been interested in the efficiencies and design of markets,” Ariely and Norton note, “while experimental economists have not customarily focused on emotion, memory, or implicit cognition.”

In addition to studying different topics, the two fields differ in their research methods as well:

It’s not just that psychologists enjoy lying to people while economists enjoy paying them. To find out what they want to find out, psychologists have to give their experimental subjects a “cover story” and transport them into a particular situation, for which purposes deception is often necessary. By contrast, economists want to know about experimental subjects’ ability to make informed decisions, and for that purpose deception would be counterproductive. At the same time, economists want to motivate their subjects to behave “normally,” so they explicitly define incentives to enable subjects to evaluate the costs and benefits of a particular course of action.

Finally, Sigelman quotes Ariely and Norton, who write,

Experimental economists might shift from asking whether deception is good or bad — a moral question — to exploring whether deception helps or harms social scientists’ ability to understand human behavior. Psychologists’ aversion to incentives, on the other hand, might be addressed by taking a broader view of what experimental economists are trying to accomplish with them: making people care about their behavior as much in the lab as they do in the real world.

Aversion to incentives?

I just have two things to add.

1. I applaud the call for researchers to become more aware of what is being done in other fields, but at some point, different people have different areas of expertise. (See my remarks here on why we shouldn't be disturbed that economists don't spend more time studying romance and here on different views of rationality.)

2. I question whether psychologists really have an "aversion to incentives." Giving incentives to research participants is one strategy out of many. It's natural for economists to privilege financial incentives--that's what they study--but maybe not so natural for others and maybe not always so relevant to the "real world." Many important real-world phenomena--such as political particpation--have little or no financial incentives at all!

Stash it so I don't forget

| No Comments

Chris Paulse writes, "By accident I [Chris] discovered a book that has part of its focus on educational psychology. It's called Handbook of Competence and Motivation, Elliott and Dweck eds. A few recent articles have appeared in the NYT that seem to be sourced from material like this (one on self-regulation profiling the work of Roy Baumesiter, and another on learning from mistakes that quotes Carol Dweck). The Dweck chapter on self-concept is a fun read. I'd love to see a mixture model developed from survey data for the evaluation anxiety idea. Great for teachers."

Martin James writes,

From the Nelson and Simmons paper:

Across more than 90 years of professional baseball, batters whose names began with K struck out at a higher rate (in 18.8% of their plate appearances) than the remaining batters (17.2%), . . . players with the initial K struck out more often than other players even when we controlled for the average year in which each athlete played (p < .015). In fact, when we controlled for average year of play (and excluded initials associated with fewer than 5 Major League players—e.g., U as a first initial), K was both the first initial and the last initial associated with the highest strikeout rate. Furthermore, ethnic confounds are unlikely to account for the effect, as an analysis controlling for whether players were American or foreign born also showed that batters with the initial K were reliably more likely to strike out than other players were.

Their explanation is psychological:

Despite a universal desire to avoid striking out, players whose first or last names began with the letter K struck out more often than other players. For players with this initial, the explicitly negative performance outcome may feel implicitly less aversive. Even Karl ‘‘Koley’’ Kolseth would find a strikeout aversive, but he might find it a little less aversive than players who do not share his initials, and therefore he might be less motivated to avoid striking out.

This probably explains Dave Kingman pretty well. Not to mention Vince Koleman. I don't know if I believe this, or, maybe more to the point, what it would take for me to believe this. Somehow it's easier for me to accept the positive aspects of liking one's own name (dentists named Dennis, lawyers named Laura, etc.) than these sorts of negative aspects. Logically, they do go together, I guess. There's lots more of this in the Nelson and Simmons paper.

An interesting look at topology

| No Comments

Though more mathematics than statistics, I thought this would be relevant on the heels of the entry about the Krampf science experiment videos.

The blog 3 quarks daily recently posted a video showing how to turn a sphere inside out. See the post here.

Just as the poster on that blog, my interest in topology is amateurish as well. I found this a good explanation of a topic that I imagine loses many people in its discussion. It was one of those (rare?) Internet videos after which I felt that I had truly gained some knowledge.

Gait analysis and evo psych

| 4 Comments

Dave Garbutt writes,

Perhaps you will find this interesting. It is a deconstruction rather than the original, but how to analyse the data might be an interesting challenge worthy of your readers....

I don't have anything to add about this, except that there's a long tradition of oveinterpreting data on menstrual cycles. I remember an example from an old and highly-recommended statistics textbook (Say it with Figures, by Hans Zeisel) that I used in a class once: he had an interesting example along with a graph and story, but then when I took a look at the original article being cited, I couldn't find anything like what was being claimed in the textbook. (This case is a little different, because here it's the scientific article itself that's being called into question, but still it reminded me.)

The theoretical statistician uses x, the applied statistician uses y (because we reserve x for predictors).

What makes a face attractive?

| 8 Comments

Susan sent me this link and asked for my thoughts about some related question which, unfortunately, I've forgotten. That's what happens when you wait over a month to answer an email. Anyway, the website is cute, much cuter than ours. We clearly have a lot of work to do.

The work looks interesting. I wonder about time trends. It's my impression that characters in old TV shows were often pretty ugly (for example, consider the guy in Mr. Ed), but now they all seem pretty attractive. But maybe some of that is technology--cameras are better so they don't have to slap on all the greasepaint or whatever.

Half-lives of verbs

| 2 Comments

Richard Morey sends along this link. It looks pretty cool; the only thing that bugs me is that they keep using the word "mathematical" when they really mean "statistical."

John Hull sends along this article from Chance News. From The Economist:

In this week's Physical Review Letters, Yoshiharu Yamamoto of the University of Tokyo and his colleagues explain how the movements of people suffering from clinical depression can be described by a power law—and how this law is so different from that of healthy people that it looks truly diagnostic.

Further discussion is in the Chance News article. But my question is: why is this in Physical Review Letters? Shouldn't it be in a journal of medicine, psychiatry, or psychology?

From the British Psychological Society Research Digest:

Children with Tourette's syndrome, the motor disorder characterised by involuntary tics, are more skilled than healthy control children at processing certain forms of grammar. That's according to Matthew Walenski at the Brain and Behaviour Lab at Georgetown University and colleagues, writing in Neuropsychologia. . . .

The children with Tourette's responded more quickly than the controls on those aspects of the tasks that were considered to depend on procedural memory – such as when producing past tenses of regular verbs and naming objects that can be manipulated, but they responded with similar speed to the controls when performance depended on declarative memory – such as when giving the past tense of irregular verbs or naming non-manipulable objects.

Procedural memory is rooted in the frontal/basal ganglia circuits of the brain and these areas are known to be structurally abnormal in people with Tourette's. The researchers said it was likely this association explained the superior performance of the children with Tourette's.

Past studies involving children and adults with Tourette's have tended to focus on their involuntary verbal tics, rather than investigating their actual language abilities. . . . The new findings follow a study published last year that showed people with Tourette's have enhanced cognitive control relative to healthy participants, as shown by their ability to switch task sets without the usual reaction time cost.

This makes sense to me.

Pink

| 1 Comment

This (from Ben Goldacre) is pretty funny.

uk.JPG

As Goldacre and his commenters discuss, the actual research might be (ultimately) useful and increase our scientific understanding, but the interpretation is way over the top.

P.S. More here.

Cool science experiment videos

| 4 Comments

From Jason Kottke:

An illustration of how insanely effective water is at absorbing heat: you can hold a water balloon over a candle without popping it. The rest of Robert Krampf's videos are worth a look as well.

We need some cool statistics videos too.

Ralph Blair sent this in. It's so horrible that I have to put it in the continuation part of the blog entry. I recommend you all stop reading right here.

Stop . . . It's not too late!!!!!!!!!!!

In this discussion of Allegra Goodman's book novel Intuition, Barry wrote, "brilliant people are at least as capable of being dishonest as ordinary people."  The novel is loosely based on some scientific fraud scandals from the 1980s, the one of its central characters, a lab director, is portrayed as brilliant and a master of details, but who makes a mistake by brushing aside evidence of fraud by a postdoc in her lab.  One might describe the lab director's behavior as "soft cheating" since, given the context of the novel, she had to have been deluding herself by ignoring the clear evidence of a problem.

Anyway, the question here is:  are brilliant scientists at least as likely to cheat?  I have no systematic data on this and am not sure how how to get this information.  One approach would be to randomly sample scientists, index them by some objective measure of "brilliance" (even something like asking their colleagues to rate their brilliance on a 1-10 scale and then taking averages would probably work), then do a through audit of their work to look for fraud, and then regress Pr(fraud) on brilliance.  This would work if the prevalence of cheating were high enough.  Another approach would be to do a case-control study of cheaters and non-cheaters, but the selection issues would seem to be huge here, since you'd be only counting the cheaters who got caught.  Data might also be available within colleges on the GPA's and SAT scores of college students who were punished for cheating; we could compare these to the scores of the general population of students.  And there might be useful survey data of students, asking questions like "do you cheat" and "what's your SAT" or whatever.  I guess there might even be a survey of scientists, but it seems harder to imagine they'd admit to cheating.

Intuition, by Allegra Goodman

| 8 Comments

I read this novel, which is loosely based on various scientific fraud scandals from the 1980s. It was readable, sort of like John Updike in the general themes and similar to Scott Turow in writing style and characterization. (Everything fits into place a bit too cleanly, with each character given some small quirk, a sort of hyper-realism that is just a bit too reasonable to be quite convincing. But, as with Turow, this style actually helps in keeping the reader focused on the ideas of the story rather than on individual characters). Spoilers below . . .

Confidence building

| 1 Comment

Confidence-building is an under-researched area in statistics. Some pieces of confidence-building:

This by Freeman Dyson was pretty cool. Not the stuff about how open-source biotechnology is going to change the world--maybe he's right, maybe he's wrong, but it comes across to me as generic science writing. The cool stuff was his discussion of the ideas of Carl Woese (whom I'd never previously heard of):

[Woese asks] When did Darwinian evolution begin? By Darwinian evolution he means evolution as Darwin understood it, based on the competition for survival of noninterbreeding species. He presents evidence that Darwinian evolution does not go back to the beginning of life. When we compare genomes of ancient lineages of living creatures, we find evidence of numerous transfers of genetic information from one lineage to another. In early times, horizontal gene transfer, the sharing of genes between unrelated species, was prevalent. It becomes more prevalent the further back you go in time. . . .

In his "New Biology" article, he is postulating a golden age of pre-Darwinian life, when horizontal gene transfer was universal and separate species did not yet exist. Life was then a community of cells of various kinds, sharing their genetic information so that clever chemical tricks and catalytic processes invented by one creature could be inherited by all of them. Evolution was a communal affair, the whole community advancing in metabolic and reproductive efficiency as the genes of the most efficient cells were shared. Evolution could be rapid, as new chemical devices could be evolved simultaneously by cells of different kinds working in parallel and then reassembled in a single cell by horizontal gene transfer.

But then, one evil day, a cell resembling a primitive bacterium happened to find itself one jump ahead of its neighbors in efficiency. That cell, anticipating Bill Gates by three billion years, separated itself from the community and refused to share. Its offspring became the first species of bacteria—and the first species of any kind—reserving their intellectual property for their own private use. With their superior efficiency, the bacteria continued to prosper and to evolve separately, while the rest of the community continued its communal life. Some millions of years later . . . nothing was left of the community and all life was divided into species. The Darwinian interlude had begun.

Now this is cool--the idea that speciation itself is a sort of prisoner's dilemma, or killer app, so that once a species is formed, it can preserve its genetic identity and eventually outlast the faster-evolving but less walled-off organisms around them. Speciation has always been a mystifying aspect of evolution to me, so it's interesting to see this (possibly false, but interesting) theory.

Unfair mockery

| 2 Comments

I heard from David Feitler, who had sent this. He writes:

John sent in this interesting discussion of conditional probability calculations in court. Here's the article:

Seth tested his balance every day, sometimes when eating flaxseed oil and sometimes when eating olive oil, and found the following:

flaxseed.jpg

This is a pretty graph, and shows that Seth's balance improved when he ate flaxseed oil and got worse with the olive oil. He conjectures:

A possible explanation is that when the concentration of omega-3 in the blood is low, the omega-3 in cell membranes slowly “evaporates” into the blood. When a cell’s membranes lose omega-3, it doesn’t work as well.

But . . .

As a statistician, my first thought was some sort of measurement bias: Seth knows when he was taking olive oil and when he was taking flaxseed oil, and staying balanced is a tricky enough task that I could well imagine that the results could be affected by his expectations.

Flying blind

I'd be more convinced by a blinded experiment. This is tricky with a self-experiment but it could be done. For example:

1. Get 50 identical vials and pour olive oil into 25 of them and flaxseed oil into the other 25. Label them (e.g., "o" and "f"), then cover up the labels with removable stickers.

2. Mix up the vials in a bag (this is sometimes called "physical randomization" in the sampling literature), then use one vial per day. After use, place them on a shelf in order. Each day, measure your balance and whatever else you want to record.

3. When the experiment is over, peel off the stickers and identify which oil was eaten on which day.

4. If the two oils can be told apart by smell, clip your nose (this might sound weird but actually Seth was already doing this.) If they taste different, mix with some strong bitter flavor (this might mess up Seth's weight-loss experiment but should be OK for the balance study). If they look different, add food coloring or just use opaque bottles and don't look inside before drinking.

This simple experiment, with complete randomization, might not capture the time trends Seth is looking for. It would be simple enough to alter the experiment, for example by replacing the vials with larger containers and setting the unit of randomization to be the ten-day period rather than the day. You could even do something trickier, maybe with the assistance of a friend, to set up a pattern with long strings of o's and f's without knowing exactly when the switches will occur.

Why Seth's existing experiment is a good thing: I'm not slamming unblinded studies

I hope Seth (or one of his correspondents) does this randomized experiment. In the meantime, Seth's results provide a potentially important contribution by motivating new hypotheses. The unblinded experiment was so easy to do (within the context of Seth's earlier experiments), and placing a requirement such as blinding might have increased the required effort to the extent that Seth might not have gotten around to doing it.

Maybe Seth could make blinding (where possible) a routine part of his future experiments, though. Just as he's trained himself to perform disciplined self-experiments with precise and regular measurements (something that I never get around to doing when trying out new teaching methods, for example), maybe he could take the next step with blinding.

Here's an interesting problem involving the time interval between cougar "kills"...meaning cougars killing prey, not cougars being killed. (By the way, "cougar" is synonymous with "mountain lion", "catamount", and "puma". Same animal.) The data I'll discuss below were collected by Polly Buotte and other researchers guided by Toni Ruth of the Selway Institute, funded by the Hornocker Wildlife Institute and Wildlife Conservation Society.

Cougars in and around Yellowstone National Park are monitored in two ways. Researchers try to put a radio collar on every adult cougar; there are typically about a dozen adult cougars in the park.

Most of the collars used, now and historically, are old-style radiotelemetry collars. These emit a periodic signal that can be used, through triangulation, to determine the approximate location of the animal (spatial error less than 100m). More recently, some of the collars are GPS collars that report the exact location of the animal every three hours. The GPS collars, a new technology, are expensive, relatively short- lived, and somewhat failure-prone.

One of the issues of interest to researchers is the statistical distribution of intervals between kills, called the "inter-kill interval" or IKI. A specific question of interest is the extent to which the IKI distribution has changed due to the reintroduction of wolves to Yellowstone. Some change might be expected because (1) wolves sometimes steal a cougar's kill before the cougar is done with it, so the cougar might have to kill more frequently to make up for the lost meat, and (2) prey availability might change, as prey change their behavior to try to avoid areas favored by wolves, thus possibly changing the types of prey available to cougars or their density in cougar habitat.

In addition to the statistical distribution of IKI overall and its change since the reintroduction of wolves, a related question of interest is how the IKI differs for different "social classes" of cougars, where "social class" distinguishes adult female, adult male, or maternal female (i.e. female with cubs).

Based on the radio collar data, 121 IKIs were determined for 11 cougars over 8 years. The following figure shows the IKI data for the three social classes, as determined by the two different methods (GPS and "ground").

IKIhists.png

With the help of the radio collars, researchers have tried to characterize every cougar kill made by certain cougars during certain time periods. "Characterizing" the kill means determining the date, time, and location of the kill and the type of animal killed: a large bighorn sheep, a young elk, and so on. For the standard telemetry collars, this involves using the collar to track the cougar's movements; a researcher essentially tracks the cougar every day (without disturbing its behavior) searches locations the day after the cat leaves, and locates the carcass from each kill. This method, which we refer to below as the "ground" method, is very labor- intensive. By contrast, with the GPS collars, the researcher compiles a list of the locations where the cougar spent a substantial amount of time, and visits each of those locations to characterize the kill. (Cougar usually stay on or near a kill for at least 3 days, unless driven off, and are rarely stationary for that long unless they have made a kill). This method (the "GPS" method) is much less time-intensive because the researcher can proceed from kill location to kill location rather than following the cougar.

In Snow White it was the magical mirror that answered the question "who's the fairest of them all?" Now Australian researchers have created software to answer this question. They extracted 13 features and used C4.5 as classification method (more features below). (Detail can be found in: Assessing facial beauty through proportion analysis by image processing and supervised learning)
FacialRatio.png

With that in hand, it may be natural to wonder who's the most beautiful of them all? Shocking answer may be found in the research done at the universities of Regensburg and Rostock in Germany, where they did a large research project on 'facial attractiveness'.

A remarkable result of our research project is that faces which have been rated as highly attractive do not exist in reality. This became particularly obvious when test subjects (independently of their sex!) favored women with facial shapes of about 14 year old girls. There is no such woman existing in reality! They are artificial products - results of modern computer technology.
Thus, sad as it may be, your ideal beauty may not be in this world. So going back to the good old Snow White, if the magical mirror were asked the question today, it may answer; "You're the fairest where you are, but in the virtual world, well let's not go into that.."

I read this entry on study of correlation between music and personality .

A series of 6 studies investigated lay beliefs about music, the structure underlying music preferences, and the links between music preferences and personality. The data indicated that people consider music an important aspect of their lives and listening to music an activity they engaged in frequently. Using multiple samples, methods, and geographic regions, analyses of the music preferences of over 3,500 individuals converged to reveal 4 music-preference dimensions: Reflective and Complex, Intense and Rebellious, Upbeat and Conventional, and Energetic and Rhythmic. Preferences for these music dimensions were related to a wide array of personality dimensions (e.g., Openness), self-views (e.g., political orientation), and cognitive abilities (e.g., verbal IQ).
music.png
This study is much like the music genome project although it adds a twist by relating the preference of music to the personality, which makes it more interesting. I'm not sure if music genome project lets the data out to public. But if they do, this is a great data to strengthen the generalization part of the study since they have age, gender, and postal code of the users. Although as always, they will have to deal with the reliability issues. I wonder if anyone in the music genome project has thought about doing the personality study, the result may be interesting.

Jesus tomb and Bayes rule

| 3 Comments

Two weeks ago there was a press conference with James Cameron (the creator of Titanic, the highest-grossing, most-nominated and receiving most Oscars of all movies to date). Those credentials tend to be taken seriously when a documentary claims they discovered the tomb of Jesus. The documentary was a serious success - James Tabor (one of the advisors) reports 4.1 million viewers.

One of the key pieces of evidence was a statistical calculation that those names must have belonged to Jesus' family - you can read more about it from the source, Andrey Feuerverger. A quantitative investigation into those claims has appeared in Scientific American (with some of my involvement) and later in the Wall Street Journal column and blog. These articles stress the importance assumptions behind the calculations. But there is more interesting detail to what is going on.

First, there is a lot of confusion about "those names are frequent". Of course, Feuerverger did take that into consideration. The real problem is in the interaction between the names on the coffins and the contents of the coffins, which is a simple example of a probability calculus. Let me thus walk through the calculations [PDF].

It's not a problem to think that Jesus had a tomb: indeed, he was said to be buried. The authors assumed the probability to be 1/1,000 - meaning that one of the tombs was surely Jesus Christ's. This is fine, we denote it as P(Tomb=Jesus)=1/1,000. Easily, P(Tomb=Random)=999/1,000. It is also not a problem to assume the probability of that particular choice of names out of all possible ones to be 1/600,000. We denote it as P(Names|Tomb=Random) = 1/600,000

P(Names|Tomb=Jesus)=1.0 is a major can of worms - we need this assumption to compute the probability P(Tomb=Random|Names). Why? The probability is computed through the Bayes rule:

P(Tomb=Random|Names) = P(Names|Tomb=Random) P(Tomb=Random) / (P(Names|Tomb=Random) P(Tomb=Random) + P(Names|Tomb=Jesus) P(Tomb=Jesus))

Here P(Names|Tomb=Jesus) = 1, P(Tomb=Jesus) = 1/1,000, P(Names|Tomb=Random) = 1/600,000, P(Tomb=Random) = 999/1,000. P(Tomb=Random|Names) corresponds to odds of about 1 in 600, but only if you agree with the other numbers.

First, to assume that the bones inside the "Jesus" ossuary are Jesus Christ's would be inconsistent with the hard-line interpretation of the Ascension - that Jesus Christ physically ascended to heaven, so his bones aren't there - they would want this probability to be zero based on the properties of the tomb. Note that not all Christians believe in physical Ascension, but those who do would assign the probability of Jesus having a full ossuary to 0 (but I don't know if the ossuary was full or not, I'm assuming it was full). Secondly, hard-line Christians are quite ignited over the implication that Jesus would have a wife and a child, as is the case with this particular tomb. Instead, they would claim that the probability is zero, because the idea of a married Jesus with a wife and a child would be contrary to their understanding of the Bible.

Feuerverger went beyond what I would have gone, so my guess in Scientific American was wrong. The probability of 1/600,000 is reasonable and safe: it's the probability that a random tomb would carry those names. The number of 1,000 is also safe: it's the number of tombs. But using the Bayes rule to come up with the probability that the tomb is a random find is no longer safe. Why? Because it is based on P(Names|Tomb=Jesus) - you're opening a Pandora's box of trouble.

As for other lessons learned, when you talk with a journalist, don't explain things that *he* would understand it, explain it so that *everyone* would understand, because you're going to be quoted and you don't want to sound as eggheaded as I have. Also, try to keep the answer within a sentence or two. In the past I've had the chance to edit the final version, but you can't count on it.

In a recent discussion at Machine Learning (Theory) blog the website called Faculty of 1000 (Biology) and Faculty of 1000 (Medicine) came up. It works as follows: users submit papers they like, and there is space for supporting and dissenting comments centered around the paper. This is the peer review as it should be done, not the opaque and time-consuming system currently in place with the journals.

As an example of a discussion, consider this example of one the highest rated papers, Why most published research findings are false. Do examine the negative reviews, or dissents, along with the response of the author.

What's still missing from the Internet are instruments of identity, trust and renumeration, but they should be up and running in a few years. As for trust, needed for guaranteeing high-quality information, Faculty of 1000 does institute "section heads" and "faculty members" for different topic areas. As for renumeration, needed to keep the whole thing running, there seems to be some sort of a subscription model with the Faculty of 1000 that might lock people out of the system unless they are affiliated with a major institution. I wish this was integrated with the idea of the Public Library of Science open access model.

Chris Paulse writes,

I came across this video on making a taser from a disposible camera (following the link from Digg, from Buzzfeed, from Stay Free. I haven't tried it out yet, but it reminded me of a story that I'll tell sometime about my friend and diet author Seth.

Diversity in learning

| 3 Comments

Once I figure out how to do it, I'll be reorganizing the list of links and adding Seth's blog, but, in the meantime, here's a fascinating article on diversity in learning, where Seth describes a class assignment where he let students do whatever they wanted:

I [Seth] taught a class called Psychology and the Real World where the off-campus work essentially was the course. Students could do any off-campus work related to psychology – at least 60 hours of it during the 15-week semester. In addition, we met weekly for discussions and the students wrote three short papers. Eight students signed up. Their off-campus work was learning how to be a mediator, developing a television show about happiness, working at a shelter for battered women, working at a nursing home, talking with patients in a mental hospital for the criminally insane, taking care of two-year-old twins, tutoring high-school students, and making bereavement support calls. It was time well-spent.

I had a few thoughts:

1. This sounded a lot better than the class on left-handedness that Seth and I taught 12 years ago. The students liked the class OK but they certainly didn't do anything substantial on their own. But, even then, I recall Seth telling me that he thought a big problem with college courses, as they were usually configured, is that they have the goal of making the student as much like the instructor (or the textbook) as possible. It's a rare class where students' differing experiences and talents are appreciated. (One rare positive example among my own classes is my seminar with Shigeo, where it really works well that different students have different knowledge bases about political science. But in other classes it's been hard to make use of students' diversity.)

2. It's funny that only 8 students signed up, out of the 20,000 undergraduates at UC Berkeley. Setting aside selection issues, it sounds like at least a few more students would've benefited. But I have to say that it's hard to get good attendance in a non-required course. I recall that Mike Jordan said that he gets an enrollment of 125 in his Bayesian statistics course at Berkeley, which seems pretty impressive--I certainly don't get 125 in my classes here--but maybe it's required.

3. I somehow expect that this course wouldn't work so well if I \--or almost anyone else--were teaching it. Part of this is that Seth knows a lot about psychology, but it's also something about working with students. When I've tried to have students do open-ended projects, they've almost always done something pretty uninteresting (see Section 11.4.3 in Teaching Statistics: A Bag of Tricks for more on this). I remember discussing this with Seth several years ago. The conversation went something like this:

Me: Students generally pick uninteresting topics, skimp on the real work of data collection, and avoid any kind of random sampling or even systematic design, so I'm thinking I have to give them more structure, a better list of project topics, maybe assign them to projects.

Seth: Try giving them less structure and see what they come up with.

It seems that Seth's suggestion has worked--for him. I'll give it a try. But I still think I'll have to check their ideas and rule out the worst, such as comparisons of GPA's of athletes and nonatheletes, surveys of students about hours studying and drinking, etc etc. Actually, I really don't know what I should do about this.

4. Seth's article also has a bunch of hypotheses about evolution of various social behaviors. I neither believe nor disbelieve these things--I just don't know how to evaluate such things--but I think of them in a utilitarian sense as useful in helping Seth formulate hypotheses for his self-experimentation. Also, I like the Jane Jacobs references because I am also a big fan of her work (although maybe not all of it).

Recent Comments

  • Andrew Gelman: Enter "cabinet" in the search window and hit the search read more
  • Antonio: Hi, just a quick question: where is your comments on read more
  • Ross: That depends on the hotdog, of course :) read more
  • Ed: I like Nissam Taleb's preference for always losing small amounts read more
  • michael webster: You write: "I don't think it makes sense to use read more
  • Philip: What about modeling actors and preferences as a bipartite graph read more
  • superdestroyer: Bill, Places like Chicago, Maryland, NYC, Mass., demonstrate that you read more
  • Michael Sweeney: Some very good, absolutely fascinating analysis here. I feel that read more
  • Michael Roberts: Thank you for this post. My sediments EXACTLY. This notion read more
  • Preston McAfee: I don't disagree; a lot of related but different behaviors read more
  • Bill: Superdestroyer, demographic shifts won't ever make the U.S. a one-party read more
  • Bill Jefferys: @Greg Davies: Is this article available, and can you provide read more
  • Hopefully Anonymous: "Steve Sailer | November 15, 2009 9:24 AM | Reply read more
  • Andrew Gelman: Preston: I think utility theory is great, both in theory read more
  • Brian Josephson: Pathological Science? Don't forget Pathological Disbelief! read more
  • Andrew Gelman: Bella: I thought it was very accurate. I just didn't read more
  • Bob Hawkins: You think Meryl Streep is wasted in "The Fantastic Mr. read more
  • Ken Williams: When I was in grad school, a fellow student (with read more
  • Bella Stander: Andrew, I thought Weiner's piece was hilarious. Painfully so, because read more
  • Phil: Wait a minute...you saw a movie? read more