November 17, 2008

"Why can't I just install an exhaust fan in the basement?'

Someone went to our radon site and asked:

I'm thinking of mitigating my basement radon of 7.75 pci/l. It's a parcel slab with a crawl space. Why can't I just install an exhaust fan in the basement? Instead of PVC piping, drilling into the slab and sucking out the air underneath the membrane in the crawl space, etc. I have a high efficiency furnace with a fresh air inlet that wouldn't create negative pressure.

Phil's reply:

You didn't say where or how you made your radon measurement. If your measurement was 7.75 pCi/L in the basement, and you don't use the basement as living space, then you might not want to do anything, especially if your basement isn't very well connected to the rest of the house. Before making any costly decisions, we recommend a long-term measurement in the living area of your house.

http://www.cdphe.state.co.us/hm/rad/radon/radonlists.htm has information about Colorado radon mitigators. If you do decide to mitigate, I suggest you speak with one or two of them before doing anything on your own.


Posted by Andrew at 7:46 PM | Comments (1) | TrackBack

August 29, 2008

Battle of the biostatisticians

Sanjay Kaul points to this interesting news article by Matthew Herper about statistical controversies involving the drug Vytorin:

A 1,873-patient study called SEAS found there were 50% more cancers among patients who took Vytorin than those who received placebo. Researchers involved in the study put together a hastily organized, company-funded press conference on July 21 to release the data.

There, Richard Peto, an Oxford University statistician, quieted the cancer scare before it really began. He pooled data from two much larger ongoing studies of Vytorin and said they showed that the cancer risk was a statistical fluke. He called the contention that Vytorin could cause cancer "bizarre." . . .

Forbes surveyed 16 experts about the SEAS results. None were entirely convinced of a link to cancer, but eight thought Peto had gone too far in completely dismissing any cancer risk. Ten thought there was at least some possibility that Vytorin increases the risk of death for patients who have cancer. . . .

Sander Greenland, a statistician at the University of California, Los Angeles, says the data are "ambiguous" . . . The data "are not definitive at all," says James Stein, a cardiologist at the University of Wisconsin, Madison. He says Vytorin should remain "a third-line drug" until more data can be collected, even though the drug is less likely to make patients complain of symptoms like aching. Doctors, he says, are admonished to "first do no harm," not "first do what is easy." Peto responds: "These three trials do not provide any credible evidence of any side effect."

Peto serves as statistician for one Vytorin trial, but does not receive direct money from Merck and Schering-Plough. Here's his basis for dismissing the cancer risk: In SEAS, there were 102 cases of cancer for patients on Vytorin, compared with 67 on placebo. Peto got permission to unblind data from two ongoing studies involving 20,000 patients: SHARP, testing Vytorin vs. placebo and Zocor in kidney patients, and IMPROVE-IT, which compares Vytorin and Zocor in patients at risk for heart attacks. There were 313 cases of cancer for patients taking Vytorin, compared with 326 cancer cases for people taking Zocor or placebo. . . .

The numbers work out differently when one looks at cancer deaths, as opposed to just cancer. In SEAS, 39 Vytorin patients died from cancer, compared with 23 on placebo. In the two larger studies, 97 patients getting Vytorin died, compared with 72 getting Zocor or placebo. The result, Peto says, is close to statistically significant; the odds are about six in 100 that it occurred by chance.

But unlikely things do occur by chance sometimes--that's how people get struck by lightning or win the lottery. Peto points out that hundreds of drugs are being studied in thousands of clinical trials, and sometimes a study will show a drug causes cancer or extends survival just by chance. The hypothesis being tested was that Zetia increases the risk of cancer, not that it increases the risk of cancer deaths. "These continually changing hypotheses are a misuse of statistics," Peto says.

UCLA's Greenland disagrees. He thinks that it is entirely proper to look at the issue of cancer deaths, and he also thinks it's okay to pool all the data from all three trials, an approach that yields a highly statistically significant result. . . .

Donald Berry, head of division of quantitative sciences at M.D. Anderson Cancer Center says he finds much of Peto's analysis convincing. But he says it's not impossible that a drug could make a whole swath of cancers worse, and that possibility may not have been ruled out yet.

"My very clear bent is in Peto's direction," says Berry, "but I do think one wants to keep looking at this."

These are some interesting issues. I have just a few comments:

1. It's funny that 6 experts surveyed did not think there was "at least some possibility that Vytorin increases the risk of death for patients who have cancer." There's gotta be some possibility, right?

2. It's funny that a p-value of .06 is likened to being struck by lightning or winning the lottery. 6% just isn't so rare as all that.

3. At some point the discussion should move beyond whether a certain result is statistically significant, and move to an assessment of the costs and benefits of different courses of action.

Posted by Andrew at 12:23 AM | Comments (10) | TrackBack

August 22, 2008

Friday the 13th, Part 2

Juli linked to this study about Friday the 13th not being more unlucky:

A study published on Thursday by the Dutch Centre for Insurance Statistics (CVS) showed that fewer accidents and reports of fire and theft occur when the 13th of the month falls on a Friday than on other Fridays. . . . In the last two years, Dutch insurers received reports of an average 7,800 traffic accidents each Friday, the CVS study said. But the average figure when the 13th fell on a Friday was just 7,500.

Datacharmer recently made a good comment on this:

Apart from avoiding risky behaviour on Friday the 13th because it is deemed unlucky (which might well be happening), you should also consider that Friday the 13th - unlike other Fridays - CAN'T be Christmas or New Year's (where people get drunk and drive), and it will also be associated with a lower (or higher) probability of falling before a bank holiday weekend (or I guess in the States Independence day, etc).

I guess all I'm saying that it could well be other factors driving this result other than a change in people's behaviour because Friday the 13th is 'unlucky'.

How about accidents on Friday the 12th of Friday the 14th? The article only compares Friday the 13th with an average Friday - in fact, it doesn't even reveal whether the 13th is least accident prone Friday in the book...

Posted by Andrew at 1:17 AM | Comments (2) | TrackBack

July 24, 2008

"Rise in TB Is Linked to Loans From I.M.F."

Steve Kass writes:

Under the headline “Rise in TB Is Linked to Loans From I.M.F.”, Nicholas Bakalar writes for the New York Times today that “The rapid rise in tuberculosis cases in Eastern Europe and the former Soviet Union is strongly associated with the receipt of loans from the International Monetary Fund, a new study has found.”

The study, led by Cambridge University researcher David Stuckler, was published in PLoS Medicine . . . After reading the paper and looking at much of the source data, I [Kass] agree with William Murray, an IMF spokesman also quoted in the article: “This is just phony science.”

Some fun shootin'-fish-in-a-barrel follows. But, hey, it was published in PLoS Medicine, it must be correct, right?

Posted by Andrew at 12:00 AM | Comments (6) | TrackBack

July 22, 2008

Fat stuff, and a discussion of the difficulty of clear writing

I just baked three loaves of bread and then saw these nutrition notes posted by Seth. It's oddly entertaining to read, even though I don't understand anything about it. Sort of like the feeling you get from reading a John Le Carre novel--it all seems so real!

But I completely disagree with Seth's comment that "among academics to write clearly is low status, to write mumbo-jumbo is high status." What Seth is missing here is that it's difficult to write clearly. My impression is that people write mumbo-jumbo because that's what they know how to do; writing clearly takes a lot of practice. It's often surprisingly difficult to get people to state in writing exactly what they did (for example, in fitting a model to data). It takes continual effort to express oneself clearly and directly. Language is inherently nonalgorithmic. It might be that high-status people write mumbo-jumbo, but I suspect that's just because they're not putting in the immense effort required to write clearly. Lots of low-status academics write mumbo-jumbo also (as I know from reviewing many hundreds of submissions to academic journals).

Posted by Andrew at 10:15 AM | Comments (8) | TrackBack

July 21, 2008

Animated adiposity

Rebecca sends in this animated graph and writes, "all the white states inititally are a bit deceptive, but even so, it's pretty striking, and the animation is very effective." I think I'd prefer a time series of the national average, along with a color-coded animated map showing each state relative to the national average in each year.

Posted by Andrew at 12:08 AM | Comments (2) | TrackBack

May 22, 2008

Errata

Steven Levitt's blog is great, but . . . shouldn't it be Monica Das Gupta who deserves the hero treatment? Here are Das Gupta's graphs:

fig3_missing_women.gif

fig2_missing_women.gif

This isn't news at all--Das Gupta's graphs came out at least a year ago. Shouldn't the scientist who was correct all along--and published the data to show it--get more of the credit?

P.S. I published a false theorem once myself (and an erratum note a few years later, when the error was pointed out to me), but I'd hate to think this is "incredibly rare" behavior.

P.P.S. And many other errors get caught before publication.

Posted by Andrew at 3:04 PM | Comments (9) | TrackBack

May 7, 2008

Eight Americas?

Ben Goldacre links to this article by Christopher Murray et al.:

The gap between the highest and lowest life expectancies for race-county combinations in the United States is over 35 y[ears]. We [Murray et al.] divided the race-county combinations of the US population into eight distinct groups, referred to as the “eight Americas,” to explore the causes of the disparities that can inform specific public health intervention policies and programs. . . . Asians, northland low-income rural whites, Middle America, low-income whites in Appalachia and the Mississippi Valley, western Native Americans, black Middle America, low-income southern rural blacks, and high-risk urban blacks.

The graphs are hard to read, but I do like that they ordered the 8 categories in decreasing order of life expectancy. I've never actually understood how "life expectancy at birth" is defined, but I assume these people know what they're doing. There's something funny about having 8 categories, where one category includes over 2/3 of the people It's unusual to see my family counted in "Middle America," so I shouldn't complain.

Posted by Andrew at 6:50 AM | Comments (1) | TrackBack

May 6, 2008

Can you trust a dataset where more than half the values are missing?

Rick Romell of the Milwaukee Journal Sentinel pointed me to the National Highway Traffic Safety Administration’s data on fatal crashes. Rick writes,

In 2006, for example, NHTSA classified 17,602 fatal crashes as being alcohol-related and 25,040 as not alcohol-related. In most of the crashes classified as alcohol-related, no actual blood-alcohol-concentration test of the driver was conducted. Instead, the crashes were determined to be alcohol-related based on multiple imputation. If I read NHTSA’s reports correctly, multiple imputation is used to determine BAC in about 60% of drivers in fatal crashes.

He goes on to ask, "Can actual numbers be accurately estimated when data are missing in 60% of the cases?" and provides this link to the imputation technique the agency now uses and this link to an NHTSA technical report on the transition to the currently-used technique.

My quick thought is that the imputation model isn't specifically tailored to this problem and I'm sure it's making some systematic mistakes, but I figure that the NHTSA people know what they're doing, and if the imputed values made no sense, they would've done something about it. That said, it would be interesting to see some confidence-building exercises to give a sense that the imputations make sense. (Or maybe they did this already; I didn't look at the report in detail.)

Posted by Andrew at 12:02 AM | Comments (6) | TrackBack

April 2, 2008

The limits of open-mindedness in evaluating scientific research

Seth is skeptical of skepticism in evaluating scientific research. He starts by pointing out that it can be foolish to ignore data, just because they don't come from a randomized experiment. The "gold standard" of double-blind experimentation has become an official currency, and Seth is arguing for some bimetallism. To continue with this ridiculous analogy, a little bit of inflation is a good thing: some liquidity in scientific research is needed in order to keep the entire enterprise moving smoothly.

As Gresham has taught us, if observational studies are outlawed, then only outlaws will do observational studies.

I think Seth goes too far, though, and that brings up an interesting question.

In the discussion on his blog, Seth appears to hold the position that all published research has value. (At least, I brought up a notorious example of error-ridden research, and Seth responded that "I don’t agree that this means its info is useless.") But if all published research, even that with crippling errors, is useful, then presumably this is true of some large fraction of unpublished research, right?

At this point, even setting aside monkeys-at-a-typewriter arguments, there's the question of what we're supposed to do with the mountain of research: millions of published articles each year, plus who knows how many undergraduate term papers, high school science fair projects, etc. I think there are some process-type solutions out there, things like Wikipedia and Slashdot or whatever (which have their own biases, but let a zillion flowers bloom, etc.). But that seems like a cop-out to me, since ultimately someone has to read the papers and judge whether it's worth trying to replicate studies, and so forth. Somewhere it's gotta be relevant that a paper has mistakes, right?

Posted by Andrew at 9:37 AM | Comments (11) | TrackBack

March 29, 2008

Don't blame the literature professors

Seth rants about institutional review boards (IRBs). I have no problem with that; I rant about IRBs all the time. (And when IRBs should be doing something, they don't seem to be around.) But I don't know that Seth is being fair to blame "literature professors" for the problem. I've been on some NIH panels where some pretty ridiculous human subjects concerns were raised. And these were scientists on the panels, not literature professors.

Posted by Andrew at 4:50 PM | Comments (2) | TrackBack

March 20, 2008

The "all else equal fallacy," one more time (this time using econ jargon), and also a discussion of the perils of "crossover" arguments

I was sorry to see Steven Levitt repeating the claim about driving a car being good for the environment. I wrote about this last week when it appeared in the other New York Times column of John Tierney, but perhaps it's worth repeating:

These guys are making a classic statistical error, I think, which is to assume that all else is held constant. This is the error that also leads people to misinterpret regression coefficients causally. (See chapters 9 and 10 of our book for discussion of this point.) In this case, the error is to assume that the walker and the driver will be making the same trip. In general, the driver will take longer trips--that's one of the reasons for having a car, that you can easily take longer trips. Anyway, my point is not to get into a long discussion of transportation pricing, just to point out that this seemingly natural calculation is inappropriate because of its mistaken assumption that you can realistically change one predictor, leaving all the others constant.

Unintended consequences of an economist forgetting about unintended consequences

I'm surprised that Levitt didn't notice this, given that the distinction between "exogenous" and "endogenous" variables is such a big deal in economics. In fact, an important contribution that economists often make to public policy debates is to emphasize that you can't simply assume "all else held equal" in an analysis. In fact, Levitt himself made this point is his column a couple months ago, in discussing unintended consequences. One of the consequences of switching from driving to walking is that you take shorter trips. Maybe this is a good thing, maybe it's a bad thing, but I don't think it makes a lot of sense to say, "Be Green: Drive" without realizing that distance traveled is affected by the choice.

P.S. Levitt buttresses his argument with the statement, "Chris Goodall [the person who made the walking/driving comparison] is no right-wing nut; he is an environmentalist and author of the book How to Live a Low-Carbon Life." How relevant is this? Even a "right-wing nut" could make a good point, right? More to the point, I think we have to be careful about automatically trusting "crossover" arguments. Do we have to believe something, just because it comes from somebody who we wouldn't expect to say it? I worry that this sort of crossover appeal is so appealing that otherwise-skeptical commentators (such as Levitt) forget their usual skepticism.

P.P.S. Yes, I realize that Levitt might just be trying to be amusing and thought-provoking rather than making a claim about public policy. From the standpoint of economics and statistics, though, I think this really a great opportunity to explain why the "all else equal" assumption can cause problems. A great example for a course in linear regression or econometrics.

Posted by Andrew at 2:32 PM | Comments (8) | TrackBack

Did GSK trial data mask Paxil suicide risk?

Dave Garbutt writes,

I don't know if you saw this recent report in New scientist about Paxil. It is bad because it alleges suicide attempts were included from washout were included for placebo & excluded from treatment. If you read the pdf they link to - on page seven there is a dire graphic of duration until suicide attempts where age is the Y axis & time is the X. With no indication of nos at risk. A better analysis?

I have no thoughts on this but it looked interesting enough to post. Well, I don't like that the cited graph uses nearly identical symbols for the two different categories. More to the point, if this is true, it's pretty scary. There seems to be a real conflict of interest here (and in similar trials). Maybe it would be better if they approved more drugs but then had an outside agency monitor them.

Posted by Andrew at 12:34 AM | Comments (0) | TrackBack

March 19, 2008

Ethical and data-integrity problems in Iraq mortality study?

Michael Spagat has written this paper criticizing the study of Iraq mortality by Burnham, Lafta, Doocy, and Roberts:

I [Spagat] consider the second Lancet survey of mortality in Iraq published in 2006. I give evidence of ethical violations against the survey’s respondents including endangerment, privacy breaches and shortcomings in obtaining informed consent. Violations to minimal disclosure standards include non-disclosure of the survey’s questionnaire, data-entry form, data matching anonymized interviewer IDs with households and sample design. I present evidence suggesting data fabrication and falsification that falls into nine broad categories: 1) non-disclosure of key information; 2) implausible data on non-response rates and security-related failures to visit selected clusters; 3) evidence suggesting that the survey’s figure for violent deaths was extrapolated from two earlier surveys; 4) presence of a number of known risk factors for interviewer fabrication listed in a joint document of American Association for Public Opinion Research and the American Statistical Association; 5) a claimed field-work regime that seems impossible without field workers crossing ethical boundaries; 6) large discrepancies with other data sources on the total number of violent deaths and their distribution in time and space; 7) two particular clusters that appears to contain fabricated data; 8) irregular patterns suggestive of fabrication in claimed confirmations of violent deaths through death certificates and 9) persistent mishandling of other evidence on mortality in Iraq presented so as to suggest greater support for the survey’s findings from other evidence than is actually the case.

I haven't read Spagat's paper and so am offering no evaluation of my own (see here for some comments form a year or so ago), but the discussions of ethics and survey practice are fascinating. Social data always seem much cleaner when you don't think too hard about how they were collected! May I say it again: a great example for your classes...

P.S. As a minor point, I still am irritated at the habit of referring to a scientific publication by the name of the journal where it was published ("the Lancet study").

P.P.S. A reporter called me about this stuff a couple months ago, but I'm embarrassed to say that I offered nothing conclusive, beyond the statement that these studies are hard to do, and for some reason it's often hard to get information from survey organizations about what goes on within primary sampling units. (We had to work hard even to get this information from these simple telephone polls in the U.S.)

Posted by Andrew at 3:23 PM | Comments (2) | TrackBack

March 14, 2008

Valuing "Lives Saved" vs. "Life-Years Saved," leading to a discussion of the flawed concept of "willingness to pay"

Jim Hammitt sends along this interesting report comparing different measures of risk when evaluating public health options:

There is long-standing debate whether to count "lives saved" or "life-years saved" when evaluating policies to reduce mortality risk. Historically, the two approaches have been applied in different domains. Environmental and transportation policies have often been evaluated using lives saved, while life-years saved has been the preferred metric in other areas of public health including medicine, vaccination, and disease screening. . . Describing environmental, health, and safety interventions as "saving lives" or "saving life-years" can be misleading. . . . Reducing the risk of dying now increases the risk of dying later, so these lives are not saved forever but life-years are gained. . . .

We discuss some of these issues in our article on home radon risks. Beyond this, I have two comments on Jim Hammitt's paper:

1. I wish he'd talked about Qalys. I just like the sound of that word. Qaly, qaly, qaly. (It's pronounced "Qualy")

2. He talks briefly about "willingness to pay." I've always thought this can be a misleading concept. Sometimes it's really "ability to pay." Give someone a lot more money and he or she becomes more able to pay for things, including risk reduction. True, this induces more willingness to pay, but to me the ability is the driving factor. I think the key is what comparison is being made. If you're considering one person and comparing several risks, then the question is, what are you willing to pay for. But if you are considering several people with different financial situations, then the more relevant question might be, who is able to pay.

Posted by Andrew at 10:01 PM | Comments (13) | TrackBack

How many lives has statistics saved?

Andrew C. Thomas suggests that the method of propensity scores has saved thousands of lives due to its use in medical and public health research. This raises the question of how could we measure/estimate the number of lives (or qalys, or whatever) saved by propensity scores. And then, if that could be done, it would make sense to do it in a context where you could estimate the lives saved by other methods (least squares, logistic regression, Kaplan-Meier curves, etc.) This all seems pretty impossible to me--how would you deal with the double-counting problems, also how do you deal with bad methods that are nonetheless popular. (For example, I hate the Wilcoxon rank test--as discussed in one of our books, I'd rather just rank-transform the data (if that is indeed what you want to do) and then run a regression or whatever. But Wilcoxon's probably saved lots of lives.)

If one more generally wanted to ask how many lives have been saved by statistical methods in total, I'd want to restrict to medical and public health. Otherwise you have difficulties, for example, in counting how many lives were saved or lost due to military research in World War II and so forth.

Posted by Andrew at 2:41 PM | Comments (5) | TrackBack

January 31, 2008

Random restriction as an alternative to random assignment? A mini-seminar from the experts

Robin Hanson suggested here an experimental design in which patients, instead of randomly assigned to particular treatments, are randomly given restrictions (so that each patient would have only n-1 options to consider, with the one option removed at random). I asked some experts about this design and got the following responses.

Eric Bradlow wrote:

I think "exclusion", more generally, in Marketing has been done in the following ways:

[1] A fractional design -- each person only sees a subset of the choices, items, or attributes of a product (intentionally) on the part of the experimenter. Of course, this is commonly done to reduce complexity of the task while trading off the ability to estimate a full set of interactions. The challenge here, and I wrote a paper about this in JMR in 2006, is that people infer the values of the missing attributes and do not, despite instructions, ignore them. Don Rubin actually wrote an invited discussion on my piece. So, random exclusion on the part of the experimenter is done all of the time.

[2] A second way exclusion is sometimes done is prior to the choice or consumption task, you let the respondent remove "unacceptable" alternatives. There was a paper by Seenu Srinivasan of Stanford on this. In this manner, the respondent eliminates "dominated/would never choose alternatives". This is again done for the purposes of reducing task complexity.

[3] A third set of studies I have seen, and Eric Johnson can comment on the psychology of this much more than I can, is something that Dan Ariely (now of Duke formerly of MIT and colleagues have done), which seems closest to this post. In these sets of studies, alternatives are presented and then "start to shrink and/or vanish". What is interesting is that these alternatives that he does this to are not the preferred ones and it has a dramatic effect on people's preferences. I always found these studies fascinating.

[4] A fourth set of related work, of which Eric Johnson has great fame, is a "mouse-lab" like experiment where you allow people to search alternatives until they want to stop. This then becomes a sequential search problem; however, people exclude alternatives when they want to
stop.

So, Andy, I agree with your posting that:

(a) Marketing researchers have done some of this.

(b) Depending on who is doing the excluding, one will have to model this as a two-step process, where the first step is a self-selection (observational study like likelihood piece, if one is going to be model-based).

The aforementioned Eric Johnson then wrote:

I think there are at least two important thoughts here:

(1) random inclusion for learning... Decision-making has changed the way we think about preferences: They are discovered (or constructed) not 'read' from a table (thus Eric B.'s point 3).

A related point is that a random option can discover a preferences (gee, I never thought I liked ceviche....) so there may be value in adding random options to the respondent,,, The late Hillel Einhorn wrote about 'making mistakes to learn.'

(2) "New Wave' choice modeling often consists of generating the experimental design on the fly: Adaptive conjoint. By definition, these models use the results from one choice to eliminate a bunch of possible options and focus on those that have the most information. Olivier Toubia at Columbia Marketing is a master of this.

To elaborate on Eric B.'s points:

Consumer Behavior research shows that elimination is a major part of choice for consumers, probably determining much of the variance in what is chosen. Make choice easier, learning harder.

There is an interesting tradeoff for both the individual and larger publics here: You try a option you are likely not to like (treatment which may well not work). If you are surprised, then you (or subsequent patients) benefit for a long time. Since this is an intertemporal choice, people may
not experiment enough.

Finally, Dan "Decision Science News" Goldstein added:

I've never seen a firm implement such a design in practice, neither when I worked in industry, nor when I judged "marketing effectiveness" competitions.

My own thoughts are, first, that there are a lot of interesting ideas in experimental design beyond the theory in the textbooks. It would be worth thinking systematically about this (someday). Second, I want to echo Eric Johnson's comment about preferences being constructed, not "read off a table" from some idealized utility function. Utility theory is beautiful but it distresses me that people think it fits reality in an even approximate way.

Posted by Andrew at 12:12 AM | Comments (1) | TrackBack

January 10, 2008

Chewy food

This is interesting. As a bread-lover, though, I don't particularly enjoy hearing people tell me not to eat white flour. Also, I don't see the relevance of the tree-climbing crabs, but they do look cool:

coconut_crab.jpg

Posted by Andrew at 9:51 AM | Comments (0) | TrackBack

December 26, 2007

Zoloft stories: Blackballed researchers and Correlation can imply causation

Frederick Crews is writing about selective serotonin reuptake inhibitors" (SSRIs):

Hence the importance of David Healy's stirring firsthand account of the SSRI wars, Let Them Eat Prozac. Healy is a distinguished research and practicing psychiatrist, university professor, frequent expert witness, former secretary of the British Association for Psychopharmacology, and author of three books in the field. Instead of shrinking from commercial involvement, he has consulted for, run clinical trials for, and at times even testified for most of the major drug firms. But when he pressed for answers to awkward questions about side effects, he personally felt Big Pharma's power to bring about a closing of ranks against troublemakers. That experience among others has left him well prepared to puncture any illusions about the companies' benevolence or scruples.

. . .

The most gripping portions of Let Them Eat Prozac narrate courtroom battles in which Big Pharma's lawyers, parrying negligence suits by the bereaved, took this line of doubletalk to its limit by explaining SSRI-induced stabbings, shootings, and self-hangings by formerly peaceable individuals as manifestations of not-yet-subdued depression. As an expert witness for plaintiffs against SSRI makers in cases involving violent behavior, Healy emphasized that depressives don't commit mayhem. But he also saw that his position would be strengthened if he could cite the results of a drug experiment on undepressed, certifiably normal volunteers. If some of them, too, showed grave disturbance after taking Pfizer's Zoloft—and they did in Healy's test, with long-term consequences that have left him remorseful as well as indignant—then depression was definitively ruled out as the culprit.

Healy suspected that SSRI makers had squirreled away their own awkward findings about drug-provoked derangement in healthy subjects, and he found such evidence after gaining access to Pfizer's clinical trial data on Zoloft. In 2001, however, just when he had begun alerting academic audiences to his forthcoming inquiry, he was abruptly denied a professorship he had already accepted in a distinguished University of Toronto research institute supported by grants from Pfizer. The company hadn't directly intervened; the academics themselves had decided that there was no place on the team for a Zoloft skeptic.

That doesn't make the research institute look so good, although maybe there's another side to the story.

Hey, did he just say what I think he said???

Crews continues:

Undeterred, Healy kept exposing the drug attorneys' leading sophistry, which was that a causal link to destructive behavior could be established only through extensive double-blind randomized trials—which, cynically, the firms had no intention of conducting. In any case, such experiments could have found at best a correlation, in a large anonymous group of subjects, between SSRI use and irrational acts; and the meaning of a correlation can be endlessly debated. In contrast, Healy's own study had already isolated Zoloft as the direct source of his undepressed subjects' ominous obsessions.

Thanks partly to Healy's efforts, juries in negligence suits gradually learned to be suspicious of the "randomized trial" shell game. . . .

I agree that randomized trials aren't the whole story, and I'll further agree that maybe we statisticians overemphasize randomized trials. But, but, . . . if you do do a randomized trial, and there are no problems with compliance, etc., then, yes, the correlation does imply causation! That's the point of the randomized design, to rule out all the reasons why observational results can be "endlessly debated."

The New York Review of Books needs a statistical copy editor! I don't know anyone there (and I don't know Crews), but maybe someone can pass the message along. . . .

P.S. Maybe I'm being too hard on Crews, who after all is a literary critic, not a statistician. I assume he wrote this thing about correlation and causation because he misinterpreted what some helpful statistician or medical researcher had to say. Sort of like how I might sound foolish if I tried to make some pronouncement about Henry James or whatever.

P.P.S. Typo fixed (thanks, Sebastian).

Posted by Andrew at 7:21 PM | Comments (7) | TrackBack

November 9, 2007

Being Overweight Isn't All Bad, Study Says

Dan Goldstein sent me this link, with the note, "possibly interesting to you / Seth."

What I'm wondering is, will Seth be happy because it shows how conventional medical research has failed, or will he be unhappy because it finds that losing weight is not such a great thing?

Posted by Andrew at 12:20 AM | Comments (5) | TrackBack

October 28, 2007

Bayes pays again

In addition to this, Frederic Bois has another position available:

INERIS Research position in statistics/biostatistics applied to toxicology

Systems biology is an emergent field that aims at understanding of biological systems accounting not only for the components of the system but also for their dynamics. With the progress of molecular biology projects, like genome sequence project, that accumulate in-depth knowledge of molecular nature of biological system, we are now at the stage to seriously look into possibility of system-level understanding solidly grounded on molecular-level understanding.

In the research position we offer, you will develop inference tools for pharmacokinetic and pharmacodynamic models. The systems of interest will be cells, organs, and hormonal signalling pathways. A particular emphasis will be on integrating information from “omics” technologies and new toxicological tools in a systems biology context. The models you will help develop will integrate biological knowledge, in vitro data and physico-chemical information on compounds to derive potential risk for human health. As such, the research will contribute to protect health and lower the number of toxicity tests on animals. It fits perfectly with the objectives of the REACH European regulation, which shape the future of toxicology in Europe.
The position is offered by INERIS, Verneuil-en-Halatte (Picardie, 30 minutes North of Paris by train), in the fast growing modelling team of the Experimental and Predictive Toxicology Unit (TOXI, with staff of 15 members, involved in pharmacokinetic/dynamic modelling reproductive toxicology, neurotoxicology, and inhalation toxicology). Part of the work will involve supervising students and mounting research proposals to be financed by national and international funding bodies, such as the French National Research Agency, the European Union, etc. Current support comes in majority from large European and national research grants. A PhD in statistics or biostatistics is required. A prior experience in Bayesian statistics applied to « omics » data analysis will be a plus.

Interested candidates are invited to submit by email to A. Péry and F. Bois (see addresses below) the following elements.
- A CV with a list of publications
- Two or three relevant publications
- An application letter
- The names and e-mail addresses of referees.

Alexandre Péry, INERIS, DRC/TOXI, Parc Alata BP2 60550 Verneuil-en-Halatte
Tel : +33 3 44 55 61 26, Email: Alexandre.pery@ineris.fr

Frédéric Bois, INERIS, DRC, Parc Alata BP2 60550 Verneuil-en-Halatte
Tel : +33 3 44 55 65 96, Email: Frederic.bois@ineris.fr

Posted by Andrew at 6:11 PM | Comments (0) | TrackBack

October 3, 2007

Counting the lost qalys

Daniel Lakeland writes,

What I [Lakeland] would like to see is a graph that shows the importance of diseases relative to the number of expected person-years they eliminate each year. A disease that kills 1000 people age 10 eliminates about 680000 person years, whereas a disease that kills 100000 people age 85 eliminates about the same 660000 person years. . . . this means that a really important killer in the US is automobile accidents, suicide, and childhood cancers, even though many many more people die of cancer and heart disease.

I imagine this has been done somewhere but I've never seen it tallied. You also have to decide where to draw the line, for example do you count diseases that kill fetuses.

P.S. His art is ok but these other paintings are more my style.

Posted by Andrew at 10:13 PM | Comments (6) | TrackBack

September 30, 2007

Computer Science Applications to Improve Health Care Delivery in Low-Income Countries

Neal Lesh is speaking on this tomorrow (Monday) in the CS department:

It is increasingly possible to apply computer innovation to improve aspects of health care delivery in low-income countries. The urgency of this effort is underscored by the unprecedented health inequities that exist between today's poor and wealthy populations. For example, almost 10% of infants die during their first year in poor countries, compared to 0.5% in wealthy countries. In this talk, I [Lesh] will discuss opportunities for computer science in global health, reporting on the last few years I have spent working in Rwanda, Tanzania, and South Africa on a variety of health delivery projects. These include electronic patient record systems for public AIDS treatment programs, PDAs to guide health workers step-by-step through medical treatment algorithms, and simple solutions to improve the management of blood tests and other laboratory data. Additionally, I will try to give some background on global health inequities, as well the ups and downs of being an ex-pat worker in donor-funded non-profit organizations in low-income countries.
Neal Lesh is Chief Technology Officer at D-Tree International (www.d-tree.org) and Director of Special Projects at Dimagi (www.dimagi.com). He received a PhD in Computer Science from the University of Washington in 1998. As a Senior Scientist at the Mitsubishi Electric Research Laboratory (MERL) in Cambridge, MA, he worked in a variety of areas, including planning, intent inference, information visualization, interactive optimization, and human-computer collaboration. In 2004, Neal got a Master of Public Health from the Harvard School of Public Health. Since then, he has been working and living mostly abroad. In Tanzania, he has worked on electronic medical record systems for a large Harvard-supported AIDS treatment program with tens of thousands of patients in care or treatment. He worked with Partners in Health during the early stages of their operations in rural Rwanda, helping to build reporting systems and laboratory systems. In South Africa and Tanzania, he is investigating the use of handhelds to deliver standardized care to improve the treatment of common causes of child mortality and triaging of HIV+ patients. He will soon start work in Bangladesh to deliver essential information over mobile phones.

The talk is Monday, October 1, 2007, 11am Schapiro Center, Davis Auditorium Columbia University. I wonder if that last bit on mobile phones in Bangladesh is related to our project?

Posted by Andrew at 5:37 PM | Comments (0) | TrackBack

September 20, 2007

An inspiring story of a college class

Seth writes about this epidemiology class taught by Leonard Syme:

Every week there was a new topic. For every topic Syme would assign a paper laying out the conventional wisdom — that high cholesterol causes heart disease, for example — plus three or four papers that cast doubt on that conclusion. I think he even had American Heart Association internal emails. Several students would present the material and then there would be debate — what’s to be believed? The debates were intense. If ever the students seemed to be reaching agreement, he would say something to derail it. “You know, there was a study that found . . . ”

Practically all classes make you think you know more at the end of them than you knew when they began. Practically all professors believe this is proper and good and cannot imagine anything else. With Syme’s class, the opposite happened: Your beliefs were undermined. You walked out knowing less than when you walked in. You had been sure that X causes Y; now you were unsure. At first, Syme said, many students found it hard to take. A three-hour debate with no resolution. They did not like the uncertainty that it produced. But eventually they got used to it.

The overall effect of Syme’s class was to make students think that epidemiology was important and difficult — even exciting. It was important because we really didn’t know the answers to big questions, like how to reduce heart disease; and it was difficult and exciting because the answers were not nearly as obvious as we had been told. . . .

This sounds great and leads me to a few thoughts:

1. Seth and I tried to do something similar over 10 years ago when we taught our seminar in left-handedness: before every week's class, the students had to read a book chapter and a couple of articles on some topic of handedness. Two of the students were given the assignment to present the week's reading to the class, then we had discussion. It didn't go quite as well as Syme's class is described to have gone. Some differences:

a. Handedness is less important than public health.

b. We didn't focus so strongly on controversies. We tried, but sometimes it's hard to get articles on two sides of an issue.

c. When we did get articles on two sides of an issue, it was difficult for the students to evaluate the articles, beyond a high-school-essay sort of reasoning where you can give three reasons to support or oppose any argument. There was no sense of how to weigh the evidence. Of course, that's the kind of skill you want to teach in an epidemiology class. I assume that Syme covered some methods in his class also, to move the discussion beyond nihilistic platitudes.

d. Syme's class was 3 hrs/week; ours was 2 hrs, I think. We also didn't have homework (beyond the readings and some data collection) and we barely taught any methods.

e. Syme's class had grad students in public health, whom I assume were more motivated to work hard, compared to our class of undergrads.

f. Syme is an expert on epidemiology, Seth and I had no particular expertise in handedness.

Looking at a-e above, the key difference, I think, is that I bet Syme's students worked a lot harder in the class. Syme deserves credit for this: motivating students to work hard and teach themselves is a fundamental challenge of teaching.

2. Regarding the discussion of whether universities should teach classes on office politics, and relating to point f above, I want to emphasize that we are not experts in this area. I'm an expert in statistical graphics. I've made thousands of graphs and done both applied and theoretical research in the area. Even if I were good at office politics (which I'm not), I wouldn't be an expert, I wouldn't have done research in the area, I wouldn't be familiar with the literature, etc. At a place like Berkeley or Columbia, the profs are world experts in what they teach. Giving the students training in office politics might be a good idea, but I would clearly distinguish it from the main academic material.which is based on research and scholarship, not just anecdotes, opinions, and personal experience.

3. Thinking about point f above: I think it would be fun to follow Syme's format in teaching a course on Controversies in Statistics. That's a topic I'm an expert on!

Posted by Andrew at 9:01 AM | Comments (5) | TrackBack

September 17, 2007

Do We Really Know What Makes Us Healthy?

Seth has an interesting discussion of this article by Gary Taubes. Seth calls it the best article on epidemiology he's read. I have nothing to add to Taubes's article and Seth's discussion except to say that it's good to see this issues raised for a general audience.

Posted by Andrew at 1:24 PM | Comments (0) | TrackBack

September 14, 2007

Most science studies appear to be tainted by sloppy analysis

Boris pointed me to this article by Robert Lee Hotz:

We all make mistakes and, if you believe medical scholar John Ioannidis, scientists make more than their fair share. By his calculations, most published research findings are wrong. . . . "There is an increasing concern that in modern research, false findings may be the majority or even the vast majority of published research claims," Dr. Ioannidis said. "A new claim about a research finding is more likely to be false than true."

The hotter the field of research the more likely its published findings should be viewed skeptically, he determined.

Take the discovery that the risk of disease may vary between men and women, depending on their genes. Studies have prominently reported such sex differences for hypertension, schizophrenia and multiple sclerosis, as well as lung cancer and heart attacks. In research published last month in the Journal of the American Medical Association, Dr. Ioannidis and his colleagues analyzed 432 published research claims concerning gender and genes.

Upon closer scrutiny, almost none of them held up. Only one was replicated. . . .

Ioannides attributes this to "messing around with the data to find anything that seems significant," and that's probably part of it. The other part is that, even if all statistics are done according to plan, the estimates that survive significance testing will tend to be large--this is what we call "Type M error." See here for more discussion.

Posted by Andrew at 3:57 PM | Comments (2) | TrackBack

August 8, 2007

Oscar winners do not live longer

From Laurie Snell:

If you put "Oscar winners live longer" in Google you will get over 7,000 hits. Here is one from the January 23, 2007 issue of Health and Aging:

Oscar winners live longer: Reported by Susan Aldridge, PhD, medical journalist.

It is Oscar season again and, if you're a film fan, you'll be following proceedings with interest. But did you know there is a health benefit to winning an Oscar? Doctors at Harvard Medical School say that a study of actors and actresses shows that winners live, on average, for four years more than losers. And winning directors live longer than non-winners. Source: "Harvard Health Letter" March 2006.

The assertion that Oscar winners live longer was based on an article by Donald Redelmeier, and Sheldon Singh: "Survival in Academy Award-winning actors and actresses". Annals of Internal medicine, 15 May, 2001, Vol. 134, No. 10, 955-962.

This is the kind of study the media loves to report and medical journals enjoy the publicity they get. . . .

A recent article by James Hanley, Marie-Pierre Sylvestre and Ella Huszti, "Do Oscar winners live longer than less successful peers? A reanalysis of the evidence," Annals of Internal medicine, 5 September 2006, Vol 145, No. 5, 361-363, claims that the Redelmeier, Singh paper was flawed. They provided a reanalysis of the data showing that it does not support the claim that Oscar winners live longer. . . .

Snell continues with a thorough and entertaining discussion of the issues. I have nothing to add.

Posted by Andrew at 7:05 PM | Comments (1) | TrackBack

July 31, 2007

Electrosensitivity

A blogger writes,

The reason I am writing to you has to do with a recent paper (which has received a lot of press coverage) on 'electrosensitivity' - in simple English, whether mobile phone signals can be detected by humans, and whether exposure causes harm. (the full title of the paper is 'Does Short-Term Exposure to Mobile Phone Base Station Signals Increase Symptoms in Individuals who Report Sensitivity to Electromagnetic Fields? A Double-Blind Randomised Provocation Study', and you can find it here - free access.)

I find the authors' conclusions against electrosensitivity being detectable/harmful as unwarranted given the data, and their analysis in many cases is questionable. Read this for example:

58 self-reported sensitive and 121 control individuals came in for testing. Of these, 56 sensitive and 120 controls completed the open provocation test, while 44 sensitive and 115 controls (114 excluding one observation that was left out due to a 'technical error') also completed the double-blind tests.

Participants made on/off judgements during both the 5 minute and 50 minute double-blind exposures. Only 2 sensitive and 5 control participants were able to correctly identify all 6 on/off judgements.

If I did the maths correctly, the probability of 2 or more individuals out of 44 getting 6 binary choices right by chance is 26%, while for 5 or more out of 114 to be right this probability is a mere 3.4%.

And there's also this:

For the three quick double-blind tests (in session 1) and the three 50-minute double-blind tests (sessions 2 to 4) participants judged whether the base station was on or off and indicated how confident they were of this judgement using a scale from 0 'not at all sure' to 100 'completely sure'. The ROC curve method was chosen to analyze the responses as this takes into account not only accurate (hits) and inaccurate responses (false alarms), but also how confident participants are of their judgments.

Sensitive participants had an accuracy rate of 55.2% during the 5 minute tests (d ́=-0.08, sensitivity=66.4%, specificity=32.7%) and 59.8% during the 50 minute tests (d ́=0.20, sensitivity=69.3%, specificity=40.9% ). The control group had an accuracy rate of 51·4% during the 5 minute tests (d ́=0.10, sensitivity=51.7%, specificity=50.8%) and 50.1% during the 50 minute tests (d ́=0.06, sensitivity=48.0%, specificity=54.3%). For each group the 95% confidence interval on the ROC curves include the diagonal axis, implying that participant performance for each group did not differ from chance.

Personally, I never thought there was anything to 'electrosensitivity' - but contrary to the authors' interpretations of the data I believe their findings actually constitute cause for worry (or at least more research). At the risk of coming across as too harsh, the whole paper strikes me as being unbalanced to say the least; and while I am not familiar with some of the statistical techniques employed, the analysis seems to be somewhat sloppy.

I don't really have anything to say here--my only experience studying electromagnetic fields is in Section 21.8 of our new book. It was pretty unpleasant to try to read the linked paper--I'm glad I didn't have to review it!

Posted by Andrew at 2:04 PM | Comments (4) | TrackBack

July 14, 2007

Question and R package on study of Iraqi deaths

David Kane writes,

You posted before on the Burnham et al (2006) study on Iraqi mortality. I [Kane] have an R package with some preliminary analysis and comments.

My question concerns the confidence intervals reported in the prior study, Roberts et al (2004), by many of the same authors.

My question concerns the confidence intervals reported in the prior study, Roberts et al (2004), by many of the same authors. Their central result is:
"The risk of death was estimated to be 2.5-fold (95% CI 1.6 - 4.2) higher after the invasion when compared with the pre-invasion periods."

Since they estimate a pre-war mortality rate of 5.0 per thousand, we can translate the confidence intervals given in relative risk above into a post-war mortality rate of 8.0 - 22.5. (Just multiply pre-war mortality by the new risk. I realize that this ignores the uncertainty in the pre-war estimate, but let's ignore that complication for now.) The problem is that this contradicts the direct estimate of post-war mortality which the authors provide.

"The crude mortality rate during the period of invasion and occupation was 12.3 per 1,000 people per year for the post-invasion period. (95% CI 1.4 - 23.2)"

In other words, their direct measure of the confidence interval for post-war mortality is so high that there is no way that their confidence interval for the relative risk can be correct. The more
imprecise their measure of pre-war mortality, the worse this conflict becomes.

It seems to be that either a) I don't understand what is going on or, b) there is something fundamentally wrong with the results. Which is it?

Thanks for any help that your readers (or you!) can provide.

My quick thought is that new data became available between 2004 and 2006. The 2006 paper cited in my earlier blog entry estimated a pre-invasion mortality rate of 5.5 and a post-invasion rate of 13.3 (with confidence interval of [10.9, 16.1]) which is different from the [1.4, 23.2] interval you have above.

Posted by Andrew at 9:52 PM | Comments (2) | TrackBack

June 29, 2007

Happiness over the life course

Grazia passed on this link to a report by Joel Waldfogel:

People with higher incomes today report higher levels of happiness than their poorer contemporaries. At the same time, people today are far richer than earlier generations, but they're not happier than those who came before them. In light of such wrinkles, a growing cadre of economists has cut out the money middleman and moved to studying happiness directly. The latest installment in this genre is a new study by economists David Blanchflower of Dartmouth and Andrew Oswald of Warwick. They document how happiness evolves as people age. While income and wealth tend to rise steadily over the life cycle, peaking around retirement, happiness follows a U-shaped age pattern.

It's a good news article, with data details:

The authors' data come from large-scale surveys. The General Social Survey asks Americans to rate their happiness level on a three-point scale, with "very happy" a three, "pretty happy" a two, and "not too happy" a one. The average happiness score in the United States is 2.2. The data, covering people older than 16, come from the years 1974 through 2004 and include about 20,000 men and 25,000 women. Across the Atlantic, the Eurobarometer offers a similar four-point scale (very satisfied, fairly satisfied, not very satisfied, not at all satisfied). The average happiness score in Europe is three. The data include about 400,000 men and women in 11 European countries, from 1975 to 1998.

Analyzing data from these surveys, Blanchflower and Oswald found that for both men and women in the United States and throughout Europe, happiness starts off relatively high in early adulthood, then falls, bottoming out on average around age 45, and then rises after that year and on into old age.

In this study (as in others), people are happier than their poorer counterparts if they have more income. How does the effect of income on happiness compare with the age effect? In the United States, the steady decline in happiness from age 16 to age 45 has an effect that's larger than a 50 percent reduction in income—that is, happiness varies more as people get older than it does if you compare significantly richer people to poorer ones. And, equivalently, the 15-year upswing in happiness that follows age 45 is stronger than the upswing that tracks doubling of income. For Europeans, the age-based happiness rise that's equivalent to the effect of doubling income occurs between ages 35 and 70.

There's an age-period-cohort issue:

The U-shaped happiness pattern is not a completely new finding. But past researchers couldn't tell whether 55-year-olds were happier than 45-year-olds in a given year because they'd aged or because they were born to a sunnier generation. This study gets around this problem by combining data on people of different ages at different points in time over a quarter-century. The authors can compare not only 55- and 45-year-olds today, but also 55-year-olds today to people who were 45 a decade ago. And when they account for when people were born, the U-shaped happiness pattern remains.

The authors also find that over the last century, Americans, both men and women, have gotten steadily—and hugely—less happy. The difference in happiness of men between men of my generation, born in the 1960s, and my father's generation, born in the 1920s, is the same as the effect of a tenfold difference in income. In other words, if my father had little money compared to his contemporaries and I have lots of money compared to mine, I can still expect to be less happy. Here, curiously, the European pattern diverges. Happiness falls for the birth years from 1900 to about 1950, and generations born on the continent since World War II have gotten successively happier.

These age-period-cohort things always confuse me. I can't quite believe that this is quite identified.

Here's the paper by Blanchflower and Oswald. It's an interesting mix of theorizing, literature review, and number crunching.

Just a few statistical comments . . .

First off, the results should be graphed, either with time on the x-axis or age on the x-axis. I wanna see that "U-shape"! Beyond that, if the key issue is the pattern with age, it would be good to see more modeling, not just a quadratic curve (age and age-squared). If splines are too much work, then I'd like to see a few age categories. A lot of interpretation seems to be riding on this quadratic assumption. Can you really conclude, for example, that "the minimum point of well-being is estimated at age 49.1"? At the very least, the zillions of data points would allow you to do a binned residual plot and look for departures from the curve by age.

At a technical level, if you are going to use age and age squared, you should at least do some standardization, if not a full standardization then something simple like using (age - 40)/10 so that (a) you can interpret the linear term in the presence of the quadratic, (b) you don't get coefficients like .00026 which are impossible to interpret directly.

Anyway, interesting stuff, and I'm sure that this study will motivate lots more explorations of these questions.

Posted by Andrew at 12:22 AM | Comments (10) | TrackBack

June 28, 2007

Bayes: radical, liberal, or conservative?

I wrote the following (with Aleks Jakulin) to introduce a special issue on Bayesian statistics of the journal Statistica Sinica (volume 17, 422-426). I think the article might be of interest even to dabblers in Bayes, as I try to make explicit some of the political or quasi-political attitudes floating around the world of statistical methodology.

As a lifetime member of the International Chinese Statistical Association, I am pleased to introduce a volume of Bayesian articles. I remember that in graduate school, Xiao-Li Meng, now editor of this journal, told me they didn't teach Bayesian statistics in China because the idea of a prior distribution was contrary to Mao's quotation, "truth comes out of empirical/practical evidence." I have no idea how Thomas Bayes would feel about this, but Pierre-Simon Laplace, who is often regarded as the first applied Bayesian, was active in politics during and after the French Revolution.

In the twentieth-century Anglo-American statistical tradition, Bayesianism has certainly been seen as radical. As statisticians, we are generally trained to respect conservatism, which can sometimes be defined mathematically (for example, nominal 95% intervals that contain the true value more than 95% of the time) and sometimes with reference to tradition (for example, deferring to least-squares or maximum-likelihood estimates). Statisticians are typically worried about messing with data, which perhaps is one reason that the Current Index to Statistics lists 131 articles with "conservative" in the title or keywords and only 46 with the words "liberal" or "radical."

Like many political terms, the meaning of conservatism depends on its comparison point. Does the Democratic Party in the U.S. represent liberal promotion of free expression or a conservative perpetuation of government bureaucracy? Do the Republicans promote a conservative defense of liberty and property or a radical revision of constitutional balance? And where do we place seemingly unclassifiable parties such as the Institutional Revolutionary Party in Mexico or the pro-Putin party in Russia?

Such questions are beyond the scope of this essay, but similar issues arise in statistics. Consider the choice of estimators or prior distributions for logistic regression. Table 1 gives an example of the results of giving specified doses of a toxin to 20 animals. Racine et al. (1986) fit a logistic regression to these data assuming independent binomial data with the logit probability of death being a linear function of dose. The maximum likelihood estimate for the slope is 7.8 with standard error of 4.9, and the corresponding Bayesian inference with flat prior distribution is similar (but with a slightly skewed posterior distribution; see Gelman et al. 2003, Section 3.7).

This noninformative analysis would usually be considered conservative--perhaps there would be some qualms about the uniform prior distribution (why defined on this particular scale), but with the maximum likelihood estimate standing as a convenient reference point and fallback. But now consider another option.

Instead of a uniform prior distribution on the logistic regression coefficients, let us try a Cauchy distribution centered at 0 with a scale of 2.5, assigned to the coefficient of the standardized predictor. This is a generic prior distribution that encodes the information that it is rare to see changes of more than 5 points on the logit scale (which is what it would take to shift a probability from 0.01 to 0.5, or from 0.5 to 0.99). Similar models have been found useful in the information retrieval literature (Genkin, Lewis, and Madigan, 2006). Combining the data in Table 1 with this prior distribution yields an estimated slope of 4.4 with standard error 1.9. This is much different from the classical estimate; the prior distribution has made a big difference.

Is this new prior distribution conservative? When coming up with it (and using it as the default in our bayesglm package in R), we thought so: the argument was that true logistic regression coefficients are almost always quite a bit less than 5 (if predictors have been standardized), and so this Cauchy distribution actually contains less prior information than we really have. From this perspective, the uniform prior distribution is the most conservative, but sometimes too much so (in particular, for datasets that feature separation, coefficients have maximum likelihood estimates of infinity), and this new prior distribution is still somewhat conservative, thus defensible to statisticians.

But from another perspective--that of prediction--our prior distribution is not particularly conservative, and the flat prior is even less so! Let us explain. We took the software of Genkin, Lewis, and Madigan (2005), which fits logistic regressions with a variety of prior distributions and found that a Gaussian prior distribution with center 0 and scale 2.5 performed quite well as measured using predictive error from five-fold cross validation, generally beating the corresponding Cauchy model (as well as the maximum likelihood estimate) in predictive error, when evaluated on a large corpus of datasets. The conclusion may be that the Gaussian distribution is better than the Cauchy at modeling the truth, or at least that this particular Gaussian prior distribution is closer in spirit to what cross-validation is doing: hiding 20% of the data and trying to make predictions using the model built on the other 80%.

This result is consistent with the hypothesis that our Cauchy prior distribution has more dispersion than the actual population of coefficients that might be encountered. But is it conservative? From the computer scientist's standpoint of prediction, it is the Gaussian prior distribution that is conservative, in yielding the lowest expected predictive error for a new dataset (to the best of our knowledge).

Thinking about binary data more generally, the most conservative prediction of all is 0.5 (that is, guessing that both outcomes are equally likely). From this perspective, one starts with the prior distribution and then uses data to gain efficiency, which is the opposite of the statistician's approach of modeling the data first. Which of these approaches makes more sense depends on the structure of the data, and more generally one can use hierarchical approaches that fit prior distributions from data. Our point here is that, when thinking predictively, weak prior distributions are not necessarily conservative at all, and as statisticians we should think carefully about the motivations underlying our principles.

Statistical arguments, like political arguments, sometimes rely on catchy slogans. When I was first learning statistics, it seemed to me that proponents of different statistical methods were talking past each other, with Bayesians promoting "efficiency" and "coherence" and non-Bayesians bringing up principles such as "exact inference" and "unbiasedness." We cannot, unfortunately, be both efficient and unbiased at the same time (unless we perform unbiased _prediction_ instead of _estimation_, in which case we are abandoning the classical definition of unbiasedness that conditions on the parameter value).

Statistics, unlike (say) physics, is a new field, and its depths are close to the surface. Hard work on just about any problem in applied statistics takes us to foundational challenges, and this is particuarly so of Bayesian statistics. Bayesians have sometimes been mocked for their fondness of philosophy, but as Bayes (or was it Laplace?) once said, "with great power comes great responsibility," and, indeed, the power of Bayesian inference--probabilistic predictions about everything--gives us a special duty to check the fit of our model to data and to our substantive knowledge. In the great tradition of textbook writers everwhere, I know nothing at all about the example of Racine et al. (1986) given in Table 1, yet I feel reasonably confident that the doses in the experiment do not take the true probability of death from 0.003 to 0.999 (as would result from the odds ratio implied by the maximum likelihood estimate of 7.8). It seems much more conservative to me to suppose this extreme estimate to have come from sampling variation, as is in fact consistent with the model and data. Even better, ultimately, would be more realistic models that appropriately combine information from multiple experiments--a goal that is facilitated by technical advances such as presented in the papers in this volume.


References:

Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2003). Bayesian Data Analysis, second edition. London: CRC Press.

Genkin, A., Lewis, D. D., and Madigan, D. (2005). BBR: Bayesian logistic regression software. Center for Discrete Mathematics and Theoretical Computer Science, Rutgers University. www.stat.rutgers.edu/~madigan/bbr/

Genkin, A., Lewis, D. D., and Madigan, D. (2006). Large-scale Bayesian logistic regression for text categorization. Technometrics.

Racine, A., Grieve, A. P., Fluhler, H., and Smith, A. F. M. (1986). Bayesian methods in practice: experiences in the pharmaceutical industry (with discussion). Applied Statistics 35, 93-150.


Table:

Dose (log g/ml) Number of animals Number of deaths
-0.86 5 0
-0.30 5 1
-0.05 5 3
0.73 5 5

Table 1. Bioassay data from Racine et al. (1986), used as an example for fitting logistic regression.

Posted by Andrew at 8:04 AM | Comments (0) | TrackBack

June 12, 2007

Total vs. marginal effects, or, Are the overall benefits of health care "probably minor"?

I was having an interesting discussion with Seth about his claim that "the overall benefits of health care are probably minor." The basis of his claim is evidence cited by Aaron Swartz:

In the 1970s, the RAND Corporation picked out 7700 people in six cities and gave half of them free health care. Those lucky ones took advantage of it (spending 30-40% more on average) and they spent it on reasonable things (as judged by medical observers), but they didn’t seem to get any healthier. . . . The RAND study was by far the biggest study of this kind, but other studies find similar results. One analysis found that regions whose Medicare programs give out more money (when the underlying healthiness of the residents is held constant) see no increase in survival rates. A replication found the same results in VA hospitals. Cross-national comparisons find “the impact of public spending on health is … both numerically small and statistically insignificant”. Correlational studies find “Environmental variables are far more important than medical care.” And there are more where that came from.

Several discussants (including myself) at Seth's blog were skeptical about his skeptism, citing various successful medical treatments (in my case, fixing a compound fracture of the wrist; others mentioned cancer treatment, etc.). Seth responded:

The RAND study, of course, is limited — but is there a better attempt to figure out the overall value of medicine? I don’t know of one. if you can point me to a study that shows the more-than-minor value of modern medicine I’d love to look at it. . . . when the overall effectiveness of medicine has come under scrutiny, it has not fared well — and the RAND study is a good example.

Total vs. marginal effects

I have not looked at the Rand study so can't comment on the details, but my first thought is that the marginal benefits from additional health care will be less than the benefits from good existing care. So, even if more is not much better, that doesn’t mean that the overall benefits of existing care are “minor.”

From a policy standpoint, it is the marginal effects that are the most interesting, since nobody (even Seth?) is proposing to zero out medical treatment. Presumably there are diminishing returns, and the benefit/cost ratio for additional treatment is less than that for existing treatment. (And, indeed, some medical care can make things worse, even in expected value; for example, you can get catch the flu in the doctor's waiting room.) But, unless I'm missing something, Seth and Aaron are confusing marginal with total effects.

P.S. Also see Robin Hanson's discussion (with lots of links), which explicitly distinguishes between marginal and total effects. Here I'm not expressing any position on the marginal effects of health care (given my ignorance on the topic), just pointing out that Robin's position seems to have become overstated by others.

P.P.S. See Jake Bowers's comments below. Also more discussion here.

Posted by Andrew at 7:46 AM | Comments (14) | TrackBack

May 18, 2007

Sex ratio at birth in the US

I have been looking at the Trend Analysis of the Sex Ratio at Birth in the United States. It provides a chart analogous to the one posted previously for China.

sex_ratio_US.png

There are some paralells: in the times of war (WW2 from 1940-1945, civil war in China 1945-1953, Vietnam for US around 1970) there is a greater proportion of boys to girls. But this does not fully explain the shift towards boys in China from 1985 onwards...

There are two immediate hypotheses. One is that the economical development resembles a war, either by removing men from the population or by increasing stress. The other is that the birth order or age at birth influences. The following two charts show that the dependence of sex ratio at birth on the birth order is more clean than the dependence on mother's age (the weird colored bars are actually confidence intervals):

ratio_by_age.png

ratio_by_order.png

There are several theories for this. A divorce is less likely if the wife bears a son than if she bears a girl. In that sense, bearing sons earlier and daughters later would be adaptive to a woman. In 1997, Manning suggested that the war implies large age difference between men and women. Dominant men have more boys and dominant females more girls. For example, there is an overwhelming predominance of boys being born to US presidents, but there are more girls born to attractive parents. Finally, a very comprehensive discussion by Martin et al from 1994 suggests that it all has to do with coital frequency.

Who would know. But with the right data, these hypotheses could be evaluated statistically. To me, the most persuasive explanation is indeed dominance.

Posted by Aleks Jakulin at 9:00 AM | Comments (3) | TrackBack

May 17, 2007

Campaign contributions

A colleague writes,

We've been looking at donations from 2004 to candidates and parties... data from the FEC (it's part of a larger project we're working on... the election is not ultimately interesting to us). Anyway, we noticed that the contributions from LA County represented 2.5% in terms of raw count and 3% in terms of value. This seemed small to me, but matches the population % for LA County (we're about 10 million people).

Do these 2.5% and 3% numbers sound right? I would have thought the metropolitan areas would have been higher.

Posted by Andrew at 12:21 AM | Comments (2) | TrackBack

May 14, 2007

China's missing girls

China has more boy babies, compared to girls, than would be expected from the usual biological sex ratio. Monica Das Gupta has written a quick summary of her research explaining why she attributes this to preference to sons (resulting in differential rates of abortion and, possibly, infanticide, by sex).

fig1_missing_women.gif

fig3_missing_women.gif

Link from Marshall Jevons, also some earlier discussion by me here in the context of Emily Oster, an economist who's looked at this problem from a different perspective. Oster attributes the sex ratio to hepatitis infections but that doesn't seem so plausible given this graph from Das Gupta:

fig2_missing_women.gif

Posted by Andrew at 11:02 AM | Comments (1) | TrackBack

May 4, 2007

Effectiveness of geriatric specialists, leading to a brief discussion of the "separate accounts" fallacy in decision making and a comparison of the climates of Baltimore and St. Paul

I'd like to move from basketball to something more important: geriatric care, a topic I was reminded of after reading this interesting article by Atul Gawande.

The article starts with some general discussion of the science of human aging, then moves to consider options for clinical treatment. Gawande learns a lot from observing a gerontologist's half-hour meeting with a patient. He tells a great story (too long to make sense to repeat here), although I suspect he was choosing the best out of the many patients he observed. He notes:

In the story of Jean Gavrilles and her geriatrician, there’s a lesson about frailty. Decline remains our fate; death will come. But, until that last backup system inside each of us fails, decline can occur in two ways. One is early and precipitately, with an old age of enfeeblement and dependence, sustained primarily by nursing homes and hospitals. The other way is more gradual, preserving, for as long as possible, your ability to control your own life.

Good medical care can influence which direction a person’s old age will take. Most of us in medicine, however, don’t know how to think about decline. We’re good at addressing specific, individual problems: colon cancer, high blood pressure, arthritic knees. Give us a disease, and we can do something about it. But give us an elderly woman with colon cancer, high blood pressure, arthritic knees, and various other ailments besides—an elderly woman at risk of losing the life she enjoys—and we are not sure what to do.

Gawande continues with a summary of this study:

Several years ago, researchers in St. Paul, Minnesota, identified five hundred and sixty-eight men and women over the age of seventy who were living independently but were at high risk of becoming disabled because of chronic health problems, recent illness, or cognitive changes. With their permission, the researchers randomly assigned half of them to see a team of geriatric specialists. The others were asked to see their usual physician, who was notified of their high-risk status. Within eighteen months, ten per cent of the patients in both groups had died. But the patients who had seen a geriatrics team were a third less likely to become disabled and half as likely to develop depression. They were forty per cent less likely to require home health services.

Little of what the geriatricians had done was high-tech medicine: they didn’t do lung biopsies or back surgery or PET scans. Instead, they simplified medications. They saw that arthritis was controlled. They made sure toenails were trimmed and meals were square. They looked for worrisome signs of isolation and had a social worker check that the patient’s home was safe.

But now comes the kicker:

How do we reward this kind of work? Chad Boult, who was the lead investigator of the St. Paul study and a geriatrician at the University of Minnesota, can tell you. A few months after he published his study, demonstrating how much better people’s lives were with specialized geriatric care, the university closed the division of geriatrics.

“The university said that it simply could not sustain the financial losses,” Boult said from Baltimore, where he is now a professor at the Johns Hopkins Bloomberg School of Public Health.

One of the problems comes from the "separate accounts" fallacy in decision making:

On average, in Boult’s study, the geriatric services cost the hospital $1,350 more per person than the savings they produced, and Medicare, the insurer for the elderly, does not cover that cost. It’s a strange double standard. No one insists that a twenty-five-thousand-dollar pacemaker or a coronary-artery stent save money for insurers. It just has to maybe do people some good. Meanwhile, the twenty-plus members of the proven geriatrics team at the University of Minnesota had to find new jobs. Scores of medical centers across the country have shrunk or closed their geriatrics units. Several of Boult’s colleagues no longer advertise their geriatric training for fear that they’ll get too many elderly patients. “Economically, it has become too difficult,” Boult said.

But the finances are only a symptom of a deeper reality: people have not insisted on a change in priorities. We all like new medical gizmos and demand that policymakers make sure they are paid for. They feed our hope that the troubles of the body can be fixed for good. But geriatricians? Who clamors for geriatricians? What geriatricians do—bolster our resilience in old age, our capacity to weather what comes—is both difficult and unappealingly limited. It requires attention to the body and its alterations. It requires vigilance over nutrition, medications, and living situations.

On the plus side, Baltimore has much better weather than St. Paul.

From the article by Boult et al. (you might notice a shift in style from the New Yorker to
the Journal of the American Geriatric Society):

PARTICIPANTS: A population-based sample of community-dwelling Medicare beneficiaries age 70 and older who were at high risk for hospital admission in the future (N = 568).

INTERVENTION: Comprehensive assessment followed by interdisciplinary primary care.

MEASUREMENTS: Functional ability, restricted activity days, bed disability days, depressive symptoms, mortality, Medicare payments, and use of health services. Interviewers were blinded to participants' group status.

RESULTS: Intention-to-treat analysis showed that the experimental participants were significantly less likely than the controls to lose functional ability (adjusted odds ratio (aOR) = 0.67, 95% confidence interval (CI) = 0.47–0.99), to experience increased health-related restrictions in their daily activities (aOR = 0.60, 95% CI = 0.37–0.96), to have possible depression (aOR = 0.44, 95% CI = 0.20–0.94), or to use home healthcare services (aOR = 0.60, 95% CI = 0.37–0.92) during the 12 to 18 months after randomization. Mortality, use of most health services, and total Medicare payments did not differ significantly between the two groups. The intervention cost $1,350 per person.

CONCLUSION: Targeted outpatient GEM slows functional decline.

P.S. Dennis Miller alert: Since I'm mentioning the New Yorker, I'll have to link to this again.

Posted by Andrew at 12:15 AM | Comments (4) | TrackBack

May 1, 2007

Mental hospital, prison, and homicide rates

Bruce McCullough points me to this note by Bernard Harcourt on the negative correlation between the rates of institutionalization and homicide. Basically, when more people have been in mental hospitals, there have been fewer homicides, and vice-versa.

It makes sense since, presumably, men who are institutionalized are more likely to commit crimes, so I'm surprised that Harcourt descrbes his results as "remarkable--actually astounding. These regressions cover an extremely lengthy time period . . . a large number of observations . . . and the results remain robust and statistically significant . . ." With a large data set, you're more likely to find statistically significance. Especially when the main result is so plausible in the first place.

Harcourt concludes with some interesting comments about the applicability of his results. (I'd also like to recommend the paper by Donohue and Wolfers on death penalty deterrence as a model example of this sort of analysis.)

P.S. See here for an update by Harcourt, where he explains why he finds his results surprising. I'm not convinced--I believe the results are important, just not that they're suprising.

Funny stuff

Harcourt's blog entry had some amusing comments:

I don't understand why you're including standard errors and p-values in your results.

What is your stochastic model, exactly? If I understand correctly, the underlying data (e.g. the crime rates) are population statistics, not sample estimates, correct?

So where is the randomness coming from?

(Good question. The randomness in the model comes from state-to-state and year-to-year variation.)

Graphs with two y-axes on different scales make Baby Jesus cry, especially when the axes aren't labeled.

(Actually, I understand from Howard Wainer that scatterplots are a fairly recent invention, probably not around in Jesus's time.)

Can you explain your findings in English for people like me who do not speak graph?

(Google clearly needs to implement that Graph -> English translator.)

Remember when this blog was all sweetness and light and Eugene's insightful comments on a variety of topics and puzzleblogger Kevan Choset's interesting observations and Adler's/Juan's snarky comments?

This blog used to be fun. Now, whoa, Ilya thinks he probably didn't (but maybe did!) change US policy on drug eradication in Afghanistan, and we've got graphs with two Y-axes on different scales, and it's all wonk all the time. Why have we abandoned the idea that posts should be entertaining and interesting to someone other than the author?

Include me out!

That's pretty funny, but even better is the note below the comment area:

Comment Policy: We'd like the posts to be civil, of course (no profanity, personal insults, and the like), but we're also hoping that people try to be as calm, reasoned, and substantive as possible. So please, also avoid rants, invective, substantial and repeated exaggeration, and radical departures from the topic of the thread.

Hey, I'd love to have some good rants here . . .

The note continues:

Here's a tip: Reread your post, and think of what people would think if you said this over dinner. If you think people would view you as a crank, a blowhard, or as someone who vastly overdoes it on the hyperbole, rewrite your post before hitting enter.

And if you think this is the other people's fault -- you're one of the few who sees the world clearly, but fools wrongly view you as a crank, a blowhard, or as someone who overdoes it on the hyperbole -- then you should still rewrite your post before hitting enter. After all, if you're one of the few who sees the world clearly, then surely it's especially important that you frame your arguments in a way that is persuasive and as unalienating as possible, even to fools.

In all seriousness, I doubt that this advice will work. I'm afraid a delusional person will not be able to process this sort of rational, well-intentioned advice. But I guess it doesn't hurt to try.

Posted by Andrew at 11:41 AM | Comments (6) | TrackBack

April 12, 2007

Books on nutrition

Seth recommends:

The Queen of Fats, by Susan Allport

Nutrition and Physical Degeneration, by Weston Price

The first of these books is recent; the other is from 1930 or so.

Posted by Andrew at 8:52 AM | Comments (0) | TrackBack

April 5, 2007

What drives media slant? Evidence from U.S. daily newspapers

Boris pointed me to this paper by Matthew Gentzkow and Jesse Shapiro. Here's the abstract:

We [Gentzkow and Shapiro] construct a new index of media slant that measures whether a news outlet's language is more similar to a congressional Republican or Democrat. We apply the measure to study the market forces that determine political content in the news. We estimate a model of newspaper demand that incorporates slant explicitly, estimate the slant that would be chosen if newspapers independently maximized their own profits, and compare these ideal points with firms' actual choices. Our analysis confirms an economically significant demand for news slanted toward one's own political ideology. Firms respond strongly to consumer preferences, which account for roughly 20 percent of the variation in measured slant in our sample. By contrast, the identity of a newspaper's owner explains far less of the variation in slant, and we find little evidence that media conglomerates homogenize news to minimize fixed costs in the production of content.

It appears that newspapers are more liberal in liberal cities and more conservative in conservative cities.

I like the idea of what they are doing but I have some difficulties with the implementation. For example, they consider the phrases, "death tax,” “tax relief,” “personal account,” and “war on terror”" (identifed as strongly Republican), and “"estate tax, ”“tax break, ”“private account, ”and “war in Iraq"” (identifed as strongly Democratic). What bothers me here is that these terms have different factual implications. Just going through these one at a time:

- "Estate tax" is, I believe, the standard term. "Death tax" is pretty much explicitly partisan, designed to shift the debate.
- "Tax relief" and "tax break" both sound descriptive to me. (I wouldn't mind either one right about now, actually...) I'll take their word for it that the Dems use one of these and the Reps use the other, but I wouldn't call these ideologically slanted.
- "Personal account" and "private account" sound the same to me also! Again, these may differ in the world of Social Security focus groups, but neither sounds slanted.
- "The war on terror" and "the war in Iraq" are both happening. They overlap but are not identical. Again, I don't see a slant. A Republican could argue that the war in Iraq is a key part of the war on terror, a Democrat could argue that the war in Iraq is a distraction from the war on terror, but both phrases seem legitimate to me.

Posted by Andrew at 12:20 AM | Comments (0) | TrackBack

April 4, 2007

Climate change is predicted to reduce U.S. crop yields by 25%-80%

Wolfram Schlenker of our economics department is presenting this paper by himself and Michael Roberts on the effects of climate change. The talk is this Thursday, 11:30-1, in 717 IAB. Here's the abstract:

There has been an active debate whether global warming will result in a net gain or net loss for United States agriculture. With mounting evidence that climate is warming, we show that such warming will have substantial impacts on agricultural yields by the end of the century: yields of three major crops in the United States are predicted to decrease by 25-44% under the slowest warming scenario and 60-79% under the most rapid warming scenario in our preferred model. We use a 55-year panel of crop yields in the United States and pair it with a unique fine-scale weather data set that incorporates the whole distribution of temperatures between the minimum and maximum within each day and across all days in the growing season. The key contribution of our study is in identifying a highly non-linear and asymmetric relationship between temperature and yields. Yields increase in temperature until about 29C for corn and soybeans and 33C for cotton, but temperatures above these thresholds quickly become very harmful, and the slope of the decline above the optimum is significantly steeper than the incline below it. Previous studies average temperatures over a season, month, or day and thereby dilute this highly non-linear relationship. We use encompassing tests to compare our model with others in the literature and find its out-of-sample forecasts are significantly better. The stability of the estimated relationship across regions, crops, and time suggests it may be transferable to other crops and countries.

50% declines in crop yields--that's pretty scary! Getting to the statistics, Schlenker points out that weather can be considered as a natural experiment with effects on crop yields, but that if effects are nonlinear, you can't just use broadly spatially- and time-aggregated weather.

My main substantive question would be about potential effects of mitigation (such as switching crops). Also here are some specific comments (bearing in mind that I haven't had a chance to look at the paper in detail):

- I can't believe it's a good idea to fit 6th-order polynomials. I mean, if you want a 6-parameter family, why polynomial? I'd think a spline would make more sense.

- The tables should be graphs. Really really really. Tables 1 and 2 should be a series of line plots with temperature on the x-axis. This is a gimme. Tables 3-9 should be displayed graphically also. In addition, temperature should be per 10 degrees so that the coefs are more interpretable, also (if you must use a table) use fewer significant figs. Precip should also be on a more interpretable scale (you can see the problem by noting the tiny coef on Precip squared).

- The color scheme in Fig 1 should be fixed. In particular, it's not clear if Florida is Interior or Irrigated. Also, the caption says "counties" but the graph seems to be of states.

- The county maps are pretty. Would be improved by either eliminating the borders between counties or making them very very light gray. As it is, they interfere with the gray scheme. Also, I'd remove the N/A counties entirely, rather than coloring them in white, which looks too much like one of the colors in the map.

Finally--and most importantly--the figures are ok but what's missing is a check that the models fit the data. The paper makes a strong substantive claim that might very well be disputed, so I recommend trying to do some of these checks right away: I'd like to see some plots of the data, along with plots of replicated data under the model to reveal what aspects of data are not being captured.

One thing that might be helpful would be to make these model-checking plots, first for a linear model of the form implicitly fit by others, then using the current model, to see the improvement in fit.

Posted by Andrew at 12:35 PM | Comments (9) | TrackBack

March 8, 2007

Postdoc opportunity in network analysis

Simon Frost sent this in:

Postdoctoral position MODELING SOCIAL, SEXUAL, REFERRAL, AND HIV TRANSMISSION NETWORKS Division of Comparative Pathology Department of Pathology University of California San Diego

DESCRIPTION: A fully funded postdoctoral position (up to two years) will be
available starting July 1st 2007 to work on modeling sexual, social, and HIV
transmission networks. Because of the requirements of the funding mechanism,
this position is open to US citizens only.

RESEARCH GROUP: The position is based in the laboratory of Dr. Simon Frost
(http://www.simonfrost.com). The successful candidate will develop,
implementing and applying cutting edge statistical methods, for the analysis
of a variety of network level data including; viral phylogenies;
affiliation data between individuals and meeting places; and respondent-driven
sampling referral networks. There is the potential to develop and co-supervise
undergraduate and graduate research projects.

LOCATION: University of California, San Diego. The position is based at the
Antiviral Research Center (http://www.avrctrials.org), situated in the
Hillcrest area (http://www.hillquest.com) near downtown San Diego.

REQUIREMENTS: A Ph.D. in statistics, network analysis, mathematical or
computational biology, or similar. Evidence of research productivity as
indicated by scholarly publications is required. Sound skills in C/C++
programming, algorithms and methods and the analysis of network data are a
prerequisite. Experience in advanced statistics (exponential random graph
models, random effects models, Markov Chain Monte Carlo) a plus.
Evidence of strong communication and teamwork skills is highly desired.

SALARY: Salaries are set at standard NIH scales, and are commensurate
with experience.

APPLICATION: Please send letter of interest, C.V., and the names and
contact details of three referees by April 9th, 2007 to: Postdoctoral
Position in Viral Evolution and Dynamics, Dr. Simon Frost, UCSD Antiviral
Research Center, 150 W. Washington St., San Diego CA 92103, USA. Electronic
application materials (PDF, Word) are preferred - please email to sdfrost
at ucsd.edu. Review of applications will begin immediately, and continue
until the positions are filled.

Posted by Andrew at 12:49 AM | Comments (0) | TrackBack

February 28, 2007

Watching faces on TV in the morning may cure depression

Seth Roberts did some self-experimentation several years ago and found that watching faces on TV in the morning improved his mood (see here for a link to his article on this research along with some of my thoughts). Several years ago, I email-interviewed Seth on this. The interview never appeared anywhere and we just dug it up, so I'm posting it here. (Seth will also post it on his blog, which has many of his thoughts on self-experimentation.)

Andrew Gelman: Why don't you start by describing your method of using TV watching to cure depression?

Seth Roberts: To feel better, you watch faces on TV in the morning and avoid faces (televised and real) at night. TV faces are beneficial in the morning and harmful at night only if they resemble what you would see during an ordinary conversation. The TV faces must be looking at the camera (both eyes visible) and close to life-size. (My experiments usually use a 27-inch TV.) Your eyes should be about three feet from the screen. Time of day is critical--if you see the TV faces too early or late they will have no effect. The crucial time of day depends on when you are exposed to sunlight but figuring out the best time of day is mainly trial and error right now. I usually have subjects start watching around 7 a.m. They watch about 50 minutes of faces each morning, and so do I.

Most mornings I watch little snippets of TV shows with plenty of faces looking at the camera, such as The News Hour with Jim Lehrer (PBS), the Talking Points section of The O'Reilly Factor (Fox News), Washington Journal (C-SPAN), and Larry King Live (CNN), that I taped the day before. I usually fast-forward through the non-big-face portions. The best TV show for this research is Booknotes (C-SPAN), on Sunday, which I watch in pieces throughout the week. My subjects watch tapes of Booknotes.

AG: How did you come up with this idea?

SR: By accident. I was trying to improve my sleep–wake up too early less often. I suspected that the problem (early awakening) was due to a difference between my life and Stone-Age life. I knew that human contact has a big effect on sleep–we tend to be awake at the times of day that we have contact with other people. In the Stone Age, I believed, people usually chatted with their neighbors early in the morning, whereas I lived alone and might work alone all morning. Maybe the lack of morning chit-chat caused early awakening. To test this idea, I took advantage of results suggesting that late-night TV can have the same effect as human contact on our sleep/wake rhythm. I taped the Leno and Letterman monologues and watched them early one Monday morning. This had no obvious effect. I fell back asleep. The rest of the day was normal. On Tuesday, however, I woke up and felt great--cheerful, calm, full of energy. I had never before felt so good early in the morning. Yet the preceding night and day had been ordinary in every way--except for the morning TV.

AG: When I tell my friends about this idea--"I know a guy who says you can cure depression by watching TV in the morning"--it sounds really nutty. Does it sound nutty to you?

SR: No, because I know a few things your friends may not: (a) the effect is produced by seeing faces on TV, not just any TV; (b) when we have contact with other people has a big effect on when we are awake; and (c) there are many connections between depression and circadian rhythms. Depression is closely connected with insomnia, for instance.

AG: I generally think of TV as an evil, addictive presence in American life. Do you think there's something dangerous about giving TV this "badge of approval" as a medical treatment?

SR: It's not quite a "badge of approval." Seeing faces on TV at NIGHT--which of course is when most people watch--is harmful, my research suggests, if the faces are close to life-size. And they often are. Maybe TVs will be made with variable picture sizes--one size for morning, another size for night. When I watch TV at night (very rare), I stay as far away as possible.

AG: I mean, if this method really worked, I could imagine the Depression Network running talk shows in the morning that are basically infomercials for Prozac or whatever. Would you worry about that?

SR: No. I watch faces on TV every morning and would appreciate more choice. I suspect the morning shows would not be Prozac infomercials, however, because the people watching would not be depressed.

AG: One thing that bothers people about your plan is the idea of TV as a substitute for human contact. I think that most of us--even people who spend a lot of time watching TV--find this idea upsetting. It's like "Brave New World" and virtual reality. Are you at all bothered by recommending to depressed people that they sit inside watching TV?

SR: "Substitute for human contact"? True, but why is that so bad? Reading--which TV critics, many of them writers, seem invariably to like--is also a substitute for human contact, of course. Agriculture is a substitute for hunting and foraging. Vitamin pills substitute for food. Civilization is all about substitutes--about being able to fulfill needs in many ways.

Still, I think watching faces on TV in the morning is only a partial solution to the problem of depression, just as nutritional supplements (e.g., iodized salt, folate added to flour) are only partial solutions to the problems caused by a poor diet. A fuller solution would include changing when most people work. The usual pattern is work (morning and afternoon) then socialize (evening). A better pattern would be socialize (early morning) then work (late morning to early evening)--and go to bed early. I do my little bit for the revolution by inviting friends to brunch rather than dinner. The revolution would also include picture phones with life-size faces.

AG: I heard you say once that depression is ten times as common now as it was 100 years ago. Where do you get that information from?

SR: Many articles have made that point. One of them is: Klerman, G. L, & Weisman, M. M. (1989). Increasing rates of depression. Journal of the American Medical Association, 216, 2229-2235.

AG: If depression is a consequence of modern life, do you think there's something strange about seeking a technological solution for it? It's sort of like saying, people are too atomized, so let's solve the problem with even more solitude?

SR: It is one of many technological solutions to problems caused by "atomization"--people being farther apart. Telephones, air travel, and email are other examples. So it isn't strange. If my subjects are any guide, watching TV for an hour every morning would not increase the solitude of most depressed persons. They are already alone during that time.

AG: Would listening to the radio be OK?

SR: No. You have to see faces.

AG: Have you ever tried to get your research sponsored by TV stations or networks or, for that matter, a publication like TV Guide?

SR: No, but I once put a "TV is good" ad (ABC) on my bulletin board.

[interview ends]

My thoughts on reading this several years later

Wow, that was really fun to read. I should do more interviews. The back-and-forth of the friendly interview can really get to some points that don't come up in the usual article format.

I wonder if Seth is still watching faces every morning, also how things are going with the people in his study.

Posted by Andrew at 12:07 AM | Comments (3) | TrackBack

February 20, 2007

Bioassays with R

Aleks sent along info on this project by Christian Ritz and Jens C. Streibig for an R package for bioassays. We should talk with them since we're currently developing our Bayesian method for serial dilution assays (see our 2004 Biometrics paper here) and implementing it partly in R.

Posted by Andrew at 12:17 AM | Comments (0) | TrackBack

February 4, 2007

Norwegian fraud update

Kjetil Halvorsen reports that news from the fake cancer study in Norway (best quote: "of the 908 people in Sudbo's study, 250 shared the same birthday") has been summarized here. No jail time, but his license to practice medicine and dentistry were revoked. No big fat fine and no community service--maybe the authorities decided that prosecution was too much effort. I still think that at the very least he should have to clean the toilets at a dental clinic every week for a few years. (But no, I don't think I should be punished for the false theorem I published--that's different since it was a mistake, not fraud.)

Posted by Andrew at 1:28 PM | Comments (0) | TrackBack

January 31, 2007

Families of prisoners

Ernest Drucker sent along this paper along with this story:

I [Drucker] have been speaking about the problem of mass imprisonment for years to anyone who would listen – mostly professional groups and students. Once I spoke to the Urban League national convention in Pittsburgh - to little response. But one such talk was to the medical students at Einstein where I teach - they had organized a social medicine course outside the formal curriculum and I was happy to see their interest went beyond clinical medicine. My topic was the epidemiology of incarceration. I showed all my usual PPT slides – tables of data showing the sharp rise in imprisonment in the USA over the last 30 years- and of how far imprisonment had spread in the black community. I talked about the epidemic and our countries drug war policies – something I’ve done dozens of times.

But in the audience was Dean S who came up to me after the presentation and asked if Id be willing to give the same talk to her group of students – who it turned out were all in High school in the Bronx. They were in the Einstein program to bring Bronx HS kids into the medical school labs and hospitals – to let them see about medical science and maybe interest them in careers in health in some way. As select HS students many would go to college, so maybe it was a bit of early recruiting of local talent for Einstein admission in 4 or 5 years. They came from most of the public and parochial high school of the Bronx, but these kids were the pick of the crop. To be in the program they had to sign up in advance fore limited number of slots, they had to get up there to Einstein every week for a term, and their parents were supposed to come in too- for a conference with Dean S about their progress. These were serious kids from families that support their academic goals ands valued education enough to go to some extra trouble to cultivate it.

As I often do with audiences, I asked who had ever had a family member or close friend go to prison. To my amazement they all raised their hands – 100% of them had a member who had been in prison - that’s a very simple and striking measure of the prevalence of incarceration at this time in the Bronx – every family in this select group was affected directly by incarceration.

We'll be talking with Ernie this Thursday in the social networks working group. His work seems related to the ideas of this paper (or at least to its title).

Posted by Andrew at 12:37 AM | Comments (4) | TrackBack

January 18, 2007

Going to college may be bad for your brain?

Jeff Lax pointed me to this online article by Jeanna Bryner:

Higher education tied to memory problems later, surprising study finds

Going to college is a no-brainer for those who can afford it, but higher education actually tends to speed up mental decline when it comes to fumbling for words later in life.

Participants in a new study, all more than 70 years old, were tested up to four times between 1993 and 2000 on their ability to recall 10 common words read aloud to them. Those with more education were found to have a steeper decline over the years in their ability to remember the list, according to a new study detailed in the current issue of the journal Research on Aging. . . .

As Jeff pointed out, they only consider the slope and not the intercept. Pehaps the college graduates knew more words at the start of the study?

Here's a link to the study by Dawn Alley, Kristen Southers, and Eileen Crimmins. Looking at the article, we see "an older adult with 16 years of schooling or a college education scored about 0.4 to 0.8 points higher at baseline than a respondent with only 12 years of education." Based on Figures 1 and 2 of the paper, it looks like higher-educated people know more words at all ages, hence the title of the news article seems misleading.

The figures represent summaries of the fitted models. I'd like to see graphs of the raw data (for individual subjects in the study and for averages). It's actually pretty shocking to me that in a longitudinal analysis, such graphs are not shown.

Posted by Andrew at 4:08 PM | Comments (3) | TrackBack

January 17, 2007

Not getting the Nobel Prize reduces your expected lifespan by two years

Andrew Oswald (see here and here) sends in this paper. Here's the abstract:

It has been known for centuries that the rich and famous have longer lives than the poor and ordinary. Causality, however, remains trenchantly debated. The ideal experiment would be one in which status and money could somehow be dropped upon a sub-sample of individuals while those in a control group received neither. This paper attempts to formulate a test in that spirit. It collects 19th-century birth data on science Nobel Prize winners and nominees. Using a variety of corrections for potential biases, the paper concludes that winning the Nobel Prize, rather than merely being nominated, is associated with between 1 and 2 years of extra longevity. Greater wealth, as measured by the real value of the Prize, does not seem to affect lifespan.

The natural worry here is a selection bias, in which people who die at age X are less likely to receive the prize (for example, if you die at age 60, but you would have received the prize had you lived past age 62). The authors address this using a survival-analysis approach to condition on the age at which the relevant scientists are nominated for or receive the prize.

Two years is a large effect, but at the same time I could imagine this difference occurring from some sort of statistical artifact, so I would't call such a study conclusive, but it adds to the literature on status, health, and longevity.

Explanation of the above title to this blog entry

Thinking more about the particular case of Nobel Prizes, I've long thought that the pain of not receiving the prize is far greater, on average, than the joy of receiving it. Feeling like you deserve the prize, then not getting it year after year . . . that can be frustrating, I'd think. Sort of like waiting for that promotion that never comes. Getting it, on the other hand, I'm sure is nice, but so many more eligible people don't get it than do (and the No comes year after year). I'd guess that it's a net reducer of scientists' lifespans.

Posted by Andrew at 12:39 AM | Comments (4) | TrackBack

January 5, 2007

Food regulations and natural experiments?

Janet Rosenbaum writes:

New York City has recently required restaurants with uniform menus to post calorie content on their menus with a font size equal to the prices. This initiative may not decrease obesity, but if we're able to gather good data, posting calories on menus could help us better understand how people choose food.

Currently, we don't have a good understanding of how people choose what they eat. Observations of people's food choices through nutritional surveys and food diaries tell us only what people will admit to eating. Laboratory experiments tell us how people who volunteer for psychology experiments choose foods in a new environment, but may not generalize to larger populations in real life situations. Non-laboratory experiments with vending machines have found that people will buy more healthy foods when healthy food is "subsidized" and when less healthy food is "taxed", but nutritional information is not immediately available to subjects even in these experiments: the foods which were manipulated were pretty obvious candidates for healthy and unhealthy foods such as carrot sticks and potato chips.

We also don't know how much knowledge about food people have: when someone chooses a high calorie food, we don't know whether they have chosen that food in ignorance of its calorie content or despite its calorie content. Putting calories on the menu in a visible way gives consumers information which is more readily available than on food packages, and reduces the second problem: some people will read the calorie content of their food when making their choices, and the calorie content may influence their choices.

If calorie information becomes widespread, we could even begin to discuss an elasticity of demand according to both the price and calorie content of the food, as well as a willingness to pay for fewer calories. Just thinking about the McDonald's menu, people can minimize the number of calories they eat by choosing either the least expensive (basic hamburger + fries) or the most expensive items on the menu (salads, grilled chicken).

Some have speculated that posting calorie information on the menus won't affect behavior at all because people choosing to eat at places with unhealthy food can't expect lower calories, but that seems naive. After all, even people shopping at expensive stores are somewhat price sensitive, and all retailers go to lengths to make people feel as though they are getting a bargain.

The inclusion of calorie information on menus gives a tremendous opportunity for social scientists, if only we can get sales data suitable for a quasi-experiment (pre-post with control). Any ideas?

My first thought here is that I imagine that people who eat salad and grilled chicken at McDonalds are only at Mickey's in the first place because someone else in their party wants a burger and fries. There's got to be some information on this sort of thing from marketing surveys (although these might not be easily accessible to researchers outside the industry).

My other thought is that it would be great if the food industry and public health establishment could work together on this (see note 4 here).

Posted by Andrew at 12:29 AM | Comments (9) | TrackBack

December 26, 2006

What is the evidence on birth order and brain cancer?

Bruce McCullough writes,

The probability of getting brain cancer is determined by the number of younger siblings. So claim some scientists, according to an article published in the current issue of The Economist.

I have ordered your book so that I can read more about controlling for intermediate outcomes, but I am not yet confident enough to tackle it myself. Perhaps you might blog this?

I'll give my thoughts, but first here's the scientific paper (by Altieri et al. in the journal Neurology), and here are the key parts of the news article that Bruce forwarded:

Younger siblings increase the chance of brain cancer

IT IS well known that many sorts of cancer run in families; in other words you get them (or, at least, a genetic predisposition towards them) from your parents. . . . Dr Altieri was looking for evidence to support the idea that at least some brain cancers are triggered by viruses and that children in large families are therefore at greater risk, because they are more likely to be exposed to childhood viral infections. . . .

Dr Altieri describes what he discovered when he analysed the records of the Swedish Family Cancer Database. This includes everyone born in Sweden since 1931, together with their parents even if born before that date.

More than 13,600 Swedes have developed brain tumours in the intervening decades. In small families there was no relationship between an individual's risk of brain cancer and the number of siblings he had. However, children in families with five or more offspring had twice the average chance of developing brain cancer over the course of their lives compared with those who had no brothers and sisters at all.

Digging deeper, Dr Altieri found a more startling result. When he looked at those people who had had their cancer as children or young teenagers he found the rate was even higher--and that it was particularly high for those with many younger siblings. Under-15s with three or more younger siblings were 3.7 times more likely than only children to develop a common type of brain cancer called a meningioma, and at significantly higher risk of every other form of the disease that the researchers considered. . . . the mechanisms by which younger siblings have more influence than elder ones are speculative. . . . An alternative theory is that a first child may experience a period when his immune system is particularly sensitive to certain infections at about the age when third and fourth children are typically born. . . .

OK, now my thoughts. There are two issues to address here: first, what exactly did Altieri et al. find in their data analysis, and, second, how can we think about causal inference for birth order and the number of siblings?

What did Altieri et al. find?

The main results in the paper appear to be in Table 2, where the brain cancer risk is slightly higher among people with more siblings. The overall risk ratios, normalized at 1 for only children, are 1.03, 1.06, 1.10, and 1.06 for people with 1, 2, 3, or 4+ siblings, respectively. The table gives a p value for the trend as 0.005, but I think they made a mistake, because, in R:

> x <- 0:4 > y <- c(1,1.03,1.06,1.10,1.06) > summary (lm (y~x))

Call:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.012000 0.019950 50.727 1.69e-05 ***
x 0.019000 0.008145 2.333 0.102

Residual standard error: 0.02576 on 3 degrees of freedom
Multiple R-Squared: 0.6446, Adjusted R-squared: 0.5262
F-statistic: 5.442 on 1 and 3 DF, p-value: 0.1019


The p-value seems to be 0.10, not 0.005.

Stronger results appear in Tables T-1, T-2, and T-3 (referred to in the paper and included in the supplementary material at the article's webpage). Risk ratios for brain cancer are quite a bit higher for kids with 3 or more younger siblings, and lower for kids with 3 or more older siblings.

In all these tables, results are broken down by type of cancer, but sample sizes are small enough that I don't really put much trust into these subset analyses. A multilevel model would help, I suppose.

Causal inference for birth order

How to think about this causally? We can think about the number of younger siblings as a causal treatment: if littel Billy's parents have more kids, how does this affect the probability that Billy gets brain cancer? But how do we think about older siblings? I'm a little stuck here: if I try to compare Billy as an only child to Billy as the youngest of three children, it's hard to think of a corresponding causal "treatment."

Thinking forward from treatments, suppose a couple has a kid and is considering to have another. One could imagine the effect of this on the first child's probability of brain cancer. One can also consider the probability that the second child has brain cancer, but the comparison of the two kids would not be "causal" (in the Rubin sense). This is not to dismiss the comparison--I'm just laying out my struggle with thinking about these things. Similar issues arise in other noncausal comparisons (for example, comparing boys to girls).

The reluctant debunker

Finally, I'm sensitive to Andrew Oswald's comment that I'm too critical of innovative research--even if there are methodological flaws in a paper, it's conclusions could still be correct. My only defense is that I'm responding to Bruce's request: I wasn't going out looking for papers to debunk.

In any case, I'm not "debunking" this paper at all. I don't see major statistical problems with their comparisons; I'm just struggling to understand it all.

Posted by Andrew at 10:24 AM | Comments (4) | TrackBack

December 22, 2006

Just in time for the holidays

Aleks forwarded this along:

atomic-energy-lab-01.jpg

The entries themselves were pretty funny, but I also liked the comment on the atomic energy kit entry by the guy with "a comfortable six-figure salary." Maybe if he'd had a little less radiation exposure as a child, he'd have a comfortable seven-figure salary by now . . .

Posted by Andrew at 7:38 PM | Comments (2) | TrackBack

November 21, 2006

Working in the tradition of R. A. Fisher

The following appeared in an email list:

Philip Morris International (PMI), based in Switzerland, is a leading tobacco company outside the United States. . . . The primary mission of the PMI Research & Development Centre is to research and develop a new generation of products which may have the potential to reduce the risks of smoking.

To strengthen the department of Product Risk Management in Neuchâtel, Switzerland, we are looking for a:

Statistician (Modeler)

THE POSITION You will form part of a team in charge of a broad range of statistical and biostatistical activities, such as data analysis, and development of quantitative methods in various areas of smoking and health research. Furthermore, the building of probabilistic disease-models is an important responsibility of the team.

Our new modeler will bring his/her strong statistical background into the activities related to disease- and risk modeling. The work includes the development of Bayesian networks and comprises the identification, implementation and application of available methods. . . . You have profound knowledge of Bayesian
statistics, robust statistics, and re-sampling techniques.

P.S. Fisher links are here. (In fairness to Philip Morris, they do say that they're trying to reduce the risks, which is a bit different than Fisher's claim that inhaling reduces cancer risk.)

Posted by Andrew at 6:56 AM | Comments (1) | TrackBack

October 31, 2006

Cell phones fighting arsenic in Bangladesh

Here's a nice description of our project (headed by Lex van Geen) to use cell phones to help people in Bangladesh to lower ther arsenic exposure.

phone.jpg

Further background is here.

Posted by Andrew at 5:37 PM | Comments (0) | TrackBack

October 23, 2006

Galton was a hero to most

In Graphics of Large Datasets: Visualizing a Million (about which more in a future entry), I saw the following graph reproduced from an 1869 book by Francis Galton, one of the fathers of applied statistics:

Genius.png

According to this graph [sorry it's hard to read: the words on the left say "100 per million above this line", "Line of average height", and "100 per millon above this line"; and on the right it says "Scale of feet"], one man in a million should be 9 feet tall! This didn't make sense to me: if there were about 10 million men in England in Galton's time, this would lead us to expect 10 nine-footers. As far as I know, this didn't happen, and I assume Galton would've realized this when he was making the graph.

I asked Antony Unwin (author of the chapter that included the above graph), Howard Wainer (expert on statistical graphics) and Steve Stigler (expert on the history of statistics). Howard said that the tallest man ever was almost 9 feet. Certainly a rate of less than 1 in a million.

Antony wrote:

Galton was postulating a hypothetical population with a normal distribution. . . .He did investigate some physical distributions in the book (including the chest dimensions of Scottish soldiers) and claimed (p32 included in the attached) that: "It will now be my aim to show there is sufficient uniformity in the inhabitants of the British Isles to bring them fairly within the grasp of this law." So he should have thought about the extreme values. On the other hand information in those days ways not so readily available as it is now. Might Galton have believed there were such people, he just hadn't heard anything of them yet? The lower classes were not acknowledged and who knows what oddities might have been found amongst them? Forrest's excellent biography of Galton ends with an anecdote that supports the idea of his not regarding all as equals.

Steve wrote:

Galton was describing a hypothetical population, and he specifies for the illustration that the mean is 66 inches and that 100/million exceed 78 inches. By my rough calculation that gives a SD of about 3.2 inches. This was his earliest statistical book and Galton had more faith in the normal than later, but without good tables available (even though Laplace had given a continued fraction that would have given acceptable results) Galton did not appreciate how fast the tail comes down at the extremes. His view might have been colored by the fact that in the 1850s he had spent a couple of years in Africa, where there were and still are peoples of a quite wide variety of heights.

Howard summarized:

We must calculate the z-score associated with a probability of 100 out of 1,000,000 to be above 78 inches. The z-score of 1/10,000 is 3.72 and so we calculate the standard deviation to be (78-66)/3.72=3.2. We then ask how many sd's away from the mean is it for a nine-footer. The obvious calculation [(108-66)/3.2] yields a z-score of 13. If height were truly distributed normally, the likelihood of a nine-footer would be far, far less than one in a trillion.

This shows that (a) Galton in 1869 didn't know the tails of the normal distribution (he couldn't just use "pnorm" in R) and (b) actual distribution of men's heights is longer-tailed than normal.

The interesting thing to me is point (a).

Howard will have a column on this in the next issue of Chance.

Posted by Andrew at 12:41 AM | Comments (3) | TrackBack

October 17, 2006

Estimating Iraq deaths using survey sampling

Tex and Jimmy sent me links to this study by Gilbert Burnham, Riyadh Lafta, Shannon Doocy, and Les Roberts estimating the death rate in Iraq in recent years. (See also here and here for other versions of the report). Here's the quick summary:

Between May and July, 2006, we did a national cross-sectional cluster sample survey of mortality in Iraq. 50 clusters were randomly selected from 16 Governorates, with every cluster consisting of 40 households. Information on deaths from these households was gathered. Three misattributed clusters were excluded from the final analysis; data from 1849 households that contained 12 801 individuals in 47 clusters was gathered. 1474 births and 629 deaths were reported during the observation period. Pre-invasion mortality rates were 5·5 per 1000 people per year (95% CI 4·3–7·1), compared with 13·3 per 1000 people per year (10·9–16·1) in the 40 months post-invasion. We estimate that as of July, 2006, there have been 654 965 (392 979–942 636) excess Iraqi deaths as a consequence of the war, which corresponds to 2·5% of the population in the study area. Of post-invasion deaths, 601 027 (426 369–793 663) were due to violence, the most common cause being gunfire.

And here's the key graph:

iraq.png

Well, they should really round these numbers to the nearest 50,000 or so, But that's not my point here. I wanted to bring up some issues related to survey sampling (a topic that is on my mind since I'm teaching it this semester):

Cluster sampling

The sampling is done by clusters. Given this, the basic method of analysis is to summarze each cluster by the number of people and the number of deaths (for each time period) and then treat the clusters as the units of analysis. The article says they use "robust variance estimation that took into account the correlation," but it's really simpler than that. Basically, the clusters are the units. With that in mind, I would've liked to have seen the data for the 50 clusters. Strictly speaking, this isn't necessary, but it would've fit in easily enough in the paper (or, certainly, in the technical report) and that would make it easy to replicate that part of the analysis.

Ratio estimation

I couldn't find in the paper the method that was used to extrapolate to the general population, but I assume it was ratio estimation (reporting deaths from 629/12801 = 4.9%, and if you then subtract the deaths before the invasion, and multiply by 12/42 (since they're counting 42 months after the invasion), I guess you get the 1.3% reported in the abstract). For pedagical purposes alone, I would've liked to see this mentioned as a ratio esitmate, (especially since this information goes into the standard error).

Inicidentally, the sampling procedure gives an estimate of the probability that each household in the sample is selected, and from this we should be able to get an estimate of the total popilation and total #births, and compare to other sources.

I also saw a concern that they would oversample large households, but I don't see why that would happen from the study design; also, the ratio estimation should fix any such problem, at least to first order. The low nonresponse numbers are encouraging if they are to be believed.

It's all over but the attributin'

On an unrelated note, I think it's funny for people to refer to this as the "Lancet study" (see, for example, here for some discussion and links). Yes, the study is in a top journal, and that means it passed a referee process, but it's the authors of the paper (Burnham et al.) who are responsible for it. Let's just say that I woldn't want my own research referred to as the "JASA study on toxivology" or the "Bayesian Analysis report on prior distributions" or the "AJPS study on incumbency advantage" or whatever.

Posted by Andrew at 12:35 AM | Comments (32) | TrackBack

October 12, 2006

Sweden is not Finland

I came across this:

While some Scandinavian countries are known to have high levels of suicide, many of them – including Sweden, Finland and Iceland – ranked in the top 10 for happiness. White believes that the suicide rates have more to do with the very dark winters in the region, rather than the quality of life.

Jouni's response:

Technically it's correct - "While *some* Scandnavian countries ... have high levels of suicide ... Sweden, Finland and Iceland ranked in the top 10 for happiness..."

That "some Scandinavian country" is Finland; Sweden (or Iceland - surprisingly) has roughly 1/2 the suicide rate of Finland.

Posted by Andrew at 12:49 AM | Comments (5) | TrackBack

October 3, 2006

Family demography and public policy seminar schedule, fall 2006

Here's the listing for the Family Demography and Public Policy Seminar this semester:

Fall 2006, Tuesdays 1:00 - 2:15
School of Social Work
1255 Amsterdam Ave
Room 1109

Sept 12 Julien Teitler, Assistant Professor of Social Work, CUSSW. “Mental Illness as a Barrier to Marriage among Mothers with Out-of-Wedlock Births.”
Sept 19 Irv Garfinkel, Mitchell I. Ginsburg Professor of Contemporary Urban Problems, CUSSW. “The American Welfare State: Laggard or Leader.”
Sept 26 Rodolfo De La Garza, Eaton Professor of Administrative Law and Municipal Science, Political Science, CU. “Expanding or changing the mainstream: Latino political incorporation.”
Oct 3 Elliott Sclar, Professor of Urban Planning and International Affairs, CU. Title: TBA.
Oct 10 Bentley Macleod, Professor of Economics, International and Public Affairs and Law, CU. “First Do No Harm?:Tort Reform and Birth Outcomes.” (w/co-author, Janet Currie).
Oct 17 Luisa Borrell, Assistant Professor of Epidemiology, Mailman School of Public Health. Title: TBA.
Oct 24 Lenna Nepomnyaschy, Associate Research Scientist, CUSSW. “Socioeconomic Gradients in Early Child Health.”
Oct 31 Gordon Berlin, President, MDRC. “Rewarding the Work of Single Adults: A Counterintuitive Approach to Reducing Poverty and Strengthening Families.”
Nov 7 Election Day, NO SEMINAR
Nov 14 David Weiman, Alena Wels Hirschorn '58 Professor of Economics, Barnard. “The Social Effects of Mass Incarceration: A Labor Market Perspective.”
Nov 21 Robert Lieberman, Associate Professor of Political Science, CU. Title: TBA.
Nov 28 Doug Almond, Assistant Professor of Economics, CU. Title: TBA.
Dec 5 Ronald Mincy, Maurice V. Russell Professor of Social Policy and Social Work Practice, CUSSW. Title: TBA.
Dec 12 Fred Ssewamala, Assistant Professor of Social Work, CUSSW. Title: TBA.

Posted by Andrew at 7:11 AM | Comments (0) | TrackBack

September 13, 2006

Should you wear a bicycle helmet?

Rebecca pointed me to this interesting article by Ben Hoyle in the London Times, "Helmeted cyclists in more peril on the road." Hoyle writes:

Cyclists who wear helmets are more likely to be knocked off their bicycles than those who do not, according to research.

Motorists give helmeted cyclists less leeway than bare-headed riders because they assume that they are more proficient. They give a wider berth to those they think do not look like “proper” cyclists, including women, than to kitted-out “lycra-clad warriors”.

Ian Walker, a traffic psychologist, was hit by a bus and a truck while recording 2,500 overtaking manoeuvres. On both occasions he was wearing a helmet.

During his research he measured the exact distance of passing traffic using a computer and sensor fitted to his bicycle.Half the time Dr Walker, of the University of Bath, was bare-headed. For the other half he wore a helmet and has the bruises to prove it.

He even wore a wig on some of his trips to see if drivers gave him more room if they thought he was a woman. They did.

He was unsure whether the protection of a helmet justified the higher risk of having a collision. “We know helmets are useful in low-speed falls, and so are definitely good for children.”

On average, drivers steered an extra 3.3 in away from those without helmets to those wearing the safety hats. Motorists were twice as likely to pass “very close” to the cyclist if he was wearing a helmet.

Not just risk compensation

This is interesting: I was aware of the "risk compensation" idea, that helmeted riders will ride less safely, thus increasing the risk of accident (although the accident itself may be less likely to cause serious injury), as has been claimed with seat belts, antilock brakes, and airbags for cars. (If it were up to me, I would make car bumpers illegal, since they certainly seem to introduce a "moral hazard" or incentive to drive less carefully.)

But I hadn't thought of the idea that the helmet could be providing a signal to the driver. From the article, it appears that the optimal solution might be a helmet, covered by a wig . . .

The distinction between risk compensation altering one's own behavior, and perceptions altering others' behavior, is important in making my own decision. On the other hand, my small n experience is that I have a friend who was seriously injured after crashing at low speed with no helmet. So it's tricky for me to put all the information together in making a decision.

Attitudes

The news article concludes with,

He [Walker] said: “When drivers overtake a cyclist, the margin for error they leave is affected by the cyclist’s appearance. Many see cyclists as a separate subculture.

“They hold stereotyped ideas about cyclists. There is no real reason to believe someone with a helmet is any more experienced than someone without.”

I don't know the statistics on that, but I do think there's something to this "subculture" business. People on the road definitely seem to have strong "attitudes" to each other based on minimal information.

Self-experimentation

Finally, Rebecca pointed out that this is another example of self-experimentation. As with Seth's research, the self-experimenter here appears to have a lot of expert knowledge to guide his theories and data collection. Also amusing, of course, is that his name is Walker.

Posted by Andrew at 12:39 AM | Comments (10) | TrackBack

September 11, 2006

Verb

Carrie writes,

File this one under News of the Weird:

Health Journal: Hip government exercise campaign looks for its next move

The story is about the apparent success of the Center for Disease Control's "verb" ad campaign -- designed to fight obesity among children and teens. A recent study in the journal Pediatrics found that kids who had seen the Verb campaign "reported one-third more physical activity during their free time than kids who hadn't."

Carrie expresses skepticism since it's hard to see that cryptic ads could really make such a difference in bahavior. In addition, it's an observational study: the ads were shown everywhere, then they compared kids who recalled seeing the ads to kids who didn't. They did a baseline study, so they could control for pre-treatment level of exercise, but they didn't do much on this. I would have liked to see scatterplots and regressions.

Here's the article in the journal Pediatrics reporting the comparison of exercise levels for kids who recalled or didn't recall the ad campaign. Perhaps an interesting example for a statistics or policy analysis class. As usual, I'm not trying to shoot down the study, just to point out an interesting example of scientific ambiguity. I'd think there's lots of potential for discussion about how a future such study could be conducted.

Posted by Andrew at 8:00 AM | Comments (0) | TrackBack

August 30, 2006

Problems in a study of girl and boy births, leading to a point about the virtues of collaboration

I was asked by a reporter to comment on a paper by Satoshi Kanazawa, "Beautiful parents have more daughters," which is scheduled to appear in the Journal of Theoretical Biology.

As I have already discussed, Kanazawa's earlier papers ("Engineers have more sons, nurses have more daughters," "Violent men have more sons," and so on) had a serious methodological problem in that they controlled for an intermediate outcome (total number of children). But the new paper fixes this problem by looking only at first children (see the footnote on page 7).

Unfortunately, the new paper still has some problems. Physical attractiveness (as judged by the survey interviewers) is measured on a five-point scale, from "very unattractive" to "very attractive." The main result (from the bottom of page 8) is that 44% of the children of surveyed parents in category 5 ("very attractive") are boys, as compared to 52% of children born to parents from the other four attractiveness categories. With a sample size of about 3000, this difference is statistically significant (2.44 standard errors away from zero). I can't confirm this calculation because the paper doesn't give the actual counts, but I'll assume it was done correctly.

Choice of comparisons

Not to be picky on this, though, but it seems somewhat arbitrary to pick out category 5 and compare it to 1-4. Why not compare 4 and 5 ("attractive" or "very attractive") to 1-3? Even more natural (from my perspective) would be to run a regression of proportion boys on attractiveness. Using the data in Figure 1 of the paper:

> attractiveness <- c (1, 2, 3, 4, 5)
> percent.boys <- c (50, 56, 50, 53, 44)
> display (lm (percent.boys ~ attractiveness))
lm(formula = percent.boys ~ attractiveness)
coef.est coef.se
(Intercept) 55.10 4.56
attractiveness -1.50 1.37
n = 5, k = 2
residual sd = 4.35, R-Squared = 0.28

So, having a boy child is negatively correlated with attractiveness, but this is not statistically significant. (Weighting by the approximate number of parents in each category, from Figure 2, does not change this result.) It would not be surprising to see a correlation of this magnitude, even if the sex of the child were purely random.

But what about the comparison of category 5 with categories 1-4? Well, again, this is one of many comparisons that could have been made. I see no reason from the theory of sex ratios (admittedly, an area on which I am no expert) to pick out this particular comparison. Given the many comparisons that could be done, it is not such a surprise that one of them is statistically significant at the 5% level.

Measuring attractiveness?

I have little to say about the difficulties of measuring attractiveness except that, according to the paper, interviewers in the survey seem to have assessed the attractiveness of each participant three times over a period of several years. I would recommend using the average of these three judgments as a combined attractiveness measure. General advice is that if there is an effect, it should show up more clearly if the x-variable is measured more precisely. I don't see a good reason to use just one of the three measures.

Reporting of results

The difference ireported in this study was 44% compared to 52%--you could say that the most attractive parents in the study were 8 percentage points more likely than the others to have girls. Or you could say that they were .08/.52=15% more likely to have girls. But on page 9 of the paper, it says, "very attractive respondents are about 26% less likely to have a son as the first child." This crept up to 36% in this news article, which was cited by Stephen Dubner on the Freakanomics blog.

Where did the 26% come from? Kanazawa appears to have run a logistic regression of sex of child on an indicator for whether the parent was judged to be very attractive. The logistic regression coefficient was -0.31. Since the probabilities are near 0.5, the right way to interpret the coefficient is to divide it by 4: -0.31/4=-0.08, thus an effect of 8 percentage points (which is what we saw above). For some reason, Kanazawa exponentiated the coefficient: exp(-0.31)=0.74, then took 0.74-1=-0.26 to get a result of 26%. That calculation is inappropriate (unless there is something I'm misunderstanding here). But, of course, once it slipped past the author and the journal's reviewers, it would be hard for a reporter to pick up on it.

Coauthors have an incentive to catch mistakes

I'm disappointed that Kanazawa couldn't find a statistician in the Interdisciplinary Institute of Management where he works who could have checked his numbers (and also advised him against the bar graph display in his Figure 1, as well as advised him about multiple hypothesis testing). Just to be clear on this: we all make mistakes, I'm not trying to pick on Kanazawa. I think we can all do better by checking our results with others. Maybe the peer reviewers for the Journal of Theoretical Biology should've caught these mistakes, but in my experience there's no substitute for adding someone on as a coauthor, who then has a real incentive to catch mistakes.

Summary

Kanazawa is looking at some interesting things, and it's certainly possible that the effects he's finding are real (in the sense of generalizing to the larger population). But the results could also be reasonably explained by chance. I think a proper reporting of Kanazawa's findings would be that they are interesting, and compatible with his biological theories, but not statistically confirmed.

My point in discussing this article is not to be a party pooper or to set myself up as some sort of statistical policeman or to discourage innovative work. Having had this example brought to my attention, I was curious enough to follow it up, and then I wanted to share my newfound understanding with others. Also, this is a great example of multiple hypothesis testing for a statistics class.

Posted by Andrew at 12:17 AM | Comments (11) | TrackBack

August 9, 2006

Randomized experimentation and foreign aid

There's an article by Abhijit Vinayak Banerjee in the Boston Review recommending randomized experiments (or the next best thing, "natural experiments") to evaluate stragies for foreign aid. Also, here's a link to the Boston Review page which includes several discussions by others and a response by Banerjee.

On the specific topic of evaluating social interventions, I have little to add beyond my coments last year on Esther Duflo's talk: randomized experimentation is great, but once you have the randomized (or "naturally randomized") data, it still can be a good idea to improve your efficiency by gathering background inforomation and using sophisticated statistical methods to adust for imbalance. To quote myself on Dfulo's talk:

There are a couple ways in which I think the analysis could be improved. First, I'd like to control for pre-treatment measurements at the village level. Various village-level information is available from the 1991 Indian Census, including for example some measures of water quality. I suspect that controlling for this information would reduce the standard errors of regression coefficients (which is an issue given that most of the estimates are less than 2 standard errors away from 0). Second, I'd consider a multilevel analysis to make use of information available at the village, GP, and state levels. Duflo et al. corrected the standard errors for clustering but I'd hope that a full multilevel analysis could make use of more information and thus, again, reduce uncertatinties in the regression coefficients.

Why don't we practice what we preach?

Nonetheless, I am not sure myself that large-N studies are always a good idea. And, in practice, I rarely do any sort of formal experimentation when evaluating interventions in my own activities. Here I'm particularly thinking of teaching methods, where we try all sorts of different things but have difficulty evaluating what works. I certainly do make use of the findings of educational researchers (many of whom, I'm sure, use randomized experiments), but when I try things out myself, I don't ever seem to have the discipline to take good measurements, let alone set up randomized trials. So in my own professional life, I'm just as bad as the aid workers who Banerjee criticizes for not filliong out forms.

This is not meant as a criticizm of Banerjee's paper, just a note that it seems easier to give statistical advice to others than to follow it ourselves.

Posted by Andrew at 12:01 AM | Comments (0) | TrackBack

August 8, 2006

Fascinating talk by Hans Rosling

Albyn Jones sent me this link by Hans Rosling, the founder of Gapminder. It's a great demonstration of statistical visualization. I'd like to use it to start off my introductory statistics classes--except then the students would probably be disappointed that my lectures aren't as good...

Posted by Andrew at 7:10 AM | Comments (1) | TrackBack

August 7, 2006

Pooling of data

Some good news:

The Bill and Melinda Gates Foundation, run by the chairman of the Microsoft Corporation, will deliver $287 million in five-year grants to researchers working to produce an AIDS vaccine. The caveat: Grantees must agree to pool their results. Fragmented and overlapping work in the area of AIDS research has hindered progress toward a vaccination for the virus that affects 40 million people around the world.... A web site will share data in real time.

More at The Wall Street Journal and at YaleGlobalOnline.

Hopefully this will push the work towards my vision of the interactive analysis of data through the internet instead of the current model of only publishing the not-always-reproducible results of the analysis. See my previous postings on statistical data.

Posted by Aleks Jakulin at 12:22 PM | Comments (4) | TrackBack

June 5, 2006

Graphs instead of tables

I came across this paper. Could someone please convert all the tables into graphs? Thank you.

Posted by Andrew at 1:12 AM | Comments (0) | TrackBack

May 19, 2006

Why didn't he sue the lawyer right back?

In an interesting article on medical malpractice in the New Yorker (14 Nov 2005, p.63), Atul Gawande writes,

I have a crazy-lawsuit story of my own. In 1990, while I was in medical school, I was at a crowded Cambridge bus stop and an elderly woman tripped on my foot and broke her shoulder. I gave her my phone number, hoping that she would call me and let me know how she was doing. She gave the number to a lawyer, and when he found out that it was a medical-school exchange he tried to sue me for malpractice, alleging that I had failed to diagnose the woman's broken shoulder when I was trying to help her. (A marshal served me with a subpoena in physiology class.) When it became apparent that I was just a first-week medical student and hadn't been treating the woman, the court disallowed the case. The lawyer then sued me for half a million dollars, alleging that I'd run his client over with a bike. I didn't even have a bike, but it took a year and a half-and fifteen thousand dollars in legal fees-to prove it.

This made me wonder: why didn't Gawande sue that lawyer right back? If he didn't have a bike, that seems like pretty good evidence that they were acting fradulently, maliciously, etc. I could see why he might want to just let things slide, but it it took a year and a half and $15,000, wouldn't it make sense to sue him back? I'm sure I'm just revealing my massive ignorance about the legal system by asking this.

P.S. Gawande's article is of statistical and policy interest, as he discusses the cost-benefit issues of medical malpractice law.

Posted by Andrew at 4:17 PM | Comments (3) | TrackBack

May 16, 2006

Cost-benefit

The Federal Reserve Bank of Minneapolis has an interesting article by Douglas Clement from 2001 about cost-benefit analysis in pollution regulation.

I'm generally a fan of cost-benefit analysis and related fields, for use in setting government policies. Andrew, Chia-Yu Lin, Dave Krantz, and I once wrote a decision-analysis paper with a large cost-benefit component, that someone said is "the best paper I have seen on decision-making under uncertainty in a public health context" (thanks, mom!). But this article mentions the fact that the Clean Air Act explicitly forbids costs from being considered in setting pollutant standards, and then goes on to discuss this seemingly ridiculous fact...in a way that basically convinced me that eh, maybe cost-benefit analysis in this context isn't necessary after all.

The usual problems are cited: it's hard to figure out how to evaluate some costs and some benefits in terms of dollar costs (or any other common scale), there is no agreed "value of a life", yada yada. Standard stuff. In the aforementioned decision analysis paper, we avoided many of these issues by comparing several policies (for radon monitoring and mitigation), including the current one; thus, we could find policies that save more lives for less money, and be confident that they're better than the current one, without having to claim that we have found the optimum. But if you're trying to do something new, like set a maximum value for a pollutant that has never previously been regulated, then our approach to avoiding the "value of a life" issue won't work.

But in addition to the usual suspects for criticizing cost-benefit analysis, the article mentions a few others. For instance:

(1) "Potential benefits of a policy are often calculated on the basis of surveys of people's willingness to pay for, say, a better view of the Grand Canyon, an extra year of life or avoidance of cancer. But critics argue that willingness to pay hinges, in part, on ability to pay, so that cost-benefit analysis is fundamentally dependent on the equity issues it professes to avoid."

(2) "Indeed, Breyer, in his American Trucking concurrence, said that the technology-forcing intent of laws like the Clean Air Act makes determining costs of implementation “both less important and more difficult. It means that the relevant economic costs are speculative, for they include the cost of unknown future technologies. It also means that efforts to take costs into account can breed time-consuming and potentially unresolvable arguments about the accuracy and significance of cost estimates.” In short, it makes cost-benefit analysis itself less likely to pass a cost-benefit test."

(3) "In an article titled “The Rights of Statistical People,” Heinzerling argues that analysts have created the entity of a “statistical life” in order to facilitate cost-benefit analysis, but the concept essentially strips those lives of their human rights. We don't allow one person to kill another simply because it's worth $10 million to the killer to see that person dead and because society may measure that person's worth at less than $10 million, she argues; then why should regulatory policy be based on a similar principle?"

Item (1) could be handled within a cost-benefit analysis (by defining an 'equity score' and assigning a trade-off between dollars and equity, for example); item (2) just suggests that cost-benefit analysis can be so uncertain and so expensive that it's not worth the effort, but that doesn't seem true for regulations with implications in the billions of dollars --- heck, I'll do a cost-benefit analysis for 1/1000 of that, and that's a real offer. Item (3) is a real ethical question that challenges the heart of cost-benefit analysis, and I'll need to think about it more.

I'm tempted to go on and list items (4), (5), and (6), but read the article yourself. Among the people quoted is a Columbia economics prof, by the way. (In case you, the reader, don't know: Andrew teaches at Columbia).

Overall, my view seems to be fairly close to that of someone quoted in the article:

====

“My own justification for using cost-benefit analysis in common-law decision-making,” wrote Richard Posner, “is based primarily ... on what I claim to be the inability of judges to get better results using any alternative approach.” And he recommends that the acknowledged moral inadequacy of the Kaldor-Hicks criteria—its neglect of distributional equity—should be addressed by simply employing cost-benefit as a tool for informing policymakers, not as a decision-making imperative.

“This may seem a cop-out,” Posner admitted, “as it leaves the government without a decision rule and fails to indicate how cost-benefit analysis is to be weighted when it is merely a component of the decision rule.” But “I am content to allow the usual political considerations to reinforce or override the results of the cost-benefit analysis. If the government and the taxpayer and the voter all know—thanks to cost-benefit analysis—that a project under consideration will save 16 sea otters at a cost of $1 million apiece, and the government goes ahead, I would have no basis for criticism.”

=====

Of course, more typically we would think that the project will save between 3 and 40 sea otters at a cost of $200K to $6 million each, or whatever. But I agree with the general idea.

Posted by Phil at 3:26 PM | Comments (2) | TrackBack

Matter, meet antimatter

P.S. I was pleased to see Seth's book at #101 on Amazon today. Only 91 slots below the Twinkies Cookbook (and 5 below the Sonoma Diet and 3 below the South Beach Diet, but I'm sure that will change...).

Posted by Andrew at 12:51 AM | Comments (1) | TrackBack

April 18, 2006

Americans see weight problems everywhere but in the mirror

A Pew Research Center study (linked to from sycophant) finds a "Lake Wobegon effect" in how Americans perceive obesity:

Americans believe their fellow Americans have gotten fat. They consider this a serious national problem. But when they think about weight, they appear to use different scales for different people.

Nine-in-ten American adults say most of their fellow Americans are overweight. But just seven-in-ten say this about "the people they know." And just under four-in-ten (39%) say they themselves are overweight.

These sliding assessments are drawn from a Pew Research Center telephone survey conducted from February 8 through March 7 among a randomly selected, representative national sample of 2,250 adults.

The survey finds that most Americans, including those who say they are overweight, agree that personal behavior - rather than genetic disposition or marketing by food companies - is the main reason people are overweight. In particular, the public says that a failure to get enough exercise is the most important reason, followed by a lack of willpower about what to eat. About half the public also says that the kinds of foods marketed at restaurants and grocery stores are a very important cause, and roughly a third say the same about the effect of genetics and heredity.

The report is interesting (although they could do much much better in their data displays). It reminds me of the surveys showing misperception of the populations of ethnic minorities and immigrants.

Posted by Andrew at 8:10 AM | Comments (0) | TrackBack

April 13, 2006

A case of cost-benefit analysis

Through my involvement in arsenic research, I heard about the following news item (see below). It's an interesting cost-benefit case, especially since lots of people drink bottled water even though their local water supply is completely safe. I think I would prefer that my town put in the effort to have lower arsenic, but perhaps I'm just too attuned to the arsenic problem because I work in that area. But my favorite part of the article is that Dickensian name, Benjamin H. Grumbles.

EPA May Weaken Rule on Water Quality Plan Would Affect Towns That Find Complying Costly By Juliet Eilperin Washington Post Staff Writer Saturday, April 1, 2006; A04

The Environmental Protection Agency is proposing to allow higher levels of contaminants such as arsenic in the drinking water used by small rural communities, in response to complaints that they cannot afford to comply with recently imposed limits.
The proposal would roll back a rule that went into effect earlier this year and make it permissible for water systems serving 10,000 or fewer residents to have three times the level of contaminants allowed under that regulation.
About 50 million people live in communities that would be affected by the proposed change. In the case of arsenic, the most recent EPA data suggest as many as 10 million Americans are drinking water that does not meet the new federal standards.
Benjamin H. Grumbles, assistant administrator for EPA's Office of Water, said the agency was trying to satisfy Congress, which instructed EPA in 1996 to take into account that it costs small rural towns proportionately more to meet federal drinking water standards.
"We're taking the position both public health protection and affordability can be achieved together," Grumbles said in an interview this week. "When you're looking at small communities, oftentimes they cannot comply with the [current] standard."
But Erik Olson, a senior lawyer for the advocacy group Natural Resources Defense Council, called the move a broad attack on public health.
"It could have serious impacts on people's health, not just in small-town America," Olson said. "It is like overturning the whole apple cart on this program."
The question of how to regulate drinking water quality has roiled Washington for years. Just before leaving office, President Bill Clinton imposed a more stringent standard for arsenic, dictating that drinking water should contain no more than 10 parts per billion of the poison, which in small amounts is a known carcinogen. President Bush suspended the standard after taking office, but Congress voted to reinstate it, and in 2001, the National Academy of Sciences issued a study saying arsenic was more dangerous than the EPA had previously believed. The deadline for water systems to comply with the arsenic rule was January of this year.
The proposed revision was unveiled in early March in the Federal Register and is subject to public comment until May 1. Administration officials said the number of comments they receive will determine when it would take effect.
EPA's new proposal would permit drinking water to have arsenic levels of as much as 30 parts per billion in some communities. This would have a major effect on states such as Maryland and Virginia, which have struggled in recent months to meet the new arsenic rule.
Last summer, the Virginia Department of Health estimated that 11 well-based water systems serving 9,500 people in Northern Virginia might not meet the new standard for arsenic.
Maryland has a high level of naturally occurring arsenic in its water, and its Department of the Environment has estimated that 37 water systems serving more than 26,000 people now exceed the 10-parts-per-billion arsenic limit. These include systems serving several towns as well as individual developments, mobile home parks, schools and businesses in Dorchester, Caroline, Queen Anne's, Worcester, Garrett, St. Mary's and Talbot counties.
General Manager George Hanson's Chesapeake Water Association in Lusby, Md., serves 4,000 town residents with four wells. Three of them meet the new arsenic standard, but one well has 14 parts per billion in its water. He estimated that cleaning it up would cost between $1 million and $4 million.
"It's some of the most beautiful water I've ever seen. The arsenic is the only thing that fouls the entire system," Hanson said, adding that he and other community water suppliers are hoping the new EPA proposal will offer them a way out. "They're waiting for someone to help them."
Under the Safe Drinking Water Act Amendments of 1996, complying with federal drinking water standards is not supposed to cost water systems more than 2.5 percent of the median U.S. household income, which in 2004 was $44,684, per household served. That means meeting these standards should not cost more than $1,117 per household.
Under EPA's proposal, drinking water compliance could not cost more than $335 per household.
Several public officials and environmental experts said they were just starting to review the administration's plan, but some said they worry that it could lead to broad exemptions from the current federal contaminant standards cities and larger towns must also meet. Besides arsenic, other water contaminants including radon and lead pose a health threat in some communities.
James Taft, executive director of the Association of State Drinking Water Administrators, said he and others are concerned that the less stringent standard will "become the rule, rather than the exception" if larger communities press for similar relief.
Avner Vengosh, a geochemistry and hydrology professor at Duke University's Nicholas School of the Environment and Earth Sciences, said he was surprised by the administration's proposal because North Carolina officials are trying to keep arsenic levels as low as 2 parts per billion.
"It's a bit ironic you have this loosening in the EPA standard when local authorities are making it more stringent," Vengosh said, adding that many rural residents "have no clue what they have in the water."
National Rural Water Association analyst Mike Keegan, who backs the administration's proposal, said the current rule is based on what contaminant levels are economically and technically feasible, rather than what is essential to preserve public health.
The administration may face a fight on Capitol Hill over the proposal. Rep. Henry A. Waxman (D-Calif.), who helped write the 1996 law, said EPA's proposal, "if finalized, would allow weakened drinking water standards, not just in rural areas, but in the majority of drinking water systems in the United States."

Posted by Andrew at 12:39 AM | Comments (0) | TrackBack

April 12, 2006

The big bad IRB

See Seth's comment here.

P.S. And also Phil's comment here.

Posted by Andrew at 3:49 AM | Comments (0) | TrackBack

April 11, 2006

Rat experiments to test the Shangri-La diet?

I wonder if Seth has considered performing rat experiments on his diet? I remember Seth telling me that rat experiments are pretty cheap (i.e., you can do them without getting external grant funding), and he discusses in his book how much he's learned from earlier rat experiments.

Posted by Andrew at 8:57 AM | Comments (2) | TrackBack

April 6, 2006

Lose weight effortlessly through the Shangri-La diet?

Seth Roberts's book, The Shangri-La Diet: The No Hunger Eat Anything Weight-Loss Plan, is out.. Maybe I can be the first person to review it. (I've known Seth for over 10 years; we co-taught a class at Berkeley on left-handedness.)

Seth figured out his basic idea--that drinking unflavored sugar water lowers his "setpoint," thus making it easy for him to lose weight--about 10 years ago, following several years of self-experimentation (see here for a link to his article). Since then, he's tried it on other people, apparently with much success, and generalized it to inclde driking unflavored oil as a different option for keeping the setpoint down every day.

The book itself describes the method, and the theory and experimental evidence behind it. It seems pretty convincing to me, although I confess I haven't tried the diet myself. I suppose that thousands will, now that the book has come out. If it really is widely successful, I'll just have to say that I'm impressed with Seth for following this fairly lonely research path for so many years. I had encouraged him to try to study the diet using a controlled experiment, but who knows, maybe this is a better approach, The unflavored-oil option seems to be a good addition, in making the diet less risky for diabetics.

Some other random notes:

1. I like the idea of a moving setpoint. Although then maybe the word "setpoint" is inappropriate?

2. The book is surprisingly readable, given that I already knew the punchline (the diet itelf). A bit like the book "Dr. Jekyll and Mr. Hyde," which is actually suspenseful, even though you know from the beginning that they're the same guy.

3. In the appendix, Seth describes some published research that influenced his work. The researchers were from fairly obscure places--Laval University, Brooklyn College, and Monell Chemical Sciences Institute. Perhaps this is because animal nutrition research is an obscure field that flourishes in out-of-the-way places? Or perhaps because there are literally millions of scientific researchers around the world, and it's somewhat random who ends up at the "top" places?

4. Near the end of the book, Seth discusses ways in which the food industry could profit from his dieting insights and make money offering foods that lower the setpoint. That's a cool idea--to try to harness these powerful forces in society to move in that direction.

5. With thousands of people trying this diet, will there be a way to monitor its success? Or maybe now, some enterprising researchers will do a controlled experiment. It really shouldn't be difficult at all to do such a study; perhaps it could be a good project for some class in statistics or psychology or nutrition.

More

P.S. See Alex's blurb here, which I guess a few thousand more people will notice. I'm curious what Alex (and others) think about my point 5 above. In a way, you could say it's a non-issue, since each individual person can see if the diet works for him or her. But for scientific understanding, if nothing else, I think it would be interesting to learn the overall effectiveness (or ineffectiveness) of the diet.

P.P.S. Regarding point 1 above, Denis Cote writes,

Indeed, there is some talk about a settling point which is a more appropriate label. (see Pinel et al 2000. Hunger, Eating, and Ill Health, American Psychologist. 55(10), 1105-1116.

I'll have to take a look. The American Psychologist is my favorite scientific journal in the sense of being enjoyable and interesting to read.

P.S. See here, here, and here for more on the book.

Posted by Andrew at 12:01 PM | Comments (10) | TrackBack

Why Do Europeans Smoke More than Americans?

From Freakanomics, a link to this paper by David Cutler and Edward Glaeser. Here's the abstract:

While Americans are less healthy than Europeans along some dimensions (like obesity), Americans are significantly less likely to smoke than their European counterparts. This difference emerged in the 1970s and it is biggest among the most educated. The puzzle becomes larger once we account for cigarette prices and anti-smoking regulations, which are both higher in Europe. There is a nonmonotonic relationship between smoking and income; among richer countries and people, higher incomes are associated with less smoking. This can account for about one-fifth of the U.S./Europe difference. Almost one-half of the smoking difference appears to be the result of differences in beliefs about the health effects of smoking; Europeans are generally less likely to think that cigarette smoking is harmful.

This is an interesting problem, partly because evidence suggests that anti-smoking programs are ineffective, but there is a lot of geographic variation in smoking rates across countries, as well as among U.S. states (and among states in India too, as S. V. Subramanian has discussed). It's encouraging to think that better education could reduce smoking rates.

Amazingly, "in Germany only 73 percent of respondents said that they believed that smoking causes cancer." Even among nonsmokers, only 84% believed smoking caused cancer! I'd be interested in knowing more about these other 16%. (I guess I could go to the Eurobarometer survey and find out who they are.)

?

There are a couple things in the paper I don't quite follow. On page 13, they write, "there is a negative 38 percent correlation coefficient between this regulation index and the share of smokers in a state. A statistically significant negative relationship result persists even when we control for a wide range of other controls including tobacco prices and income. As such, it is at least possible that greater regulation of smoking in the U.S. might be a cause of the lower smoking rate in America." But shouldn't they control for past smoking rates? I imagine the regulations are relatively recent, so I'd think it would be appropriate to compare states that are comparable in past smoking rates but differ in regulations.

Finally, there's some speculation at the end of the paper about how it is that Americans have become better informed than Europeans about the health effects of smoking. They write, "while greater U.S. entrepreneurship and economic openness led to more smoking during an earlier era (and still leads to more obesity today), it also led to faster changes in beliefs about smoking and ultimately less cigarette consumption." I don't really see where "entrepreneurship" comes in. I had always been led to believe that a major cause of the rise in smoking in the U.S. in the middle part of the century was that that the GI's got free smokes in their rations. For anti-smoking, I had the impression that federalism had something to do with it--the states got the cigarette taxes, and the federal government was free to do anti-smoking programs. But I'm not really familiar with this history so maybe there's something I'm missing here.

P.S. Obligatory graphics comments:

Figures 1 and 2 should be a scatterplot. Figure 3 should lose the horizontal lines, display every 20 years (not every 10) on the x-axis, and rescale the y-axis to be in cigarettes per day (rather than per year, as it appears to be). Figure 4 should be square, with both axes on a common scale, and just label all the countries (not a problem if the axes are limited to the range of the data (basically, 20%-50%, not 10-60), also remove those distracting horizontal lines, Figure 5 should also remove the horizontal lines (and use the same y-scale as the new figure 4, and give the country names), Figure 6 should label all 50 states (with 2-letter abbreviations) and jitter the points slightly in the x-direction (also restrict the y-range to that of the data and, again, remove those lines), ditto on Figure 7. There's also a problem with Figures 6 and 7. According to Fig 7, the US smoking rate is 19%, but according to Fig 6, the smoking rate is above 20% for all but two of the states. What am I missing here? Fig 8, again, would be improved by removing those horizontal lines and restricting the x and y-axes to the range of the data (thus giving room for more data). Same for Figures 9 and10 (and, again, use 2-letter abbreviations, also do income on the log scale to be consistent with Figure 8). And Fig 11 (this time, spell out the country names and ditch the dots--there will be plenty of room when the graph has been rescaled. Table 1 should be in some natural order (e.g., increasing income, or increasing smoking rate), not alphabetical (as Howard Wainer and others have emphasized). Actually I'd prefer it as a graph, but I won't press the point. Similarly for Tables 2,3,4 (actually, I'd put them all in 1 table (if not a graph)) so that the infor can be better compared.

Anyway, it's a fascinating paper and I'm sure will inspire lots of analysis of individual-level survey data.

Posted by Andrew at 12:46 AM | Comments (1) | TrackBack

February 8, 2006

Do low-fat diets have "significant' benefits?

There have been several news stories about a recently completed study of the effects of a low-fat diet on heart disease, colon cancer, and breast cancer. One of those stories is here. Some excerpts:

The large primary prevention trial conducted at 40 U.S. clinical centers from 1993 to 2005 enlisted 48,835 postmenopausal women ages 50 to 79 years.


Women in all three trials were randomly assigned to an intervention group (40%, n=19,541) with intensive behavior modification sessions and the goal of reducing fat intake to 20% of energy and increasing vegetable and fruit intake to at least five servings a day and grains to six servings.


Participants in a comparison group (60%, n= 29,294) were not asked to make changes...

So far, so good: we know the sample sizes, we know something about the interventions. What about the results? The article goes on to say:

Over an 8.1-year average follow-up, a low-fat diet did not lead to a significant reduction in invasive breast cancer risk, the researchers reported.


The number of women (annualized incidence rate) who developed breast cancer was 655 (0.42%) in the intervention group and 1,072 (0.45%) in the comparison group (hazard ratio 0.91; 95% confidence interval 0.83-1.01).

For starters, 655 cases out of 19541 women in the intervention group is 3.3%, not 0.42%, so I'm not sure quite what is going on with the numbers. Presumably something got lost in translation from the scientific publication to the news article.

But the thing that really strikes me is that the 95% confidence interval for breast cancer prevention just barely includes 1.00, or no effect. So the news stories all say some variation of "there is no significant benefit" from a low-fat diet. I suspect, though of course we will never know for sure, that if there had been just a couple fewer cases in the intervention group, or a couple more in the comparison group, so that the 95% confidence interval topped out at 0.99 rather than 1.01, the news articles would have trumpeted "Significant reduction in breast cancer from a low-fat diet." This is an issue Andrew touched on a few weeks ago on this blog.

Like a lot of people who perform statistical analyses, I've always been unhappy with "statistical significance" (the term, and the concept) for two reasons: (1) in common speech, "significant" means "important", but that is not true in stat-speak, where a parameter value can be "statistically significant" even though it is of no practical importance, and can be of great practical importance even if it is not "statistically significant" (the latter being the case if sample sizes or other sources of uncertainty are so large that even important effects can be missed); and (2) when did God say that 95% is the threshold for "statistical significance"? What's so magic about 1/20, as opposed to 1/18 or 1/21 or anything else for that matter?

In the current study, there is mild evidence for a small effect on breast cancer; the best guess (from this study alone) would suggest that a low-fat diet, of the type they tested, reduces breast cancer incidence by something between 5% and 15%, although larger or smaller benefits (or even a small penalty) cannot be ruled out. I wish reporters would put it that way, rather than declaring that the result did or did not meet some arbitrary (though admittedly customary) "significance" standard.

Posted by Phil at 3:38 PM | Comments (4) | TrackBack

January 28, 2006

Composite sampling and the safety of canned tuna

Seth forwarded me an article from the Chicago Tribune on mercury poisoning in fish. Near the end of the article, there's a mini-debate about composite sampling, which brought back some memories, since I did a project in 1994 on composite sampling of fish.

I'll post the relevant bits from my article and then give my comments.

FDA tests show risk in tuna U.S. agency finds high mercury levels in some cans and in samples of Chilean sea bass By Sam Roe and Michael Hawthorne Tribune staff reporters

January 27, 2006

Newly released government data provide the best evidence to date that some cans of light tuna--one of America's favorite seafoods--contain high levels of mercury.

Testing by the Food and Drug Administration found that 6 percent of canned light tuna samples contained large amounts of mercury, a toxic metal that can cause learning disabilities in children and neurological problems in adults.
. . .

In the 216 samples of canned light tuna tested by the FDA, the mercury levels averaged 0.12 parts per million, in line with previous limited testing and well below the legal limit of 1.0 parts per million. But 12 samples exceeded 0.35 parts per million, an amount the government considers high. When the Tribune recently tested 36 cans of the same type of canned tuna, none of the samples exceeded that level. The discrepancy might be due to the difference in sample size or because mercury levels can vary widely in all fish.

When asked about the FDA's latest testing results on light tuna, an agency official said consumers should not be concerned that 6 percent of canned light tuna tested high in mercury. What's important, the official said, is that on average, such tuna tested relatively low.
. . .


The U.S. Tuna Foundation, the industry's leading lobbying group, said the FDA's new data actually confirm the safety of canned light tuna.
. . .

"FDA's latest findings about mercury levels in canned tuna should end the debate over whether canned tuna is a safe and healthy food for all Americans," David Burney, the foundation's executive director, said in a statement. "No one is at risk from the minute amounts of mercury in any form of canned tuna."
. . .

In 2004, the FDA and the U.S. Environmental Protection Agency jointly warned high-risk consumers to eat no more than 6 ounces of albacore canned tuna per week because of high mercury levels.

Even if women of childbearing age and young children followed that suggestion, the EPA's own calculations show they would absorb too much mercury.
. . .

Among the fish testing relatively low in mercury in the FDA's latest round of tests was tilefish, a species the agency warns pregnant women and young children not to eat.

Previous testing in the Gulf of Mexico found high mercury levels in tilefish. The latest samples came from waters off the Atlantic Coast, raising questions about the reliability of the FDA's consumer advice.
. . .

Just how much mercury might be in a single can of tuna is unclear. That is because the FDA does not test individual cans. Instead, it removes small pieces of tissue from 12 cans and mixes the tissue together. The agency then tests the mixture, masking any extreme amounts of mercury in a single can. This is done with other fish species as well.
. . .

In the FDA's recent testing, one sample of light tuna showed mercury levels of 0.72 parts per million--a high amount but still within the 1.0 legal limit. But because this result was a composite of 12 cans, it is likely that some of the individual cans had higher levels.

It is impossible to know whether one of those cans tested over the legal limit.

The FDA said it tests a mixture of cans rather than individual cans partly to save money.

"It would cost 12 times as much to test 12 separate cans and then average the data, which is what we would have to do," said the FDA official who requested anonymity.

That methodology troubles some doctors.

"I find that incredibly disturbing," said Jane Hightower, a San Francisco internist who treats patients with mercury-related ailments. "That is falsifying data as far as I am concerned."

The FDA's data are here. It's not just the tuna that has mercury. Don't each the sharks! See here for the summary of the FDA data and here for their advice.

My thoughts (in no particular order):

1. My understanding of heavy-metal poisoning is that it's the cumulative dose that matters, not the size of any particular dose. I might be wrong on this, but if I'm right, the FDA is right to just look at averages. The article correctly states that composite sampling is "masking any extreme amounts of mercury in a single can," but that's not a problem if the concern is total exposure.

2. Similarly, I don't think it's relevant to say that 6% of cans exceeded some threshold. What's relevant is the average amount of mercury per can. If I'm eating a lot of tuna, it doesn't matter whether I'm getting it all from one can, or if it's evenly distributed in many cans.

3. I'm confused about the recommendations. Ignoring the irrelevant (but headline-producing) 6%, I see that, for all the cans tested, "the mercury levels averaged 0.12 parts per million, in line with previous limited testing and well below the legal limit of 1.0 parts per million." So it sounds like the mercury is not a problem. But then later on it says, "the EPA's own calculations show they would absorb too much mercury." So what's the scientific consensus?

4. There are some elaborate methods in the statistical literature for getting standard errors for estimates from composite sampling. The simplest and most reliable approach, however, is to just replicate the composite sampling, then use simple inference for the average of the composite samples. That's what we did for my consulting project in 1994 analyzing fish sampled from a lake.

5. Maybe I'm missing something, but I can't see how that doctor can say that composite sampling is "falsifying data." And since when are doctors considered experts in statistics? Well, I guess that's OK if a spokesman from the Tuna Association is considered an expert in medicine. . . .

The article is actually a little frustrating, because (as noted in point 3 above), there seems to be some ambiguity in the recommendations, but this just about gets lost amid the quotes giving extreme views of the issue.

But it was fun to see composite sampling in the news. Maybe next time they'll quote a statistician . . .

P.S.

A commenter pointed out that there's a question at Scatterbox about the tuna study. I don't know enough about this to evaluate it, but as noted above, my first thought is that it makes more sense to look at averages than to look at the percent of cans that are above some threshold. And composite sampling is fine. What's not clear to me are the esitmated risks of a given dose of mercury (or, I suppose, the health benefits of eating fish compared to whatever the alternative is for any particular person).

Posted by Andrew at 9:36 AM | Comments (4) | TrackBack

January 17, 2006

Fake cancer study in Norway

Kjetil Halvorsen linked to this story, "Research cheats may be jailed," from a Norwegian newspaper:

State officials have been considering imposing jail terms on researchers who fake their material, but the proposal hasn't seen any action for more than a year. More is expected now, after a Norwegian doctor at the country's most prestigious hospital was caught cheating in a major publication. . . . The survey allegedly involved falsification of the death- and birthdates and illnesses of 454 "patients." The researcher wrote that his survey indicated that use of pain medication such as Ibuprofen had a positive effect on cancers of the mouth. He's now admitted the survey was fabricated and reportedly is cooperating with an investigation into all his research.

And here's more from BBC News:

Norwegian daily newsaper Dagbladet reported that of the 908 people in Sudbo's study, 250 shared the same birthday.

Slap a p-value on that one, pal.

Speaking as a citizen, now, not as a statistician, I don't see why they have to put these people in jail for a year. Couldn't they just give them a big fat fine and make them clean bedpans every Saturday for the next 5 years, or something like that? I would think the punishment could be appropriately calibrated to be both a deterrent and a service, rather than a cost, to society.

P.S. Update here.

Posted by Andrew at 12:41 AM | Comments (4) | TrackBack

January 3, 2006

Statistics and ethics: human volunteers edition

Charles Star considers one of the fine points in the definition of "volunteer."

Posted by Andrew at 9:20 AM | Comments (1) | TrackBack

November 3, 2005

Expert statistical modeler needed

Statistics is fundamental to pharmacology and drug development. Billy Amzal at Novartis forwarded me this job announcement for a statistician or mathematician who wants to do statistical modeling in pharmocokinetics/pharmacodynamics. "Knowledge of Bayesian statistics and its application is a strong plus." It's a long way from Berkeley, where one of my colleagues told me that "we don't believe in models" and another characterized a nonlinear differential equation model (in pharmacokinetics) as a "hierarchical linear model." Anyway, it looks like an interesting job opportunity.

Posted by Andrew at 9:56 AM | Comments (1) | TrackBack

October 21, 2005

Scientists behaving badly? Maybe, but maybe not as often as reported

In a comment here, Martin Termouth cited this report from Nature, "One in three scientists confesses to having sinned."

But what are these sins? Here's the relevant table:

sins.jpg

This looks pretty bad, until you realize that the rarest behaviors, which are also the most severe, are at the top of the table. The #1 "sin," admitted-to by 15.5% of the respondents, is "Changing the design, methodology or results of a study in response to pressure from a funding source." But is that a sin at all? For example, I've had NIH submissions where the reviewers made good suggestions about the design or data analysis, and I've changed the plan in my resubmission. This is definitely "pressure"--it's not a good idea to ignore your NIH reviewers--but not improper at all.

From the other direction, as an NSF panelist I've made suggestions for research proposals, with the implication that they better think very hard about alternative designs or analyses if they want to get funding. This all seems proper to me. Of course, I agree that it's improper to change the results of a study in response to pressure. But, changing the design or methodology, that seems OK to me.

Now let's look at the #2 sin, "Overlooking others' use of flawed data or questionable interpretation of data." This is not such an easy ethical call. Blowing the whistle on frauds by others is a noble thing to do, but it's not without cost. My friend Seth Roberts has, a couple times, pointed out cases of scientific fraud (here's one example), and people don't always appreciate it. Payoffs for whistleblowing are low and the costs/risks are high, so I'd be cautious about characterizing "Overlooking other's use of flawed data..." as a scientific "sin."

Now, the #3 sin, "Cirumventing certain minor aspects of human-subjects requirements." I agree that this could be "questionable" behavior. Although I'm not quite sure if "circumventing" is always bad. It's sort of like the difference between "tax evasion" (bad) and "tax avoidance" (OK, at least according to Judge Learned Hand).

Taking out these three behaviors leaves 11.4%, not quite as bad as the "more than a third" reported. (On the other hand, these are just reported behaviors. I bet there's a lot more fraud out there by people who wouldn't admit to it in a survey.)

If you've read this far, here's a free rant for you!

P.S. When you click on a Nature article, a pop-up window appears, from "c1.zedo.com", saying "CONGRATULATIONS! YOU HAVE BEEN CHOSEN TO RECEIVE A FREE GATEWAY LAPTOP . . . CLICK HERE NOW!." Is this tacky, or what? I thought the British were supposed to be tasteful!

Posted by Andrew at 12:00 AM | Comments (2) | TrackBack

October 20, 2005

"Fat Politics: The Real Story Behind America's Obesity Epidemic"

Eric Oliver is speaking todayin the American Society and Politics Workshop on "fat politics." Here's the paper and here are some paragraphs from it:

In truth, the only way we are going to “solve” the problem of obesity is to stop making fatness a scapegoat for all our ills. This means that public health officials and doctors need to stop making weight a barometer of health and issuing so many alarmist claims about the obesity epidemic. This also means that the rest of us need to stop judging others and ourselves by our size.

Such a change in perspective, however, may be our greatest challenge. Our body
weight and fatness is a uniquely powerful symbol for us – something we feel we should
be able to control but that often we can’t. As a result, obesity has become akin to a
sacrificial animal, a receptacle for many of our problems. Whether it is our moral
indignation, status anxiety, or just feelings of general powerlessness, we assume we can
get a handle on our lives and social problems by losing weight. If we can only rid
ourselves of this beast (that is, obesity), we believe we will not only be thin, but happy,
healthy, and righteous. Yet, as with any blind rite, such thinking is a delusion and
blaming obesity for our health and social problems is only going to cause us more injury
over the long haul.

So how might we change our attitudes about obesity and fat? As with any change
in perspective, the first place we must begin is in understanding why we think the way we
do. In the case of obesity, we need to understand both why we are gaining weight and,
more importantly, why we are calling this weight gain a disease. In other words, if we
are to change our thinking about fat, we need to recognize the real sources of America’s
obesity epidemic.

Oliver continues:

This book seeks to help in this effort. It is divided roughly into two parts. The first part, examines how and why our growing weight has come to be characterized as an “obesity epidemic.” Chapters 1 and 2 examine the role of the health professions, drug companies, government, and diet industry in promulgating the idea that our growing weight is a dangerous disease. After reviewing both the scientific evidence and the history of “obesity” as a health concept, it becomes clear that America’s “health industrial complex” is far more responsible for the obesity epidemic than any other source. But the health warnings about obesity have not fallen on deaf ears and if Americans are truly worried about obesity it is because of their receptivity to the various health pronouncements. Chapters 3 and 4 examine why we in the West hate fatness so much, particularly in white women, while the rest of the world tends to celebrate it. As we’ll see, our attitudes about fatness have much more to do with our concerns about social status, race, and sex than they do with health.

The second half of the book examines why we are gaining weight and what this
weight gain signifies. Chapter 5 looks at the science of fat and what the genetic sources
of weight tell us about our expanding waistlines and our health. Chapters 6 and 7 review
the charges and evidence concerning food, exercise, and our growing weight. As we’ll
see, the most commonly accused culprits (fast food, high fructose corn syrup, television,
and automobiles) are merely accessories to the “crime”; meanwhile, the real source of our
growing weight (snacking) goes largely unnoticed. Chapter 8 reviews the politics behind
the various obesity initiatives coming from our state and federal governments. Not only
are most of these policies unlikely to help us lose weight, they also reveal the
fundamental problems with making weight-loss a target of government action. In
Chapter 9, the conclusion, I discuss what I think our growing weight really means and
what we can do to address the real problems of obesity in the United States.

This sounds interesting; I'd like to see the quantitative analysis. Also, of course, I wonder what Seth would think of this.

The talk is in 270 International Affairs Bldg at 4:15, unfortunately a time that conflicts with one of my classes.

Posted by Andrew at 12:06 AM | Comments (0) | TrackBack

September 16, 2005

Seth's diet, etc.

Seth Roberts is guest-blogging at Freakanomics with lots of interesting hypotheses about low-budget science, or what might be called "distributed scientific investigation" (by analogy to "distributed computing").

One of the paradoxes of Seth's self-experimentation research is that it seems so easy, but it clearly isn't, as one can readily see by realizing how few scientific findings have been obtained this way. Reading Seth's article in Behavioral and Brain Sciences gave me a sense of how difficult these self-experiments were. They took a lot of time (lots of things were tried), required discipline (e.g., standing 8 hours a day and setting up elaborate lighting systems in the sleep experiments) and many many measurements, and were much helped by a foundation of a deep understanding of the literature in psychology, nutrition, etc.

Also, for those interested in further details of Seth's diet, I'll cut-and-paste something from his blog entry.

Seth writes:

Because I hope to write a diet book I [Seth] am not going to be giving advice, at least until my future publisher approves. But I am happy to talk about how I lost weight and how I maintained (and maintain) the loss. With that in mind,

1. The Times article wasn't terribly precise about what I do now. For good reason: Neither am I. I used to drink carefully measured amounts of fructose water or extra-light olive oil -- amounts containing about 100-300 calories per day. Now I measure nothing. I am sure however that my total caloric intake from what I will call unusual foods has not changed. The unusual foods currently consist of canola oil, sucrose water (much more convenient than fructose water), and most days a raw egg, swallowed quickly, as the Italians do. Ah, food taboos. I repeat: I am not recommending this (or anything else). I got the idea from a friend of mine; a raw egg swallowed quickly is a relatively diverse source of calories without taste. Perhaps she got the idea from the Italian custom. I have only been swallowing raw eggs for a few months and overall am beginning to think they are more trouble than they are worth. The child in me wishes there were more opportunities to bring it up in conversation. Just as when I was a graduate student I enjoyed saying (truthfully) that I subscribed to the National Enquirer. "That's worse than Playboy!" someone said.

2. I am leaning toward consuming the sugar water hot in the evenings. Somehow it tastes better then. An Italian friend of mine says that when he was young, that's what his mother gave him -- hot sugar water before bed time. If the critics of sugar wish to malign an entire country of devoted mothers tending to their children . . .

3. Before he studied food intake, my friend the physiologist Michel Cabanac, at Laval University, Quebec, studied temperature regulation; and his work in that area may have helped him understand food intake. There is a body-temperature set point: a temperature the body tries to maintain by increasing or decreasing not only how much we sweat but also how pleasant we find heat and cold. Michel found that there was a circadian rhythm in the set point: it went up and down with a 24-hour rhythm. The circadian rhythm of the set point causes the more obvious circadian rhythm in body temperature. How pleasant we find heat varies with the time of day; a warm shower will be more pleasant in the morning (when our set point is rising) than at other times (when it is no longer rising). This doesn't predict, I admit, that hot sugar water should taste better in the evening. Michel also found (or perhaps someone else found this) that the body temperature set point depended on the external (air) temperature: When it was cold, the set point went up. Because he knew this, it was easy for him to believe that the body fat set point also depended on external conditions. That is the general idea behind my weight-control theory.

4. I don't know if canola oil works. I haven't been doing it very long. For a few years, I used extra-light olive oil to maintain my weight loss. I'm sure it works -- for me. I swallow it easily. Lots of people don't. If anyone understands what causes the difference -- why it is easy for some, hard for others -- please let us know. Perhaps someone has had an experience that changed easy to hard or hard to easy.

Posted by Andrew at 12:32 AM | Comments (0) | TrackBack

August 1, 2005

Faces in the morning

From Marginal Revolution, I see this pointer by Courtney Knapp to a place that sells wallpaper "designed for people who don’t want a roommate, but still want company. It’s a photographic wall coverings with images of life-size people. The wallpaper shows attractive, original-sized individuals, in different situations at home." Here's one of the pictures:

TVWatcher.JPG

My first thought when seeing this was that it reminded me of Seth Roberts's idea, obtained from self-experimentation, of seeing life-sized faces in the morning as a cure for depression. See Section 2.3 of this paper for lots of detail on this hypothesis and the data Seth has to support it. Seth used TV to get the large faces but maybe wallpaper would work too. So maybe that wallpaper isn't as silly as it sounds

Posted by Andrew at 7:46 AM | Comments (1) | TrackBack

July 19, 2005

Using propensity scores to estimate the effects of seeing gun violence

Jeff Fagan forwarded this article on gun violence by Jeffrey Bingenheimer, Robert Brennan, and Felton Earls. The research looks at children in Chicago who were exposed to gun violence, and uses propensity score matching to find a similar group who were unexposed. Their key finding: "Results indicate that exposure to firearm violence approximately doubles the probability that an adolescent will perpetrate serious violence over the subsequent 2 years."

I'll first give a news report summarizing the article, then my preliminary thoughts.

Here's the summary:



Controversial Study Suggests Seeing Gun Violence Promotes It

Constance Holden

A longitudinal study of Chicago adolescents has concluded that even a single exposure to firearm violence doubles the chance that a young person will later engage in violent behavior. The study may once again stoke up the debate over juvenile violence; it has already triggered criticism over the unusual statistical method it employs.

The work is part of the decade-old Project on Human Development in Chicago Neighborhoods, run by Harvard University psychiatrist Felton J. Earls. On page 1323, Earls and two health statisticians describe how they used a relatively new technique called "propensity score stratification" to create, through statistical means, a randomized experiment on propensity toward violence from observational data.

Over a 5-year period, the researchers conducted three interviews with more than 1000 adolescents initially aged 12 to 15. In the first, they gathered extensive data on variables such as family structure, temperament, IQ, and previous exposure to violence. Halfway through the study, the subjects were asked if, in the prior 12 months, they had been exposed to firearm violence--defined as being shot or shot at or seeing someone else shot or shot at. Then at the end of the period, the 984 subjects remaining were asked if they had engaged in any violence--defined as participation in a fight in which anyone got hurt as well as firearm-related incidents, including carrying a gun.

Figure 1Violence debate. A study of Chicago adolescents indicates that seeing a murder may lead to later gun violence by the observer.

"If you just compare exposed and unexposed, the exposed were three or four times as likely to be [violence] perpetrators," says lead author Jeffrey B. Bingenheimer, a Ph.D. candidate at the University of Michigan School of Public Health in Ann Arbor.

The authors then went to great lengths to weed out confounding factors. Subjects were ranked according to "propensity" scores: a cumulative tally of 153 risk factors that estimated the probability of exposure to gun violence. They were then divided up according to whether or not they had reported such exposure and whether or not they had subsequently engaged in violent behavior. Those with the same propensity scores but different exposures were compared with each other. In this way, the authors claim, they controlled for a host of individual, family, peer, and neighborhood variables.

Even with this analysis, exposure to gun violence predicted a doubling of the risk for violent behavior--from 9% for unexposed to 18% among the subjects who reported exposure, says Bingenheimer. And it didn't take repeated exposures--"the vast majority" of subjects reported only one, he says. Can a single experience of seeing someone shoot at someone else make an individual more violence-prone? "That doesn't seem improbable to me," says Bingenheimer. "It could be for only a minority, but a very large effect for that minority."

Developmental psychologist Jeanne Brooks-Gunn of Columbia University, one of the scientific directors of the Chicago neighborhoods project, agrees that a single exposure might have a profound effect, even on a hitherto nonviolent individual. "Nobody's done this kind of analysis before," she says, and nobody has focused just on gun violence, which "clearly is a very extreme type of violence."

But a number of other scholars have deep misgivings about both the study findings and the methodology. Psychiatrist Richard Tremblay of the University of Montreal in Canada says the study does not demonstrate that "those who are nonviolent to begin with will become violent." Indeed, the authors didn't address this point directly because a lack of subjects in the lowest-risk category led them to eliminate it from their analysis.

Because the remaining subjects already had some violence risk factors, the results don't surprise Tremblay. He compares the work to looking at whether alcoholics are more likely to drink if they are exposed to alcohol. It is already well known, he says, that "if individuals at a high risk of violence are in an environment with violence, they're more likely to be violent."

Economist Steven Durlauf of the University of Wisconsin, Madison, calls the study an "implausible modeling of violence exposure." The authors assume that two individuals with the same propensity rankings are equally likely to encounter violence, he says. But such exposure may not be random; rather, it probably stems from "something that has not been measured"--such as recklessness, says Durlauf. Nobel Prize-winning economist James Heckman of the University of Chicago agrees, calling the study "potentially very misleading." Adds Heckman: "This is why this kind of statistics is not science. This is why you find out orange juice causes lung cancer one week and cures it the next."

But Brooks-Gunn defends the innovative study. The propensity scoring technique "comes the closest we have to any experiment, which is why I think the results are so strong," she says.

My thoughts:

I see merit in the arguments of both sides. I don't know the context of Heckman's particular comment about orange juice, but perhaps the issue is that it contributes to cancer for some people, under some conditions, while reducing the risk of cancer for other people, under other conditions. Matching methods work by restricting analysis to a subset of the population that is well-matched for the treated and control people. So any summary of such an analysis should consider treatment interactions--that is, ways in which treatment effects can vary given pre-treatment predictors.

However, I don't think it's right to simply dismiss this sort of observational study. For one thing, people will be making these comparisons anyway, and comparing matched groups should be better than simple unmatched comparisons.

Posted by Andrew at 12:14 AM | Comments (3) | TrackBack

July 15, 2005

Following up on medical studies

From Mahalanobis, a link to a story following up on medical research findings. From the CNN.com article:

New research highlights a frustrating fact about science: What was good for you yesterday frequently will turn out to be not so great tomorrow.

The sobering conclusion came in a review of major studies published in three influential medical journals between 1990 and 2003, including 45 highly publicized studies that initially claimed a drug or other treatment worked.

Subsequent research contradicted results of seven studies -- 16 percent -- and reported weaker results for seven others, an additional 16 percent.

That means nearly one-third of the original results did not hold up, according to the report in Wednesday's Journal of the American Medical Association.

This is interesting, but I'd like to hear more. If we think of effects as being continuous, then I'd expect that "subsequent research" would find stronger results half the time, and weaker results the other half the time. I imagine their dividing line relates to statistical significance, but that criterion can be misleading when making comparisons.

I'm not saying there's anything wrong with this JAMA article, just that I'd like to see more to understand what exactly they found. They do mention as an example the notorious post-menapausal hormone study.

P.S. For the name-fans out there, the study is by "Dr. John Ioannidis, a researcher at the University of Ioannina." I wonder if having the name helped him get the job.

Posted by Andrew at 7:21 PM | Comments (2) | TrackBack

June 23, 2005

Diet soda and weight gain

I wonder what Seth Roberts thinks about this:

Study links diet soda to weight gain

BY DON FINLEY

San Antonio Express-News

A review of 26 years of patient data found that people who drink diet soft drinks were more likely to become overweight.

Not only that, but the more diet sodas they drank, the higher their risk of later becoming overweight or obese -- 65 percent more likely for each diet drink per day.

The findings, the latest from the long-term San Antonio Heart Study, took even the researchers by surprise.

''I was baffled,'' said Sharon Fowler, a faculty associate at the University of Texas Health Science Center, who presented the data earlier this month at the American Diabetes Association's 65th annual Scientific Sessions in San Diego.

Researchers looked at questionnaires and medical records for 1,177 patients who began enrolling in the study in 1979. All had weights considered either normal or overweight, but not obese.

The volunteers were asked how many soft drinks per day they usually drank and whether they were regular or diet -- or a combination of each. The researchers followed up with them over the years.

Drinking any soda -- regular or diet -- was linked to a higher risk of becoming overweight. But when the researchers adjusted the data to account for differences in age, sex and ethnicity, they found that regular soft drinks had very little connection with serious weight gain.

Diet drinks, however, did.

The researchers are quick to point out that their findings are not proof that drinking diet soft drinks causes people to become heavy. It could be that as they began gaining weight, they switched from regular to diet drinks.

''People who were normal weight, one out of four of them at the time of our study were drinking diet drinks,'' Fowler said. ``People who were overweight but not obese, one out of three of them were drinking the diet drinks. Definitely they were voting with their feet. They were obviously trying to avoid gaining further weight or repeating a family history.''

However, the idea that diet sodas can lead to weight gain isn't new. Last year, a group from Purdue University found that when rats were fed the equivalent of diet soda, they ate more high-calorie food afterwards than did rats fed the same amount of a drink sweetened with high-calorie sweetener.

Here's another story with more details, including:

That may be just what happens when we offer our bodies the sweet taste of diet drinks, but give them no calories. Fowler points to a recent study in which feeding artificial sweeteners to rat pups made them crave more calories than animals fed real sugar.

"If you offer your body something that tastes like a lot of calories, but it isn't there, your body is alerted to the possibility that there is something there and it will search for the calories promised but not delivered," Fowler says.

This is very similar to the reasoning applied in reverse by Seth, who recommends a weight-loss strategy based on taking sugar water (with calories but no taste) between meals. Seth developed his ideas using self-experimentation but based his conjectures on rat experiments as well.

Posted by Andrew at 1:02 AM | Comments (7) | TrackBack

June 1, 2005

Chad on ethics

Chad Heilig is a statistics Ph.D. graduate of Berkeley who has moved from theoretical statistics to work at the CDC. He recently wrote a paper on ethics in statistics that will appear in Clinical Trials. The paper is interesting to read--it presents a historical overview of some ideas about ethics and statistics in medical studies.

Two key ethical dilemmas in clinical trials are:

(1) The conflict between the goal of saving future lives (by learning as much as possible, right away, about effectiveness of treatments), and the goal of treating current patients as effectively as possible (which, in some settings, means using the best available treatment, and in others means using something new--but will not, in general, correspond to random assignment).

(2) The conflict between the goals in (1)--to help current and future patients--and the goals of the researcher, which can include pure scientific knowledge as well as $, glory, etc.

As Chad points out, it's a challenge to quantify either of these tradeoffs. For example, how many lives will be saved by performing a large randomized trial on some drug, as compared to using it when deemed appropriate and then learning its effectiveness from observational studies. (It's well known that observational studies can give wrong answers in such settings.)

I completely disagree with the following statement on page 5 of the paper, which Chad attributes to Palmer (1993): "Where individual ethics is favored, one ought to employ Bayesian statistical methods; where collective ethics is favored, frequentist methods apply." This doesn't make sense to me. (For one thing, "frequentist methods" is an extremely general class which includes Bayesian methods as a special case.)

For a copy of the paper, email Chad at cqh9@cdc.gov

Posted by Andrew at 9:57 AM | Comments (0) | TrackBack

May 18, 2005

We've been eating candy with lead

Carrie points to an article in the Orange County Register reporting that lots of Mexican candy has lead in it. One of the candies in Carrie's graphic looked familiar, so I clicked on the link and saw that I've eaten a lot of these candies! I actually have some in a jar in my office and have been giving them out to students!

The Register article focuses on Mexico, but I've seen these candies in sale in Guatemala also. According to the article, the chili powder in the candies and the ink in the wrappers is contaminated with lead. Pretty scary, especially considering that my in-laws used to eat these as kids, and we still get them occasionally as treats.

Posted by Andrew at 6:53 AM | Comments (1) | TrackBack

May 13, 2005

Death by survey

Emmanuela Gakidou and Gary King (Institue for Quantitative Social Science, Harvard) wrote a cool paper, "Death by survey: estimating adult mortality without selection bias," in which they consider estimates of mortality based on "survey responses about the survival of siblings, parents, spouses, and others." By explicitly modeling the missing-data process, they correct for selection biases such as, dead persons with more siblings are more likely be counted in a survey asking about the deaths of siblings. (And persons with no siblings won't be counted at all.)

Comments on the Gakidou and King paper

This is a fun, interesting, and potentially important paper. I just had a few questions/comments. Mostly picky, but hey, anything to be helpful . . .

- What is the "DHS program"? They refer to it several times but I don't know what it is.

- Figure 1 would be better, I think, as a 3x3 grid of small plots. Instead of trying to use symbols and colors to convey so many details on a single plot, use a grid so that the individual plots are less overloaded. Also, connect the points in each plot with lines (and get rid of the points). Then you can label the lines directly on the plot and avoid the need for a legend.

- Same for Figure 2. Also, for Figure 2, make the bottom boundary at 0 a "hard" boundary (no gap between 0 and the axis) since zero is a meaningful comparison. Also, I applaud the authors for using RMSE instead of MSE.

- Table 1 should be a graph. No doubt about it.

- Figure 3 is fascinating. I have a minor comment which is that I can't figure out how the subplots are ordered. I'd like to see something like an ordering by average death rate, or GDP, or some interpretable quantity.

More importantly, I'm interested in the consistent pattern of these curves (of death rate vs. #siblings), which go up from as the number of siblings increases from 1 to 4 or 5, then generally decrease as the number increases further. What's going on here? Is it a "real" phenomenon, or is it some statistical artifact having to do with the sampling? I just didn't quite know how to think about it.

- In the discussion of sampling weights in the conclusion, the authors should be aware that, in many settings, the more appropriate survey weights come from poststratification. To the extent that family size information is directly available in some of these countries, I suspect that poststratification could improve the Gadikou and King method even more.

A related classroom demo

The Gakidou and King paper is really cool and reminds me of a (much simpler) classroom demo for teaching sampling methods. We ask each student to tell how many siblings are in his or her family ("How many brothers and sisters are in your family, including yourself?"). We write the resutls on the blackboard as a frequency table and a histogram, and then compute the mean, which is typically around 3.

But families, on average, typically have less than 3 kids (and this was also true twenty or so years ago when the current college students were being born). Why i sthe number for the class so high? Students give various suggestions such as, perhaps larger families are more likely to sent children to college. But the real reason is that the probability a family is included in the sample is proportional to the number of kids in the family. Families with 0 kids won't be included at all, and families with many kids are more likely to be sampled than families with just 1 kid.

This example is discussed in Section 5.1.6 of Teaching Statistics: A Bag of Tricks. Related examples in surveys include sampling by household or by individual.

Posted by Andrew at 6:27 AM | Comments (0) | TrackBack

April 25, 2005

How to lie with statistics: clinical trials edition

Carrie noticed an article in the Carlat Report describing some methods used in sponsored research to induce bias in drug trials:

1. Make sure your drug has a dosage advantage. This way, you can present your findings as a “head-to-head” trial without worrying that your drug will be outperformed. Thus, a recent article on Cymbalta concluded that “in three comparisons, the mean improvement for duloxetine was significantly greater than paroxetine or fluoxetine.” (Depression and Anxiety 2003, 18; 53-61). Not a surprising outcome, considering that Cymbalta was ramped up to a robust 120 mg QD, while both Prozac and Paxil were kept at a meek 20 mg QD.

2. Dose their drug to cause side effects. . . . The original Lexapro marketing relied heavily on a study comparing Lexapro 10 mg and 20 mg QD with Celexa 40 mg QD—yes, patients in the Celexa arm were started on 40 mg QD (J Clin Psychiatry 2002; 63:331-336). The inevitably higher rate of discontinuation with high-dose Celexa armed Forest reps with the spin that Lexapro is the best tolerated of the SSRIs. . . .

3. Pick and choose your outcomes. If the results of the study don’t quite match your high hopes for the drug, start digging around in the data, and chances are you’ll find something to make you smile! Neurontin (gabapentin) is a case in point. . . .

4. Practice "creative writing" in the abstract.

Carlat also cites a study from the British Medical Journal finding that "Studies sponsored by pharmaceutical firms were four times more likely to show results favoring the drug being tested than studies funded by other sources."

I don't know enough about medical trials to have a sense of how big a problem this is (or, for that matter, how to compare the negatives of biased research to the positives associated with research sponsorship), but at the very least it would seem to be a great example for that "how to lie with statistics" lecture in an intro statistics class.

One thing that interests me about Carlat's methods is that only item 3 ("Pick and choose your outcome") and possibly item 4 ("Practice creative writing") fit into the usual "how to lie with statistics" framework. Items 1 and 2, which involve rigging the design, are new to me. So maybe this would be a good article for an experimental design class.

For more examples and discussion, see the article by Daniel Safer in Journal of Nervous and Mental Disease 190, 583-592 (2002), cited by Carlet.

Posted by Andrew at 12:52 AM | Comments (5) | TrackBack

April 8, 2005

Seth on small-n and large-n studies

After reading Seth Roberts's article on self-experimentation, I had a dialogue with him about when to move from individual experimentation to a full-scale controlled experiment with a large-enough n to obtain statistically significant results. My last comment said:

But back to the details of your studies. What about the weight-loss treatment? That seems pretty straightforward--drink X amount of sugar water once a day, separated by at least an hour from any meals. To do a formal study, you'd have to think a bit about what would be a good control treatment (and then there are some statistical-power issues, for example in deciding whether it's worth trying to estimate a dose-response relation for X), but the treatment itself seems well defined.

Seth replied as follows:

Here are some relevant "facts":

Long ago, John Tukey said that he would rather have a sample of n = 3 (randomly selected) than Kinsey's really large non-random samples. He did not explain how one would get a randomly selected person to answer intimate questions. Once one considers that point Kinsey's work looks a little better -- because ANY actual sample will involve some compromise (probably large) with perfectly random sampling. Likewise, the closer one looks at the details of doing a study with n = 100, the more clearly one sees the advantages of smaller n studies.

How do the results of self-experimentation make their way in the world? An example is provided by blood-sugar testing for diabetics. Now it is everywhere -- "the greatest advance since the discovery of insulin," one diabetic told me. It began with self-experimentation by Richard Bernstein, an engineer at the time. With great difficulty, Bernstein managed to present his work at a scientific conference. It was then followed up by a British academic researcher, who began with relatively small n studies. I don't think he ever did a big study (e.g., n = 100). The benefits were perfectly clear with small n. From there it spread to become the norm. Likewise, I don't think that a really large study of my weight-loss ideas will ever be necessary. The benefits should be perfectly clear with small n. Fisher once said that what is really convincing is not a single study with a really low p value but repeated studies with p < .05. Likewise, I don't think that one study with n = 100 is half as convincing as several diverse studies with much smaller n.

It is so easy to believe that bigger is better (when in fact that is far from clear) that I wonder if it is something neurological: Our brains are wired to make us think that way. I cannot remember ever hearing a study proposed that I thought was too small; and I have heard dozens of proposed studies that I thought were too large. When I discussed this with Saul Sternberg, surely one of the greatest experimental psychologists of all time, he told me that he himself had made this very mistake: Gone too quickly to a large study. He wanted to measure something relatively precisely so he did an experiment with a large n (20 is large in cognitive psychology). The experiment failed to repeat the basic effect.

P.S. Seth's paper was also noted here.
See also here for Susan's comments.

Posted by Andrew at 7:37 AM | Comments (0) | TrackBack

March 31, 2005

More thoughts on self-experimentation

Susan writes:

I've started reading the piece you sent me on Seth. Very interesting stuff. I generally tend to think that one can get useful evidence from a wide variety of sources -- as long as one keeps in mind the nature of the limitations (and every data source has some kind of limitation!). Even anecdotes can generate important hypotheses. (Piaget's observations of his own babies are great examples of real insights obtained from close attention paid to a small number of children over time. Not that I agree with everything he says.) I understand the concerns about single-subject, non-blind, and/or uncontrolled studies, and wouldn't want to initiate a large-scale intervention on the basis of these data. But from the little bit I've read so far, it does sound like Seth's method might elicit really useful demonstrations, as well as generating hypotheses that are testable with more standard methods. But I also think it matters what type of evidence one is talking about -- e.g., one can fairly directly assess one's own mood or weight or sleep patterns, but one cannot introspect about speed of processing or effects of one's childhood on present behavior, or other such things.

My thoughts: that's an interesting distinction between aspects of oneself that can be measured directly, as compared to data that are more difficult to measure.

I remember that Dave Krantz once told me that many of the best ideas in the psychology of decision making had come from researchers' introspection. That sounds plausible to me. Certainly, speculative axioms such as "minimax risk" and similar ideas discussed in the Luce and Raiffa book always seemed to me to be justified by introspection or by demonstrations of the Socratic-dialogue type that (such as in Section 5 of this paper, where we demonstrate why you can't use a curving utility function to explain so-called "risk averse" attitudes).

One of the discussants of Seth's paper in Behavioral and Brain Sciences compared introspection to self-experimentation. Just as self-experimentation is a cheaper, more flexible, but limited version of controlled experiments on others, introspection is a cheaper etc. version of self-experimentation.

Back to Susan's comments: she appears to agree with Seth that it's not a good idea to jump from the self-experiments to the big study. So there should be some intermediate stage . . . pilot-testing with volunteers? How much of this needs to be done before he's ready for the big study? More generally, this seems to be an important experimental design question not addressed by the usual statistical theory of design of experiments.

Posted by Andrew at 12:12 AM | Comments (3)

March 16, 2005

Learning from self-experimentation

Seth Roberts is a professor of psychology at Berkeley who has used self-experimentation to generate and study hypotheses about sleep, mood, and nutrition. He wrote an article in Behavioral and Brain Sciences describing ten of his self-experiments. Some of his findings:

Seeing faces in the morning on television decreased mood in the evening and improved mood the next day . . . Standing 8 hours per day reduced early awakening and made sleep more restorative . . . Drinking unflavored fructose water caused a large weight loss that has lasted more than 1 year . . .

As Seth describes it, self-experimentation generates new hypotheses and is also an inexpensive way to test and modify them. One of the commenters, Sigrid Glenn, points out that this is particularly true with long-term series of measurements that it might be difficult to do on experimental volunteers.

Heated discussion

Behavioral and Brain Sciences is a journal of discussion papers, and this one had 13 commmenters and a response by Roberts. About half the commenters love the paper and half hate it. My favorite "hate it" comment is by David Booth, who writes, "Roberts can swap anecdotes with his readers for a very long time, but scientific understanding is not advanced until a literature-informed hypothesis is tested between or within groups in a fully controlled design shown to be double-blind." Tough talk, and controlled experiments are great (recall the example of the effects of estrogen therapy), but Booth is being far too restrictive. Useful hypotheses are not always "literature-informed," and lots has been learned scientifically by experiments without controls and blindness. This "NIH" model of science is fine but certainly is not all-encompassing (a point made in Cabanac's discussion of the Roberts paper).

The negative commenters were mostly upset by the lack of controls and blinding in self-experiments, whereas the positive commenters focused on individual variation, and the possibility of self-monitoring to establish effective treatments (for example, for smoking cessation) for individuals.

In his response, Roberts discusses the various ways in which self-experimentation fits into the landscape of scientific methods.

My comments

I liked the paper. I followed the usual strategy with discussion papers and read the commentary and the response first. This was all interesting, but then when I went back to read the paper I was really impressed, first by all the data (over 50 (that's right, 50) scatterplots of different data he had gathered), and second by the discussion and interpretation of his findings in the context of the literature in psychology, biology, and medicine.

The article has as much information as is in many books, and it could easily be expanded into a book ("Self-experimentation as a Way of Life"?). Anyway, reading the article and discussions led me to a few thoughts which maybe Seth or someone else could answer.

First, Seth's 10 experiments were pretty cool. But they took ten years to do. It seems that little happened for the first five years or so, but then there were some big successes. It would be helpful to know if he started doing something in last five years that made his methods more effective. If someone else wants to start self-experimenting, is there a way to skip over those five slow years?

Second, his results on depression and weight control, if they turn out to generalize to many others, are huge. What's the next step? Might there be a justification for relatively large controlled studies (for example, on 100 or 200 volunteers, randomly assigned to different treatments)? Even if the treatments are not yet perfected, I'd think that a successful controlled trial would be a big convincer which could lead to greater happiness for many people.

Third, as some of the commenters pointed out, good self-experimentation includes manipulations (that is, experimentation) but also careful and dense measurements--"self-surveillance". If I were to start self-experimentation, I might start with self-surveillance, partly because the results of passive measurements might themselves suggest ideas. All of us do some self-experimentation now and then (trying different diets, exercise regimens, work strategies, and soon). Where I suspect that we fall short is in the discipline of regular measurements for a long enough period of time.

Finally, what does this all say about how we should do science? How can self-experimentation and related semi-formal methods of scientific inquiry be integrated into the larger scientific enterprise? What is the point where researchers should jump to a larger controlled trial? Seth talks about the benefits of proceeding slowly and learning in detail, but if you have an idea that something might really work, there are benefits in learning more about it sooner.

P.S. Some of Seth's follow-up studies on volunteers are described here (for some reason, this document is not linked to from Seth's webpage, but it's referred to in his Behavioral and Brain Sciences article).

Posted by Andrew at 12:35 AM | Comments (11) | TrackBack

February 17, 2005

Bayesian modeling for kidney filtering

Chris Schmid (statistics, New England Medical Center) writes:

We're trying to make a prediction equation for GFR which is the rate at which the kidney filters stuff out. It depends on a bunch of factors like age, sex, race and lab values like the serum creatinine level. We have a bunch of databases in which these things are measured and know that the equation depends on factors such as presence of diabetes, renal transplantation and the like. Physiologically, the level of creatinine depends on the GFR but we can measure creatinine more easily than GFR so want the inverse prediction. Two complicating factors are measurement error in creatinine and GFR as well as the possibility that the doctor may have some insight into the patient's condition that may not be available in the database. We have been proceding along the lines of linear regression, but I suggested that a Bayesian approach might be able to handle the measurement error and the prior information. I'm attaching some notes I wrote up on the problem.

So, we have a development dataset to determine a model, a validation set to test it on and then new patients on whom the GFR would need to be predicted as well as some missing data on potential important variables. What I am not clear about is how to use a prior for the prediction model, if this uses information not available in the dataset. So we'd develop a Bayesian scheme for estimating the posteriors of the regression coefficients and true unknown lab values but would then need to apply it to single individuals with measure of creatinine and some covariates. The prior on the regression parameters would come from the posterior of the data analysis, but wouldn't the doctor's intuitive sense of the GFR level need to be incorporated also and since it's not in the development dataset, how would that be done? It seems to me that you'd need a different model for the prediction than for the data analysis. Or is it that you want to use the data analysis to develop good priors to use in a new model?

A Bayesian approach would definitely be the natural way to handle the measurement error. I would think that substantive prior information (such as doctor's predictions) could be handled in some way as regression predictors, rather than directly as prior distributions. Then the data would be able to assess, and automatically calibrate, the relevance of these predictors for the observed data (the "training set" for the predictive model).

Any other thoughts?

Posted by Andrew at 10:08 AM | Comments (0)

February 15, 2005

Reducing arsenic exposure in Bangladesh

Well diggers in Araihazar, Bangladesh, will soon be able to take advantage of a cell phone-based data system, developed at the Earth Institute, to target safe groundwater aquifers for installing new wells that are not tainted with arsenic. Using a new needle sampler (also developed at the Earth Institute), they will also be able to test whether the water is safe during drilling and before a well is actually installed . . .

(see also here)

Posted by Andrew at 7:39 AM | Comments (0)

January 24, 2005

Estimating the probability of events that have never occurred

Jim Hammitt (director of the Harvard Center for Risk Analysis) had a question/comment about my paper, Estimating the probability of events that have never occurred: when is your vote decisive? (written with Gary King and John Boscardin, published in the Journal of the American Statistical Association).

The paper focused on the problem of estimating the probability that your single vote could be decisive in a Presidential election. There have been only 50-some elections, and so this probability can't simply be estimated empirically. But, on the other side, political scientists and economists had a history of estimating this sort of probability purely theoretically, using models such as the binomial distribution. These theoretical models didn't give sensible answers either.

In our paper we recommended a hybrid approach, using a theoretical model to structure the problem but using empirical data to estimate some of the key components. We suggest that this is potentially a general approach: estimate the probability of very rare events by empirically estimating the probability of more common "precursor" events, then using a model to go from the probability of the precursor to the probability of the event in question.

But Jim is skeptical. He writes:

This paper had a number of provocative comments which I'm not sure I agree with. I was especially interested in your (that's a plural you) comments about the merits of statistical models vs. logical (data-free?) models. I guess the discussion was really in the context of the problem you address there, where it appears that the logical models that had been used weren't really that logical or well thought out. In a broader context, I'm interested in use of data vs. theory and logic, especially when data are limited (e.g., we can't observe very low probabilities of harm in a modest sized sample). If you are just saying that some data beat no data, that doesn't seem very earth-shaking. Another comment I wondered about was the implication that it is hard to estimate the probability of an event that hasn't happened (I think that was in the title, even). If we have a well-understood process (like multiple bernoulli trials with unknown probability of success), then zero successes is conceptually no different than some positive number of successes - i.e., we can estimate confidence intervals for p, construct a posterior given a prior, etc.

My answer is that I'd definitely prefer a method that allows data to enter somewhere. If the number of counts is zero, then one can't really get a good confidence interval without some prior information. For example, zero ties out of 50 Presidential elections--so the simple Bayes estimate is 1/52? In this case, I'd rather put in other information by modeling precursors (e.g., the probability that a state is within 10,000 votes of a tie), rather than treating it as some sort of binomial prior distribution.

Similarly with environmental hazards--I'd assume that it could be possible to get empirical estimates of the probabilities of various "near-miss" events, and then you'd have to resort to mathematical modeling to extrapolate to the probabilities of the really rare events.

Does this make sense?

Posted by Andrew at 6:27 PM | Comments (0)

January 7, 2005

Could propensity score analysis fix the Harvard Nurses study?

A well-publicized example of problems with observational studies is hormone replacement therapy and heart attack risks for postmenopausal women. In brief, the observational study gave misleading answers because the "treatment" and "control" groups differed systematically. Could the method of propensity scores have found (and solved) the problem?

Hormone replacement therapy and heart attacks

The evidence from the Women's Health Initiative, a randomized clinical trial from the 1990s, is that hormone replacement therapy increases the risk of heart attack in older women. (Here's a summary from the American College of Obstetricians and Gynecologists, which I found at the entry for Hormone Replacement Therapy at the National Library of Medicine).

Confusion from the observational study

The above findings surprised people because observational evidence from the Harvard Nurses Health Study found that women who used hormone replacement therapy had a lower risk of heart attacks. For example, in an article in the Harvard Health Letter from October, 1997:

The latest report from the Nurses' Health Study speaks to some of those issues. In this ongoing investigation, begun in 1976, the researchers examined the impact of long-term HRT use in more than 60,000 nurses. They looked at length and continuity of hormone use and how this affected the women's death rates. They also studied women who used estrogen alone or in an estrogen/progesterone combination and adjusted their data to account for smoking, weight, exercise, and other lifestyle habits.

Overall, the researchers found that the death rate among HRT users was 37% lower than that of women who had never taken hormones, primarily because the hormones appeared to protect women against heart disease. Indeed, the risk of dying of cardiovascular disease was 53% lower in the HRT group.

But since then, attitudes have changed. For example, from the NIH:

Do not use estrogen plus progestin therapy to prevent heart disease. The new findings show that it doesn't work. In fact, the therapy increases the chance of a heart attack or stroke. And it increases the risk of breast cancer and blood clots.

The Nurses Health Study seems to have struck out on that one! The women who took HRT were apparently quite a bit different, on average, from those who didn't--even after "controlling" for background variables.

But is it possible that, if the data from the Nurses study had been analyzed using propensity scores (see here for a description of the method), that more reasonable claims would have been made from the beginning?

Meanwhile

The Nurses Study continues to operate and make headlines, so this is still a live issue.

P.S.
I was writing about matching and propensity scores because these are the adjustment methods most familiar to me. The question could equally be asked about other methods, such as g-estimation (see the comments of Jamie Robins at this 2003 meeting).

Posted by Andrew at 9:08 PM | Comments (2) | TrackBack

January 6, 2005

CrashStat

Accident statistics are a standard example for teaching count data. Some fascinating collections of data on pedestrian and bicycle accident data in New York City are available at www.crashstat.org. These include detailed maps (all intersections in New York City) as well as breakdowns by zip code. Lots of count data!

Transportation Alternatives has a press release about the maps. From a statistical standpoint, it would be interesting to study the potential effectiveness of various traffic calming ideas.

Posted by Andrew at 7:47 PM | Comments (0)

January 5, 2005

Twins

Cuurious about the latest statistics on twins (I had heard that they are more frequent in the context of modern fertility treatments), I did a quick google.

#1 was for the Minnesota Twins, but #2 was for Twins magazine. I clicked through to the magazine's "Facts and Stats" section which indeed confirms that the birth rate for twins in the U.S. (as of 2002) was 1 in 32 babies, that is, 1 in 64 births, quite a big higher than the historical rate of 1 in 80 births.

But what really got me were its fun facts. They list 10 famous twins, which include Elvis Presley (of course), Ann Landers, John Elway, and 7 other people who really aren't so famous. My guess of the least-famous of these is "Deirdre Hall, actress, Days of our Lives." I mean, if twins really represent 1/40-th of the population (and they do), can't they get 10 more famous people than this? Even Ann Landers really isn't so famous as all that. And John Elway is a pretty impressive guy, but I certainly can't believe he's one of the 40 most famous athletes of all time. (He's not in the Sports 100, for example.)

They also list some "Famous parents of twins," a list which includes many truly famous people, including George W. Bush (of course), Bing Crosby, Pele, william Shakespeare, James Stewart, Robert De Niro, and Margaret Thatcher. A much more famous list throughout.

It's intersting that their sample of famous twins (who represent roughly 1/40 of the population) is so much lamer than their sample of parents of twins (who presumably represent a very similar population fraction. Quick calculation: suppose that the average person has 2 kids, and each birgh has a 1/80 chance of being twins. Then each person has roughly a 1/40 chance of being a parent of twins. Or another way of saying it: each pair of twins has 2 parents, so there will be roughly the same number of twins as parents of twins).

One might first attribute this to fertility treatments, causing a disproportionate number of modern celebrities to have babies, and twins, at advanced ages. But this wouldn't explain George Bush, Bing Crosby, etc. My guess is that children of celebrities are more publicized than siblings of celebrities. Also, Twins magazine is clearly aimed at parents of twins, not twins themselves. But still I'd like to think that they could do better than mid-list actors...

Posted by Andrew at 8:50 PM | Comments (2)

December 23, 2004

Family demography and public policy seminar

Here's the spring 2005 schedule for the Family Demography and Public Policy Seminar at Columbia's School of Social Work. Lots of interesting stuff, it looks like:

Jan. 25 Irv Garfinkel, Mitchell I. Ginsberg Professor of Contemporary Urban Problems and Lenna Nepomnyaschy,
Post-Doctoral Researcher. “The Relationship between Formal and Informal Child Support.”

**Feb. 3 Asher Ben-Arieh, Lecturer, Paul Baerwald School of Social Work at the Hebrew University of Jerusalem
THURSDAY Visiting Professor, Institute of Family and Neighborhood Life, Clemson University. “Where are the
Children? Children’s Role in Measuring and Monitoring their Well-Being.”

Feb. 8 Clive Belfield, Assistant Professor of Economics, Queen's College, CUNY. “The High/Scope Perry
Preschool Program: Cost-Benefit Analysis Using Data from the Age-40 Follow-Up”

Feb. 15 Julien Teitler, Assistant Professor, CUSSW. Topic: TBA

Feb. 22 Ronald Mincy. Maurice V. Russell Professor of Social Policy and Social Work Practice; Jennifer Hill.
Assistant Professor, School of International and Public Affairs; and Marilyn Sinkewicz. Doctoral Student,
CUSSW. “Productivity and Marital Status: Extending the Civilizing Hypothesis with Multiple Imputation.”

March 1 Jeanne Brooks-Gunn, Virginia and Leonard Marx Professor of Child Development and Education,
Teachers College. With co-authors Holly Foster, Yange Xue, and Tama Leventhal, “Links between
Neighborhood Processes and Adolescent Behaviors.”

March 8 Tom DiPrete, Professor of Sociology, CU. “Do Cross-National Differences in the Costs of Children
Generate Cross-National Differences in Fertility Behavior?”

March 15 SPRING BREAK – NO SEMINAR

March 22 Robert Crosnoe, Assistant Professor of Sociology, University of Texas at Austin. “Standing Out, Fitting In:
Schools as Contexts of Human Development.”

March 29 Derrick Hamilton, Assistant Professor, Milano Graduate School of Management and Urban Policy, New
School University. “From Dark to Light: Skin Color and Wages among African-Americans.”

April 5 Jeffrey Fagan, Professor of Law and Public Health, CU. “Incarceration Effects on Voter Participation in
New York City, 1985-96,” with co-author, Valerie West, Research Scientist, School of Law, CU.

April 12 Cordelia Reimers, Professor of Economics, Hunter College, CUNY. “The Impact of 9/11 on Low-Skilled,
Minority, and Immigrant Workers in New York City.”

April 19 Brendan O’Flaherty, Associate Professor of Economics, CU. “New York City Homeless Shelter
Dynamics.”

April 26 Mattew Neidell, Assistant Professor of Health Policy and Management, Mailman School of Public Health.
Topic: TBA

May 3 Henry Levin, William Heard Kilpatrick Professor of Economics and Education at Teachers College, “The
Cost to Society of Inadequate Education: A Benefit-Cost Approach.”

May 10 Hiro Yoshikawa, Associate Professor of Psychology and Public Policy, New York University. “Five-Year
Effects of an Anti-Poverty Program on Entry into Marriage among Never-Married Mothers: Exploring
Economic and Psychological Mediators.”

May 17 William Rodgers, Professor of Economics, Edward J. Bloustein School of Planning and Public Policy,
Rutgers University. Topic: TBA

Posted by Andrew at 1:25 PM | Comments (0)

December 17, 2004

Radon webage is back up and running

Our radon risk page (created jointly with Phil Price of the Indoor Environment Division, Lawrence Berkeley National Laboratory), is fully functional again.

You can now go over to the map, click on your state and then your county, give information about your house, give your risk tolerance (or use the default value), and get a picture of the distribution of radon levels in houses like yours. You also get an estimate of the dollar costs and lives saved from four different decision options along with a decision recommendation. (Here's an example of the output.)

We estimate that if all homeowners in the U.S. followed the instructions on this page, there would be a net savings of about $10 billion (with no additional loss of life) compared to what would happen if everybody followed the EPA's recommendation.

Posted by Andrew at 7:23 AM | Comments (0)

November 23, 2004

Arsenic in Bangladesh; sharing wells

Many of the wells used for drinking water in Bangladesh and other South Asian countries are contaminated with natural arsenic, affecting an estimated 100 million people. Arsenic is a cumulative poison, and exposure increases the risk of cancer and other diseases.

Is my well safe?

One of the challenges of reducing arsenic exposure is that there's no easy way to tell if your well is safe. Kits for measuring arsenic levels exist (and the evidence is that aresenic levels are stable over time in any given well), but we and other groups are just beginning to make these kits widely available locally.

Suppose your neighbor's well is low in arsenic. Does this mean that you can relax? Not necessarily. Below is a map of arsenic levels in all the wells in a small area (see the scale of the axes) in Araihazar upazila in Bangladesh:

figure1A.png

Blue and green dots are the safest wells, yellow and orange exceed the Bangladesh standard of 50 micrograms per liter, and red and black indicate the highest levels of arsenic.

Bad news: dangerous wells are near safe wells

As you can see, even if your neighbor has a blue or green well, you're not necessarily safe. (The wells are located where people live. The empty areas between the wells are mostly cropland.) Safe and dangerous wells are intermingled.

Good news: safe wells are near dangerous wells

There is an upside, though: if you currently use a dangerous well, you are probably close to a safe well. The following histogram shows the distribution of distances to the nearest safe well, for the people in the map above who currently (actually, as of 2 years ago) have wells that are yellow, orange, red, or black:

figure5bA.png

Switching and sharing

So if you are told where that safe well is, maybe you can ask your neighbor who owns that well to share. In fact, a study by Alex Pfaff, Lex van Geen, and others has found that people really do switch wells when they are told that their well is unsafe. We're currently working on a cell-phone-based communication system to allow people in Bangladesh to get some of this information locally.

General implications for decision analysis

This is an interesting example for decision analysis because decisions must be made locally, and the effectiveness of various decision strategies can be estimated using direct manipulation of data, bypassing formal statistical analysis.

Other details

Things are really more complicated than this because the depth of the well is an important predictor, with different depths being "safe zones" in different areas, and people are busy drilling new wells as well as using and measuring existing ones. Some more details are at our papers in Risk Analysis and Environmental Science & Technology.

Posted by Andrew at 12:47 PM | Comments (0)

November 16, 2004

Institutional decision analysis

The term "decision analysis" has multiple meanings in Bayesian statistics. When we use the term here, we are not talking about problems of parameter estimation, squared error loss, etc. Rather, we use "decision analysis" to refer to the solution of particular decision problems (such as in medicine, public health, or business) by averaging over uncertainties as estimated from a probability model. (See here for an example.)

That said, decision analysis has fundamental difficulties, most notably that it requires one to set up a utility function, which on one hand can be said to represent subjective feelings but on the other hand is presumably solid enough that it is worth using as the basis for potentially elaborate calculations.

From a foundational perspective, this problem can be resolved using the concept of institutional decision analysis.

Personal vs. institutional decision analysis

Statistical inference has an ambiguous role in decision making. Under a "subjective" view of probability (which I do not generally
find useful; see Chapter 1 of Bayesian Data Analysis), posterior inferences represent the personal beliefs of the analyst, given his or her prior information and data. These can then be combined with a subjective utility function and input into a decision tree to determine the optimal decision, or sequence of decisions, so as to maximize subjective expected utility. This approach has serious drawbacks as a procedure for personal decision making, however. It can be more difficult to define a utility function and subjective probabilities than to simply choose the most appealing decision. The formal decision-making procedure has an element of circular reasoning, in that one can typically come to any desired decision by appropriately setting the subjective inputs to the analysis.

In practice, then, personal decision analysis is most useful when the inputs (utilities and probabilities) are well defined. For example, in a decision problem of the costs and benefits of screening for cancer, the utility function is noncontroversial--years of life, with a slight adjustment for quality of life--and the relevant probabilities are estimated from the medical literature. Bayesian decision analysis then serves as a mathematical tool for calculating the expected value of the information that would come from the screening.

In institutional settings--for example, businesses, governments, or research organizations--decisions need to be justified, and formal decision analysis has a role to play in clarifying the relation between the assumptions required to build and apply a relevant probability model and the resulting estimates of costs and benefits.
We introduce the term "institutional decision analysis" to refer to the process of transparently setting up a probability model, utility function, and an inferential framework leading to cost estimates and decision recommendations. Depending on the institutional setting, the decision analysis can be formalized to different extents.

In general, there are many ways in which statistical inferences can be used to inform decision-making. The essence of the "objective" or "institutional" Bayesian approach is to clearly identify the model assumptions and data used to form the inferences, evaluate the reasonableness and the fit of the model's predictions (which include decision recommendations as a special case), and then expand the model as appropriate to be more realistic. The most useful model expansions are typically those that allow more information to be incorporated into the inferences.

Further discussion and several examples appear in Chapter 22 of "Bayesian Data Analysis."

Posted by Andrew at 12:51 PM | Comments (2)

November 12, 2004

Expedient Methods in Environmental Indexing

We call an Environmental Index an agglomeration of data compiled to provide a relative measure of environmental conditions. Environmental data is often sparse or non-random missing; many concepts, such as environmental risk or sustainability, are still being defined; indexers must balance modeling sophistication with modeling facility and model interpretability. We review our approaches to these constraints in the construction of the 2002 ESI and the UN Development Programme risk report.

This presentation, delivered at INFORMS2004, is a sketch of some work completed at CIESIN from 2001-2004 - where I spent two years as a gra. A paper has been submitted on diagnostics for multiple imputation used in the ESI. I hope to generate a paper on the bayesian network aggregation used in the risk index. I'm talking dec. 9th.

Posted by at 1:39 PM | Comments (0)