Results matching “R”

3-judge panels

John writes,

I'm starting to work on a paper empirically modeling dissents on appellate panels, and was hoping you could help me with the proper way to model them.

Here's what we can observe: 3 judges decide a case, 1 writes an opinion, and the other 2 either go along with that an opinion or write a dissent (about 10% of the time or less). In reality, the likelihood is that 2 of the three judges form a majority coalition, and then the 3rd judge decides whether to go along with them or not. However, we can't observe the formation of this coalition, except in the (rare) cases where there is a dissent.

Here's the problem: I want to model the individual judge's decision whether or not to dissent. The leading work in this area treats each of the 2 judges who don't write the opinion as separate observations, where 0 = no dissent and 1 = dissent, with standard errors clustered on cases. That it, they are essentially treating each judges' votes as independent (except in the error terms). But this strikes me as wrong because one judge's decision completely determines the other's, since you can't have 2 dissents in a case. I'm sure this issue must arise in other contexts (it's sort of like a conditional logit problem, but not really), and there must be ways to model this properly, but can't think of anything concrete.

My quick thought is that you want some sort of latent-data model. But I'm not quite sure what latent data you want. As Rubin says, what would you do if you have all the data? (Or, in your case, what are "all the data" that you want?) You already know how all the judges voted, right? Perhaps the latent data are the order of voting--the formation of the majority coalition. If so, you can set up a "structural model" with probabilities of each possible coalition (including unanimous 3-0 coalitions, I assume), along with a "measurement model" of the actual votes given the latent data. Such a model can then be fit using Bayes (e.g., Bugs).

The Prosecutor’s Fallacy

John sent in this interesting discussion of conditional probability calculations in court. Here's the article:

The estimate, not the prior

In a comment to this entry on using (y+1)/(n+2) instead of y/n, Aleks writes, "Instead of debating estimates, why not debate priors?" My quick answer to this is, yeah, sure, but there can be information in "n" also. Which formally means that the prior distribution for theta will depend on n, but sometimes this can be more conveniently considered in terms of the properties of estimates.

To look at it another way, my entry was linked to here by "RPM" and elicited the following comment from Jan Moren:

So the chance of me having a three-way with Madeleine Albright and an 17-tentacled alien from Arcturus humming show tunes on the roof of Sogo department store in downtown Osaka is 0.5? Sweet!

This isn't right, though, since there's no "y" or "n" in this Madeline Albright scenario. Jan is commenting on the implied prior distribution, not on the estimate.

Janez Demsar has shown me a chart of machine learning conferences and their similarity. To compute the similarity between the conferences, he used the Jaccard distance, based on the proportion of authors that publish in both venues (set intersection of authors of both venues) versus those that publish in either (set union of authors of both venues). Afterwards, he employed multidimensional scaling to embed the points into 2D space. Lines' thickness indicates proximity. As for color, red are journals, blue are conferences. He acquired the data from the DBLP.

mds-graph.png

We can see the data mining (KDD/PKDD) more towards the bottom, machine learning in the middle (ICML/ECML/ML/JMLR) largely separating the two, and AI on the top. To the left there are special areas, such as neural networks (ICANN/NN) or medical applications (ARTMED/AIME). Do not, however, interpret these areas as marginal: it's just that the lens was centered on the highly connected conferences to the right of the diagram.

There are a few challenges with analyzing such proximity data statistically. First: the authorship data should be controlled by year: long-running conferences will appear detached from the base. Second: when there is not much data, there is uncertainty in the similarity. For this we first need a probabilistic stress function (an uncertain distance can be stretched or shrunk more than a certain one). Finally, the nonconvexity of MDS can be remedied with good priors. One might also debate the pros and cons of using similarity functions on the original features, or whether to generate the original features directly from the latent variables.

Also see Map of Science and Scientometrics.

Campaign contributions

A colleague writes,

We've been looking at donations from 2004 to candidates and parties... data from the FEC (it's part of a larger project we're working on... the election is not ultimately interesting to us). Anyway, we noticed that the contributions from LA County represented 2.5% in terms of raw count and 3% in terms of value. This seemed small to me, but matches the population % for LA County (we're about 10 million people).

Do these 2.5% and 3% numbers sound right? I would have thought the metropolitan areas would have been higher.

Socially optimal districting?

Here's a paper by Stephen Coate and Brian Knight. The abstract:

This paper investigates the problem of optimal districting in the context of a simple model of legislative elections. In the model, districting matters because it determines the seat-vote curve, which describes the relationship between seats and votes. The paper rst characterizes the optimal seat-vote curve, and shows that, under a weak condition, there exist districtings that generate this ideal relationship. The paper then develops an empirical methodology for computing seat-vote curves and measuring the welfare gains from implementing optimal districting. This is applied to analyze the districting plans used to elect U.S. state legislators during the 1990s.

This is clever, no doubt about it. I never would have thought there would be a way to come up with an optimal seats-votes curve based on maximizing welfare gain. On the other hand, I can't say I really believe it. It just seems contrary to the basic idea of representative democracy to say that the optimal partisan bias is nonzero. It just seems too sensitive to the assumptions of the model. But here it is--your opinions may differ from mine. As I said, the paper is impressive as an intellectual endeavor!

Just a couple other comments:

Justin Gross writes:

Is there a way in lmer to specify a particular nonzero correlation between certain random effects? I'm working with a social relations type model (with logit link)--similar to models by Snijders and Kenny, and Hoff, et al. Among my random effects are sender and receiver effects, s_i and r_j indicating that an observed dyadic relation is from actor i to actor j. But I would like to allow for the possibility that s_i and r_i are correlated, that is actor i's "gregariousness" is not unrelated to his/her "popularity". lmer estimates the variance for sender and receiver effects, but If I wish to incorporate covariance between sending and receiving, do I need to just estimate this separately and adjust my estimates manually, or can this be accommodated within lmer.

My quick answer: yes, you can have correlations, no problem. It just gets structured in the same way as correlations in a varying-intercept, varying-slope model. In either case, you have two parameters for each group. Whether this can be done easily in lmer, I'm not sure. But you can do it in Bugs (as long as it runs in reasonable time and doesn't crash).

More hoops

Carl Bialik has followed up on the NBA bias study. (See here for my earlier comments.) There was some interesting back-and-forth, and the problem had an interesting feature in that different parties were performing different analyses on different datasets. My only added comment is that I mentioned this stuff to Carl Morris yesterday and he pointed out that it would be interesting to see who were the players who were fouled against. For (almost) every foul there is a "foulee", and you'd think that racism would manifest itself most as fouls called on a white guy playing against a black player or on a black guy facing a white player.

(y+1)/(n+2) instead of y/n

Alan Agresti has written some papers motivating the (y+1)/(n+2) estimate, instead of the raw y/n estimate, for probabilities. (Here we're assuming n independent tries with y successes.)

The obvious problem with y/n is that it gives deterministic estimates (p=0 or 1) when y=0 or y=n. It's also tricky to compute standard errors at these extremes, since sqrt(p(1-p)/n) gives zero, which can't in general be right. The (y+1)/(n+2) formula is much cleaner. Agresti and his collaborators did lots of computations and simulations to show that, for a wide range of true probabilities, (y+1)/(n+2) is a better estimates, and the confidence intervals using this estimate have good coverage properties (generally better than the so-called exact test; see Section 3.3 of this paper for my fulminations against those misnamed "exact tests").

The only worry is . . .

The only place where (y+1)/(n+2) will go wrong is if n is small and the true probability is very close to 0 or 1. For example, if n=10 and p is 1 in a million, then y will almost certainly be zero, and an estimate of 1/12 is much worse than the simple 0/10.

However, I doubt that would happen much: if p might be 1 in a million, you're not going to estimate it with a n=10 experiment. For example, I'm not going to try ten 100-foot golf putts, miss all of them, and then estimate my probability of success as 1/12.

Conclusion

Yes, (y+1)/(n+2) is a better default estimate than y/n.

China's missing girls

China has more boy babies, compared to girls, than would be expected from the usual biological sex ratio. Monica Das Gupta has written a quick summary of her research explaining why she attributes this to preference to sons (resulting in differential rates of abortion and, possibly, infanticide, by sex).

fig1_missing_women.gif

fig3_missing_women.gif

Masanao discovered this interesting paper by Ralf Munnich and Susanne Rassler. From the abstract:

In this paper we [Munnich and Susanne Rassler] discuss imputation issues in large-scale data sets with different scaled variables laying special emphasis on binary variables. Since fitting a multivariate imputation model can be cumbersome, univariate specifications are proposed which are much easier to perform. The regression-switching or chained equations Gibbs sampler is proposed and possible theoretical shortcomings of this approach are addressed as well as data problems.

A simulation study is done based on the data of the German Microcensus, which is often used to analyse unemployment. Multiple imputation, raking, and calibration techniques are compared for estimating the number of unemployed in different settings. We find that the logistic multiple imputation routine for binary variable, in some settings, may lead to poor point as well as variance estimates. To overcome possible shortcomings of the logistic regression imputation, we derive a multiple imputation matching algorithm which turns out to work well.

This is important stuff. They refer to the packages Mice and Iveware which have inspired our new and improved (I hope) "mi" package which is more flexible than these predecessors. Unfortunately with this flexibility comes the possibility of more problems, so it's good to see this sort of research. I like the paper a lot (except for the ugly color figures on pages 12-16!).

Weakly informative priors

Bayesians traditionally consider prior distributions that (a) represent the actual state of subject-matter knowledge, or (b) are completely or essentially noninformative. We consider an alternative strategy: choosing priors that convey some generally useful information but clearly less than we actually have for the particular problem under study. We give some examples, including the Cauchy (0, 2.5) prior distribution for logistic regression coefficients, and then briefly discuss the major unsolved problem in Bayesian inference: the construction of models that are structured enough to learn from data but weak enough to learn from data.

I'm speaking Monday on this at Jun Liu's workshop on Monte Carlo methods at Harvard (my talk is 9:45-10:30am Monday at 104 Harvard Hall).

Here's the presentation. I think this is potentially a huge advance in how we think about Bayesian models.

A message from Nebraska

Eric Aspengren writes,

I just discovered your blog via MyDD.com . . . I work as a political operative in Nebraska and I'm trying to give myself a crash course in statistical analysis vis-a-vis political science and campaigns. I do not have a math background and have had to pick most of this stuff up piecemeal as I go. This summer, though, after our local elections, I plan to dive into this field as much as possible. Your blog seems to be a very good resource for finding the latest research in the field.

I've been searching for something like this for some time.

Are there any specific online resources for good, basic, tutorials in this sort of analysis? Also, I have some training in GIS and wonder if you know of any good online resources concerning geostatistical analysis concerning vote prediction and GOTV. I would specifically like to find ways to maximize limited resources given certain demographic and other data of voters.

I have to say that I'm not, given my limited understanding, sold on Bayesian analysis. The promises of Bayesian analysis, though, are quite tantalizing. I think I can be sold on this if I can find some evidence of its application to elections and GOTV.

Cool--a political operative in Nebraska! It's encouraging to know that we are reaching people outside of academia! In answer to the question, online resources for statistics are not so good. I recommend my own books, with the newest (Data analysis using regression and multilevel/hierarchical models) being the more accessible. We don't actually do geostatistical analyses, though. Regarding get out the vote, you could read the papers of Alan Gerber and Don Green. There is some controversy about their findings; see here.

With Antti Pajala we have built a database of roll-call votes in Finland for the last decade and a half or so. One of the things we focused on was examining the cohesion of the Eduskunta (Finnish parliament). Perhaps the most interesting thing we found was that if in the year preceding the elections, the cohesion of either the cabinet (and sometimes also of the opposition) would drop, the structure of the cabinet would be replaced with a different group of parties.

finnish-cohesion2.png

Notice the drops when the cabinet composition changed in 1995, 2002 and 2006. No change took place in 1999, however, and there was no change in cohesion either: so our approach predicts it correctly every time. My causal theory is that as MP's start disliking each other, they will no longer be as motivated to seek consensus and the dislike will also reflect in the next election, but Piero says that dislike is in turn caused by the cabinet losing popularity in the public and followed by the emergence of competing factions.

Our working paper (only written in Finnish, however) is here. It would be interesting to see if similar phenomenon shows up in other parliaments. It's always cool to be able to forecast elections a year in advance.

Names really do make a difference

Aleks pointed me to this news report by Anushka Asthana:

Parents are being warned to think long and hard when choosing names for their babies as research has discovered that girls who are given very feminine names, such as Anna, Emma or Elizabeth, are less likely to study maths or physics after the age of 16, a remarkable study has found.

Both subjects, which are traditionally seen as predominantly male, are far more popular among girls with names such as Abigail, Lauren and Ashley, which have been judged as less feminine in a linguistic test. The effect is so strong that parents can set twin daughters off on completely different career paths simply by calling them Isabella and Alex, names at either end of the spectrum. A study of 1,000 pairs of sisters in the US found that Alex was twice as likely as her twin to take maths or science at a higher level.

Wil Wilkinson points to an interesting article by Nicholas Eberstadt (and adds some comments of his own) on the topic of the high birth rates in the United States compared to Europe. Wilkinson attributes the difference to Americans' higher average rates of reported happiness and, regarding government policy, cites Shelly Lundberg and Robert Pollak to suggest that birth rates could be raised via policies leading to lower unemployment for young adults. I know there have been some studies of the relation between local economic conditions and birthrates, but I can't remember the findings. I seem to recall some interactions, with different patterns among different ethnic groups.

The business of unemployment and children is interesting, since from an abstract perspective I suppose that lower unemployment is a good thing, but so are lower birthrates (at least in the U.S., where the population is growing via in immigration anyway). And of course if people are unemployed, presumably they have more time to take care of the children. Maybe "unemployment" isn't quite the right measure here.

To continue with the economic argument . . . Wilkinson writes, "I like the optimism explanation. It's easy to see why folks would refrain from reproduction if they thought their kids had only a broiling, denuded planet full of wretched consumer-zombies living pointless lives in cookie-cutter McMansions and soulless big box strip malls to look forward to." This isn't quite right, I think: McMansions are good things to have--I think that pessimism is thinking you'll live in a bad neighborhood, not that you'll live in a McMansion.

What I really meant to say was . . .

Anyway, the real reason I brought this up was not to talk about happiness and birth rates (on which I'm no expert) but to discuss the challenges of the "why" sort of causal inference. It's a basic mode of science (and of social science): we see stylized fact X (in this case, higher birth rates in the U.S. than in Europe) and then try to make various comparisons to figure out the causes of X.

But Rubin has taught us to look for the effects of causes, not the causes of effects. A similar problem arose in our Department of Health study where we were trying to understand the different rates of rodent infestation comparing whites, blacks, and hispanics in NYC. Even after controlling for some available information such as the neighborhood, the quality of the building, the floor of the apartment, etc., there were more rodents in the apartments of ethnic minorities. We'd like to "explain"--understand--this pattern, but this sort of reasoning doesn't fit directly into the statistical framework of causal inference. One approach is to reframe things in terms of potential intervntions (as I've done above with the birthrate example by imagining policies that lower unemployment). But that doesn't seem to completely get at Wilkinson's question about happiness.

Stories about the qualifying exam

Yves told me he's doing his qualifying exam, which brought back memories from 20 years ago. At one point we had a fire alarm. Before exiting the bldg, I went into the office and took out all the booklets of exams I'd been working on--our exam was a 2-week-long takehome. Just on the off chance there actually was a fire, I didn't want my papers to burn up!

Among other things, I learned logistic regression from a problem of Bernie Rosner, and I got stuck on a very hard, but simple-looking, problem from Fred Mosteller on dog learning, an example that I ended up returning to again and again, most recently in our new multilevel modeling book. There was also a problem by Peter Huber--he was notorious for using the same problem year after year--it featured a 10th-generation photocopy of a very long article on crystallography, I think, along with a suggestion that the data be analyzed using robust methods. Like everyone else, I skipped that one. But I did spend three days on a problem from Art Dempster on belief functions--it was so much work but I really felt I needed to do it.

As a student, we all took this so seriously. The perspective from the faculty direction is much different, since a key part of the exam is to evaluate the students, not just to give them an exciting challenge. Also we've had problems over the years with cheating, so it's been difficult to have take-home exams. Finally, I heard a rumor that our students were told not to worry, there won't be anything Bayesian on Columbia's qualifying exam this year. Say it ain't so!!

Does anyone know who made this map that Aleks pointed me to? I want a version without the city names, state boundaries, and state colors. Thanks!

election2500.png

Recursion

This story amused me.

Measuring media bias

Tim Groseclose and Jeffrey Milyo wrote a paper on "A measure of media bias." Here's the paper, and here's the abstract to the paper:

We [Groseclose and Milyo] measure media bias by estimating ideological scores for several major media outlets. To compute this, we count the times that a particular media outlet cites various think tanks and policy groups, then compare this with the times that members of Congress cite the same groups. Our results show a strong liberal bias: all of the news outlets we examine, except Fox News’ Special Report and the Washington Times, received scores to the left of the average member of Congress. Consistent with claims made by conservative critics, CBS Evening News and the New York Times received scores far to the left of center. The most centrist media outlets were PBS NewsHour, CNN’s Newsnight, and ABC’s Good Morning America; among print outlets, USAToday was closest to the center. All of our findings refer strictly to news content; that is, we exclude editorials, letters, and the like.

They fit a version of an ideal-point model to mentions of "think tanks and policy groups" by Congressmembers and media outlets and they find, basically, that most of the newspapers they look at quote a mixture of groups that is similar to moderate-to-conservative Democrats, most of the TV shows are comparable in quotations to conservative Democrats in Congress, and two of the more partisan Republican news organizations (Fox News and the Washington Times) have quotation patterns that are comparable to liberal Republicans in Congress.

This makes sense--as the authors note, surveys have found that many more journalists are Democrats than Republicans, but partisans of both sides have to moderate their views in order to maintain journalistic credibility.

I wonder to what extent these bias measures depend on the issues under consideration. There's also the question of the relevance of quotation patterns to the larger questions of bias. As well as the question of what role the press should be expected to take in a representative democracy--for example, will a mass-readership press be expected to hold more left-wing views so as to be popular with readers, or more right-wing views so as to be popular with advertisers? There's also the difference between local and national news. Lots to think about.

Brendan's comments

P.S. Here are Brendan Nyhan's thoughts criciticm of the paper. Brendan's criticisms seem valid to me; notheless I'm a bit more positive than Brendan is about the paper, I think because the problem of studying media bias is tough, and I'm impressed about what Groseclose and Milyo did manage to do. Perhaps just my own bias in showing an affinity with quantitative researchers . . . I do agree, though, that "bias" isn't quite the right word to discuss what Groseclose and Milyo measure, since "bias" implies a deviation from some unbiased position or truth, which I don't see them measuring.

P.S.

I also have a few comments on the presentation of the results:

Corruption and parking tickets

Ray Fisman spoke in our quantiative political science seminar, reporting on his paper with Ted Miguel on the number of unpaid parking tickets of each U.N. delegation in Manhattan. (Diplomats don't have to pay parking tickets, although in recent years the mayor of NY has reduced the problem by about 90% by more aggressively towing cars that are illegally parked.) There's a strong correlation between the number of unpaid tickets and a measure of corruption of each country--that is, diplomats from countries with more of a "culture of corruption" had more unpaid tickets.

References and endnotes in books

I pretty much agree with Aaron Haspel's rant about footnotes and endnotes. I think Deb Nolan and I did a good job of these in the Notes section of Teaching Statistics: A Bag of Tricks. But we should have referred to page numbers as well as section numbers.

In all my books I've been careful to put the references at the ends of chapters or at the end of the book, rather than threaded through the text. I don't like in-text references (for example, "We consider the xyz model (Jones, 1994)") because they seem to me to be a way of passing the buck. I want everything in my book to be something that I believe and stand behind. But then at the end of the chapter or book, I do want to credit where I got the ideas from.

Question about 1/f noise

For decades I've been reading about 1/f noise and have been curious what it sounds like. I've always been meaning to write a little program to generate some and then play it on a speaker, but I've never put the effort into figuring out exactly how to do it. But now with Google . . . there must be some 1/f noise out there already, right?

A search on "1/f noise" yielded this little snippet of 1/f noise simulated by Paul Bourke from a deterministic algorithm. It sounded pretty cool, like what I'd imagine computer music to sound like.

I did a little more searching on the web; it was easy to find algorithms and code for generating 1/f ("pink") noise but surprisingly difficult to find actual sounds. I finally found this 15-second sinppet, which sounded like ocean waves, not like computer music at all! (You can compare to the sample of white noise, which indeed sounds like irritating static.)

I also found this online 1/f noise generator from a physics class at Berkeley. It works, also shows the amplitude series and spectrum. Also sounds like ocean waves. I'm disappointed--I liked the computer music. What's the deal?

I'd like to move from basketball to something more important: geriatric care, a topic I was reminded of after reading this interesting article by Atul Gawande.

The article starts with some general discussion of the science of human aging, then moves to consider options for clinical treatment. Gawande learns a lot from observing a gerontologist's half-hour meeting with a patient. He tells a great story (too long to make sense to repeat here), although I suspect he was choosing the best out of the many patients he observed. He notes:

In the story of Jean Gavrilles and her geriatrician, there’s a lesson about frailty. Decline remains our fate; death will come. But, until that last backup system inside each of us fails, decline can occur in two ways. One is early and precipitately, with an old age of enfeeblement and dependence, sustained primarily by nursing homes and hospitals. The other way is more gradual, preserving, for as long as possible, your ability to control your own life.

Good medical care can influence which direction a person’s old age will take. Most of us in medicine, however, don’t know how to think about decline. We’re good at addressing specific, individual problems: colon cancer, high blood pressure, arthritic knees. Give us a disease, and we can do something about it. But give us an elderly woman with colon cancer, high blood pressure, arthritic knees, and various other ailments besides—an elderly woman at risk of losing the life she enjoys—and we are not sure what to do.

Gawande continues with a summary of this study:

Several years ago, researchers in St. Paul, Minnesota, identified five hundred and sixty-eight men and women over the age of seventy who were living independently but were at high risk of becoming disabled because of chronic health problems, recent illness, or cognitive changes. With their permission, the researchers randomly assigned half of them to see a team of geriatric specialists. The others were asked to see their usual physician, who was notified of their high-risk status. Within eighteen months, ten per cent of the patients in both groups had died. But the patients who had seen a geriatrics team were a third less likely to become disabled and half as likely to develop depression. They were forty per cent less likely to require home health services.

Little of what the geriatricians had done was high-tech medicine: they didn’t do lung biopsies or back surgery or PET scans. Instead, they simplified medications. They saw that arthritis was controlled. They made sure toenails were trimmed and meals were square. They looked for worrisome signs of isolation and had a social worker check that the patient’s home was safe.

But now comes the kicker:

How do we reward this kind of work? Chad Boult, who was the lead investigator of the St. Paul study and a geriatrician at the University of Minnesota, can tell you. A few months after he published his study, demonstrating how much better people’s lives were with specialized geriatric care, the university closed the division of geriatrics.

“The university said that it simply could not sustain the financial losses,” Boult said from Baltimore, where he is now a professor at the Johns Hopkins Bloomberg School of Public Health.

One of the problems comes from the "separate accounts" fallacy in decision making:

On average, in Boult’s study, the geriatric services cost the hospital $1,350 more per person than the savings they produced, and Medicare, the insurer for the elderly, does not cover that cost. It’s a strange double standard. No one insists that a twenty-five-thousand-dollar pacemaker or a coronary-artery stent save money for insurers. It just has to maybe do people some good. Meanwhile, the twenty-plus members of the proven geriatrics team at the University of Minnesota had to find new jobs. Scores of medical centers across the country have shrunk or closed their geriatrics units. Several of Boult’s colleagues no longer advertise their geriatric training for fear that they’ll get too many elderly patients. “Economically, it has become too difficult,” Boult said.

But the finances are only a symptom of a deeper reality: people have not insisted on a change in priorities. We all like new medical gizmos and demand that policymakers make sure they are paid for. They feed our hope that the troubles of the body can be fixed for good. But geriatricians? Who clamors for geriatricians? What geriatricians do—bolster our resilience in old age, our capacity to weather what comes—is both difficult and unappealingly limited. It requires attention to the body and its alterations. It requires vigilance over nutrition, medications, and living situations.

On the plus side, Baltimore has much better weather than St. Paul.

From the article by Boult et al. (you might notice a shift in style from the New Yorker to
the Journal of the American Geriatric Society):

PARTICIPANTS: A population-based sample of community-dwelling Medicare beneficiaries age 70 and older who were at high risk for hospital admission in the future (N = 568).

INTERVENTION: Comprehensive assessment followed by interdisciplinary primary care.

MEASUREMENTS: Functional ability, restricted activity days, bed disability days, depressive symptoms, mortality, Medicare payments, and use of health services. Interviewers were blinded to participants' group status.

RESULTS: Intention-to-treat analysis showed that the experimental participants were significantly less likely than the controls to lose functional ability (adjusted odds ratio (aOR) = 0.67, 95% confidence interval (CI) = 0.47–0.99), to experience increased health-related restrictions in their daily activities (aOR = 0.60, 95% CI = 0.37–0.96), to have possible depression (aOR = 0.44, 95% CI = 0.20–0.94), or to use home healthcare services (aOR = 0.60, 95% CI = 0.37–0.92) during the 12 to 18 months after randomization. Mortality, use of most health services, and total Medicare payments did not differ significantly between the two groups. The intervention cost $1,350 per person.

CONCLUSION: Targeted outpatient GEM slows functional decline.

Racial bias in basketball fouls

Yu-Sung and Jeff pointed me to a study by Joseph Price and Justin Wolfers on racial discrimination among NBA referees. Basically, black refs call more fouls on white players and vice-versa.

basketball.png

Here's a news article (by Alan Schwarz), here's the technical paper, and here's the abstract (with my thoughts following):

At last!

The second printing of the book is available.

cover.gif

Here's what I wrote when the first printing came out.

Panels of judges

John Kastellec writes,

attach.all() for R

The attach() function in R can be frustrating because it does not overwrite. So we wrote attach.all():

Doing surveys through the web

Aleks pointed me to this site.

Bruce McCullough points me to this note by Bernard Harcourt on the negative correlation between the rates of institutionalization and homicide. Basically, when more people have been in mental hospitals, there have been fewer homicides, and vice-versa.

It makes sense since, presumably, men who are institutionalized are more likely to commit crimes, so I'm surprised that Harcourt descrbes his results as "remarkable--actually astounding. These regressions cover an extremely lengthy time period . . . a large number of observations . . . and the results remain robust and statistically significant . . ." With a large data set, you're more likely to find statistically significance. Especially when the main result is so plausible in the first place.

Harcourt concludes with some interesting comments about the applicability of his results. (I'd also like to recommend the paper by Donohue and Wolfers on death penalty deterrence as a model example of this sort of analysis.)

P.S. See here for an update by Harcourt, where he explains why he finds his results surprising. I'm not convinced--I believe the results are important, just not that they're suprising.

Funny stuff

Harcourt's blog entry had some amusing comments:

There's some cool and (possibly) important stuff in Yue Cui's dissertation summary (under the supervision of Jim Hodges and Brad Carlin at University of Minnesota biostat). The short story is that, for reasons of substantive modeling as well as prediction, we're pushing to fit more and more complicated models to data. (See here and here for my thoughts on "Occam's Razor." I'm with Radford on this one.)

Anyway, having fit these models, we need to figure out how to understand them. Cui's dissertation is pretty technical so I won't try to summarize it here, but it has some interesting stuff for those of you out there who like counting degrees of freedom. Jim writes,

Highlights: Chapter 2 summarizes our Technometrics paper on smoothed ANOVA, due to appear very soon. Chapter 3 redefines degrees of freedom in a way that's consistent with the Hodges & Sargent definition but which allows a tidy decomposition of df by effects for arbitrary linear hierarchical models with normal errors (we think -- no counterexamples yet). This gives new options for putting priors on smoothing parameters, among other things. Chapter 4 extends the version of SANOVA in our Technometrics to arbitrary designs (again, we can't see how there *could* be a counterexample, but my imagination has failed me before). In particular, it allows more than one error term and doesn't require balance.

'll also put in a plug for my paper with Pardoe on R-squared and pooling factors for multilevel models. In all these papers, the game is to come up with something that looks reasonable and gives the right answer in key special cases.

This paper by David Blanchflower and Andrew Oswald (from the Australian Economic Review in 2005) looks interesting. I'm interested in happiness (who isn't?) but this paper particularly interests me because it addresses a special case of the general statistical problem of summarizing multivariate data by indexes. Here's the abstract:

According to the well-being measure known as the U.N. Human Development Index, Australia now ranks 3rd in the world and higher than all other English-speaking nations. This paper questions that assessment. It reviews work on the economics of happiness, considers implications for policymakers, and explores where Australia lies in international subjective well-being rankings. Using new data on approximately 50,000 randomly sampled individuals from 35 nations, the paper shows that Australians have some of the lowest levels of job satisfaction in the world. Moreover, among the sub-sample of English-speaking nations, where a common language should help subjective measures to be reliable, Australia performs poorly on a range of happiness indicators. The paper discusses this paradox. Our purpose is not to reject HDI methods, but rather to argue that much remains to be understood in this area.

I recommend--for the next paper these folks write--to present the results in graphical, not tabular, form, and to order the countries in some reasonable way (for example, in order of per-capita GDP) rather than alphabetically. For example, do we really need to know that Australia has a value of 5.39 for one index and 5.62 for another? These comments apply to the raw data and also the displays of regression coefficients.

Going on to the substance of the paper, I have no particular comments. It is admirably crisp and speaks for itself and modestly focuses on the statistical issues.

Too much information?

Aleks sent me the link to this site. Seth might like it--except that it seems to be set up only to monitor data, not to record experiments.

Mediation

Rahul writes:

Baby-faced politicians lose

Greg Laun pointed me to this paper by Alexander Todorov, Anesu Mandisodza, Amir Goren, and Crystal Hall, whose abstract states:

Inferences of competence based solely on facial appearance predicted the outcomes of U.S. congressional elections better than chance (e.g., 68.8% of the Senate races in 2004) and also were linearly related to the margin of victory. These inferences were specific to competence and occurred within a 1-second exposure to the faces of the candidates. The findings suggest that rapid, unreflective trait inferences can contribute to voting choices, which are widely assumed to be based primarily on rational and deliberative considerations.

MCSim lives!

MCSim is some software that Frederic Bois wrote for our toxicology research over 10 years ago. I didn't know it was still around, until Bill Harris wrote,

A yucky phrase from Proc Bayes

Jouni writes that SAS has now a Bayesian module. I agree with Jouni that "The Bayesian probability reflects a person's subjective beliefs" is not really the kind of phrase you expect to hear from a modern practicing Bayesian methodologist. This definition would immediately invalidate the use of Bayesian methods in any field of science, I'd think." Well, maybe not in sociology . . .

The Myth of the Rational Voter

Greg Mankiw and Tyler Cowen point to the release of this book by Bryan Caplan, so it might be worth pointing to my discussion of an earlier version of the book that he showed me when I visited his university in 2005. I don't like the title (unsuprisingly, since I wrote a paper called Voting as a rational choice), but Caplan's book is interesting.

My full comments are here, and here's the short version:

This paper by Catherine Crouch, Jessica Watkins, Adam Fagen, and Eric Mazur looks pretty exciting to me:

Peer Instruction is an instructional strategy for engaging students during class through a structured questioning process that involves every student. Here we describe Peer Instruction (hereafter PI) and report data from more than ten years of teaching with PI in the calculus- and algebra-based introductory physics courses for non-majors at Harvard University, where this method was developed. Our results indicate increased student mastery of both conceptual reasoning and quantitative problem solving upon implementing PI. Gains in student understanding are greatest when the PI questioning strategy is accompanied by other strategies that increase student engagement, so that every element of the course serves to involve students actively. We also provide data on gains in student understand-ing and information about implementation obtained from a survey of almost four hundred instructors using PI at other institutions. We find that most of these instructors have had success using PI, and that their students understand basic mechanics concepts at the level characteristic of courses taught with interactive engagement methods. Finally, we provide a sample set of materials for teaching a class with PI, and provide information on the extensive resources available for teaching with PI.

Their stuff is all about physics. I'd like to do it with statistics. I think it could revolutionize the (currently crappy) state of statistics instruction.

More Black Swan

Jonathan Nagler posts this mini-conference:

The psychology of power

In a comment on this entry, Chris points to this interview with Deborah Gruenfeld. Some excerpts:

I [Gruenfeld] have been studying the psychological consequences of having power for the past seven years . . . There are just so many good examples of people with power who behave in ways that demand some kind of psychological explanation.

For example, I had a brief career in journalism, and I occasionally met with Jann Wenner, the founder and publisher of Rolling Stone. . . . He had in his office a small refrigerator within arm’s reach of his desk. As far as I could tell, there were only two things in there: a bottle of vodka and a bag of raw onions. While we were meeting, he would reach over, open the door, drink vodka straight out of the bottle, and eat onions. What’s striking about it now is that none of us ever said anything to him about this, and he never even offered to share! He seemed to think it was perfectly appropriate to do this in a meeting. And that is, I think, a classic example of what we think is going on with power, which is what we call “disinhibition.”

Gruenfeld continues:

NYC R users group

I received this in the email. I know nothing about it, just passing it on:

Seth tested his balance every day, sometimes when eating flaxseed oil and sometimes when eating olive oil, and found the following:

flaxseed.jpg

This is a pretty graph, and shows that Seth's balance improved when he ate flaxseed oil and got worse with the olive oil. He conjectures:

A possible explanation is that when the concentration of omega-3 in the blood is low, the omega-3 in cell membranes slowly “evaporates” into the blood. When a cell’s membranes lose omega-3, it doesn’t work as well.

But . . .

As a statistician, my first thought was some sort of measurement bias: Seth knows when he was taking olive oil and when he was taking flaxseed oil, and staying balanced is a tricky enough task that I could well imagine that the results could be affected by his expectations.

Flying blind

I'd be more convinced by a blinded experiment. This is tricky with a self-experiment but it could be done. For example:

1. Get 50 identical vials and pour olive oil into 25 of them and flaxseed oil into the other 25. Label them (e.g., "o" and "f"), then cover up the labels with removable stickers.

2. Mix up the vials in a bag (this is sometimes called "physical randomization" in the sampling literature), then use one vial per day. After use, place them on a shelf in order. Each day, measure your balance and whatever else you want to record.

3. When the experiment is over, peel off the stickers and identify which oil was eaten on which day.

4. If the two oils can be told apart by smell, clip your nose (this might sound weird but actually Seth was already doing this.) If they taste different, mix with some strong bitter flavor (this might mess up Seth's weight-loss experiment but should be OK for the balance study). If they look different, add food coloring or just use opaque bottles and don't look inside before drinking.

This simple experiment, with complete randomization, might not capture the time trends Seth is looking for. It would be simple enough to alter the experiment, for example by replacing the vials with larger containers and setting the unit of randomization to be the ten-day period rather than the day. You could even do something trickier, maybe with the assistance of a friend, to set up a pattern with long strings of o's and f's without knowing exactly when the switches will occur.

Why Seth's existing experiment is a good thing: I'm not slamming unblinded studies

I hope Seth (or one of his correspondents) does this randomized experiment. In the meantime, Seth's results provide a potentially important contribution by motivating new hypotheses. The unblinded experiment was so easy to do (within the context of Seth's earlier experiments), and placing a requirement such as blinding might have increased the required effort to the extent that Seth might not have gotten around to doing it.

Maybe Seth could make blinding (where possible) a routine part of his future experiments, though. Just as he's trained himself to perform disciplined self-experiments with precise and regular measurements (something that I never get around to doing when trying out new teaching methods, for example), maybe he could take the next step with blinding.

Benjamin Page is speaking on this paper:

Data from the 2006 CCGA national survey once again indicate that the American public is much more multilateralist than U.S. foreign policy officials. Large majorities of Americans favor several specific steps to strengthen the UN, support Security Council intervention for peacekeeping and human rights, and favor working more within the UN even if it constrains U.S. actions. Large majorities also favor the Kyoto agreement on global warming, the International Criminal Court, the Comprehensive Nuclear Test Ban Treaty, and the new inspection agreement on biological weapons. Large majorities favor multilateral uses of U.S. troops for peacekeeping and humanitarian purposes, but majorities oppose most major unilateral engagements.

He continues:

The first Bayesian Statistics clog

Pierre points to the proceedings of the first of the Valencia International Meetings on Bayesian Statistics [31MB PDF!] in 1979.

Browsing through, I am surprised that they look very much like a blog, with good papers and a lot of good commentary and discussion, something that we have discussed before. Then it took an airplane flight to the beautiful Mediterranean beaches, but today, thanks to the internet, we can stay in our offices and chat online. Hmm.

Perhaps that flight was what motivated people to show up and contribute something interesting. But receiving commentary, especially commentary from good researchers is also something. Peer review with anonymous reviews that nobody looks at is such a waste of human effort. Make commentaries, not reviews! Pick good papers! Pick good authors! Pick trusted commentators! Pick good rankers! Pick good editors (who list papers on a topic or who invite authors to write on a topic)! Pick interesting topics! The monolithic nightmare of conferences, tomes and publishers should go away.

For some inspiration, look at websites such as Reddit or Yelp!. Reddit shows how good stuff can rise higher (but it fails because people ranking are not trusted). Yelp shows how one can pick and reward good reviewers (but fails because it's hard to find good stuff).

Glossary: clog = conference log.

The norm of self-interest

Aleks's comments here, in particular the bit about selfishness, reminds me of one of my favorite papers, "The norm of self-interest" by the psychologist Dale Miller. Here's the abstract:

The self-interest motive is singularly powerful according to many of the most influential theories of human behavior and the layperson alike. In the present article the author examines the role the assumption of self-interest plays in its own confirmation. It is proposed that a norm exists in Western cultures that specifies self-interest both is and ought to be a powerful determinant of behavior. This norm influences people's actions and opinions as well as the accounts they give for their actions and opinions. In particular, it leads people to act and speak as though they care more about their material self-interest than they do. Consequences of misinterpreting the "fact" of self- interest are discussed.

(Related work by Noah Kaplan, Aaron Edlin, and myself here, distinguishing rationality from selfishness as motivations for voting.)

Political psychology workshop

This looks interesting (it's this Saturday, from 10:30 to 4:00 in 801 International Affairs Building):

Susan Fiske and Lasana Harris (Princeton), "Which Groups We Consider Least Human: Evidence From Social Cognition and Social Neuroscience."

Mark Peffley (Kentucky), "Racial Polarization in Criminal Justice Attitudes."

Shawn Rosenberg (UC Irvine and Princeton), "Types of Democratic Deliberation: Can the People Govern?"

Following up (sort of) on my comments on The Black Swan . . .

Dan Goldstein and Nassim Taleb's paper write: "Finance professionals, who are regularly exposed to notions of volatility, seem to confuse mean absolute deviation with standard deviation, causing an underestimation of 25% with theoretical Gaussian variables. In some fat tailed markets the underestimation can be up to 90%. The mental substitution of the two measures is consequential for decision making and the perception of market variability."

This interests me, partly because I've recently been thinking about summarizing variation by the mean absolute difference between two randomly sampled units (in mathematical notation, E(|x_i-x_j})), because that seems like the clearest thing to visualize. Fred Mosteller liked the interquartile range but that's a little too complicated for me, also I like to do some actual averaging, not just medians which miss some important information. I agree with Goldstein and Taleb that there's not necessarily any good reason for using sd (except for mathematical convenience in the Gaussian model).

Duncan Watts (of Columbia's sociology department) wrote an article in the New York Times the other day:

As anyone who follows the business of culture is aware, the profits of cultural industries depend disproportionately on the occasional outsize success — a blockbuster movie, a best-selling book or a superstar artist — to offset the many investments that fail dismally. What may be less clear to casual observers is why professional editors, studio executives and talent managers, many of whom have a lifetime of experience in their businesses, are so bad at predicting which of their many potential projects will make it big. How could it be that industry executives rejected, passed over or even disparaged smash hits like “Star Wars,” “Harry Potter” and the Beatles, even as many of their most confident bets turned out to be flops? It may be true, in other words, that “nobody knows anything,” as the screenwriter William Goldman once said about Hollywood. But why?

Duncan continues:

J. Robert Lennon has a blog!

J. Robert Lennon, one of my favorite authors, has a blog (with his wife). It's interesting to see what a Real Writer thinks about literature. Also a bit disillusioning . . .

Trust and institutions

Lanlan Wang sent along this paper. Here's the abstract:

One thing that bugs me is that there seems to be so little model checking done in statistics. As I wrote in this referee report,

I'd like to see some graphs of the raw data, along with replicated datasets from the model. The paper admirably connects the underlying problem to the statistical model; however, the Bayesian approach requires a lot of modeling assumptions, and I'd be a lot more convinced if I could (a) see some of the data and (b) see that the fitted model would produce simulations that look somewhat like the actual data. Otherwise we're taking it all on faith.

But, why, if this is such a good idea, do people not do it? I don't buy the cynical answer that people don't want to falsify their own models. My preferred explanation might be called sociological and goes as follows: We're often told to check model fit. But suppose we fit a model, write a paper, and check the model fit with a graph. If the fit is ok, then why bother with the graph: the model is OK, right? If the fit shows problems (which, realistically, it should, if you think hard enough about how to make your model-checking graph), then you better not include the graph in the paper, or the reviewers will reject, saying that you should fix your model. And once you've fit the better model, no need for the graph.

The result is: (a) a bloodless view of statistics in which only the good models appear, leaving readers in the dark about all the steps needed to get there; or, worse, (b) statisticians (and, in general, researchers) not checking the fit of their model in the first place, so that neither the original researchers nor the readers of the journal learn about the problems with the model.

One more thing . . .

You might say that there's no reason to bother with model checking since all models are false anyway. I do believe that all models are false, but for me the purpose of model checking is not to accept or reject a model, but to reveal aspects of the data that are not captured by the fitted model. (See chapter 6 of Bayesian Data Analysis for some examples.)

Statisticians often talk about a bias-variance tradeoff, comparing a simple unbiased estimator (for example, a difference in differences) to something more efficient but possibly biased (for example, a regression). There's commonly the attitude that the unbiased estimate is a better or safer choice. My only point here is that, by using a less efficient estimate, we are generally choosing to estimate fewer parameters (for example, estimating an average incumbency effect over a 40-year period rather than estimating a separate effect for each year or each decade). Or estimating an overall effect of a treatment rather than separate estimates for men and women. If we do this--make the seemingly conservative choice to not estimate interactions, we are implicitly estimating these interactions at zero, which is not unbiased at all!

I'm not saying that there are any easy answers to this; for example, see here for one of my struggles with interactions in an applied problem---in this case (estimating the effect of incentives in sample surveys), we were particularly interested in certain interactions even thought they could not be estimated precisely from data.

(Also posted at Overcoming Bias.)

Lotteries: A Waste of Hope

Statisticians are always looking for ways to convince people not to play the lottery. Here's another reason (from Eliezer Yudkowsky).

Boris points us to this paper (with Christopher Berry and Nolan McCarty):

states_common_density.png

Boris writes:

Jesus update

Jonathan Falk points us to this:

Several prominent scholars who were interviewed in a bitterly contested documentary that suggests that Jesus and his family members were buried in a nondescript ancient Jerusalem burial cave have now revised their conclusions, including the statistician who claimed that the odds were 600:1 in favor of the tomb being the family burial cave of Jesus of Nazareth . . .

See here for Aleks's earlier thoughts on this.

Books on nutrition

Seth recommends:

The Queen of Fats, by Susan Allport

Nutrition and Physical Degeneration, by Weston Price

The first of these books is recent; the other is from 1930 or so.

Adjusted R-sq = 0.001

A correspondent writes:

Wanted to add to my comment on the Black Swan review... but didn't want to hang people in public.

You mentioned... (Mosteller and Wallace made a similar point in their Federalist Papers book about how they don't trust p-values less than 0.01 since there can always be unmodeled events. Saying p<0.01 is fine, but please please don't say p<0.00001 or whatever.) which is a terrific point!

I had a related experience just last week when attending a seminar recently. Some guys were modeling some marketing information and showed ranges of coefficents from the set of regressions and argued that everything was significant. At the bottom of the table, it read: "Adjusted R-sq = 0.001".

I had to check my glasses. I thought I was hallucinating. That line didn't seem to unfaze anyone else. The audience were asking modeling questions, why didn't you model it this way or that, etc. I turned around and asked my neighbor: were you bothered by R-sq of 0.1%? His answer was "I have seen 0.001 or lower for panel data".

Now I'm not an expert in panel data analysis. But I am shocked, shocked, that apparently such models are allowable in academia. Pray tell me not!

I don't know what to say. In theory, R^2 can be as low as you want, but I have to admit I've never seen something like 0.001.

Jen pointed me to Level-Headed: Economics Experiment Finds Taste for Equality. In brief, people are willing to pay their own money to take from the rich and give it to the poor. The underlying Nature article mentions that:


Emotions towards top earners become increasingly negative as inequality increases, and those who express these emotions spend more to reduce above-average earners’ incomes and to increase below-average earners’ incomes. The results suggest that egalitarian motives affect income-altering behaviours, and may therefore be an important factor underlying the evolution of strong reciprocity and, hence, cooperation in humans.

However, I can see other explanations that don't require the explanation of altruism:


  • Utility arbitrage: Utility is nonlinear: taking $1 when you have $10 of daily income is worse than taking $10 when you have $100 of daily income. This is used as an argument for progressive taxation, which might be nonlinear in money, but could be linear in utility (taxes giving everyone the same amount of pain). Those who take from the rich and give from the poor might effectively be doing arbitrage: the amount of gratitude from the poor minus the anger from the rich minus the cost amounts to a positive profit for Robin Hood.

  • Insurance against slavery: There is an incentive for a commoner to prevent a powerful figure from gathering excessive power because letting this go on could lock the commoners into an under-caste.

  • Power asymmetry: The rich can become richer only by increasing the imbalance in the income distribution. But as they become richer, they actually become fewer. At some point, increasing their riches actually reduces their power, and they get "taken under" (in a revolution or a revolt). Since revolutions are costly, it's adaptive to "equalize" without breaking things up.

As an aside, it's interesting to notice James H. Fowler among the authors: he's behind a chain of very interesting papers over the past few years.

Maryland Sidesteps Electoral College

From Brian Witte of the Associated Press:

Maryland officially became the first state on Tuesday to approve a plan to give its electoral votes for president to the winner of the national popular vote instead of the candidate chosen by state voters.

'Gov. Martin O'Malley, a Democrat, signed the measure into law, one day after the state's General Assembly adjourned.

The measure would award Maryland's 10 electoral votes to the national popular vote winner. The plan would only take effect if states representing a majority of the nation's 538 electoral votes decided to make the same change.

. . .

Other states are considering the change . . . National Popular Vote, a group that supports the change, said there are legislative sponsors for the idea in 47 states. . . . But not everyone is buying into the idea. North Dakota and Montana rejected it earlier this year. Opponents say the change would hurt small rural states, where the percentage of the national vote would be even smaller than the three electoral votes they each have in the overall Electoral College.

"Even smaller" . . . that's right. North Dakota has 640,000 people--that's 0.21% of the U.S. population. Their share of 538 electoral votes is 0.0021 x 538 = 1.15. Explain again why they should get more electoral votes than, say, the 679,000 people in Cobb County, Georgia, or the 668,000 people in Will County, Illinois?

Boris pointed me to this paper by Edward Glaeser, Giacomo Ponzetto, and Jesse Shapiro. Here's the abstract:

Party platforms differ sharply from one another, especially on issues with religious content, such as abortion or gay marriage. Religious extremism in the U.S. appears to be strategically targeted to win elections, since party platforms diverge significantly, while policy outcomes like abortion rates are not affected by changes in the governing party. Given the high returns from attracting the median voter, why do vote-maximizing politicians veer off into extremism? In this paper, we find that strategic extremism depends on an important intensive margin where politicians want to induce their core constituents to vote (or make donations) and the ability to target political messages towards those core constituents. Our model predicts that the political relevance of religious issues is highest when around one-half of the voting population attends church regularly. Using data from across the world and within the U.S., we indeed find a non-monotonic relationship between religious extremism and religious attendance.

And here are my thoughts:

Brendan Nyhan's political science links

Brendan Nyhan (who arranged my fun visit to Duke's quantitative social science center in Feb) sent a bunch of references. I'm commenting on them here for convenience (easier than storing in my inbox!).

1. Cooperative game theory (looks at combinations of coalitions): a paper of Brandenburger and a syllabus of a course of Gilboa and Scarf.

2. NetLogo, a popular automaton simulation environment. This looks cool. I want to use something like this to do simulations to extend the ideas of this paper: Forming voting blocs and coalitions as a prisoner's dilemma: a possible theoretical explanation for political instability. (This is why I'm interested in item 1 above also.)

3. Computational and Mathematical Modeling in the Social Sciences, by Scott De Marchi: I ordered it, will report back. Brendan said I should read it because Scott's views on statistics are completely different from mine.

4. Fearon and Laitin's paper on civil wars, which is a controversial example of political methodology because they try to interpret zillions of regression coefficients at once. Also these supplementary tables.

5. Arthur Brooks's survey data on civic engagement and inequality, and Brendan's comment on Brooks's writings in this area. (I'd earlier noticed some of Brooks's interesting work on fertility differences between Democrats and Republicans and charitable giving.)

Nassim Taleb's "The Black Swan"

OK, I finished reading it and transcribing my thoughts. They're the equivalent of about 20 blog entries (or one long unpublishable article) but it seemed more convenient to just put them in one place.

As I noted earlier, reading the book with pen in hand jogged loose various thoughts. . . . The book is about unexpected events ("black swans") and the problems with statistical models such as the normal distribution that don't allow for these rarities. From a statistical point of view, let me say that multilevel models (often built from Gaussian components) can model various black swan behavior. In particular, self-similar models can be constructed by combining scaled pieces (such as wavelets or image components) and then assigning a probability distribution over the scalings, sort of like what is done in classical spectrum analysis of 1/f noise in time series. For some interesting discussion in the context of "texture models" for images, see the chapter by Yingnian Wu in my book with Xiao-Li on applied Bayesian modeling and causal inference. (Actually, I recommend this book more generally; it has lots of great chapters in it.)

That said, I admit that my two books on statistical methods are almost entirely devoted to modeling "white swans." My only defense here is that Bayesian methods allow us to fully explore the implications of a model, the better to improve it when we find discrepancies with data. Just as a chicken is an egg's way of making another egg, Bayesian inference is just a theory's way of uncovering problems with can lead to a better theory. I firmly believe that what makes Bayesian inference really work is a willingness (if not eagerness) to check fit with data and abandon and improve models often.

More on black and white

My own career is white-swan-like in that I've put out lots of little papers, rather than pausing for a few years like that Fermat's last theorem guy. Years ago I remarked to my friend Seth that he's followed the opposite pattern: by abandoning the research-grant, paper-writing treadmill and devoting himself to self-experimentation, he basically was rolling the dice and going for the big score--in Taleb's terminology, going for that black swan.

On the other hand, you could say that in my career I'm following Taleb's investment advice--my faculty job gives me a "floor" so that I can work on whatever I want, which sometimes seems like something little but maybe can have unlimited potential. (On page 297, Taleb talks about standing above the rat race and the pecking order; I've tried to do so in my own work by avoiding a treadmill of needing associates to do the research to get the funding, and needing funding to pay people.)

In any case, I've had a boring sort of white-swan life, growing up in the suburbs, being in school continuously since I was 4 years old (and still in school now!). In contrast, Taleb seems to have been exposed to lots of black swans, both positive and negative, in his personal life.

Chapter 2 of The Black Swan has a (fictional) description of a novelist who labors in obscurity and then has an unexpected success. This somehow reminds me of how lucky I feel that I went to college when and where I did. I started college during an economic recession, and in general all of us at MIT just had the goal of getting a good job. Not striking it rich, just getting a solid job. Nobody I knew had any thought that it might be possible to get rich. It was before stock options, and nobody knew that there was this thing called "Wall Street." Which was fine. I worry that if I had gone to college ten years later, I would've felt a certain pressure to go get rich. Maybe that would've been fine, but I'm happy that it wasn't really an option.

95% confidence intervals can be irrelevant, or, living in the present

On page xviii, Taleb discusses problems with social scientists' summaries of uncertainty. This reminds me of something I sometimes tell political scientists about why I don't trust 95% intervals: A 95% interval is wrong 1 time out of 20. If you're studying U.S. presidential elections, it takes 80 years to have 20 elections. Enough changes in 80 years that I wouldn't expect any particular model to fit for such a long period anyway. (Mosteller and Wallace made a similar point in their Federalist Papers book about how they don't trust p-values less than 0.01 since there can always be unmodeled events. Saying p<0.01 is fine, but please please don't say p<0.00001 or whatever.)

More generally, people (or, at least, political commentators) often live so much in the present that they forget that things can change. An instructive example here is Richard Rovere's book on Goldwater's 1964 campaign. Rovere, a respected political writer, wrote that the U.S. had a one-and-a-half-party system, with the Democrats being the full party and the Republicans the half party. Yes, Goldwater lost big and, yes, the Democrats did have twice the number of Senators and twice the number of Representatives in Congress then--but, actually, from 1950 through 1990, the Republicans won or tied every Presidential election (except 1964). Hardly the performance of a half-party.

Knowing what you don't know, and omniscience is not omnipotence

The quotes on page xix remind me of one of my favorites: "It ain't what you don't know that gets you into trouble. It's what you know for sure that just ain't so" (Mark Twain?). I actually prefer the version that says, "It's what you don't know you don't know that gets you into trouble." Also Earl Weaver's "It's what you learn after you know it all that counts."

On page xx, Taleb writes, "What you know cannot really hurt you." This doesn't sound right to me. Sometimes you know something bad is coming but you can't dodge it. For example, consider certain diseases.

Creativity is not (yet) algorithmic

On page xxi, Taleb says how almost no great discovery came from design and planning. This reminds me about a biography of Mark Twain that I read several years ago. Apparently, Twain was always trying to create a procedure--essentially, an algorithm--to produce literature. He tried various strategies, collaborators, etc., but nothing really worked. He just had to wait for inspiration and write what came to mind.

Also on page xxi, Taleb writes "we don't learn rules, just facts, and only facts." This statement would surprise linguists. It's been well demonstrated that kids learn language through rules (as can be seen, for example, from overgeneralizations such as "feets" and "teached"). More generally, folk science is strongly based on categories and natural kinds--I think Taleb is aware of this since he cites my sister's work in his references. (A recent example of naive categorization in folk science is in the papers of Satoshi Kanazawa.)

Recognition, prevention, and saltatory growth

On page xxiii, Taleb writes that "recognition can abe quite a pump." Yes, but recall all those scientists whose lives were shortened by two years (on average) from frustration at not receiving the Nobel Prize!

On page xxiv, "few reward acts of prevention": I'm reminded of our health plan in grad school, which paid for catastrophic coverage but not routine dental work. A friend of mine actually had to get root canal, and eventually got the plan to pay for it, but not without a struggle.

On page 10, Taleb writes, "history does not crawl, it jumps." This reminds me of the evidence on saltatory growth in infants (basically, babies grow length by a jump every few days; they don't grow the same amount every day).

Aha

I was also reminded of the fractal nature of scientific revolutions--basically, at all scales (minutes, hours, days, months, years, decades, centuries, . . .), science seems to proceed by being derailed by unexpected "aha" moments. (Or, to pick up on Taleb's themes, I can anticipate that "aha" moments will occur, I just can't predict exactly when they will happen or what they will be.)

Liberals and conservatives

On page 16, Taleb asks "why those who favor allowing the elimination of a fetus in the mother's womnb also oppose capital punishment" and "why those who accept abortion are supposed to be favorable to high taxation but against a strong military," etc. First off, let me chide Taleb for deterministic thinking. Fromthe General Social Survey cumulative file, here's the crosstab of the responses to "Abortion if woman wants for any reason" and "Favor or oppose death penalty for murder":

40% supported abortion for any reason. Of these, 76% supported the death penalty.

60% did not support abortion under all conditions. Of these, 74% supported the death penalty.

This was the cumulative file, and I'm sure things have changed in recent years, and maybe I even made some mistake in the tabulation, but, in any case, the relation between views on these two issues is far from deterministic!

But getting back to the main question: I don't think it's such a mystery that various leftist views (allowing abortion, opposing capital punishment, supporting a graduated income tax, and reducing the military) are supposed to go together--nor is it a surprise that the opposite positions go together in a rightist worldview. Abortion is related to women's rights, which has been a leftist position for a long time. Similarly, conservatives have favored harsher punishments and liberals (to use the U.S. term) have favored milder punishments for a long time also. The graduated income tax favors the have-nots rather than the have-mores, and the military is generally a conservative institution. Other combinations of views are out there, but I don't agree with Taleb's claim that the left-right distinction is arbitrary.

Picking pennies in front of a steamroller

On page 19, Taleb refers to the usual investment strategy (which I suppose I actually use myself) as "picking pennies in front of a steamroller." That's a cute phrase; did he come up with it? I'm also reminded of the famous Martingale betting system. Several years ago in a university library I came across a charming book by Maxim (of gun fame) where he went through chapter after chapter demolishing the Martingale system. (For those who don't know, the Martingale system is to bet $1, then if you lose, bet $2, then if you lose, bet $4, etc. You're then guaranteed to win exactly $1--or lose your entire fortune. A sort of lottery in reverse, but an eternally popular "system.")

Throughout, Taleb talks about forecasters who aren't so good at forecasting, picking pennies in front of steamrollers, etc. I imagine much of this can be explained by incentives. For example, those Long-Term Capital guys made tons of money, then when their system failed, I assume they didn't actually go broke. They have an incentive to ignore those black swans, since others will pick up the tab when they fail (sort of like FEMA pays for those beachfront houses in Florida). It reminds me of the saying that I heard once (referring to Donald Trump, I believe) that what matters is not your net worth (assets minus liabilities), but the absolute value of your net worth. Being in debt for $10 million and thus being "too big to fail" is (almost) equivalent to having $10 million in the bank.

The discussion on page 112 of how Ralph Nader saved lives (mostly via seat belts in cars) reminds me of his car-bumper campaign in the 1970s. My dad subscribed to Consumer Reports then (he still does, actually, and I think reads it for pleasure--it must be one of those Depression-mentality things), and at one point they were pushing heavily for the 5-mph bumpers. Apparently there was some federal regulation about how strong car bumpers had to be, to withstand a crash of 2.5 miles per hour, or 5 miles per hour, or whatever--the standard had been 2.5 (I think), then got raised to 5, then lowered back to 2.5, and Consumer's Union calculated (reasonably correctly, no doubt) that the 5 mph standard would, in the net, save drivers money. I naively assumed that CU was right on this. But, looking at it now, I would strongly oppose the 5 mph standard. In fact, I'd support a law forbidding such sturdy bumpers. Why? Because, as a pedestrian and cyclist, I don't want drivers to have that sense of security. I'd rather they be scared of fender-benders and, as a consequence, stay away from me! Anyway, the point here is not to debate auto safety; it's just an interesting example of how my own views have changed. Another example of incentives.

Three levels of conversation, or, why lunch at the faculty club might (sometimes) be more interesting than hanging out with chair-throwing traders

On page 21, Taleb compares the excitement of chair-throwing stock traders to "lunches in a drab university cafeteria with gentle-minded professors discussing the latest departmental intrigue." This reminds me of a distinction I came up with once when talking with Dave Krantz, the idea of three levels of conversation. Level 1 is personal: spouse, kids, favorite foods, friends, gossip, etc. Level 2 is "departmental intrigue," who's doing what job, getting person X to do thing Y, how to get money for Z--basically, level 2 is all about money. Level 3 is impersonal things: politics, sports, research, deep thoughts, etc. When talking with Dave, I resolved to minimize level 2 conversation and focus on the far more important (and interesting) levels 1 and 3. Level 2 topics have an immediacy which puts them on the top of the conversational stack, which is why I made the special effort to put them aside. Anyway, it struck me in reading page 21 of Taleb's book that chair-throwing stock traders have much more interesting level 2 conversations (compared with professors or even grad students), and quite possibly they have better level 1 conversations also--but I'd hope that the level 3 conversations at the university are more interesting. Being on campus, I'm used to having all sorts of good level 3 conversations, but I find these harder to come by in other settings. Probably it's nothing to do with the depth of these other people, just that I find it easier to get into a good conversational groove with people at the university. In any case, I try (not always successfully) to keep conversations away from "the latest departmental intrigue."

Riding the escalator to the stairmaster

The story on page 54 about the people who ride the escalator to the Stairmasters reminds me that, where I used to work, there was a guy who carried his bike up the stairs to the 4th floor. This always irritated me because it set an unfollowable example. For instance, one day I was on the elevator (taking my bike to the 3rd floor) and some guy asked me, "You ride your bike for the exercise. Why don't you take the stairs?" (I replied that I don't ride my bike for the exercise.)

Confirmation bias, or, shouldn't I be reading an astrology book?

Around pages 58-59, Taleb talks about confirmation bias and recommends that we look for counterexamples to our theories. I certainly agree with this and do it all the time in my research. But what about other aspects of life? For example, I was reading The Black Swan, which I knew ahead of time would contain lots of information that I already agreed with. Should I instead read a book on astrology? In practice, I'm sure this would just confirm my (true) suspicion that astrology is false, so I'm kinda stuck.

Rare events and selection bias

The footnote on 61 reminded me of a talk I saw a couple years ago where it was said that NYC is expected to have a devastating earthquake some time in the next 2000 years.

On page 77, Taleb says that lottery players treat odds of one in a thousand and one in a million almost the same way. But . . . when they try making lottery odds lower (for example, changing from "pick 6 out of 42" to "pick 6 out of 48," people do respond by playing less (unless the payoffs are appropriately increased). I attribute this not to savvy probability reasoning but to a human desire not to be ripped off.

On page 102 and following, Taleb discusses selection bias. I also recommend the article by Howard Wainer et al. (A Selection of Selection Anomalies); Deb Nolan and I also have a few in our Teaching Statistics book.

Then, on page 126, Taleb describes a conference he attended where his "first surprise was to discover that the military people there thought, behaved, and acted like philosophers [in the good sense of the word] . . . They thought out of the bix, like traders, except much better and without fear of introspection." He goes on to discuss why military officers are such good skeptical thinkers. But this seems like a clear case of selection bias! The military officers who come to an academic symposium are probably an unusual bunch.

Losers lie

On page 118-119, there's a discussion of how someone with a winning streak in life can think it's skill, even if it's just luck and selection (that the losers don't get observed). I'd like to add another explanation, which is that people lie. Someone who tells you he won ten straight times probably actually won ten times out of fifteen. Someone who tells you he broke even probably is a big loser. Etc.

On page 125, Taleb explains why the Fat Tonys get more Nobel Prizes in medicine than the Dr. Johns. I don't know if this is really true, but if it is, I might attribute it to the Tonys' better social skills (i.e., helping others be happy and getting people to do what they want) more than their better ability to assess uncertainty.

Of fights and coin flips

On page 127-128, Taleb discusses the distinction between uncertainty and randomness (in my terminology, the boxer, the wrestler, and the coin flip). I'd only point out that coins and dice, while maybe not realistic representations of many sources of real-world uncertainty, do provide useful calibration. Similarly, actual objects rarely resemble "the meter" (that famous metal bar that sits, or used to sit, in Paris), but it's helpful to have an agreed-upon length scale. We have some examples in Chapter 1 of Bayesian Data Analysis of assigning probabilities empirically (for football scores and record linkage).

Also, as discussed in our Teaching Statistics book, when teaching probability I prefer to use actual random events (e.g., sex of births) rather than artificial examples such as craps, roulette, etc., which are full of technical details (e.g., what's the probability of spinning a "00") that are dead-ends with no connection to any other areas of inquiry. In contrast, thinking about sex of births leads to lots of interesting probabilistic, biological, combinatorical, and evolutionary directions.

Overconfidence as the side effect of communication goals

On page 14, Taleb discusses overconfidence (as in the pathbreaking Alpert and Raiffa study). As we teach in decision theory, there's actually an easy way to make sure that your 95% intervals are calibrated. Just apply the following rule: Every time someone asks you to make a decision, spin a spinner that has a 95% chance of returning the interval (-infinity, infinity), and a 5% chance of returning the empty set. You will be perfectly calibrated (on average). The intervals are useless, however, which points toward the fact that when people ask you for an interval, you're inclined (for Gricean reasons if no other) to provide some information. According to Dave Krantz, much of overconfidence of probability statements can be explained by this tension between the goals of informativeness and calibration.

On page 145, Taleb discusses the fallacy of assuming that "more is better." A lot depends here on the statistical model you're using (or implicitly using). With least squares, overfitting is a real concern. Less so in Bayesian inference, but still it comes up with noninformative prior distributions. An important--the important--topic in Bayesian statistics is the construction of structured prior distributions that let the data speak but at the same time don't get overwhelmed by a flood of data.

Of taxonomies and lynx

In the discussion of Mandelbrot's work on page 269, I'd also mention his models for taxonomies, which have a simple self-similar structure without the complexities of the more familiar spatial examples. Also, the story about the problems of Gaussian models reminds me Cavan Reilly's chapter in this book, where he fits a simple predator-prey model with about 3 parameters to the famous Canadian lynx data and gets much better predictions than the standard 11-parameter Gaussian time series models that are usually fit to those data.

Buzzwords

On page 278, Taleb rants against statistical buzzwords such as standar deviation and correlation, and financial buzzwords such as risk. This reminds me of my rant against the misunderstood concept of "risk aversion." I have to write this up fully sometime, but some of my rant is here.

It's all over but the compartmentalizin'

On page 288, Taleb discusses people who compartmentalize their intellectual lives, for example the philosopher who was a trader but didn't use his trading experiences to inform his philosophy. I noticed a similar thing about some of my collegues where I used to teach in the statistics department at Berkeley. On the one hand, they were extremely theoretical, using advanced mathematics to prove very subtle things in probability theory, often things (such as the strong law of large numbers) that had little if any practical import. But when they did applied work, they threw all this out the window--they were so afraid of using probability models that they would often resort to very crude statistical methods.

I'm only a statistician from 9 to 5

I try (and mostly succeed, I think) to have some unity in my professional life, developing theory that is relevant to my applied work. I have to admit, however, that after hours I'm like every other citizen. I trust my doctor and dentist completely, and I'll invest my money wherever the conventional wisdom tells me to (just like the people whom Taleb disparages on page 290 of his book).

Miscellaneous sociological thoughts

Taleb's comment on page 155 about economics being the most insular of fields reminds me of this story of the economist who said that economists are different than "anthropologists, sociologists, and public health officials" because economists believe that "everyone is fundamentally alike" [except, of course, for anthropologists, etc.]. Economists often do seem pretty credulous of arguments presented by other economists!

The reference on page 158 to dentists reminded me of the dentists named Dennis.

On page 166, Taleb disparages plans. But plans can be helpful, no? Even if they don't work out. It usually seems to me that even a poor plan (if recognized as tentative) is better than no plan at all.

The discussion on page 171 of predicting predictions reminds me of the paradox, of sorts, that opinion polls shift predictably during presidential nominating conventions (for evidence, see here, for example), even though conventions are very conventional events, and so one's shift in views should be (on average) anticipated.

On page 174-175, Taleb commends Poincare for not wasting time finding typos. For me, though, typo-finding is pleasant. Although I am reminded of the expression, "there's no end to the amount of work you can put into a project after it's done."

The graphs on pages 186-187 have that ugly Excel look, with unecessary horizontal lines and weirdly labeled y-axes. In any case, they remind me of the game of "scatterplot charades" that I sometimes enjoy playing with a statistics class. The game goes as follows: someone displays a scatterplot--just the points, nothing more--and everyone tries to guess what's being plotted. Then more and more of the graph is revealed--first the axis numbers, then the axis labels--until people figure it out.

I'm a little puzzled by Taleb's claim, at the end of page 193, that "to these people amused by the apes, the idea of a being who would look down on them the way they look down on the apes cannot immediately come to their minds." I'm amused by apes but can imagine such a superior being who would be amused by me. Why not?

On page 196, Taleb writes, "a single butterfly flapping its wings in New Delhi may be the certain cause of a hurricane in North Carolina . . ." No--there is no "the cause" (let alone, "the certain cause"). Presumably another butterfly somewhere else could've moved the hurricane away.

Page 198: the chance of a girl birth is 48.5%, not 50%.

On page 209, Taleb writes, "work hard, not in grunt work . . .". I have mixed feelings here. On one hand, yes, grunt work can distract from the big projects. For example, I'm blogging and writing lots of little papers each year instead of attacking the big questions. On the other hand, these little projects are the way I get insight into the big questions. Getting in down and dirty, playing with the data and writing code, is a way that I learn.

The mention on page 210 of Pascal's wager reminds me of the fallacy of the one-sided bet. I'm hoping that now that this fallacy has been named, people will notice it and avoid it on occasion.

The discussion on page 222 of capitalism, socialism, and attribution errors reminds me of the saying that everybody wants socialism for themselves and capitalism for everybody else (and there's nothing more fun than spending other people's money).

The discussion on the following page of the long tail reminds me of the conjecture about the "fat head" of mega-consumers.

The footnote on page 224 about book reviews reminds me of a general phenomenon which is that different reviews of the same book tend to have almost the exact same information. This becomes really clear if you look up a bunch of reviews on Nexis, for example. It can be frustrating, because for a book I like, I'd be interested in seeing lots of different perspectives. In contrast, on the web the implicit rules haven't been defined yet, so there's more diversity (as in this non-review right here, or in these comments on Indecision).

The comments on page 231 on the Gaussian distribution remind me of this story where even Galton got confused about the tails of the distribution as applied to human height.

On page 240, Taleb writes that Gauss, in using the normal distribution, "was a mathematician dealing with a theoretical point, not making claims about the structure of reality like statistical-minded scientists." I don't have my Stigler right here, but I'd always understood that Gauss developed least squares and the normal distribution in the context of fitting curves to astronomical observations. Sure he did lots of pure math, but he (and Laplace) were doing empirical science too.

I like Galileo's quote on page 257, "The great book of Nature lies ever open before our eyes and the true philosophy is written in it. . . . But we cannot read it unless we have first learned the language and the characters in which it is written. . . . It is written in mathematical language and the characters are triangles, circles and other geometric figures." As Taleb writes, "Was Galileo legally blind?" Actual nature is not full of triangles etc., it's full of clouds, mountains, trees, and other fractal shapes. But these shapes not having names or formulas, Galileo couldn't think of them. He chose the natural kind that was closest to hand. En el pais de los ciegos, etc.

On page 261, Taleb writes that in the past 44 years, "nothing has happened in economics and social science statistics except for some cosmetic fiddling." I'd disagree with that. True, I'm sure you could find antecedents of any current method in papers that were written before 1963, but I think that developing methods that work on complex problems is a contribution in itself. There's certainly a lot we can do now that couldn't be done very easily 44 years ago.

Reading with pen in hand

To conclude: it's fun (but work) to read a book manuscript with pen in hand. Also liberating that the book is already coming out, so instead of scanning for typos or whatever, I can just write down whatever ideas pop up.

P.S. Here are my thoughts on Taleb's previous book.

Networks Course Blog

This Cornell Info 204 - Networks class blog is exemplary. It seems that the students scour around for information related to the class and post entries - that are then reviewed by (and commented by) the class and the rest of the world. This is a much better model for an ideas-history-and-paradigms class than handing in homework and essays. I've browsed around it, and the postings describe the myriad applications of network-oriented thinking we can see around.

Election & Public Opinion by PIIM

Here is interactive visualization of Election & Public Opinion by PIIM. It's an interactive display of Red / Blue state. Election data goes all the way back to 1789, the first presidential election.
PIIM_Vote.png

This application will familiarize you with the voting process of the United States. Explore how public opinion and "creative democracy" has such a persuasive effect on the country; and how just a handful of votes may cause significant impact.
Historical background, the current voting process, and informative visualization of every major election are available. The Issue and policy tools permit some creative "What if" experiments in redrawing an election based on subtle alternations to historical outcomes.

mcmcsamp() and mcsamp()

Wildlife biologist Wayne Hallstrom writes,

Boris pointed me to this paper by Matthew Gentzkow and Jesse Shapiro. Here's the abstract:

We [Gentzkow and Shapiro] construct a new index of media slant that measures whether a news outlet's language is more similar to a congressional Republican or Democrat. We apply the measure to study the market forces that determine political content in the news. We estimate a model of newspaper demand that incorporates slant explicitly, estimate the slant that would be chosen if newspapers independently maximized their own profits, and compare these ideal points with firms' actual choices. Our analysis confirms an economically significant demand for news slanted toward one's own political ideology. Firms respond strongly to consumer preferences, which account for roughly 20 percent of the variation in measured slant in our sample. By contrast, the identity of a newspaper's owner explains far less of the variation in slant, and we find little evidence that media conglomerates homogenize news to minimize fixed costs in the production of content.

It appears that newspapers are more liberal in liberal cities and more conservative in conservative cities.

Wolfram Schlenker of our economics department is presenting this paper by himself and Michael Roberts on the effects of climate change. The talk is this Thursday, 11:30-1, in 717 IAB. Here's the abstract:

The Bonus Army and the G.I. Bill

Taylor Branch has a fascinating article in the New York Review of Books on the Bonus Army (the gathering of WW1 veterans in Washington in 1932) and the G.I. Bill, which paid for millions of college educations and mortages for WW2 veterans. I knew about Herbert Hoover and the Bonus Army but I didn't realize that Roosevelt later said no to them too or that "'Opposition to the bonus,' Arthur Schlesinger Jr. recalled, 'was one of the virtuous issues of the day.'" Or that the press referred to work camps for veterans as "playgrounds for derelicts" who were "shell-shocked, whisky-shocked and depression-shocked." Or that a major motivation for the G.I. Bill was to avoid similar political controversies, or that Martin Luther King was modeling his last campaign on the Bonus Army. There are also some political issues that Branch touches briefly upon, such as the ambigous role of the American Legion in the politics of the time, and the current status of soldiers and veterans in U.S. politics.

Seth writes,

One of the first managing editors of The New Yorker had a slogan: “Don’t get it right, get it written”. My philosophy with regard to the Shangri-La Diet was similar: “don’t get it exactly right, get it written, and get feedback.”

n = 35

Ronggui Huang from the Department of Sociology at Fudan University writes,

Recently, my mentor and I have collected data in about 35 neighborhoods, and we survey 30 residents in each neighborhood. I would like to study the effects of neighborhood-level characteristics, so after data collection, I aggregate the data to neighborhood-level. In other words, I have just 35 sample points. With such a small sample size (35 neighborhoods), what statistical methods can I use to analyse the data? It seems that most of the statistical methods are based on large sample theory.

My quick answer is that, from the standpoint of classical statistical theory, 35 is a large sample! You could also do a multilevel model if you want. But I'd be careful about the causal interpretations (you wrote "effects" above)--you're probably limited on what you can learn causally unless you can frame what you're doing as a "natural experiment" (for a start, see chapters 9 and 10 of our new book).

P.S. I imagine things have changed quite a bit at Fudan in the years since Xiao-Li was there.

Multiple-authored papers

A quick search on Google Scholar found that all ten of my most cited papers have multiple authors. Looking up the top ten most cited papers from some of the other tenured faculty in our statistics department: Shaw-Hwa Lo (9/10 have multiple authors), Zhiliang Ying (9/10), Daniel Rabinowitz (9/10), Ioannis Karatzas (8/10), Victor de la Pena (7/10). (I tried to look up Chris Heyde also, but Google Scholar kept coming up with articles referring to him rather than articles by him.) Victor and Ioannis are probabilists--their work is closer to pure math so perhaps it makes sense that their single-authored papers are (relatively) more prominent.

Anyway, I think it's an important point, since it's easy to undervalue multiple-authored work by diluting the credit among all the authors.

Here's an interesting problem involving the time interval between cougar "kills"...meaning cougars killing prey, not cougars being killed. (By the way, "cougar" is synonymous with "mountain lion", "catamount", and "puma". Same animal.) The data I'll discuss below were collected by Polly Buotte and other researchers guided by Toni Ruth of the Selway Institute, funded by the Hornocker Wildlife Institute and Wildlife Conservation Society.

Cougars in and around Yellowstone National Park are monitored in two ways. Researchers try to put a radio collar on every adult cougar; there are typically about a dozen adult cougars in the park.

Most of the collars used, now and historically, are old-style radiotelemetry collars. These emit a periodic signal that can be used, through triangulation, to determine the approximate location of the animal (spatial error less than 100m). More recently, some of the collars are GPS collars that report the exact location of the animal every three hours. The GPS collars, a new technology, are expensive, relatively short- lived, and somewhat failure-prone.

One of the issues of interest to researchers is the statistical distribution of intervals between kills, called the "inter-kill interval" or IKI. A specific question of interest is the extent to which the IKI distribution has changed due to the reintroduction of wolves to Yellowstone. Some change might be expected because (1) wolves sometimes steal a cougar's kill before the cougar is done with it, so the cougar might have to kill more frequently to make up for the lost meat, and (2) prey availability might change, as prey change their behavior to try to avoid areas favored by wolves, thus possibly changing the types of prey available to cougars or their density in cougar habitat.

In addition to the statistical distribution of IKI overall and its change since the reintroduction of wolves, a related question of interest is how the IKI differs for different "social classes" of cougars, where "social class" distinguishes adult female, adult male, or maternal female (i.e. female with cubs).

Based on the radio collar data, 121 IKIs were determined for 11 cougars over 8 years. The following figure shows the IKI data for the three social classes, as determined by the two different methods (GPS and "ground").

IKIhists.png

With the help of the radio collars, researchers have tried to characterize every cougar kill made by certain cougars during certain time periods. "Characterizing" the kill means determining the date, time, and location of the kill and the type of animal killed: a large bighorn sheep, a young elk, and so on. For the standard telemetry collars, this involves using the collar to track the cougar's movements; a researcher essentially tracks the cougar every day (without disturbing its behavior) searches locations the day after the cat leaves, and locates the carcass from each kill. This method, which we refer to below as the "ground" method, is very labor- intensive. By contrast, with the GPS collars, the researcher compiles a list of the locations where the cougar spent a substantial amount of time, and visits each of those locations to characterize the kill. (Cougar usually stay on or near a kill for at least 3 days, unless driven off, and are rarely stationary for that long unless they have made a kill). This method (the "GPS" method) is much less time-intensive because the researcher can proceed from kill location to kill location rather than following the cougar.

Range voting

I have come across the Range Voting website. The basic idea is to allow voters to express their preferences on a scale from 0-100. The winner is then that one candidate that has least Bayesian regret, or the highest average score.

rangevoting.png

I guess this system would make it harder for radical candidates to win, and it would give an edge to those that would try to address everyone (although polarized voters would still only use 100 or 0). It might even have a lower cognitive cost in voting, as it doesn't require the voter to make the choice, but merely to assign grades to those candidates one is familiar with.

The website has a good collection of descriptions of other voting models, I've enjoyed this voting-with-money scheme. For those that want to dig deeper, there is a very good system of pages on Wikipedia.

Pseudo-failures to replicate

Prakash Gorroochurn from our biostat dept wrote this paper discussing the fact that, even if a study find statistical significance, its replication might not be statistically significant--even if the underlying effect is real.

This is an important point, which can also be understood using the usual rule of thumb that to have 80% power for 95% significance, your true effect size needs to be 2.8 se's from zero. Thus, if you have a result that's barely statistically significant (2 se's from zero), it's likely that the true effect is less than 2.8, and so you shouldn't be so sure you'll see a statistically significant replication. As Kahneman and Tversky found, however, our intuitions lead us to (wrongly) expect replication of statistical significance.

Prakash's paper is also related to our point about the difference between significance and non-significance.

Galin Jones sent me this paper (by James Flegal, Murali Haran, and himself) which he said started with a suggestion I once made to him long ago. That's pretty cool! Here's the abstract:

Current reporting of results based on Markov chain Monte Carlo computations could be improved. In particular, a measure of the accuracy of the resulting estimates is rarely reported in the literature. Thus the reader has little ability to objectively assess the quality of the reported estimates. This paper is an attempt to address this issue in that we discuss why Monte Carlo standard errors are important, how they can be easily calculated in Markov chain Monte Carlo and how they can be used to decide when to stop the simulation. We compare their use to a popular alternative in the context of two examples.

This is a clear paper with some interesting results. My main suggestion is to distinguish two goals: estimating a parameter in a model and estimating an expectation. To use Bayesian notation, if we have simulations theta_1,...,theta_L from a posterior distribution p(theta|y), the two goals are estimating theta or estimating E(theta|y). (Assume for simplicity here that theta is a scalar, or a scalar summary of a vector parameter.)

Inference for theta or inference for E(theta)

When the goal is to estimate theta, then all you really need is to estimate theta to more accuracy than its standard error (in Bayesian terms, its posterior standard deviation). For example, if a parameter is estimated at 3.5 +/- 1.2, that's fine. There's no point in knowing that the posterior mean is 3.538. To put it another way, as we draw more simulations, we can estimate that "3.538" more precisely--our standard error on E(theta|y) will approach zero--but that 1.2 ain't going down much. The standard error on theta (that is, sd(theta|y)) is what it is.

Following up on the remark here, Ben Jann writes,

This just sprang to my mind: Do you remember the 2005 paper on oxytocin and trust by Kosfeld et al. in Nature? It has been in the news. I think they did the same mistake. The study contains a "Trust experiment" and a "Risk experiment". Because the oxytocin effect was significant in the Trust experiment, but not in the Risk experiment, Kosfeld et al. see their hypothesis confirmed that oxytocin increases trust, but not the readiness to bear risks in general. However, this is not a valid conclusion since they did not test the difference in effects. Such a test would, most likely, not turn out to be significant (at least if performed on the aggregate level as the other tests in the paper; the test might be significant if using the individual-level experimental data). (Furthermore, note that there is an error in Figure 2a: there should be an additional hollow 0.10 relative frequency bar at transfer 10.)

See here (via Alfred Cuzan).

I'm with Jim Campbell on this one.

Kevin Wright writes,

On page 284 of Bayesian Data Analysis, there is a short discussion of calculating a posterior on a (evenly-spaced) grid of points. After this grid is calculated, instructions are given on how to do a random draw from this distribution.

My question is, for a univariate posterior that has finite support, is it necessary to do the sampling? After calculating the target density on a grid of points, why not just use all the grid points to calculate moments/quantiles of the distribution? Are there cases where sampling is preferable?

In my specific example, the prior is a either a Beta distribution (or a discrete distribution on [0,1]) and I find that R can quickly calculate distribution statistics on a grid of a million equally-spaced points.

My response:

It is helpful to have random draws since these can be used to compute anything you want. Using all the grid points (and then weighting by the probabilities) is ok but can be more work because then you have to deal with the weights in all your computations.

And in more than 1 or 2 or 3 dimensions, using the entire grid is almost never practical, so sampling works as a more general strategy.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48