Null and Vetoed: “Chance Coincidence”?

Philip Stark sent along this set of calculations on the probability that the hidden message in Gov. Schwartzenegger’s message could’ve occurred by chance. The message, if you haven’t heard, is:

To the Members of the California State Assembly:

I am returning Assembly Bill 1176 without my signature.

For some time now I have lamented the fact that major issues are overlooked while many
unnecessary bills come to me for consideration. Water reform, prison reform, and health
care are major issues my Administration has brought to the table, but the Legislature just
kicks the can down the alley.

Yet another legislative year has come and gone without the major reforms Californians
overwhelmingly deserve. In light of this, and after careful consideration, I believe it is
unnecessary to sign this measure at this time.

Philp concludes:

The null hypothesis for testing “coincidences” matters. In this example, it is easy to get a wide spectrum of values for the “probability” of a coincidence. In the six calculations here, the probability ranges from about one in a couple of thousand to one in 487 billion: a factor of nearly 200 million–more than 8 orders of magnitude. News consumers should be wary of calculations of the “chance” of a coincidence, regardless of the context.

Amusing–but I don’t think the 1 in 2520 model makes a lot of sense! As Philip writes, “a better ‘null model; would pull full sentences at random from Governor Schwartzenegger’s other vetoes, string them together, and see where the linebreaks fell.” I expect the model of random words from the Gutenburg corpus would come pretty close to that. Then you have to multiply by some factor to correct for multiplicity, but considering you’re starting with a probability of about 1 in a trillion, it seems unlikely that any correction for multiplicity could make this sort of thing very likely.

A good classroom or homework example, in any case.

6 thoughts on “Null and Vetoed: “Chance Coincidence”?

  1. My thought when I heard this (on NPR) was that there are lots of possible messages that might appear (the monkey with the typewriter). It's analogous to a person winning the lottery twice. People multiply the probability by itself and get something very small. But that's the a priori probability that a particular person will win it twice, not the probability that a news-generating item will be produced when someone wins it twice. Similarly, no one looks closely for seven-letter (eight if you count the line break) messages in the huge number of messages that get sent around. When this particular message gets sent, with a particular insulting message, and noticed, you get a huge flap. I don't think a real probability can be assigned to this, and it may have been intentional, but I wouldn't think that the probability that it was intentional was anything near as small as the 1 in 10 billion that a naive calculation would indicate.

  2. probabilities should be conditional upon all possible paragraphs to express the same idea

    words in natural languages are not independent. i heard 5th order markov chains are not enough

Comments are closed.