p=1/2 or E(p)=1/2, or, the boxer vs. the wrestler

In Bayesian inference, all uncertainty is represented by probability distributions. I remember in grad school discussing the puzzle of distinguishing the following two probabilities:

(1) p1 = the probability that a particular coin will land “heads” in its next flip;

(2) p2 = the probability that the world’s greatest boxer would defeat the world’s greatest wrestler in a fight to the death.

The first of these probabilities is essentially exactly 1/2. Let us suppose, for the sake of argument, that the second probability is also 1/2. Or, to put it more formally, suppose that we have uncertainty about p2, thus a prior distribution, p(p2), and that the mean of this prior distribution, E(p2), equals 1/2.

The paradox

In Bayesian inference, p1 = p2 = 1/2. Which doesn’t seem quite right, since we know p1 much more than we know p2. More generally, it seems a problem with representation-of-uncertainty-by-probability. To put it another way, the integral of a probability is a probability, and once we’ve integrated out the uncertainty in p2, it’s just plain 1/2.

Resolution of the paradox

The resolution of the paradox is that probabilities, and decisions, do not take place in a vacuum. If the only goal were to make a statement, or a bet, about the outcome of the coin flip or the boxing/wrestling match, then yes, p=1/2 is what you can say. But the events occur within a context. In particular, the coin flip probability p1 remains at 1/2, pretty much no matter what information you provide (before the actual flipping occurs, of course). In contrast, one could imagine gathering lots of information (such as in the photo above) that would refine one’s beliefs about p2. “Uncertainty in p2” corresponds to potential information we could learn that would tell us something about p2.

P.S.

Some more discussion by Brad DeLong and others is here. (For some reason, Trackback isn’t working on the blog right now.) One of the commenters refers to the Dempster-Shafer theory as a solution to this problem. D-S won’t solve the problem, though—I’ll explain why in a later posting.

Also, here is another example of the different levels of uncertainty, as it applies to political views.

P.P.S. In comments, Aki Vehtari recommended the article, “Dicing with the unknown,” by Tony O’Hagan. This is indeed an excellent article. The sidebar on page 133 covers my boxer/wrestler example precisely.

9 thoughts on “p=1/2 or E(p)=1/2, or, the boxer vs. the wrestler

  1. Also if I were putting my finicky-philosophy-of-probability hat on, I'd say that this bit:

    Or, to put it more formally, suppose that we have uncertainty about p2, thus a prior distribution

    looks like an unwarranted transition from "uncertainty about p2" to "thus a prior distribution". Surely we want to leave it open that there are problems where there is uncertainty and we *don't* feel able to assign a distribution (in strong cases, even an improper or flat distribution).

  2. Tony O'Hagan discussed this in his excellent article "Dicing with the unknown", Significance 1(3):132-133, 2004. Shortly, there are two types of uncertainty: epistemic, which is due to lack of knowledge and aleatoric which is due to stochastic randomness. In your example, p1 contains only aleatoric uncertainty and p2 contains a lot of epistemic uncertainty. Aleatoric uncertainty is "unknowable", that is, we can't obtain observations, which would help reducing that uncertainty. Epistemic uncertainty is "unknown to me" and it is possible to obtain observations which help to reduce that uncertainty.

  3. This same reasoning lies behind the "Ellsberg Urn Paradox," http://en.wikipedia.org/wiki/Ellsberg_paradox which I must say never seemed very paradoxical to me. I've never understood why people felt any different about epistemic uncertainty than aleatoric uncertainty. Indeed, as you point out, if anything, epistemic uncertainty ought to be preferred because in some cases you can actually reduce it by learning facts about the world.

  4. It's maybe a special case of Andrew's resolution of the paradox, but I think the difference is made quite clear by just considering the joint probability of the outcomes of two coin flips, or of two contests between boxers and wrestlers. The outcomes of the two coin flips will be independent, but the outcomes of the two bouts will not be, since we learn something from the first.

    I've never been very persuaded by the alleged distinction between "risk" and "uncertainty", which some economists like. The experiments and supposed paradoxes supporting this seem to me to be based on a naive confidence that the experimenter's abstract intellectualizing is more rational than the subject's gut intuitions. Real behavious usually takes place in a social context in which knowledge and rewards can both take extremely subtle forms, which the experiment's reasoning may ignore.

    Following the link in the comment about the Ellsberg Paradox, for instance, takes you to another link explaining this in terms of not trusting the experimenter, but with the comment that this doesn't resolve the conflict with expected utility theory. Huh? Did I miss the axiom of decision theory saying you should always believe whatever an experimental psychologist tells you?

  5. Isn't this also a case where you could argue that p1 is fixed (not random), but p2 really is a random variable?

    p1 is 1/2 no matter who is flipping the coin, no matter what time it is, etc.

    p2 could vary from day-to-day, or minute-to-minute. For example, if the boxer was well rested and the wrestler had stayed up all night, p2 is probably >1/2 on that day. But if on a different day, the boxer had a cold, p2 might be less than 1/2. It also might depend on whether the fight was in the morning or at night. Etc.

  6. Take a somewhat different view, that has some overlap with views already expressed.

    Consider all the coins that have ever existed; some are/were biased. For each coin, there is (at each time instance and perhaps varying with time) a probability that it will come up heads: Pheads(i), where 'i' specifies the coin/time-instance. Now plot the PDF of Pheads(i) for all the coins together for each "instant" that each coin is/was available for tossing.

    The PDF of Pheads will be a very narrowly distributed function, with expectation E(Pheads) of 0.5.

    Now consider all the best boxer/wrestler pairings, as they have ever existed; there is (or has been) some considerable variation. For each pairing, there is (at each possible time for a fight and almost certainly varying with time) a probability that the boxer will win: Pboxer(j), where 'j' specifies the pair/fight-time. Now plot the PDF of Pboxer(j) for all the boxer/wrestler pairs together for each "time" that they are/were available for a fight.

    The PDF of Pboxer will be (under the assumptions of this paradox) a very broadly distributed function, with expectation E(Pboxer) of 0.5.

    [Note: in the above, I have only introduced the concept of variation with time, as that obviously would occur in the case of boxer/wrestler. It is not really a central issue for the paradox.]

    We know, from experience (ie inductive argument) that, despite the expectations E(Pheads) and E(Pboxer) both being 0.5, that the widths of the two PDFs (and perhaps even their shapes) are very different.

    Thus, if required to place a bet on each on a single occasion, we would instinctively be happier to bet on even terms for the coin toss than we would be to bet on the boxer/wrestler fight. On the latter, we would be much more concerned that we might be being taken for a ride by the other betting party, through him/her having greater knowledge than us of the value of Pboxer(just now), rather than it being fairly chosen from the PDF for Pboxer known equally to both parties.

    Best regards

  7. I agree with d-squared and jonathan that this is an example of keynes' distinction between the "weight" and "implication" of evidence and ellsberg's idea of ambiguity.

    The interesting thing about this issue is that Savage and most other decision scientists argue that the distinction is irrelevant. But psychologically, people treat the two types of situations very differently.

    I've argued in many papers that it is rational/normative to act differently in situations where the probability is "known" (your p1) and those where the probability is "unknown" (your p2).

    Ellsberg was ADAMANT that it was normatively justifiable to treat the two differently. Here's a recent paper by philosophers agreeing with Ellsberg and me and arguing against Savage and the vast majority of decision scientists.

    http://www.sipta.org/isipta05/proceedings/papers/

  8. I'd just like to add a little to what I wrote here earlier today. This is by way of clarification of the differences (slight though they are) between my view and that given in the original posing of the paradox and comments there-on.

    First, I think it is better to view the "paradox solution" in terms of expectations, for both the coin and the boxer/wrestler cases, rather than view the coin case as 'p' and the boxer/wrestler case as 'E(p)'. [Note, however, I accept that it is good practice, and good fun, to pose the paradox in original terms of 'p' and 'E(p)'.]

    Second, I think it is better to view the difference between the coin case and the boxer/paradox case as one of the width (and other materially difference natures) of the two PDFs of the case probabilities. This is in preference to viewing the difference as categorical (ie binary or finite set); for example the binary categorisation of epistemic versus aleatoric.

    [Just for the record, it might be useful to note that I come from a background in statistical pattern matching, rather than anything more purely in statistics. In that field, there is a general tendency to view with favour, moving concepts from the discrete to the analogue.]

    Best regards

Comments are closed.