Red Baron debunked?

Jeremy Miles forwards this article from the New Scientist:

The legend of Manfred von Richthofen, aka the Red Baron, has taken a knock. The victories notched up by him and other great flying aces of the first world war could have been down to luck rather than skill.

Von Richthofen chalked up 80 consecutive victories in aerial combat. His success seems to suggest exceptional skill, as such a tally is unlikely to be down to pure luck.

However, Mikhail Simkin and Vwani Roychowdhury of the University of California at Los Angeles think otherwise. They studied the records of all German fighter pilots of the first world war and found a total of 6745 victories, but only about 1000 “defeats”, which included fights in which pilots were killed or wounded.

The imbalance reflects, in part, that pilots often scored easy victories against poorly armed or less manoeuvrable aircraft, making the average German fighter pilot’s rate of success as high as 80 per cent. Statistically speaking, at least one pilot could then have won 80 aerial fights in a row by pure chance.

The analysis also suggests that while von Richthofen and other aces were in the upper 30 per cent of pilots by skill, they were probably no more special than that. “It seems that the top aces achieved their victory scores mostly by luck,” says Roychowdhury.

I’m still confused. (6745/7745)^80 = .000016, or 1 in 60,000. Still seems pretty good to me. I mean, with these odds I wouldn’t put my money on Snoopy, that’s for sure.

20 thoughts on “Red Baron debunked?

  1. I tracked down a full copy of the paper (by emailing the authors), and this gives a lot more detail than was presented in the New Scientist summary.

    The full paper is here: <a&gt <a href="http://;http://arxiv.org/abs/physics/0607109">http://arxiv.org/abs/physics/0607109>http://arxiv.org/abs/physics/0607109” target=”_blank”>;http://arxiv.org/abs/physics/0607109">http://arxiv.org/abs/physics/0607109>http://arxiv.org/abs/physics/0607109.

    They use a probability of credited victory (the paper has some discussion of what determines, and who decides, if a victory has occurred) of 0.89.

    Then 0.89^80 = 10^-4.

    However, there calculations get more sophisticated than this, because this assumes that everyone continues flying until they lose (and you usually only have to lose one air battle to not be able to have another). They deal with this in the paper.

    They also consider how the probability of victory changes as a function of fight number.

  2. Don't you need to know how many pilots there were? All we know is that there were at least 1,000. Wouldn't a few thousand bring the chances of one of them getting 80 victories up to the 1 in 20 level?

  3. [Statistically speaking, at least one pilot could then have won 80 aerial fights in a row by pure chance.]

    "Statistically speaking" in this sense is silly if "pure chance" is going to have its normal meaning. Times in the marathon follow a distribution but that doesn't mean that the guy who won wasn't the fastest runner. Also we have non-data information about von Richtofen; he was regarded as an exceptionally gifted flier and invented a lot of manouvres.

  4. I agree with dsquared. It's not as if winning an air battle was like everybody flipping an 89 percent coin. If instead you broke it down into its, say, 600 constituent parts (barrel roll, followed by dive, followed by hiding in that cloud over there) the law of large numbers takes over pretty quickly. This sort of thinking is liable to the sort of ridicule that that noted statistician, Antonin Scalia, used when he ridiculed his colleagues for elevating the luck element of golf when he wrote: "I guess that is why those who follow professional golfing consider Jack Nicklaus the luckiest golfer of all time, only to be challenged of late by the phenomenal luck of Tiger Woods." http://www.law.cornell.edu/supct/html/00-24.ZD.ht

  5. "Statistically speaking" in this sense is silly if "pure chance" is going to have its normal meaning. Times in the marathon follow a distribution but that doesn't mean that the guy who won wasn't the fastest runner.

    Yes, but the times are an independent metric which are then correlated against each other in subsequent runs, so essentially the same experiment is repeated over and over again. Thanks to the independent metric, the results of different races involving entirely different people can be compared. There are also big gaps between the times, too.

    There clearly are cases where it *is* a fallacy to consider that someone is superior merely for winning a bunch of one-on-one bets in a row. Someone has a win a massive single-elimination tournament, even if it is a coin-flipping or rock-paper-scissors tournament. It's not like we'd consider someone who won a 1,048,576 person coin-flipping tournament to be the "best coin-flipper" just because she won 20 matches in a row. WWI was not exactly a massive single-elimination tournament with teams on the Western Front, but I'm sure that at times it seemed like to some. ;)

    It's a perfectly reasonable question to ask "Given a model (or null hypothesis) of pure chance, how unlikely would it be to have one person with 80 consecutive victories out of a population of size X?" It's an important question to ask– people routinely underestimate how often records or unusual occurances would occur under pure chance models like Poisson models, and routinely overestimate how likely something which actually occurred was to occur.

    Whether that turns out to be a convincing value or not is a different matter, and as noted, there's plenty of other data to suggest that he was a skilled flier.

    Exactly how skilled is an interesting, but probably ultimately unanswerable question. There are all sorts of intermediate possibilities that mix skill and chance into the record.

    There's no "world record" for having the best dogfight with the most points scored. There's no repeatable experiment of dogfighting the same person or group of people, nor can different dogfights be compared easily– except by the qualitative information that, at the least, suggests that he was very skilled.

  6. the law of large numbers takes over pretty quickly

    No. The 600 constituent parts of a dogfight are not, IMO, uncorrelated events. You have misused the term.

    Does it make sense in a dogfight to say that it would go on for 600 constituent parts every time, and the person who won the majority of the 600 parts would win the dogfight? No, it does not. Lose the first part and you can lose the fight right there, or at least be at a huge disadvantage for the rest of it. There's no "I would have won the next 599 parts, assuming independence from the results of the first event."

    If those 600 parts were independent events, and always all of them happened, and the person who won the majority won the dogfight, then yes, the law might apply. Not here, though.

  7. To anonymous, the correlation of skills is close to irrelevant. Take the first principal component of skills instead, if you like. Or up the number of skills from 600 to 10,000.

    To John Thacker: The chance model can have interesting insight, but not in the case of obviously widely varying skill levels. Your examples are good ones, dogfights are not rock-paper-scissors. Thus, we have a good example of a statistical analysis guaranteed to have zero informative content. If it had rejected the chance model, we would have said… of course. But when it fails to reject, we still say that skill is the more likely explanation because the likelihood ratio is so much higher for the skill hypothesis, almost no matter what the distribution shows. Granted, like dsquared, you have bring in evidence from outside the chance model, but agnosticism has to have limits.

  8. Jonathan:

    There is no binary choice between a "chance model" and a "skill model." You can chose between multiple models– both skill and luck play a role. The Red Baron could be the absolute most skilled pilot. Or he could be one of ten pilots head and shoulders above the rest indistinguishable from the rest, but he got lucky against those others. Or could have been part of an elite 5% of pilots, one of whom was destined to survive, but the winner from that group was selected by chance. The authors of the paper appear to claim that one would have to believe that the Red Baron, at worst, was part of an elite 30%, but that that assumption is sufficient for someone with his winning streak to appear with fair probability.

    Of course, that's what makes the whole exercise slightly pointless, because it's going to depend on your prior, and because there's such an enormous range.

    I still don't understand how you're claiming to invoke the law of large numbers. Are you using the term in a colloquial fashion? Can you explain your usage in more detail? If you mean:

    1) There are 600-10,000 individual skills involved in dogfighting, the Red Baron has a high percentage chance to be superior in each, so he's way superior in aggregate and wins a huge percentage of the time, then it seems to me like you're assuming the result.

    2) That each dogfight can be broken down into 600-10,000 discrete events, the Red Baron wins the vast majority of the events, thus he wins each one, then that doesn't make any sense and the law of large numbers cannot mathematically apply. Dogfights don't work that way. Losing one event means that you lose the entire dogfight. Yes, of the long-run the large of large numbers would guarantee that the Red Baron would win every dogfight scored on points, or in some video game. But because dogfights suddenly end because someone loses and dies, the law of large numbers doesn't apply.

  9. John Thacker: At the risk of hijacking Andrew's blog for a nearly private conversation, I think we agree that the whole exercise is slightly pointless, and for more or less the same reason.

    Now to the Law: It's not like the first mistake you make necessarily leads to you losing the dogfight, it's just that you fall behind, unless you make a whopper of a mistake, and possess fewer (normally distributed) "recovery from disaster" skills. Dogfights have discrete outcomes, but imagine that what they really are are constantly varying multinomial logit functions (three outcomes: win, lose, keep playing). The game ends when the wave function collapses, but that doesn't mean that the law of large numbers doesn't mean that innumerable small effects aren't (how many negatives can I put in one sentence?) combining to give a nearly nonrandom result. Consider the following example. A guy begins playing people in chess. Ignore the draws. Chess is like a dogfight — a particularly bad move and you're cooked. Now I don't know what we mean by luck necessarily, but the guy who wins 80 in a row wasn't lucky by my definition — he was better than the (possibly ill-prepared) guys he beat. Since I have completely removed all elements of luck in my example, no statistical analysis can possibly demonstrate otherwise. Now, at some deep brain level someone points out that maybe the brain does somehow randomize something so that there is a luck component to the game. But even if there was luck, there is no difference in observed results between normally distributed skill (which will only be normally distributed because of the Law) and derivatively normally distributed (through the Law) luck.

  10. I agree with dsquared. Why in the world was this considered worthy of publication?

    p(evidence/not hot shit)= 1/60,000

    i guess that's higher than was previously believed. that's really scientific news?

    and how does this alter the general perception of this guy as a great flying ace?

    do these statistics wizards have a bayesian analysis? how could a bayesian analysis be that sensitive to this tiny parameter in the whole model?

    why would statistics wizards want to try to shoot down a flying ace with such flimsy data?

  11. I think the remarkable thing is the apparently near 89 per cent chance of a German pilot winning an air battle and that should be regarded as the content.

    As I understand it the paper says that if all duels were settled randomly according to those odds you would expect at least one pilot to do as well as the Red Baron while by hypothesis being no better than any of the others. In such a situation the mere number of victories is not enough to deduce abnormal skill.

    I imagine learning on the job makes the uniform assumption implausible but it is more conservative than one where your chance of winning increases as you go on.

    I'm surprised that dsquared who has blogged extensively the perils of accidental data mining should be hostile to that simple point.

  12. Jonathan:

    My objection is that technically, The Law of Large Numbers doesn't apply, at least not if you stated it. You're using it in its metaphorical or colloquial sense, which I understand, but it's still a technically incorrect statement. I completely agree with your characterization of the process involved: small losses make one fall further behind, and the result at any time is a win, lose, or continue playing with some new state.

    The ability to lose or win immediately means that this is not a tail event; Kolmogorov's 0-1 law does not apply to the odds of winning or losing the dogfight.

    If the dogfight were modeled according to a simple model of "With probability p Competitor 1 wins, with q Competitor 2 wins, and with 1-p-q we continue with no state information," then Competitor 1 would win p/(p+q) of the time… it would not go to near certainty.

    On the other hand, if we model it with something like a scoring system, where Competitor 1 wins if he wins the majority of 500 or 2000 trials, then it does go to near certainty.

    I agree with you that the real model is something in between. However, in that intermediate case the limiting result is not certain victory for the more skilled pilot. The lower the percentage chance of an immediate decision (the more one has to "fall behind" in order to lose), the closer it will converge to where the better pilot wins with certainty. However, I think that the correct model is not one where the better pilot is assured of winning; the correct model includes some percentage chance of quick victories due to mistakes, and thus the Law does not technically apply, at least not the way that you've stated it.

    You have to be careful in applying the Law of Large Numbers in this situation because of all the non-independence factors. Perhaps I'm just a stickler for the precise term because of my mathematician background. Certainly from a colloquial standpoint it's close to a situation where the Law applies. But you can't hand-wave away the difference between converging to Probability 1 of the more skilled pilot winning and the (more realistic, IMO) model where the less skilled pilot retains a finite chance of winning.

  13. As I understand it the paper says that if all duels were settled randomly according to those odds you would expect at least one pilot to do as well as the Red Baron while by hypothesis being no better than any of the others.

    The paper is actually a little more complex than that. It assumes that some 30% of pilots are in an group of aces, while the other 70% are less skilled. If one assumed that all the aces were equally-skilled, then one of them should have the Red Baron's level of success by luck. So it says.

    There's an infinite amount of potential models here, between the uniform skill/pure luck model and the model where results exactly match skill and there is no luck.

  14. I've blogged on exactly this question about the US elections. But as I say, we have non-sample information about Richtofen. He was regarded as an excellent aviator before the war, people who watched his dogfights said he was skilful and he invented a number of manouvres (the same goes for Immelman of the eponymous turn).

    Clearly a coin-flipping tournament has to have a winner, and so does an arm-wrestling tournament. If all arm-wrestling contests were determined by pure chance, then you would expect there to be one guy who would win the world championships and that fits the data pretty well, but in fact he won because he's strongest; the fact that the stochastic model fits is misleading, because any knockout tournament can have a model of this sort fit to it, trivially.

    I think that dog fights are sufficiently like arm-wrestling contests. If you consider someone like Douglas Bader who was identified as a skilful pilot, had his legs amputated and then went on to become an ace, I think it becomes harder to argue that there isn't an element of skill here. I also think it's indicative that Max Immelman is regarded as one of the very greatest of the air aces despite having only 15 victories credited; the best aces aren't just the ones with the most kills.

  15. If all arm-wrestling contests were determined by pure chance, then you would expect there to be one guy who would win the world championships and that fits the data pretty well, but in fact he won because he's strongest; the fact that the stochastic model fits is misleading, because any knockout tournament can have a model of this sort fit to it, trivially.

    Absolutely, any model of this sort can fit any knockout tournament trivially. Very reasonable point. (Though not that this paper does reject the hypothesis that all pilots were equally skilled as too unlikely.)

    However, OTOH, upsets do occur, and frequently in knockout tournaments. In fact, perhaps the champ wins the arm-wrestling tournament when he's the second strongest, beating the strongest in the final with some amount of luck.

    Now, it's one thing to take the position and say that "well, he won, so he must be the best." But some knockout tournaments are run multiple times with largely the same competitors, and the same champion doesn't always win.

    In the real world, there's both skill and luck. The most popular sporting tournaments are precisely those where there's both a skill component, so the results don't seem random, but a luck component, so that upsets do occasionally occur.

    I think it becomes harder to argue that there isn't an element of skill here.

    You're arguing against a straw man, here. From the section of the paper quoted, "The analysis also suggests that while von Richthofen and other aces were in the upper 30 per cent of pilots by skill," we can see that the authors do argue that there was an element of skill. It's an attempt to investigate how large a degree of skill was involved.

    It's a false dichotomy to say that "well, we know that the Red Baron was skilled, so there can't have been luck." This paper isn't even arguing that it was all skill, AIUI; it argues that he was quite skilled, but not necessarily significantly better than the upper 30%.

    I, like you, don't particularly find that persuasive, since we do have outside evidence that suggests that he was better than top 30%. But it's really only picking a point on a sliding scale of luck vs. skill. I find that "it was purely skill, nothing more" explanation unpersuasive just as the reductionist "purely luck" explanation. I think it's slightly interesting to see what minimum degree of skill the data suggests would be necessary to get the observed results with any real likelihood, but it is also an ultimately unanswerable question… and of course outside data should be used.

  16. I can't really comment on any of this because the only time I flew a plane, I held the stick completely still for 30 seconds and then, completely terrified, gave it back to my friend (who was flying it for real).

  17. dsquared wrote
    "Also we have non-data information about von Richtofen; he was regarded as an exceptionally gifted flier and invented a lot of manouvres."

    Richthofen was competent, but not an exceptional pilot. He invented no maneuvers.

    dsquared wrote
    "But as I say, we have non-sample information about Richtofen. He was regarded as an excellent aviator before the war"

    Richthofen learned to fly during the war.

  18. So I find all this extremely interesting, but seeing as how I am a high school student doing a debate about the Red Baron and am not extremely skilled in math of any kind, especially statistics. I was wondering if you might explain how you came to the conclusion that Richthofen was great because of mostly luck instead of pure skill.

  19. Ryan,
    I think the basic point to take home from this discussion is that the red baron was a skilled pilot, but he may not have been that much better than other highly skilled pilots. In any contest, there is a combination of skill and luck. From a non-statistical point of view, I suppose you could think about it this way: he could not have amassed as many victories as he did by luck alone…he must also have been a skilled pilot.

    An important point from the article that you should also think about is that the Red Baron may have had an increased edge because of more advanced planes in terms of weaponry and maneuverability. Thus, he may not have been as skilled of a pilot as he may seem at first glance due to the edge in technology combined with some luck.

    In order to understand how luck plays a part in this, thinking about flipping a coin. If you and each one of your friends flip a coin 10 times, sometimes you could 7, 8 or even 9 heads just by chance. Does this mean that the coin you flipped had a higher than 50% chance of flipping a head? Nope. Probability theory tells us that such an event is likely to happen by chance when many people flip a coin 10 times even when the probability of seeing a head is just 1/2.

    In dogfights during the war, if you assume that each pilot has an equal chance of winning, then you would expect a few pilots to win a majority of their fights by chance. Now, the red baron did win an extraordinary number of fights in a row, so it is almost certain that he had a greater than 50% chance of winning each fight. The only question is why, how many of his wins are because of his skill as a pilot, and how many are because he had a better plane and/or went after easier targets.

Comments are closed.