Recently in Sports Category

I was pleasantly surprised to have my recreational reading about baseball in the New Yorker interrupted by a digression on statistics. Sam Fuld of the Tampa Bay Rays, was the subjet of a Ben McGrath profile in the 4 July 2011 issue of the New Yorker, in an article titled Super Sam. After quoting a minor-league trainer who described Fuld as "a bit of a geek" (who isn't these days?), McGrath gets into that lovely New Yorker detail:

One could have pointed out the more persuasive and telling examples, such as the fact that in 2005, after his first pro season, with the Class-A Peoria Chiefs, Fuld applied for a fall internship with Stats, Inc., the research firm that supplies broadcasters with much of the data anad analysis that you hear in sports telecasts.

After a description of what they had him doing, reviewing footage of games and cataloguing, he said

"I thought, They have a stat for everything, but they don't have any stats regarding foul balls."

Obit here. I think I have a cousin with the same last name as this guy, so maybe we're related by marriage in some way. (By that standard we're also related to Marge Simpson and, I seem to recall, the guy who wrote the scripts for Dark Shadows.)

Ben Lindbergh invited me to write an article for Baseball Prospectus. I first sent him this item on the differences between baseball and politics but he said it was too political for them. I then sent him this review of a book on baseball's greatest fielders but he said they already had someone slotted to review that book. Then I sent him some reflections on the great Bill James and he published it! If anybody out there knows Bill James, please send this on to him: I have some questions at the end that I'm curious about.

Here's how it begins:

In politics, as in baseball, hot prospects from the minors can have trouble handling big-league pitching.

No joke. See here (from Kaiser Fung). At the Statistics Forum.

I was recently rereading and enjoying Bill James's Historical Baseball Abstract (the second edition, from 2001).

But even the Master is not perfect. Here he is, in the context of the all-time 20th-greatest shortstop (in his reckoning):

Are athletes special people? In general, no, but occasionally, yes. Johnny Pesky at 75 was trim, youthful, optimistic, and practically exploding with energy. You rarely meet anybody like that who isn't an ex-athlete--and that makes athletes seem special. [italics in the original]

Hey, I've met 75-year-olds like that--and none of them are ex-athletes! That's probably because I don't know a lot of ex-athletes. But Bill James . . . he knows a lot of athletes. He went to the bathroom with Tim Raines once! The most I can say is that I saw Rickey Henderson steal a couple bases when he was playing against the Orioles once.

Cognitive psychologists talk about the base-rate fallacy, which is the mistake of estimating probabilities without accounting for underlying frequencies. Bill James knows a lot of ex-athletes, so it's no surprise that the youthful, optimistic, 75-year-olds he meets are likely to be ex-athletes. The rest of us don't know many ex-athletes, so it's no suprrise that most of the youthful, optimistic, 75-year-olds we meet are not ex-athletes.

The mistake James made in the above quote was to write "You" when he really meant "I." I'm not disputing his claim that athletes are disproportionately likely to become lively 75-year-olds; what I'm disagreeing with is his statement that almost all such people are ex-athletes.

Yeah, I know, I'm being picky. But the point is important, I think, because of the window it offers into the larger issue of people being trapped in their own environment (the "availability heuristic," in the jargon of cognitive psychology). Athletes loom large in Bill James's world--and I wouldn't want it any other way--and sometimes he forgets that the rest of us live in a different world.

Baseball's greatest fielders


Someone just stopped by and dropped off a copy of the book Wizardry: Baseball's All-time Greatest Fielders Revealed, by Michael Humphreys. I don't have much to say about the topic--I did see Brooks Robinson play, but I don't remember any fancy plays. I must have seen Mark Belanger but I don't really recall. Ozzie Smith was cool but I saw only him on TV. The most impressive thing I ever saw live was Rickey Henderson stealing a base. The best thing about that was that everyone was expecting him to steal the base, and he still was able to do it. But that wasn't fielding either.

Anyway, Humphreys was nice enough to give me a copy of his book, and since I can't say much (I didn't have it in me to study the formulas in detail, nor do I know enough to be able to evaluate them), I might as well say what I can say right away.

(Note: Humphreys replies to some of these questions in a comment.)

1. Near the beginning, Humphreys says that 10 runs are worth about 1 win. I've always been curious about Bill James's Pythagorean projection, so let me try it out here. If a team scores 700 runs in 162 games, then an extra 10 runs is 710, and Bill James's prediction is Games.Won/Games.Lost = (710/700)^2 = 1.029. Winning 1 extra game gives you an 82-80 record, for a ratio of 82/80=1.025. So that basically lines up.

There must be some more fundamental derivation, though. I don't see where the square comes from in James's model, and I don't see where the 10 comes from in Humphreys. I mean, I can see where it can arise empirically--and the idea that 10 runs = 1 extra win is a good thing to know, partly because it seems like a surprise at first (my intuition would've been that 10 extra runs will win you a few extra games), but I feel like there's some more fundamental relationship from which the 10:1 or Pythagorean relationship can be derived.

2. As I understand it, Humphreys is proposing two methods to evaluate fielders:
- The full approach, given knowledge of where all the balls are hit when a player is in the field.
- The approximate approach using available available data.

What I'm wondering is: Are there some simpler statistics that capture much of the substance of Humphreys's more elaborate analysis? For example, Bill James has his A*B/C formula for evaluating offensive effectiveness. But there's also on-base percentage and slugging average, both of which give a pretty good sense of what's going on and serve as a bridge between the basic statistics (1B, 2B, 3B, BB, etc) and the ultimate goal of runs scored. Similarly, I think Humphreys would make many a baseball fan happy if he could give a sense of the meaning of some basic fielding statistics--not just fielding average but also #assists, #double plays, etc. One of my continuing struggles as an applied statistician is to move smoothly between data, model, and underlying substance. In this case, I think Humphreys would be providing a richer picture if he connected some of these dots. (One might say, perversely, that Bill James had an advantage of learning in public, as it were: instead of presenting a fully-formed method, he tried out different ideas each year, thus giving us a thicker understanding of batting and pitching statistics, on top of our already-developed intuition about doubles, triples, wins and losses, etc.)

3. Humphreys makes the case that fielding is more important, as a contribution to winning, than we've thought. But perhaps his case could be made even stronger. Are there other aspects of strong (or weak) fielding not captured in the data? For example, suppose you have a team such as the '80s Cardinals with a fast infield, a fast outfield, and a pitching staff that throws a lot of low pitches leading to ground balls. I might be getting some of these details wrong, but bear with me. In this case, the fielders are getting more chances because the manager trusts them enough to get ground-ball pitchers. Conversely, a team with bad fielders perhaps will adjust their pitching accordingly, taking more chances with the BB and HR. Is this captured in Humphreys's model? I don't know. If not, this is not meant as a criticism, just a thought of a way forward. Also, I didn't read every word of the book so maybe he actually covers this selection issue at some point.

4. No big deal, but . . . I'd like to see some scatterplots. Perhaps start with something simple like some graphs of (estimated) offensive ability vs. (estimated) defensive ability, for all players and for various subsets. Then some time series of fielding statistics, both the raw data of putouts, chances, assists, etc. (see point 2 above) and then the derived statistics. It would be great to see individual career trajectories and also league averages by position.

5. Speaking of time series . . . Humphreys talks a lot about different eras of baseball and argues persuasively that players are much better now than in the old days. This motivates some adjustment for the years in which a player was active, just as with statistics for offense and pitching.

The one thing I'm worried about in the comparison of players from different eras is that I assume that fielding as a whole has been more important in some periods (e.g., the dead-ball era) than in others. If you're fielding in an era where fielding matters more, you can actually save more runs and win more games through fielding. I don't see how Humphreys's method of adjustment can get around that. Basically, in comparing fielders in different eras, you have a choice between evaluating what they did or what they could do. This is a separate issue from expansion of the talent pool and general improvement in skills.


I enjoyed the book. I assume that is clear to most of you already, as I wouldn't usually bother with a close engagement if I didn't think there was something there worth engaging with. Now I'll send it off to Kenny Shirley who might have something more helpful to say about it.

Online James?

| 1 Comment

Eric Tassone writes:

Chess vs. checkers


Mark Palko writes:

Chess derives most of its complexity through differentiated pieces; with checkers the complexity comes from the interaction between pieces. The result is a series of elegant graph problems where the viable paths change with each move of your opponent. To draw an analogy with chess, imagine if moving your knight could allow your opponent's bishop to move like a rook. Add to that the potential for traps and manipulation that come with forced capture and you have one of the most remarkable games of all time. . . .

It's not unusual to hear masters of both chess and checkers (draughts) to admit that they prefer the latter. So why does chess get all the respect? Why do you never see a criminal mastermind or a Bond villain playing in a checkers tournament?

Part of the problem is that we learn the game as children so we tend to think of it as a children's game. We focus on how simple the rules are and miss how much complexity and subtlety you can get out of those rules.

As a person who prefers chess to checkers, I have a slightly different story. To me, checkers is much more boring to play than chess. All checkers games look the same, but each chess game it its own story. I expect this is true at the top levels too, but the distinction is definitely there for casual players. I can play chess (at my low level) without having to think too hard most of the time and still enjoy participating, making plans, attacking and defending. I feel involved at any level of effort. In contrast, when I play a casual game of checkers, it just seems to me that the pieces are moving by themselves and the whole game seems pretty random.

I'm not saying this is true of everyone--I'm sure Palko is right that checkers can have a lot going for it if you come at it with the right attitude--but I doubt my experiences are unique, either. My argument in favor of chess is not a naive "Chess has more possibilities" (if that were the attraction, we'd all be playing 12x12x12 three-dimensional chess by now) but that the moderate complexity of chess allows for a huge variety of interesting positions that are intricately related to each other.

Overall, I think Palko's argument about elegant simplicity applies much better to Go than to checkers.

But what happens next?

I wonder what will happen when (if?) chess is fully solved, so that we know (for example) that with optimal play the game will end in a draw. Or, if they ever make that rules change so that a stalemate is a loss, maybe they'll prove that White can force a win. In a way this shouldn't change the feel of a casual game of chess, but I wonder.

Hey, here's a book I'm not planning to read any time soon!

As Bill James wrote, the alternative to good statistics is not "no statistics," it's bad statistics.

(I wouldn't have bothered to bring this one up, but I noticed it on one of our sister blogs.)

Heat map


Jarad Niemi sends along this plot:


and writes:

2010-2011 Miami Heat offensive (red), defensive (blue), and combined (black) player contribution means (dots) and 95% credible intervals (lines) where zero indicates an average NBA player. Larger positive numbers for offensive and combined are better while larger negative numbers for defense are better.

In retrospect, I [Niemi] should have plotted -1*defensive_contribution so that larger was always better. The main point with this figure is that this awesome combination of James-Wade-Bosh that was discussed immediately after the LeBron trade to the Heat has a one-of-these-things-is-not-like-the-other aspect. At least according to my analysis, Bosh is hurting his team compared to the average player (although not statistically significant) due to his terrible defensive contribution (which is statistically significant).

All fine so far. But the punchline comes at the end, when he writes:

Anyway, a reviewer said he hated the figure and demanded to see a table with the actual numbers instead.


Bidding for the kickoff


Steven Brams and James Jorash propose a system for reducing the advantage that comes from winning the coin flip in overtime:

Dispensing with a coin toss, the teams would bid on where the ball is kicked from by the kicking team. In the NFL, it's now the 30-yard line. Under Brams and Jorasch's rule, the kicking team would be the team that bids the lower number, because it is willing to put itself at a disadvantage by kicking from farther back. However, it would not kick from the number it bids, but from the average of the two bids.

To illustrate, assume team A bids to kick from the 38-yard line, while team B bids its 32-yard line. Team B would win the bidding and, therefore, be designated as the kick-off team. But B wouldn't kick from 32, but instead from the average of 38 and 32--its 35-yard line.

This is better for B by 3 yards than the 32-yard line that it proposed, because it's closer to the end zone it is kicking towards. It's also better for A by 3 yards to have B kick from the 35-yard line, rather than from the 38-yard line, it proposed if it were the kick-off team.

In other words, the 35-yard line is a win-win solution--both teams gain a 3-yard advantage over what they reported would make them indifferent between kicking and receiving. While bidding to determine the yard line from which a ball is kicked has been proposed before, the win-win feature of using the average of the bids--and recognizing that both teams benefit if the low bidder is the kicking team--has not. Teams seeking to merely get the ball first would be discouraged from bidding too high--for example, the 45-yard line--as this could result in a kick-off pinning them far back in their own territory.

"Metaphorically speaking, the bidding system levels the playing field," Brams and Jorasch maintain. "It also enhances the importance of the strategic choices that the teams make, rather than leaving to chance which team gets a boost in the overtime period."

This seems like a good idea. Also fun for the fans--another way to second-guess the coach.

During our discussion of estimates of teacher performance, Steve Sailer wrote:

I suspect we're going to take years to work the kinks out of overall rating systems.

By way of analogy, Bill James kicked off the modern era of baseball statistics analysis around 1975. But he stuck to doing smaller scale analyses and avoided trying to build one giant overall model for rating players. In contrast, other analysts such as Pete Palmer rushed into building overall ranking systems, such as his 1984 book, but they tended to generate curious results such as the greatness of Roy Smalley Jr.. James held off until 1999 before unveiling his win share model for overall rankings.

I remember looking at Pete Palmer's book many years ago and being disappointed that he did everything through his Linear Weights formula. A hit is worth X, a walk is worth Y, etc. Some of this is good--it's presumably an improvement on counting walks as 0 or 1 hits, also an improvement on counting doubles and triples as equal to 2 and 3 hits, and so forth. The problem--besides the inherent inflexibility of a linear model with no interactions--is that Palmer seemed chained to it. When the model gave silly results, Palmer just kept with it. I don't do that with my statistical models. When I get a surprising result, I look more carefully. And if it really is a mistake of some sort, I go and change the model (see, for example, the discussion here). Now this is a bit unfair: after all, Palmer's a sportswriter and I'm a professional statistician--it's my job to check my models.

Still and all, my impression is that Palmer was locked into his regression models and that it hurt his sportswriting. Bill James had a comment once about some analysis of Palmer that gave players negative values in the declining years of their careers. As James wrote, your first assumption is that when a team keeps a player on their roster, they have a good reason. (I'm excepting Jim Rice from this analysis. Whenever he came up to bat with men on base, it was always a relief to see him strike out, as that meant that he'd avoided hitting into a double play.)

Bill James did not limit himself to linear models. He often used expressions of the form (A+B)/(C+D) or sqrt(A^2+B^2). This gave him more flexibility to fit data and also allowed him more entries into the modeling process: more ways to include prior information than simply to throw in variables.

What about my own work? I use linear regression a lot, to the extent that a couple of my colleagues once characterized my work on toxicology as being linear modeling. True, these were two of my stupider colleagues (and that's saying a lot), but the fact that a couple of Ph.D.'s could confuse a nonlinear differential equation with a linear regression does give some sense of statisticians' insensitivity to functional forms. we tend to focus on what variables go into the model without much concern for how they fit together. True, sometimes we use nonparametric methods--lowess and the like--but it's not so common that we do a Bill James and carefully construct a reasonable model out of its input variables.

But maybe I should be emulating Bill James in this way. Right now, I get around the constraints of linearity and additivity by adding interaction after interaction after interaction. That's fine, but perhaps a bit of thoughtful model construction would be a useful supplement to my usual brute-force approach.

P.S. Actually, I think that James himself could've benefited from the discipline of quantitative models. I don't know about Roy Smalley,Jr., but, near the end of the Baseball Abstract period, my impression was that James started to mix in more and more unsupported opinions, for example in 1988 characterizing Phil Bradley as possibly the best player in baseball. That's fine--I'm no baseball expert, and maybe Phil Bradley really was one of the top players of 1987, or maybe he's a really nice guy and Bill James wanted to help him out, or maybe James was just kidding on that one.. My guess (based on a lot of things in the last couple of Baseball Abstracts, not just that Phil Bradley article) is simply that James had been right on so many things where others had been wrong that he started to trust his hunches without backing them up with statistical analysis. Whatever. In any case, Win Shares was probably a good idea for Bill James as it kept him close to the numbers.

Nate writes:

The Yankees have offered Jeter $45 million over three years -- or $15 million per year. . . But that doesn't mean that the process won't be frustrating for Jeter, or that there won't be a few hurt feelings along the way. . . .

$45 million, huh? Even after taxes, that's a lot of money!

It worked on this one.

Good maze designers know this trick and are careful to design multiple branches in each direction. Back when I was in junior high, I used to make huge mazes, and the basic idea was to anticipate what the solver might try to do and to make the maze difficult by postponing the point at which he would realize a path was going nowhere. For example, you might have 6 branches: one dead end, two pairs that form loops going back to the start, and one that is the correct solution. You do this from both directions and add some twists and turns, and there you are.

But the maze designer aiming for the naive solver--the sap who starts from the entrance and goes toward the exit--can simplify matters by just having 6 branches: five dead ends and one winner. This sort of thing is easy to solve in the reverse direction. I'm surprised the Times didn't do better for their special puzzle issue.

Posted at MediaBistro:

The Harvard Sports Analysis Collective are the group that tackles problems such as “Who wrote this column: Bill Simmons, Rick Reilly, or Kevin Whitlock?” and “Should a football team give up free touchdowns?

It’s all fun and games, until the students land jobs with major teams.

According to the Harvard Crimson, sophomore John Ezekowitz and junior Jason Rosenfeld scored gigs with the Phoenix Suns and the Shanghai Sharks, respectively, in part based on their work for HSAC.

It’s perhaps not a huge surprise that the Sharks would be interested in taking advantage of every available statistic. They are owned by Yao Ming, who plays for the Houston Rockets. The Rockets, in turn, employ general manager Daryl Morey who Simmons nicknamed “Dork Elvis” for his ahead of the curve analysis. (See Michael LewisThe No Stats All-Star for an example.) But still, it’s very cool to see the pair get an opportunity to change the game.

In defense of jargon


Daniel Drezner takes on Bill James.

Bike shelf


Susan points me to this. But I don't really see the point. Simply leaning the bike against the wall seems like a better option to me.

Update on marathon statistics


Frank Hansen updates his story and writes:

Here is a link to the new stuff. The update is a little less than half way down the page.

1. used display() instead of summary()

2. include a proxy for [non] newbies -- whether I can find their name in a previous Chicago Marathon.

3. graph actual pace vs. fitted pace (color code newbie proxy)

4. estimate the model separately for newbies and non-newbies.

some incidental discussion of sd of errors.

There are a few things unfinished but I have to get to bed, I'm running the 2010 Chicago Half tomorrow morning, and they moved the start up from 7:30 to 7:00 because it's the day of the Bears home opener too.



Dave Berri writes:

Saw you had a post on the research I did with Rob Simmons on the NFL draft. I have attached the article. This article has not officially been published, so please don't post this on-line.

The post you linked to states the following: "On his blog, Berri says he restricts the analysis to QBs who have played more than 500 downs, or for 5 years. He also looks at per-play statistics, like touchdowns per game, to counter what he considers an opportunity bias."

Two points: First of all, we did not look at touchdowns per game (that is not a per play stat). More importantly -- as this post indicates -- we did far more than just look at data after five years.

We did mention the five year result, but directly below that discussion (and I mean, directly below), the following sentences appear.

Our data set runs from 1970 to 2007 (adjustments were made for how performance changed over time). We also looked at career performance after 2, 3, 4, 6, 7, and 8 years. In addition, we also looked at what a player did in each year from 1 to 10. And with each data set our story looks essentially the same. The above stats are not really correlated with draft position.

This analysis was also updated and discussed in this post (posted on-line last May). Hopefully that post will also help you see the point Rob and I are making.

I'm out of my depth on this football stuff so I'll leave it to you, the commenters.

Gladwell vs Pinker


I just happened to notice this from last year. Eric Loken writes:

Steven Pinker reviewed Malcolm Gladwell's latest book and criticized him rather harshly for several shortcomings. Gladwell appears to have made things worse for himself in a letter to the editor of the NYT by defending a manifestly weak claim from one of his essays - the claim that NFL quarterback performance is unrelated to the order they were drafted out of college. The reason w [Loken and his colleagues] are implicated is that Pinker identified an earlier blog post of ours as one of three sources he used to challenge Gladwell (yay us!). But Gladwell either misrepresented or misunderstood our post in his response, and admonishes Pinker by saying "we should agree that our differences owe less to what can be found in the scientific literature than they do to what can be found on Google."

Well, here's what you can find on Google. Follow this link to request the data for NFL quarterbacks drafted between 1980 and 2006. Paste the data into a spreadsheet and make a simple graph of touchdowns thrown (as of 2008) versus order of selection in the draft to create the picture below.


Predicting marathon times


Frank Hansen writes:

I [Hansen] signed up for my first marathon race. Everyone asks me my predicted time. The predictors online seem geared to or are based off of elite runners. And anyway they seem a bit limited.

So I decided to do some analysis of my own.

I think you knew this already


I was playing out a chess game from the newspaper and we reminded how the best players use the entire board in their game. In my own games (I'm not very good, I'm guessing my "rating" would be something like 1500?), the action always gets concentrated on one part of the board. Grandmaster games do get focused on particular squares of the board, of course, but, meanwhile, there are implications in other places and the action can suddenly shift.

Editing and clutch hitting

| 1 Comment

Regarding editing: The only serious editing I've ever received has been for my New York Times op-eds and my article in the American Scientist. My book editors have all been nice people, and they've helped me with many things (including suggestions of what my priorities should be in communicating with readers)--they've been great--but they've not given (nor have I expected or asked for) serious editing. Maybe I should've asked for it, I don't know. I've had time-wasting experiences with copy editors and a particularly annoying experience with a production editor (who was so difficult that my coauthors and I actually contacted our agent and a lawyer about the possibility of getting out of our contract), but that's another story.

Regarding clutch hitting, Bill James once noted that it's great when a Bucky Dent hits an unexpected home run, but what's really special is being able to get the big hit when it's expected of you. The best players can do their best every time they come to the plate. That's why Bill James says that the lack of evidence for clutch hitting makes sense, it's not a paradox at all: One characteristic of pros is that they can do it over and over.

From a commenter on the web, 21 May 2010:

Tampa Bay: Playing .732 ball in the toughest division in baseball, wiped their feet on NY twice. If they sweep Houston, which seems pretty likely, they will be at .750, which I [the commenter] have never heard of.

At the time of that posting, the Rays were 30-11. Quick calculation: if a team is good enough to be expected to win 100 games, that is, Pr(win) = 100/162 = .617, then there's a 5% chance that they'll have won at least 30 of their first 41 games. That's a calculation based on simple probability theory of independent events, which isn't quite right here but will get you close and is a good way to train one's intuition, I think.

Having a .732 record after 41 games is not unheard-of. The Detroit Tigers won 35 of their first 40 games in 1984: that's .875. (I happen to remember that fast start, having been an Orioles fan at the time.)

Now on to the key ideas

The passage quoted above illustrates three statistical fallacies which I believe are common but are not often discussed:

1. Conditioning on the extrapolation. "If they sweep Houston . . ." The relevant data were that the Rays were .732, not .750.

2. Counting data twice: "Playing .732 . . . wiped their feet on NY twice." Beating the Yankees is part of how they got to .732 in the first place.

3. Remembered historical evidence: "at .750, which I have never heard of." There's no particular reason the commenter should've heard of the 1894 Tigers; my point here is that past data aren't always as you remember them.

P.S. I don't mean to pick on the above commenter, who I'm sure was just posting some idle thoughts. In some ways, though, perhaps these low-priority remarks are the best windows into our implicit thinking.

P.P.S. Yes, I realize this is out of date--the perils of lagged blog posting. But the general statistical principles are still valid.

Suppose you and I agree on a probability estimate...perhaps we both agree there is a 2/3 chance Spain will beat Netherlands in tomorrow's World Cup. In this case, we could agree on a wager: if Spain beats Netherlands, I pay you $x. If Netherlands beats Spain, you pay me $2x. It is easy to see that my expected loss (or win) is $0, and that the same is true for you. Either of us should be indifferent to taking this bet, and to which side of the bet we are on. We might make this bet just to increase our interest in watching the game, but neither of us would see a money-making opportunity here.

By the way, the relationship between "odds" and the event probability --- a 1/3 chance of winning turning into a bet at 2:1 odds --- is that if the event probability is p, then a fair bet has odds of (1/p - 1):1.

More interesting, and more relevant to many real-world situations, is the case that we disagree on the probability of an event. If we disagree on the probability, then there should be a bet that we are both happy to make --- happy, because each of us thinks we are coming out ahead (in expectation). Consider an event that I think has a 1/3 chance of occurring, but you put the probability at only 1/10. If you offer, say, 5:1 odds --- I pay you $1 if the event doesn't occur, but you pay me $5 if it does --- each of us will think this is a good deal. But the same is true at 6:1 odds, or 7:1 odds. I should be willing to accept any odds higher than 2:1, and you should be willing to offer any odds up to 9:1. How should we "split the difference"?

I started pondering this question when I read the details of a wager, or rather a non-wager, that I had previously only heard about in outline: scientists James Annan and Richard Lindzen were unable to agree to terms for a bet about climate change. Lindzen thinks, or claims to think, that the "global temperature anomaly" is likely to be less than 0.2 C twenty years from now, but Annan thinks, or claims to think, it is very likely to be higher. You can imagine a disagreement over the details --- since the global temperature anomaly can't be measured exactly, perhaps you'd want to call off the bet (doing so is called a "push" in betting parlance) if the anomaly is estimated to be, say, between 0.18 and 0.22 C --- but surely, given that the probability assessments are so different, there should still be a wager that both sides are eager to make! But in fact, they couldn't agree on terms.

Chris Hibbert has discussed the issue of agreeing on a bet on his blog, where he mentions that Dan Reeves "argues, convincingly, that the arithmetic mean gives each party the same expectation of gain, and that is what fairness requires." But Hibbert goes on to say that "the way that bayesians would update their odds is to use the geometric mean of their odds." I'm not sure of the relevance of this latter statement, when it comes to making a fair bet.

Suppose I think the probability of a given event is a, and you think the probability is b. If the event occurs, you will pay me $x, and if it doesn't occur, I will pay you $y. We don't need to know the actual probability in order to figure out how much each of us thinks the bet is worth: I think I will gain ax - (1-a)y, and you think you will gain -bx + (1-b)y. We might say a wager is "reasonable" --- the word "fair" is already taken --- if I think it's worth as much to me as you think it is worth to you. Look at it this way: I should be willing to pay up to ax - (1-a)y to participate in this wager, and you should be willing to pay up to -bx + (1-b)y. If those amounts are equal, then we'd each be willing to pay the same amount to participate in this game.

Setting the two terms equal and doing the math, we end up with a reasonable bet if x= y(2-(a+b))/(a+b) or, equivalently, x = y(2/(a+b) - 1). Note that this is the same thing we would get if we agreed that the probability p = (a+b)/2. So, I agree with Dan Reeves and his co-authors: the way to make a reasonable bet is to take the arithmetic mean of the probability estimates.

A Wikipedia whitewash


After hearing a few times about the divorce predictions of researchers John Gottman and James Murray (work that was featured in Blink with a claim that they could predict with 83 percent accuracy whether a couple would be divorced--after meeting with them for 15 minutes) and feeling some skepticism, I decided to do the Lord's work and amend Gottman's wikipedia entry, which had a paragraph saying:

Gottman found his methodology predicts with 90% accuracy which newlywed couples will remain married and which will divorce four to six years later. It is also 81% percent accurate in predicting which marriages will survive after seven to nine years.

I added the following:

Gottman's claim of 81% or 90% accuracy is misleading, however, because the accuracy is measured only after fitting a model to his data. There is no evidence that he can predict the outcome of a marriage with high accuracy in advance. As Laurie Abraham writes, "For the 1998 study, which focused on videotapes of 57 newlywed couples . . . He knew the marital status of his subjects at six years, and he fed that information into a computer along with the communication patterns turned up on the videos. Then he asked the computer, in effect: Create an equation that maximizes the ability of my chosen variables to distinguish among the divorced, happy, and unhappy. . . . What Gottman did wasn't really a prediction of the future but a formula built after the couples' outcomes were already known. . . . The next step, however--one absolutely required by the scientific method--is to apply your equation to a fresh sample to see whether it actually works. That is especially necessary with small data slices (such as 57 couples), because patterns that appear important are more likely to be mere flukes. But Gottman never did that. Each paper he's published heralding so-called predictions is based on a new equation created after the fact by a computer model."

I was thinking this would just get shot down right away, but I checked on it every now and then and it was still up.

Finally, on 21 May, my paragraph was completely removed by contributor Annsy5, who also wrote:

Full disclosure: I [Annsy5] work for The Gottman Relationship Institute, which was co-founded by John Gottman, and we would like a change made to the Wikipedia entry on him.

The 3rd paragraph is made up largely of Laurie Abraham's claims about Dr. Gottman's research. Ms. Abraham's claims are inaccurate, and thorough citations can be found here: We would like the paragraph removed, or at least moved to a section where the details of Dr. Gottman's research can be expanded upon.

I know that it would be a violation of the Conflict of Interest policy for me to just go in and make the changes, so I would like other editors' input. We're not trying to bury anything "bad" about Dr. Gottman, we just want the information that is out there to be accurate! Please advise...

I don't know enough about Wikipedia to want to add my paragraph back in, but what's going on here? On 23:57, 20 May 2010, Annsy5 writes "I know that it would be a violation of the Conflict of Interest policy for me to just go in and make the changes," and then on 23:13, 21 May 2010, Annsy5 goes and removes the paragraph and all references to criticisms of Gottman's work.

That doesn't seem right to me. A link to a rebuttal by Gottman would be fine. But removing all criticism while leaving the disputed "90% accuracy" claim . . . that's a bit unscholarly, no?

P.S. A commenter asked why I posted this on the blog rather than doing this on wikipedia. The reason is that I'm more interested in the wikipedia aspect of this than the marriage-counseling aspect, and I thought the blog entry might get some interesting discussion. I know nothing about Gottman and Murray beyond what I've written on the blog, and I'm certainly not trying to make any expert criticism of their work. What does seem to be happening is that they get their claims out in the media and don't have much motivation to tone down the sometimes overstated claims made on their behalf. Whatever the detailed merits of Abraham's criticisms, I thought it was uncool for them to be removed from the wikipedia pages: Her reporting is as legitimate as Gladwell's. But I'm not the one to make any technical additions here.

I've occasionally mocked academic economists for their discussions of research papers as "singles" or "home runs" (a world in which, counter to the findings of sabermetrics, one of the latter is worth more than four of the former). The best thing, of course, is a "grand slam," a term that I always found particularly silly as it depends not just on the quality of the hit but also on external considerations ("men on base"). But then I was thinking about this again, and I decided the analogy isn't so bad: For a research paper to be really influential and important, it has to come at the right time, and the field has to be ready for it. That's what turns a home run into a grand slam. So in this case the reasoning works pretty well.

Dan Goldstein did an informal study asking people the following question:

When two baseball teams play each other on two consecutive days, what is the probability that the winner of the first game will be the winner of the second game?

You can make your own guess and the continue reading below.

Gianluca Baio sends along this article (coauthored with Marta Blangiardo):

A few months ago, Yu-Sung and I summarized some survey results from the 1993-1996 General Social Survey. 56% of respondents said they attended an amateur or professional sports event" during the past twelve months, and it turned out that they were quite a bit more Republican than other Americans but not much different in their liberal-conservative ideology:


Then, the other day, someone pointed me to this analysis by Reid Wilson of a survey of TV sports watchers. (Click the image below to see it in full size.)


The graph is very well done. In particular, the red and blue coloring (indicating points to the left or right of the zero line) and the light/dark (indicating points above or below the center line on the vertical axis) are good ideas, I think, despite that they convey no additional information, in that they draw attention to key aspects of the data.

The first two or three paragraphs of this post aren't going to sound like they have much to do with weight loss, but bear with me.

In October, I ran in a 3K (1.86-mile) "fun run" at my workplace, and was shocked to have to struggle to attain 8-minute miles. This is about a minute per mile slower than the last time I did the run, a few years ago, and that previous performance was itself much worse than a run a few years earlier. I no longer attempt to play competitive sports or to maintain a very high level of fitness, but this dismal performance convinced me that my modest level of exercise --- a 20- to 40-mile bike ride or a 4-mile jog each weekend, a couple of one-hour medium-intensity exercise sessions during the week, and an occasional unusual effort (such as a 100-mile bike ride) --- was not enough to keep my body at a level of fitness that I consider acceptable.

So after that run in October, I set some running goals: 200 meters in 31 seconds, 400m meters in der 64 seconds, and a mile in 6 minutes. (These are not athlete goals, but they are decent middle-aged-guy-with-a-bad-knee goals, and I make no apology for them). Around the end of October, I started going to the track 5 or 6 days per week, for an hour per workout. I started with the 200m goal. I alternated high-intensity workouts with lower-intensity workouts. All workouts start with 20 minutes of warmup, gradually building in intensity: skips, side-skips, butt-kicks, , a couple of active (non-stationary) stretching exercises, leg swings, high-knee running, backward shuffle, backward run, "karaokas" (a sort of sideways footwork drill), straight-leg bounds, and finally six or seven "accelerations", accelerating from stationary to high speed over a distance of about 30 meters. After the 20-minute warmup, I do the heart of the program, which takes about 30 minutes. (The final ten minutes, I do "core" work such as crunches, and some stretching). A high-intensity workout might include running up stadium sections (about 12 seconds at very close to maximum effort, followed by a 20- to 30-second break, then repeat, multiple times), or all-out sprints of 60, 100, or 120 meters...or a variety of other exercises at close to maximum effort. Every week or so, I would do an all-out 200m to gauge my progress. My time dropped by about a second per week, and within about 6 weeks I had run my sub-31 and shifted my workouts to focus on the 400m goal (which I am still between 1 and 2 seconds from attaining, almost three months later, but that's a different story).

So where does weight loss come in? I was shaving off pounds at about the same rate that I shaved off seconds in the 200m: I dropped from around 206 - 208 pounds at the end of October to under 200 in early December, and contined to lose weight more slowly after that, to my current weight of about 193-195. About twelve pounds of weight loss in as many weeks.

The Whiter Olympics


Matthew Yglesias links to Reihan Salam's article on the whiteness of the Winter Olympics. And they're not talkin bout the snow, either.

Things are actually worse than Yglesias and Salam realize. Did you know that Puerto Rico had a Winter Olympics team? One year it featured my cousin Bill, who finished last in the slalom. I'm pretty sure he wasn't born in Puerto Rico (despite what it says on one website), but I guess he's probably been there on vacation on occasion. And I wouldn't be surprised if he speaks Spanish--he does live in L.A., after all. And, of course, it takes some skill to finish last in the slalom. I'd probably fall off the chairlift and never even get to the starting line.

Alan Turing is said to have invented a game that combines chess and middle-distance running. It goes like this: You make your move, then you run around the house, and the other player has to make his or her move before you return to your seat. I've never played the game but it sounds like fun. I've always thought, though, that the chess part has got to be much more important than the running part: the difference in time between a sprint and a slow jog is small enough that I'd think it would always make sense just to do the jog and save one's energy for the chess game.

But when I was speaking last week at the University of London, Turing's chess/running game came up somehow in conversation, and somebody made a point which I'd never thought of before, that I think completely destroys the game. I'd always assumed that it makes sense to run as fast as possible, but what if you want the time to think about a move? Then you can just run halfway around the house and sit for as long as you want.

It goes like this. You're in a tough spot and want some time to think. So you make a move where the opponent's move is pretty much obvious, then you go outside and sit on the stoop for an hour or two to ponder. Your opponent makes the obvious move and then has to sit and wait for you to come back in. Sure, he or she can plan ahead, but with less effectiveness than you because of not knowing what you're going to do when you come back in.

So . . . I don't know if anyone has actually played Turing's running chess game, but I think it would need another rule or two to really work.

From Ubs:

How fast is Rickey? Rickey is so fast that he can steal more bases than Rickey. (And nobody steals more bases than Rickey.)

Regular readers of this blog are familiar with the pinch-hitter syndrome: People whose job it is to do just one thing are not always so good at that one thing. I first encountered this when noting the many silly errors introduced into my books by well-meaning copy-editors with too much time on their hands. As I wrote a few years ago:

This is a funny thing. A copy editor is a professional editor. All they do (or, at least, much of what they do) is edit, so how is it that they do such a bad job compared to a statistician, for whom writing is only a small part of the job description?

The answer certainly isn't that I'm so wonderful. Non-copy-editor colleagues can go through anything I write and find lots of typos, grammatical errors, confusing passages, and flat-out mistakes.

No, the problem comes with the copy editor, and I think it's an example of the pinch-hitter syndrome. The pinch-hitter is the guy who sits on the bench and then comes up to bat, often in a key moment of a close game. When I was a kid, I always thought that pinch hitters must be the best sluggers in baseball, because all they do (well, almost all) is hit. But of course this isn't the case--the best hitters play outfield, or first base, or third base, or whatever. If the pinch hitter were really good, he'd be a starter. So, Kirk Gibson in the 1988 World Series notwithstanding, pinch hitters are generally not the best hitters.

There must be some general social-science principle here, about generalists and specialists, roles in an organization, etc?

This idea was recently picked up by a real-life baseball statistician--Eric Seidman of Baseball Prospectus--who writes:

I wanted to talk to you about the pinch-hitter theory you presented, as I've noticed it in an abundance of situations as well.

When I read your theory it made perfect sense, although a slight modification is needed, namely in that it makes more sense as a relief-pitcher theory. In sabermetrics, we have found that pitchers perform better as relievers than they do as starters. In fact, if a starter becomes a reliever, you can expect him to lop about 1.4 runs off of his ERA and vice-versa, simply by virtue of facing batters more often. When you get to facing the batting order the 2nd and 3rd time through, relievers are almost always better options because they are fresh. Their talent levels are nowhere near those of the starters--otherwise, they would BE starters--but in that particular situation, their fresh "eyes" as it pertains to this metaphor are much more effective.

For another example, when working on my book Bridging the Statistical Gap, I found that my editor would make great changes but would miss a lot of ancillary things that I would notice upon delving back in after a week away from it. Applying that to the relief pitcher idea, the editor was still more talented when it came to editing, but his being "in too deep", the equivalent of facing the opposing batting order a few times, made my fresh eyes a bit more accurate.

I'm wondering if you have seen this written about in other areas, as it really intrigues me as a line of study, applying psychological concepts as well as those in statistics.

These are interesting thoughts--first, the idea of applying to relief pitchers, and, second, the "fresh eyes" idea, which is more adds some subtlety to the concept. I'm still not quite sure what he's saying about the pitchers, though: Is he saying that because relief pitchers come in with fresh arms, they can throw harder, or is he saying that, because hitters see starters over and over again, they can improve their swing as the game goes on, whereas when the reliever comes in, the hitters are starting afresh?

Beyond this, I'm interested in Seidman's larger question, about whether this is a more general psychological/sociological phenomenon. Do any social scientists out there have any thoughts?

P.S. I seem to recall Bill James disparaging the ERA statistic--he felt that "unearned" runs count too, and they don't happen by accident. So I'm surprised that the Baseball Prospectus people use ERA rather than RA. Is it just because ERA is what we're all familiar with, so the professional baseball statisticians want to talk our language? Or is ERA actually more useful than I thought?

This comment by Tyler Cowen on Sarah Palin's poor Scrabble strategy reminds me of my blog a few months ago with six suggested Scrabble reforms. Without further ado:



Tyler Cowen links to a blog by Paul Kedrosky that asks why winning times in the Boston marathon have been more variable, in recent years, than winning times in New York. This particular question isn't so interesting--when I saw the title of the post, my first thought was "the weather," and, in fact, that and "the wind" are the most common responses of the blog commenters--but it reminded me of a more general question that we discussed the other day, which is how to think about Why questions.

Many years ago, Don Rubin convinced me that it's a lot easier to think about "effects of causes" than "causes of effects." For example, why did my cat die? Because she ran into the street, because a car was going too fast, because the driver wasn't paying attention, because a bird distracted the cat, because the rain stopped so the cat went outside, etc. When you look at it this way, the question of "why" is pretty meaningless.

Similarly, if you ask a question such as, What caused World War 1, the best sort of answers can take the form of potential-outcomes analyses. I don't think it makes sense to expect any sort of true causal answer here.

But, now let's get back to the "volatility of the Boston marathon" problem. Unlike the question of "why did my cat die" or "why did World War 1 start," the question, "Why have the winning times in the Boston marathon been so variable" does seem answerable.

What happens if we try to apply some statistical principles here?

Principle #1: Compared to what? We can't try to answer "why" without knowing what we are comparing to. This principle seems to work in the marathon-times example. The only way to talk about the Boston times as being unexpectedly variable is to know what "expectedly variable" is. Or, conversely, the New York times are unexpectedly stable compared to what was happening in Boston those same years. Either way, the principle holds that we are comparing to some model or another.

Principle #2: Look at effects of causes, rather than causes of effects. This principle seems to break down in marathon example, where it seems very natural to try to understand why an observed phenomenon is occurring.

What's going on? Perhaps we can understand in the context of another example, something that came up a couple years ago in some of my consulting work. The New York City Department of Health had a survey of rodent infestation, and they found that African Americans and Latinos were more likely than whites to have rodents in their apartments. This difference persisted (albeit at a lesser magnitude) after controlling for some individual and neighborhood-level predictors. Why does this gap remain? What other average differences are there among the dwellings of different ethnic groups?

OK, so now maybe we're getting somewhere. The question on deck now is, how do the "Boston vs. NY marathon" and "too many rodents" problems differ from the "dead cat" problem.

One difference is that we have data on lots of marathons and lots of rodents in apartments, but only one dead cat. But that doesn't quite work as a demarcation criterion (sorry, forgive me for working under the influence of Popper): even if there were only one running of each marathon, we could still quite reasonably answer questions such as, "Why was the winning time so much lower in NY than in Boston?" And, conversely, if we had lots of dead cats, we could start asking questions about attributable risks, but it still wouldn't quite make sense to ask why the cats are dying.

Another difference is that the marathon question and the roach question are comparisons (NY vs. Boston and blacks/hispanics vs. whites), while the dead cat stands alone (or swings alone, I guess I should say). Maybe this is closer to the demarcation we're looking for, the idea being that a "cause" (in this sense) is something that takes you away from some default model. In these examples, it's a model of zero differences between groups, but more generally it could be any model that gives predictions for data.

In this model-checking sense, the search for a cause is motivated by an itch--a disagreement with a default model--which has to be scratched and scratched until the discomfort goes away, by constructing a model that fits the data. Said model can then be interpreted causally in a Rubin-like, "effects of causes," forward-thinking way.

Is this the resolution I'm seeking? I'm not sure. But I need to figure this out, because I'm planning on basing my new intro stat course (and book) on the idea of statistics as comparisons.

P.S. I remain completely uninterested in questions such as, What is the cause? Is it A or is it B? (For example, what caused the differences in marathon-time variations in Boston and New York--is it the temperature, the precipitation, the wind, or something else? Of course if it can be any of these factors, it can be all of them. I remain firm in my belief that any statistical method that claims to distinguish between hypotheses in this way is really just using sampling variation as a way to draw artificial distinctions, fundamentally in a way no different from the notorious comparisons of statistical significance to non-significance.

This last point has nothing to do with causal inference and everything to do with my preference for continuous over discrete models in applications in which I've worked in social science, environmental science, and public health.

Adjusted plus-minus ratings, etc.

| 1 Comment

David Park sent this along. I haven't really been following basketball statistics lately, but some of you might find it interesting.

I get so irritated when economists and political scientists try to explain every sort of irrational behavior in life as being part of some utility function.

That's one reason I love this paper by Erik Snowberg and Justin Wolfers, "Explaining the Favorite-Longshot Bias: Is it Risk-Love or Misperceptions." They conclude that, yes, it's misperceptions:

A question about poker


I'm too tired to think about this one, but maybe some of you out there have some ideas.

Chaz Littlejohn writes:

Ubs writes:

I wonder if I can get your thoughts on sabermetric baseball stats. Basically I'm trying to think about them more intelligently so that my instinctive skepticism can be better grounded in real science. There's one specific issue I'm focusing on, but also some more general stuff.

Allen Hurlbert writes:

I saw your 538 post [on the partisan allegiances of sports fans] and it reminded me of some playful data analysis I [Hurlbert] did a couple months ago based on's compilation of sports celebrity campaign contributions. Glancing through the list I thought I noticed some interesting patterns in the partisan nature of various sports, so I downloaded the data and created this figure:


I recently played Risk for the first time in decades and was immediately reminded of something that my sister and I noticed when we used to play as kids: the first player has a huge advantage. I think it would be easy to fix by just giving extra armies for the players who don't go first (for example, in the three-player game, giving two extra armies to the player who goes second, and four extras to the player who goes third), but the funny thing to me is that:

1. In the rules there is no suggestion to do this.

2. In all our games of Risk, my sister and I never thought of making the adjustment ourselves.

Sure, a lot of games have a first-mover advantage, but in risk the advantage is (a) large and (b) easy to correct.

You can't win for losing


Devin Pope writes:

I wanted to send you an updated version of Jonah Berger and my basketball paper that shows that teams that are losing at halftime win more often than expected.

This new version is much improved. It has 15x more data than the earlier version (thanks to blog readers) and analyzes both NBA and NCAA data.

Also, you will notice if you glance through the paper that it has benefited quite a bit from your earlier critiques. Our empirical approach is very similar to the suggestions that you made.

See here and here for my discussion of the earlier version of Berger and Pope's article.

Here's the key graph from the previous version:


And here's the update:


Much better--they got rid of that wacky fifth-degree polynomial that made the lines diverge in the graph from the previous version of the paper.

What do we see from the new graphs?

Ryan Richt writes:

I wondered if you have a quick moment to dig up an old post of your own that I cannot find by searching. I read an entry where you discussed if there really was a difference between a prior of 1/2 meaning that we have no knowledge of a coin flip, or meaning we are exactly certain that it's generative distribution is 1/2.

I'm only 24 and just got my masters last year, but I now have my own summer interns (who of course I encourage to read ET Jaynes and see the bayesian light) and one of them basically asked that question today.

My reply: The two original blog entries are here and here. Here's my published article. And here's a link discussing actual wrestlers and boxers. (Apparently the wrestler would win.)

I was getting my haircut today, and the TV in the barbershop was set to some kids' channel that was featuring a show about some weird form of basketball where the players can bounce on a trampoline on the way to dunking the ball into the basket. Sort of a cool idea, should definitely appeal to the targeted demographic of 10-year-old boys. It was set up as though it was what we might call a "real" professional sports league, with teams, won-lost records, upcoming games, announcers calling plays, and with players including some retired NBA stars. Not quite as over-the-top as professional wrestling, but that sort of thing.

Anyway, what puzzled me about all this was how little action there was on the screen. There were lots of interviews with players, video features, highlights of previous games, replays, and logos, but very little actual basketball.

Is this what 10-year-old boys want? I'm sure they've done lots of marketing surveys, so the answer is probably yes. But it left me extremely confused. Here you have a made-for-TV sport, the rules can be anything they want--I'd think they'd want there to be as much action as possible--passing, dunking, running, jumping and all the rest. While the ball was in play, the players were impressively athletic. But the ball was almost never in play. To me, it was much less exciting than any random basketball game you might see on ESPN. Again, they can make any rules they want--so why do they do it this way? I'd think kids would prefer to see live action rather than a series of disconnected highlights and replays. Perhaps someone could explain to me?

David Friedman suggests that, instead of limiting a football team to 11 men, you allow the team as many men on the field as they'd like, with the constraint that their total weight be below some 2400 pounds. It's an interesting idea.

Commenters suggest the related idea of limiting the total height of a basketball team to 30 feet. Then we'd find out right away how tall these players really are.

I'm not saying these ideas are perfect, but they're interesting.

A horse-race graph!

| No Comments

Responding to my question about graphing horse race results, Megan Pledger writes:

While waiting up late to snipe at an internet auction, I put together some simple data of a horse race and used ggplot to plot it. It's discrete time race data rather than continuous time and has very simple choice options for the horse. The graph is a starting point!


[The picture doesn't fully fit on the blog window here; right-click and select "view image" to see the whole thing.]

My reply: Very nice--thanks! I won't look a gift horse in the mouth . . . but if I were to be picky, I'd suggest making the tods smaller, the lines thinner, and the colors of the five horses more distinct. All these tricks should make the lines easier to follow. I'd also suggest gray rather than black for the connecting lines.

I think I'd also supplement it with a blown-up version of the last bit (from 80-100 on the x-axis), since that's where some interesting things are happening.

And here's the code:

Speaking of racetrack charts, did anybody make a graph of the positions of the horses over time during the recent Kentucky Derby?

I'm thinking it could be done on a very long 2-d strip (imagining the racecourse laid out as a long strip from beginning to end, with width corresponding to the width of the track), with a different color for each horse--maybe using solid lines for the top 6 finishers and light lines for the others. Also, maybe the positions of the horses every 5 seconds (say) could be connected with light gray lines--then you'd be able to see who was ahead at any given point in time.

Does this graph already exist somewhere? Or are there better ideas?

I was mentioned on ESPN (sort of). Take that, Hal Stern!

The other day I mentioned this article by Lionel Page that found a momentum effect in tennis matches; more specifically: "winning the first set has a significant and strong effect on the result of the second set. A player who wins a close first set tie break will, on average, win one game more in the second set."


I'd display these data with a heat map rather than with overplotted points, but you get the idea.

This looked reasonable to me, but Guy Molyeneux sent in some skeptical comments, which I'll give, followed by Page's response. Molyeneux writes:

Scrabble rants


According to Carl Bialik, "za," "qi," and "zzz" were added recently to the list of official Scrabble words. I'm not so bothered by "zzz"--if somebody has two blanks to blow on this one, go for it!--but "za" and "qi"??? I don't even like "cee," let alone "qat," "xu," and other abominations. (I'm also not a big fan of "aw.")

Without further ado, here are my suggestions for reforming Scrabble.

1. Change one of the I's to an O. We've all had the unpleasant experience of having too many I's in our rack. What's the point?

2. Change one of the L's to an H. And change them both to 2-point letters. The H is ridiculously overvalued.

3. V is horrible. Change one of them to an N and let the remaining V be worth 6 points.

4. Regarding Q: Personally, I'd go the Boggle way and have a Qu tile. But I respect that Scrabble traditionalists enjoy the whole hide-the-Q game, so for them I guess I'd have to keep the Q as is.

5. Get rid of a bunch of non-English words such as qat, xu, jo, etc. Beyond this, for friendly games, adopt the Ubs rule, under which, if others aren't familiar with a word you just played, you (a) have to define it, and (b) can't use it this time--but it becomes legal in the future.

6. This brings me to challenges. When I was a kid we'd have huge fights over challenges because of their negative-sum nature: when player A challenges player B, one of them will lose his or her turn. At some point we switched to the mellower rule that, if you're challenged and the word isn't in the dictionary, you get another try--but you have to put your new word down immediately, you get no additional time to think. And if you challenge and you are wrong, you don't lose your turn. (We could've made this symmetric by saying that the challenger would have to play immediately when his or her turn came up--that seems like a reasonable rule to me--but we didn't actually go so far, as challenges were always pretty rare.)

Regarding points 1, 2, and 3 above: I know that traditionalists will say that all these bugs are actually features, that a good Scrabble player will know how to handle a surplus of I's or deal with a V. I disagree. There's enough challenge in trying to make good words without artificially making some of the rare letters too common. I mean, if you really believed that it's a good thing that there are two V's worth only 4 points each, why not go whole hog and get rid of a bunch of E's, T's, A's, N's, and R's, and replace them with B's and C's and suchlike?

P.S. Also interesting is this chart showing the frequencies of letters from several different corpuses. I'm not surprised that, for example, the frequency of letters from a dictionary is different from that of spoken words, but I was struck by the differences in letter frequencies comparing different modern written sources. For example, E represents 12.4% of all letters from a corpus of newspapers, whereas it is only 11.2% in corpuses of fiction and magazines. I wonder how much of this is explained by "the."

Following my skeptical discussion of their article on the probability of a college basketball team winning after ahead or behind by one point at halftime, Jonah Berger and Devin Pope sent me a long and polite email (with graph attached!) defending their analysis. I'll put it all here, followed by my response. I'm still skeptical on some details, but I think that some of the confusion can be dispelled with a minor writing change, where they make clear that their 6.6% estimate is a comparison to a model.

Berger and Pope's first point was a general discussion about their methods:

John Shonder pointed me to this discussion by Justin Wolfers of this article by Jonah Berger and Devin Pope, who write:

In general, the further individuals, groups, and teams are ahead of their opponents in competition, the more likely they are to win. However, we show that through increasing motivation, being slightly behind can actually increase success. Analysis of over 6,000 collegiate basketball games illustrates that being slightly behind increases a team's chance of winning. Teams behind by a point at halftime, for example, actually win more often than teams ahead by one. This increase is between 5.5 and 7.7 percentage points . . .

This is an interesting thing to look at, but I think they're wrong. To explain, I'll start with their data, which are 6572 NCAA basketball games where the score differential at halftime is within 10 points. Of the subset of these games with one-point gaps at halftime, the team that's behind won 51.3% of the time. To get a standard error on this, I need to know the number of such games; let me approximate this by 6572/10=657. The s.e. is then .5/sqrt(657)=0.02. So the simple empirical estimate with +/- 1 standard error bounds is [.513 +/- .02], or [.49, .53]. Hardly conclusive evidence!

Given this tiny difference of less than 1 standard error, how could they claim that "being slightly behind increases a team's chance of winning . . . by between 5.5 and 7.7 percentage points"?? The point estimate looks too large (6.6 percentage points rather than 1.3) and the standard error looks too small.

What went wrong? A clue is provided by this picture:


As some of Wolfers's commenters pointed out, this graph is slightly misleading because all the data points on the right side are reflected on the left. The real problem, though, is that what Berger and Pope did is to fit a curve to the points on the right half of the graph, extend this curve to 0, and then count that as the effect of being slightly behind.

This is wrong for a couple of reasons.

First, scores are discrete, so even if their curve were correct, it would be misleading to say that being behind increases your chance of winning by 6.6 points. Being behind takes you from a differential of 0 (50% chance of winning, the way they set up the data) to 51% (+/- 2%). Even taking the numbers at face value, you're talking 1%, not their claimed 5% or more.

Second, their analysis is extremely sensitive to their model. Looking at the picture above--again, focusing on the right half of the graph--I would think it would make more sense to draw the regression line a bit above the point at 1. That would be natural but it doesn't happen here because (a) their model doesn't even try to be consistent with the point at 0, and (b) they do some ridiculous overfitting with a 5th-degree polynomial. Don't even get me started on this sort of thing.

What would I do?

I'd probably start with a plot similar to their graph above, but coding score differential consistently as "home team score minus visiting team score." Then each data point would represent different games, they could fit a line and see what they get. And I'd fit linear functions (on the logit scale), not 5th-degree polynomials. And I'd get more data! The big issue, though, is that we're talking about maybe a 1% effect, not a 7% effect, which makes the whole thing a bit less exciting.

P.S. It's cool that Berger and Pope tried to do this analysis. I also appreciate that they attempted to combine sports data with a psychological experiment, in the spirit of the (justly) celebrated hot-hand paper. I like that they cited Hal Stern. And, even discounting their exaggerated inferences, it's perhaps interesting that teams up by 1% at halftime don't do better. This is just what happens when studies get publicized before peer review. Or, to put it another way, the peer review is happening right now! I've put enough first-draft mistakes on my own blogs that I can't hold it against others when they do the same.

P.P.S. Update here.

Basketball bracket tips


I got this bit of spam in the email but it's actually sort of cool, would be an excellent topic for discussion in an intro stat class or a Bayesian class:

MEDIA ALERT: NCAA COLLEGE BASKETBALL TOURNAMENT - MARCH MADNESS NCAA College Basketball Tournament Bracket-Picking Tips. RJ Bell of, the top Las Vegas based sports betting authority, provides a simple blueprint to improve anyone's bracket results.

Carl Bialik writes:

There hasn't been a single 7-3 finish in the NFL since the league adopted the two-point conversion rule in 1994 . . . "Football scores are funny," Driner wrote me [Bialik] in an email. "Did you know that teams win more often when they score 13 points than when they score 14? It's a cause-effect thing. In order to get 13, you (usually) need two field goals. And teams don't kick field goals if they're down by 20 points. So teams lose 35-14 more often than they lose 35-13. That's why scoring 13 is better correlated with winning than scoring 14 is.

And, most amazingly,

An NFL game hasn't finished with a score of 7-0 in over a quarter-century.

More boringly, the most common final score is 20-17.

Brad Miner wrtes:

With the Super Bowl coming up this weekend, I [Miner] want to write about sports, which I consider a key to building a larger conservative coalition in America. . . .

If you did a survey of the political philosophies of 75,000 randomly selected Americans you'd expect the usual--if somewhat mystifying--results: "Only about one-in-five Americans currently call themselves liberal (21%), while 38% say they are conservative and 36% describe themselves as moderate." So said the folks at Pew Research, and this was after the November election.

Do that same poll among the fans at Raymond James Stadium in Tampa on Sunday and the results would likely be more like 15% liberal, 30% moderate, and 50% conservative. And a bunch of those liberals would probably be gun owners.

Obviously those numbers are just speculation on my part, but I guarantee that Steelers fans are more conservative than all Pennsylvanians and ditto Cardinals devotees and the rest of Arizona. Which is not to say that these folks cast their ballots in November more for McCain than Obama. That's the problem.

What do the data say?

Yu-Sung and I looked at the "attended sporting event in the past year" item in the General Social Survey. (Unfortunately, the question was only asked once, in the 1993-1996 survey.) 56% of respondents said they attended an amateur or professional sports event" during the past twelve months. How do they differ from the 44% who didn't?


So, at least in the mid-1990s, sports attenders were quite a bit more Republican than other Americans (the categories in the graph above are Strong Democrat, Democrat, lean Democrat, Independent, lean Republican, Republican, strong Republican), but not much different in their liberal-conservative ideology.

So these data do not appear to support Miner's claim. Miner expected sports fans to be label themselves as more conservative but maybe not to be more likely to vote Republican; actually, sports fans were more likely to call themselves Republican but no more likely to describe themselves as conservative.

Some other issues:

1. The sporting event attended could be the Super Bowl or your kid's soccer game. Maybe more dramatic results would be obtained by considering a more restricted group of sports fans.

2. There are lots of surveys of TV watching, so I'm sure there are tons of data that would let you crosstab ideology, voting, and spectator sports watching.

3. More generally, we never want to rely too strongly on just one survey. Still, it's fun to look.

P.S. Sometimes people ask me how much time blogging takes me. This took about an hour: 15 minutes for me to read Miner's article and think about it, 10 minutes for Yu-Sung to get the crosstabs, 20 minutes for me to make the graphs, and 15 minutes for me to write the blog entry.

And, yes, this means I have a lot of real work that I've been putting off. . . .

Kenny sent me this article by Bill James endorsing Hal "Bayesian Data Analysis" Stern's dis of the BCS. I'd like to add a statistical point, which is a point that Hal and I have discussed once or twice: There is an inherent tension between two goals of any such rating system:

1. Ranking the teams by inherent ability.

2. Scoring the teams based on their season performance.

Here's an example. Consider two teams that played identical opponents in the season, with team A having a 12-0 record and team B going 9-3. But here's the hitch: in my story, team B actually had a much better better point differential than team A during the season. That is, team A won a bunch of games by scores of 17-16 or whatever, and team B won a bunch of games 21-3 (with three close losses). Also assume that none of the games were true run-up-the-score blowouts.

In that case, I'd expect that team B is actually better than team A. Not just "better" in some abstract sense but also in a predictive sense. If A and B were playing some third team C, I'd guess (in the absence of other information) that B's probability of winning is greater than A's.

But, as a matter of fairness, I think you've gotta give the higher ranking to team A. They won all 12 games--what more can you ask?

OK, you might say you could resolve this particular problem by only using wins/losses, not using score differentials. But this doesn't really solve the general problem, where teams have different schedules, maybe nobody went 12-0, etc.

My real point with this example is not to recommend a particular ranking strategy but to point out the essential tension between inference and reward in this setting. That's why, as Hal notes, it's important to state clearly what are the goals.

P.S. It's been argued that a more appropriate system is to change the rules of football to make it less damaging to the health of the players (see here for a review of some data). I certainly agree that this is a more important issue than the scoring system. In statistics we often use sports examples to illustrate more general principles, but it is always good to be aware of
the reality underlying any example. It also makes sense to me that people who are closer than I am to the reality fo the situation would be less amused by the thoughts of Bill James and others about the intellectual issues in the idealized system.

No justice, no foul

| 1 Comment

This article, by Jim Stallard, is just hilarious. It's at the intersection of politics and basketball. There are just so many funny lines here, I just don't know where to start.

Regarding my article on the boxer, the wrestler, and the coin flip, Steve Hsu writes:

A world class wrestler would easily demolish a top boxer in a no holds barred fight. This has been verified by in many experiments (Inoki-Ali doesn't count)!

Steve has more details in this blog entry from 2007:

Ultimate fighting has grown from obscurity to unbelievable levels of popularity. It will soon surpass boxing as the premier combative sport. And it will soon be widely recognized that the baddest man on the planet is not a boxer, but an ultimate fighter. . . .

Unarmed single combat -- mano a mano, as they say -- has a long history, and is a subject which fascinates most men, both young and old. As a boy, I can remember serious discussions with my friends concerning which style was most effective -- karate or kung fu, boxing or wrestling, etc. How would Muhammed Ali fare against an Olympic wrestler or Judo player? What about Bruce Lee versus a Navy Seal? Of course, these discussions were completely theoretical, akin to asking whether Superman could beat Galactus in arm wrestling. There was scarcely any data available on which to base a conclusion.

However, thanks to the recent proliferation of "No Rules" or "No Holds Barred" (NHB) fighting tournaments, both in the U.S. and abroad, we finally have some interesting answers to this ancient question.

Somebody asked me for the golf putting data from Don Berry's book, which Deb and I use as an example for nonlinear modeling in this article and our Teaching Statistics book. Here they are:

Phoenix Suns shooters


Yair sends in this plot of the week:


He writes:

This displays the smoothed distribution of shots taken by wing players for the Phoenix Suns in the '07-'08 regular season (Matt Barnes played for the GS Warriors that year). Raja Bell seems like the perfect wing player for the Suns, because he plays defense and then basically sits at the 3-pt line waiting for Steve Nash to give him the ball for a good shot. Leandro Barbosa is similar, but he drives a bit more (especially when Nash is off the floor). Grant Hill didn't fit this mold because he has no 3-pt shot; he is more of a mid-range guy. From this standpoint, Matt Barnes (their free-agent pickup) looks like he could be a better fit. Of course, this plot says nothing about whether he actually hits the threes, but at least his heart is in the right place. Then again, if their offensive system changes because of the new coach, all bets are off.

Pretty graphs, huh? The color scheme seems good for a team called the Suns.

Dopey anti-doping tests?


Jim points me to this article by Don Berry, which argues that studies of doping in sports often don't correctly perform probability calculations.

Andrew Oswald sent me this paper by Amanda Goodall, Lawrence Kahn, and himself, called "Why Do Leaders Matter? The Role of Expert Knowledge." Here's the abstract:

Why do some leaders succeed while others fail? This question is important, but its complexity makes it hard to study systematically. We draw on a setting where there are well-defined objectives, small teams of workers, and exact measures of leaders characteristics and organizational performance. We show that a strong predictor of a leader's success in year T is that person's own level of attainment, in the underlying activity, in approximately year T-20. Our data come from 15,000 professional basketball games and reveal that former star players make the best coaches. This expert knowledge effect is large.

My first thought upon seeing this paper was: What about Isiah Thomas? But a glance through reveals that their data end at 2004, before Isiah took up his Knicks coaching job.

More seriously, Goodall et al.'s findings seem to contradict the conventional wisdom in baseball that the best managers are the mediocre or ok players such as Earl Weaver and Casey Stengel rather than the superstars such as Ted Williams and Ty Cobb. I'd be interested to hear what the authors think about this.

Scatterplot, please! It's not just about an eye-catching result; it's about building confidence in your findings

I won't bother to give my comments on the tables and graphs (except to note that the figures are hard to read for many reasons, starting with the fact that these are bar graphs with lower bounds at 0.4 (?), 0.6 (??), etc.).

What I will say, though, is that I'd like to see a scatterplot, with a dot for each coach/team (four different colors for the four categories of coaches), plotting total winning percentage (on the y-axis) vs. winning percentage in the year or two before the coach joined the team (on the x-axis). This is the usual before-after graph, which can then be embellished with 4 regression lines in the colors corresponding to the four groups of coaches.

When reading such an analysis, I really, really want to see the main patterns in the data. Otherwise I really have to take the results on trust. This is related to my larger point about confidence building.

Following up on our link to an article about educational measurement, Eric Loken pointed me to this:

On the Criteria Corporation blog we [Loken] just posted a look at golf tournament scores. If you take the four rounds as if they were four repeats of the same test, or four parallel items on a test, the usual psychometric analyses would yield a terrible reliability coefficient. The problem of course is restriction of range of true scores among the world's best golfers. We figured since the US Open (this weekend) is sometimes called the Ultimate Test we'd offer a little psychometric analysis of golf.

Despite having published an article on golf, I know almost nothing about the sport--I've never actually played macro-golf--so I'll link to Eric's note without comment.


See here. It took me 3 weeks the first time, about 1 week the second time. I remember setting my alarm to 5am so I could work on the cube for two hours in the morning before going to school. Eventually I got my time down to a little over 2 minutes (which is just about the longest I can concentrate on anything). There were two kinds of cube solvers: those who held the cube in a stationary orientation and spun the edges around, and those who kept turning the cube around in their hands to get just the right orientation for each move. I was of this second type, which I think kept my efficiency down. One of my math professors in college told me that he'd solved the cube in theory--he taught abstract algebra--but had never bothered to do it in practice. This impressed me to no end. A guy down the hall from me had a 4x4x4 cube, which at one point we tried to see if we could solve using only 3x3x3 operators. I don't think we succeeded.

It's been years since I've done the cube. Last time I tried and tried and tried and got stuck. If I ever want to do it again, I think I'll have to figure out some operators again from scratch.

Recent Comments

  • C Ryan King: I'd say that the previous discussion had a feature which read more
  • K? O'Rourke: On the surface, it seems like my plots, but read more
  • Vic: I agree with the intervention-based approach -- spending and growth read more
  • Phil: David: Ideally I think one would model the process that read more
  • Bill Jefferys: Amplifying on Derek's comment: read more
  • Nameless: It is not uncommon in macro to have relationships that read more
  • derek: taking in each others' laundry It's more like the farmer read more
  • DK: #17. All these quadrillions and other super low p-values assume read more
  • Andrew Gelman: Anon: No such assumption is required. If you multiply the read more
  • anon: Doesn't this rely on some form of assumed orthogonality in read more
  • Andrew Gelman: David: Yup. What makes these graphs special is: (a) Interpretation. read more
  • David Shor: This seems pretty similar to the "Correlations" feature in the read more
  • David W. Hogg: If you want probabilistic results (probabilities over outcomes, with and read more
  • Cheryl Carpenter: Bob is my brother and he mentioned this blog entry read more
  • Bob Carpenter: That's awesome. Thanks. Exactly the graphs I was talking about. read more

About this Archive

This page is an archive of recent entries in the Sports category.

Sociology is the previous category.

Statistical computing is the next category.

Find recent content on the main index or look in the archives to find all content.