The BCS sucks: Don’t believe me, believe Bill James quoting Hal Stern

Kenny sent me this article by Bill James endorsing Hal “Bayesian Data Analysis” Stern’s dis of the BCS. I’d like to add a statistical point, which is a point that Hal and I have discussed once or twice: There is an inherent tension between two goals of any such rating system:

1. Ranking the teams by inherent ability.

2. Scoring the teams based on their season performance.

Here’s an example. Consider two teams that played identical opponents in the season, with team A having a 12-0 record and team B going 9-3. But here’s the hitch: in my story, team B actually had a much better better point differential than team A during the season. That is, team A won a bunch of games by scores of 17-16 or whatever, and team B won a bunch of games 21-3 (with three close losses). Also assume that none of the games were true run-up-the-score blowouts.

In that case, I’d expect that team B is actually better than team A. Not just “better” in some abstract sense but also in a predictive sense. If A and B were playing some third team C, I’d guess (in the absence of other information) that B’s probability of winning is greater than A’s.

But, as a matter of fairness, I think you’ve gotta give the higher ranking to team A. They won all 12 games—what more can you ask?

OK, you might say you could resolve this particular problem by only using wins/losses, not using score differentials. But this doesn’t really solve the general problem, where teams have different schedules, maybe nobody went 12-0, etc.

My real point with this example is not to recommend a particular ranking strategy but to point out the essential tension between inference and reward in this setting. That’s why, as Hal notes, it’s important to state clearly what are the goals.

P.S. It’s been argued that a more appropriate system is to change the rules of football to make it less damaging to the health of the players (see here for a review of some data). I certainly agree that this is a more important issue than the scoring system. In statistics we often use sports examples to illustrate more general principles, but it is always good to be aware of the reality underlying any example. It also makes sense to me that people who are closer than I am to the reality fo the situation would be less amused by the thoughts of Bill James and others about the intellectual issues in the idealized system.

11 thoughts on “The BCS sucks: Don’t believe me, believe Bill James quoting Hal Stern

  1. What I find problematic with this whole thing is that I don't think the people who program the "computer rankings" have published there methodology. Have they? They have proprietary systems and I don't think anyone has ever studied their abilities to predict wins or losses. I'm not even sure you could.

    Eric

  2. I disagree. I think at least some teams show their superiority precisely in winning a lot of close games. I think this can happen in a couple of ways. First, there are some teams that play down to bad opponents. They don't get up for those games, but when the games on the line at the end, they get serious and pull it out. Second, some teams are just good at winning close games — they're good at handling the pressure and stuff. This is important for predicting who would win between the 12-0 and the 9-3 team. If those teams play, there's a good chance it'll be close, and that'll favor a team that's better at winning close games.

    Of course, sometimes teams win a lot of close ones just because they're lucky, and if that's the case for the 12-0 team, then the 9-3 team may well be better. Now, how do you tell whether the 12-0 team's winning the close ones because they're lucky or because they're a team like the one described above? I seriously doubt anyone is able to write a computer program that will discriminate between these two sorts of 12-0 teams. That's why we need a playoff!!

  3. Dylan: Our dispute could be resolved empirically, using "regression discontinuity" (see chapter 10 for a brief description of this method). The idea here would be to fit a model predicting game outcomes given past data, but only using scores as continuous predictors.

    The next step is to add win/loss data to the model and see if it improves predictive power. My hypothesis is that, after including scores as continuous predictors in the model, essentially no information will be added by including won/loss data. This hypothesis could be empirically checked by applying the model to predict future game outcomes.

  4. It is even clearer when you set up the situation like this: Team A plays team B twice. The first time Team A beats team B by 1 point. The second time, Team B beats team A by 21 points. Which is the better team? By wins/losses, they've tied. But wins/losses is clearly throwing away important information; everyone would say that team B is better because they scored more points. In this case, the point differential is much more informative.

    Besides, "win" is just a way to say that in a particular matchup, the point differential favored a particular side. Since "win" is just SIGN(X-Y), why not do one better and use the actual point differential, X-Y?

  5. Richard: For inference, yes. But recall item 2 in my blog entry above. Even if the 9-3 team is estimated with high confidence as being better, I still think the trophy and #1 ranking should go to the 12-0 team (again, assuming they played the same schedule), since by the rules of the game, it's winning that should be rewarded. That's the tension.

  6. of course, we could just do away with the idea of trying to find out which team is the "best" and just enjoy watching some bowl games. that'd be a novel idea.

  7. Given that teachers teach to tests, does anyone know if coaches coach to the ranking formula? Has it changed scheduling? I'd expect rankings based on scores to lead to more blowouts and fewer minutes played by second stringers.

    I like Andrew's suggestion to compare predictions with and without a win/lose predictor using held-out (future) evaluation. I like this idea in general, especially as it helps motivate priors. There's some discussion on why using priors leads to better predictors in Andrew and Jennifer's regression book, but I don't remember any held-out or cross-validated evaluations that'd help demonstrate it.

    Albert and Bennett's Curve Ball had a nice example using rookie of the year batting averages. I think it was just a simple year versus year scatterplot of batting averages for all players, showing a strong "regression to the mean". If the book had math, I'm sure this is where the beta prior would've been introduced!

  8. of course, we could just do away with the idea of trying to find out which team is the "best" and just enjoy watching some bowl games. that'd be a novel idea.

    Actually, that would be more like the old (i.e., pre-BCS) system, so novel isn't exactly the right word.

    Given that teachers teach to tests, does anyone know if coaches coach to the ranking formula? Has it changed scheduling?

    According to some, yes it has. Here is an article that claims that teams looking to compete in the BCS sweepstakes pretty much only ever schedule easy non-conference games anymore.

  9. What it boils down to is politics. Sports should have as little to do with that as possible. Voting is a popularity contest and there needs to be as much of that removed from the equation as possible.

  10. Your hypothetical scenario is worth thinking about, but the more typical problem for copmaring the 12-0 team and the 9-3 team is that the 12-0 team didn't play comparable opponents; the coach scheduled weaker opposition knowing that the team would be rewarded for getting the victories.

    In your scenario, it is also possible that the 9-3 team wins by bigger margins because of playing a high-risk/high-reward offense. So the winning margins may or may not be informative in general.

Comments are closed.