Recently in Sports Category
Tyler Cowen links to a blog by Paul Kedrosky that asks why winning times in the Boston marathon have been more variable, in recent years, than winning times in New York. This particular question isn't so interesting--when I saw the title of the post, my first thought was "the weather," and, in fact, that and "the wind" are the most common responses of the blog commenters--but it reminded me of a more general question that we discussed the other day, which is how to think about Why questions.
Many years ago, Don Rubin convinced me that it's a lot easier to think about "effects of causes" than "causes of effects." For example, why did my cat die? Because she ran into the street, because a car was going too fast, because the driver wasn't paying attention, because a bird distracted the cat, because the rain stopped so the cat went outside, etc. When you look at it this way, the question of "why" is pretty meaningless.
Similarly, if you ask a question such as, What caused World War 1, the best sort of answers can take the form of potential-outcomes analyses. I don't think it makes sense to expect any sort of true causal answer here.
But, now let's get back to the "volatility of the Boston marathon" problem. Unlike the question of "why did my cat die" or "why did World War 1 start," the question, "Why have the winning times in the Boston marathon been so variable" does seem answerable.
What happens if we try to apply some statistical principles here?
Principle #1: Compared to what? We can't try to answer "why" without knowing what we are comparing to. This principle seems to work in the marathon-times example. The only way to talk about the Boston times as being unexpectedly variable is to know what "expectedly variable" is. Or, conversely, the New York times are unexpectedly stable compared to what was happening in Boston those same years. Either way, the principle holds that we are comparing to some model or another.
Principle #2: Look at effects of causes, rather than causes of effects. This principle seems to break down in marathon example, where it seems very natural to try to understand why an observed phenomenon is occurring.
What's going on? Perhaps we can understand in the context of another example, something that came up a couple years ago in some of my consulting work. The New York City Department of Health had a survey of rodent infestation, and they found that African Americans and Latinos were more likely than whites to have rodents in their apartments. This difference persisted (albeit at a lesser magnitude) after controlling for some individual and neighborhood-level predictors. Why does this gap remain? What other average differences are there among the dwellings of different ethnic groups?
OK, so now maybe we're getting somewhere. The question on deck now is, how do the "Boston vs. NY marathon" and "too many rodents" problems differ from the "dead cat" problem.
One difference is that we have data on lots of marathons and lots of rodents in apartments, but only one dead cat. But that doesn't quite work as a demarcation criterion (sorry, forgive me for working under the influence of Popper): even if there were only one running of each marathon, we could still quite reasonably answer questions such as, "Why was the winning time so much lower in NY than in Boston?" And, conversely, if we had lots of dead cats, we could start asking questions about attributable risks, but it still wouldn't quite make sense to ask why the cats are dying.
Another difference is that the marathon question and the roach question are comparisons (NY vs. Boston and blacks/hispanics vs. whites), while the dead cat stands alone (or swings alone, I guess I should say). Maybe this is closer to the demarcation we're looking for, the idea being that a "cause" (in this sense) is something that takes you away from some default model. In these examples, it's a model of zero differences between groups, but more generally it could be any model that gives predictions for data.
In this model-checking sense, the search for a cause is motivated by an itch--a disagreement with a default model--which has to be scratched and scratched until the discomfort goes away, by constructing a model that fits the data. Said model can then be interpreted causally in a Rubin-like, "effects of causes," forward-thinking way.
Is this the resolution I'm seeking? I'm not sure. But I need to figure this out, because I'm planning on basing my new intro stat course (and book) on the idea of statistics as comparisons.
P.S. I remain completely uninterested in questions such as, What is the cause? Is it A or is it B? (For example, what caused the differences in marathon-time variations in Boston and New York--is it the temperature, the precipitation, the wind, or something else? Of course if it can be any of these factors, it can be all of them. I remain firm in my belief that any statistical method that claims to distinguish between hypotheses in this way is really just using sampling variation as a way to draw artificial distinctions, fundamentally in a way no different from the notorious comparisons of statistical significance to non-significance.
This last point has nothing to do with causal inference and everything to do with my preference for continuous over discrete models in applications in which I've worked in social science, environmental science, and public health.
David Park sent this along. I haven't really been following basketball statistics lately, but some of you might find it interesting.
I get so irritated when economists and political scientists try to explain every sort of irrational behavior in life as being part of some utility function.
That's one reason I love this paper by Erik Snowberg and Justin Wolfers, "Explaining the Favorite-Longshot Bias: Is it Risk-Love or Misperceptions." They conclude that, yes, it's misperceptions:
I'm too tired to think about this one, but maybe some of you out there have some ideas.
Chaz Littlejohn writes:
Ubs writes:
I wonder if I can get your thoughts on sabermetric baseball stats. Basically I'm trying to think about them more intelligently so that my instinctive skepticism can be better grounded in real science. There's one specific issue I'm focusing on, but also some more general stuff.
Allen Hurlbert writes:
I saw your 538 post [on the partisan allegiances of sports fans] and it reminded me of some playful data analysis I [Hurlbert] did a couple months ago based on NewsMeat.com's compilation of sports celebrity campaign contributions. Glancing through the list I thought I noticed some interesting patterns in the partisan nature of various sports, so I downloaded the data and created this figure:

I recently played Risk for the first time in decades and was immediately reminded of something that my sister and I noticed when we used to play as kids: the first player has a huge advantage. I think it would be easy to fix by just giving extra armies for the players who don't go first (for example, in the three-player game, giving two extra armies to the player who goes second, and four extras to the player who goes third), but the funny thing to me is that:
1. In the rules there is no suggestion to do this.
2. In all our games of Risk, my sister and I never thought of making the adjustment ourselves.
Sure, a lot of games have a first-mover advantage, but in risk the advantage is (a) large and (b) easy to correct.
Devin Pope writes:
I wanted to send you an updated version of Jonah Berger and my basketball paper that shows that teams that are losing at halftime win more often than expected.This new version is much improved. It has 15x more data than the earlier version (thanks to blog readers) and analyzes both NBA and NCAA data.
Also, you will notice if you glance through the paper that it has benefited quite a bit from your earlier critiques. Our empirical approach is very similar to the suggestions that you made.
See here and here for my discussion of the earlier version of Berger and Pope's article.
Here's the key graph from the previous version:

And here's the update:

Much better--they got rid of that wacky fifth-degree polynomial that made the lines diverge in the graph from the previous version of the paper.
What do we see from the new graphs?
Ryan Richt writes:
I wondered if you have a quick moment to dig up an old post of your own that I cannot find by searching. I read an entry where you discussed if there really was a difference between a prior of 1/2 meaning that we have no knowledge of a coin flip, or meaning we are exactly certain that it's generative distribution is 1/2.I'm only 24 and just got my masters last year, but I now have my own summer interns (who of course I encourage to read ET Jaynes and see the bayesian light) and one of them basically asked that question today.
My reply: The two original blog entries are here and here. Here's my published article. And here's a link discussing actual wrestlers and boxers. (Apparently the wrestler would win.)
I was getting my haircut today, and the TV in the barbershop was set to some kids' channel that was featuring a show about some weird form of basketball where the players can bounce on a trampoline on the way to dunking the ball into the basket. Sort of a cool idea, should definitely appeal to the targeted demographic of 10-year-old boys. It was set up as though it was what we might call a "real" professional sports league, with teams, won-lost records, upcoming games, announcers calling plays, and with players including some retired NBA stars. Not quite as over-the-top as professional wrestling, but that sort of thing.
Anyway, what puzzled me about all this was how little action there was on the screen. There were lots of interviews with players, video features, highlights of previous games, replays, and logos, but very little actual basketball.
Is this what 10-year-old boys want? I'm sure they've done lots of marketing surveys, so the answer is probably yes. But it left me extremely confused. Here you have a made-for-TV sport, the rules can be anything they want--I'd think they'd want there to be as much action as possible--passing, dunking, running, jumping and all the rest. While the ball was in play, the players were impressively athletic. But the ball was almost never in play. To me, it was much less exciting than any random basketball game you might see on ESPN. Again, they can make any rules they want--so why do they do it this way? I'd think kids would prefer to see live action rather than a series of disconnected highlights and replays. Perhaps someone could explain to me?
David Friedman suggests that, instead of limiting a football team to 11 men, you allow the team as many men on the field as they'd like, with the constraint that their total weight be below some 2400 pounds. It's an interesting idea.
Commenters suggest the related idea of limiting the total height of a basketball team to 30 feet. Then we'd find out right away how tall these players really are.
I'm not saying these ideas are perfect, but they're interesting.
Responding to my question about graphing horse race results, Megan Pledger writes:
While waiting up late to snipe at an internet auction, I put together some simple data of a horse race and used ggplot to plot it. It's discrete time race data rather than continuous time and has very simple choice options for the horse. The graph is a starting point!

[The picture doesn't fully fit on the blog window here; right-click and select "view image" to see the whole thing.]
My reply: Very nice--thanks! I won't look a gift horse in the mouth . . . but if I were to be picky, I'd suggest making the tods smaller, the lines thinner, and the colors of the five horses more distinct. All these tricks should make the lines easier to follow. I'd also suggest gray rather than black for the connecting lines.
I think I'd also supplement it with a blown-up version of the last bit (from 80-100 on the x-axis), since that's where some interesting things are happening.
And here's the code:
Speaking of racetrack charts, did anybody make a graph of the positions of the horses over time during the recent Kentucky Derby?
I'm thinking it could be done on a very long 2-d strip (imagining the racecourse laid out as a long strip from beginning to end, with width corresponding to the width of the track), with a different color for each horse--maybe using solid lines for the top 6 finishers and light lines for the others. Also, maybe the positions of the horses every 5 seconds (say) could be connected with light gray lines--then you'd be able to see who was ahead at any given point in time.
Does this graph already exist somewhere? Or are there better ideas?
I was mentioned on ESPN (sort of). Take that, Hal Stern!
The other day I mentioned this article by Lionel Page that found a momentum effect in tennis matches; more specifically: "winning the first set has a significant and strong effect on the result of the second set. A player who wins a close first set tie break will, on average, win one game more in the second set."

I'd display these data with a heat map rather than with overplotted points, but you get the idea.
This looked reasonable to me, but Guy Molyeneux sent in some skeptical comments, which I'll give, followed by Page's response. Molyeneux writes:
According to Carl Bialik, "za," "qi," and "zzz" were added recently to the list of official Scrabble words. I'm not so bothered by "zzz"--if somebody has two blanks to blow on this one, go for it!--but "za" and "qi"??? I don't even like "cee," let alone "qat," "xu," and other abominations. (I'm also not a big fan of "aw.")
Without further ado, here are my suggestions for reforming Scrabble.
1. Change one of the I's to an O. We've all had the unpleasant experience of having too many I's in our rack. What's the point?
2. Change one of the L's to an H. And change them both to 2-point letters. The H is ridiculously overvalued.
3. V is horrible. Change one of them to an N and let the remaining V be worth 6 points.
4. Regarding Q: Personally, I'd go the Boggle way and have a Qu tile. But I respect that Scrabble traditionalists enjoy the whole hide-the-Q game, so for them I guess I'd have to keep the Q as is.
5. Get rid of a bunch of non-English words such as qat, xu, jo, etc. Beyond this, for friendly games, adopt the Ubs rule, under which, if others aren't familiar with a word you just played, you (a) have to define it, and (b) can't use it this time--but it becomes legal in the future.
6. This brings me to challenges. When I was a kid we'd have huge fights over challenges because of their negative-sum nature: when player A challenges player B, one of them will lose his or her turn. At some point we switched to the mellower rule that, if you're challenged and the word isn't in the dictionary, you get another try--but you have to put your new word down immediately, you get no additional time to think. And if you challenge and you are wrong, you don't lose your turn. (We could've made this symmetric by saying that the challenger would have to play immediately when his or her turn came up--that seems like a reasonable rule to me--but we didn't actually go so far, as challenges were always pretty rare.)
Regarding points 1, 2, and 3 above: I know that traditionalists will say that all these bugs are actually features, that a good Scrabble player will know how to handle a surplus of I's or deal with a V. I disagree. There's enough challenge in trying to make good words without artificially making some of the rare letters too common. I mean, if you really believed that it's a good thing that there are two V's worth only 4 points each, why not go whole hog and get rid of a bunch of E's, T's, A's, N's, and R's, and replace them with B's and C's and suchlike?
P.S. Also interesting is this chart showing the frequencies of letters from several different corpuses. I'm not surprised that, for example, the frequency of letters from a dictionary is different from that of spoken words, but I was struck by the differences in letter frequencies comparing different modern written sources. For example, E represents 12.4% of all letters from a corpus of newspapers, whereas it is only 11.2% in corpuses of fiction and magazines. I wonder how much of this is explained by "the."
Following my skeptical discussion of their article on the probability of a college basketball team winning after ahead or behind by one point at halftime, Jonah Berger and Devin Pope sent me a long and polite email (with graph attached!) defending their analysis. I'll put it all here, followed by my response. I'm still skeptical on some details, but I think that some of the confusion can be dispelled with a minor writing change, where they make clear that their 6.6% estimate is a comparison to a model.
Berger and Pope's first point was a general discussion about their methods:
John Shonder pointed me to this discussion by Justin Wolfers of this article by Jonah Berger and Devin Pope, who write:
In general, the further individuals, groups, and teams are ahead of their opponents in competition, the more likely they are to win. However, we show that through increasing motivation, being slightly behind can actually increase success. Analysis of over 6,000 collegiate basketball games illustrates that being slightly behind increases a team's chance of winning. Teams behind by a point at halftime, for example, actually win more often than teams ahead by one. This increase is between 5.5 and 7.7 percentage points . . .
This is an interesting thing to look at, but I think they're wrong. To explain, I'll start with their data, which are 6572 NCAA basketball games where the score differential at halftime is within 10 points. Of the subset of these games with one-point gaps at halftime, the team that's behind won 51.3% of the time. To get a standard error on this, I need to know the number of such games; let me approximate this by 6572/10=657. The s.e. is then .5/sqrt(657)=0.02. So the simple empirical estimate with +/- 1 standard error bounds is [.513 +/- .02], or [.49, .53]. Hardly conclusive evidence!
Given this tiny difference of less than 1 standard error, how could they claim that "being slightly behind increases a team's chance of winning . . . by between 5.5 and 7.7 percentage points"?? The point estimate looks too large (6.6 percentage points rather than 1.3) and the standard error looks too small.
What went wrong? A clue is provided by this picture:

As some of Wolfers's commenters pointed out, this graph is slightly misleading because all the data points on the right side are reflected on the left. The real problem, though, is that what Berger and Pope did is to fit a curve to the points on the right half of the graph, extend this curve to 0, and then count that as the effect of being slightly behind.
This is wrong for a couple of reasons.
First, scores are discrete, so even if their curve were correct, it would be misleading to say that being behind increases your chance of winning by 6.6 points. Being behind takes you from a differential of 0 (50% chance of winning, the way they set up the data) to 51% (+/- 2%). Even taking the numbers at face value, you're talking 1%, not their claimed 5% or more.
Second, their analysis is extremely sensitive to their model. Looking at the picture above--again, focusing on the right half of the graph--I would think it would make more sense to draw the regression line a bit above the point at 1. That would be natural but it doesn't happen here because (a) their model doesn't even try to be consistent with the point at 0, and (b) they do some ridiculous overfitting with a 5th-degree polynomial. Don't even get me started on this sort of thing.
What would I do?
I'd probably start with a plot similar to their graph above, but coding score differential consistently as "home team score minus visiting team score." Then each data point would represent different games, they could fit a line and see what they get. And I'd fit linear functions (on the logit scale), not 5th-degree polynomials. And I'd get more data! The big issue, though, is that we're talking about maybe a 1% effect, not a 7% effect, which makes the whole thing a bit less exciting.
P.S. It's cool that Berger and Pope tried to do this analysis. I also appreciate that they attempted to combine sports data with a psychological experiment, in the spirit of the (justly) celebrated hot-hand paper. I like that they cited Hal Stern. And, even discounting their exaggerated inferences, it's perhaps interesting that teams up by 1% at halftime don't do better. This is just what happens when studies get publicized before peer review. Or, to put it another way, the peer review is happening right now! I've put enough first-draft mistakes on my own blogs that I can't hold it against others when they do the same.
P.P.S. Update here.
I got this bit of spam in the email but it's actually sort of cool, would be an excellent topic for discussion in an intro stat class or a Bayesian class:
MEDIA ALERT: NCAA COLLEGE BASKETBALL TOURNAMENT - MARCH MADNESS NCAA College Basketball Tournament Bracket-Picking Tips. RJ Bell of Pregame.com, the top Las Vegas based sports betting authority, provides a simple blueprint to improve anyone's bracket results.
Carl Bialik writes:
There hasn't been a single 7-3 finish in the NFL since the league adopted the two-point conversion rule in 1994 . . . "Football scores are funny," Driner wrote me [Bialik] in an email. "Did you know that teams win more often when they score 13 points than when they score 14? It's a cause-effect thing. In order to get 13, you (usually) need two field goals. And teams don't kick field goals if they're down by 20 points. So teams lose 35-14 more often than they lose 35-13. That's why scoring 13 is better correlated with winning than scoring 14 is.
And, most amazingly,
An NFL game hasn't finished with a score of 7-0 in over a quarter-century.
More boringly, the most common final score is 20-17.
Brad Miner wrtes:
With the Super Bowl coming up this weekend, I [Miner] want to write about sports, which I consider a key to building a larger conservative coalition in America. . . .If you did a survey of the political philosophies of 75,000 randomly selected Americans you'd expect the usual--if somewhat mystifying--results: "Only about one-in-five Americans currently call themselves liberal (21%), while 38% say they are conservative and 36% describe themselves as moderate." So said the folks at Pew Research, and this was after the November election.
Do that same poll among the fans at Raymond James Stadium in Tampa on Sunday and the results would likely be more like 15% liberal, 30% moderate, and 50% conservative. And a bunch of those liberals would probably be gun owners.
Obviously those numbers are just speculation on my part, but I guarantee that Steelers fans are more conservative than all Pennsylvanians and ditto Cardinals devotees and the rest of Arizona. Which is not to say that these folks cast their ballots in November more for McCain than Obama. That's the problem.
What do the data say?
Yu-Sung and I looked at the "attended sporting event in the past year" item in the General Social Survey. (Unfortunately, the question was only asked once, in the 1993-1996 survey.) 56% of respondents said they attended an amateur or professional sports event" during the past twelve months. How do they differ from the 44% who didn't?

So, at least in the mid-1990s, sports attenders were quite a bit more Republican than other Americans (the categories in the graph above are Strong Democrat, Democrat, lean Democrat, Independent, lean Republican, Republican, strong Republican), but not much different in their liberal-conservative ideology.
So these data do not appear to support Miner's claim. Miner expected sports fans to be label themselves as more conservative but maybe not to be more likely to vote Republican; actually, sports fans were more likely to call themselves Republican but no more likely to describe themselves as conservative.
Some other issues:
1. The sporting event attended could be the Super Bowl or your kid's soccer game. Maybe more dramatic results would be obtained by considering a more restricted group of sports fans.
2. There are lots of surveys of TV watching, so I'm sure there are tons of data that would let you crosstab ideology, voting, and spectator sports watching.
3. More generally, we never want to rely too strongly on just one survey. Still, it's fun to look.
P.S. Sometimes people ask me how much time blogging takes me. This took about an hour: 15 minutes for me to read Miner's article and think about it, 10 minutes for Yu-Sung to get the crosstabs, 20 minutes for me to make the graphs, and 15 minutes for me to write the blog entry.
And, yes, this means I have a lot of real work that I've been putting off. . . .
Kenny sent me this article by Bill James endorsing Hal "Bayesian Data Analysis" Stern's dis of the BCS. I'd like to add a statistical point, which is a point that Hal and I have discussed once or twice: There is an inherent tension between two goals of any such rating system:
1. Ranking the teams by inherent ability.
2. Scoring the teams based on their season performance.
Here's an example. Consider two teams that played identical opponents in the season, with team A having a 12-0 record and team B going 9-3. But here's the hitch: in my story, team B actually had a much better better point differential than team A during the season. That is, team A won a bunch of games by scores of 17-16 or whatever, and team B won a bunch of games 21-3 (with three close losses). Also assume that none of the games were true run-up-the-score blowouts.
In that case, I'd expect that team B is actually better than team A. Not just "better" in some abstract sense but also in a predictive sense. If A and B were playing some third team C, I'd guess (in the absence of other information) that B's probability of winning is greater than A's.
But, as a matter of fairness, I think you've gotta give the higher ranking to team A. They won all 12 games--what more can you ask?
OK, you might say you could resolve this particular problem by only using wins/losses, not using score differentials. But this doesn't really solve the general problem, where teams have different schedules, maybe nobody went 12-0, etc.
My real point with this example is not to recommend a particular ranking strategy but to point out the essential tension between inference and reward in this setting. That's why, as Hal notes, it's important to state clearly what are the goals.
P.S. It's been argued that a more appropriate system is to change the rules of football to make it less damaging to the health of the players (see here for a review of some data). I certainly agree that this is a more important issue than the scoring system. In statistics we often use sports examples to illustrate more general principles, but it is always good to be aware of
the reality underlying any example. It also makes sense to me that people who are closer than I am to the reality fo the situation would be less amused by the thoughts of Bill James and others about the intellectual issues in the idealized system.
This article, by Jim Stallard, is just hilarious. It's at the intersection of politics and basketball. There are just so many funny lines here, I just don't know where to start.
Regarding my article on the boxer, the wrestler, and the coin flip, Steve Hsu writes:
A world class wrestler would easily demolish a top boxer in a no holds barred fight. This has been verified by in many experiments (Inoki-Ali doesn't count)!
Steve has more details in this blog entry from 2007:
Ultimate fighting has grown from obscurity to unbelievable levels of popularity. It will soon surpass boxing as the premier combative sport. And it will soon be widely recognized that the baddest man on the planet is not a boxer, but an ultimate fighter. . . .Unarmed single combat -- mano a mano, as they say -- has a long history, and is a subject which fascinates most men, both young and old. As a boy, I can remember serious discussions with my friends concerning which style was most effective -- karate or kung fu, boxing or wrestling, etc. How would Muhammed Ali fare against an Olympic wrestler or Judo player? What about Bruce Lee versus a Navy Seal? Of course, these discussions were completely theoretical, akin to asking whether Superman could beat Galactus in arm wrestling. There was scarcely any data available on which to base a conclusion.
However, thanks to the recent proliferation of "No Rules" or "No Holds Barred" (NHB) fighting tournaments, both in the U.S. and abroad, we finally have some interesting answers to this ancient question.
Somebody asked me for the golf putting data from Don Berry's book, which Deb and I use as an example for nonlinear modeling in this article and our Teaching Statistics book. Here they are:
Yair sends in this plot of the week:

He writes:
This displays the smoothed distribution of shots taken by wing players for the Phoenix Suns in the '07-'08 regular season (Matt Barnes played for the GS Warriors that year). Raja Bell seems like the perfect wing player for the Suns, because he plays defense and then basically sits at the 3-pt line waiting for Steve Nash to give him the ball for a good shot. Leandro Barbosa is similar, but he drives a bit more (especially when Nash is off the floor). Grant Hill didn't fit this mold because he has no 3-pt shot; he is more of a mid-range guy. From this standpoint, Matt Barnes (their free-agent pickup) looks like he could be a better fit. Of course, this plot says nothing about whether he actually hits the threes, but at least his heart is in the right place. Then again, if their offensive system changes because of the new coach, all bets are off.
Pretty graphs, huh? The color scheme seems good for a team called the Suns.
Jim points me to this article by Don Berry, which argues that studies of doping in sports often don't correctly perform probability calculations.
Andrew Oswald sent me this paper by Amanda Goodall, Lawrence Kahn, and himself, called "Why Do Leaders Matter? The Role of Expert Knowledge." Here's the abstract:
Why do some leaders succeed while others fail? This question is important, but its complexity makes it hard to study systematically. We draw on a setting where there are well-defined objectives, small teams of workers, and exact measures of leaders characteristics and organizational performance. We show that a strong predictor of a leader's success in year T is that person's own level of attainment, in the underlying activity, in approximately year T-20. Our data come from 15,000 professional basketball games and reveal that former star players make the best coaches. This expert knowledge effect is large.
My first thought upon seeing this paper was: What about Isiah Thomas? But a glance through reveals that their data end at 2004, before Isiah took up his Knicks coaching job.
More seriously, Goodall et al.'s findings seem to contradict the conventional wisdom in baseball that the best managers are the mediocre or ok players such as Earl Weaver and Casey Stengel rather than the superstars such as Ted Williams and Ty Cobb. I'd be interested to hear what the authors think about this.
Scatterplot, please! It's not just about an eye-catching result; it's about building confidence in your findings
I won't bother to give my comments on the tables and graphs (except to note that the figures are hard to read for many reasons, starting with the fact that these are bar graphs with lower bounds at 0.4 (?), 0.6 (??), etc.).
What I will say, though, is that I'd like to see a scatterplot, with a dot for each coach/team (four different colors for the four categories of coaches), plotting total winning percentage (on the y-axis) vs. winning percentage in the year or two before the coach joined the team (on the x-axis). This is the usual before-after graph, which can then be embellished with 4 regression lines in the colors corresponding to the four groups of coaches.
When reading such an analysis, I really, really want to see the main patterns in the data. Otherwise I really have to take the results on trust. This is related to my larger point about confidence building.
Following up on our link to an article about educational measurement, Eric Loken pointed me to this:
On the Criteria Corporation blog we [Loken] just posted a look at golf tournament scores. If you take the four rounds as if they were four repeats of the same test, or four parallel items on a test, the usual psychometric analyses would yield a terrible reliability coefficient. The problem of course is restriction of range of true scores among the world's best golfers. We figured since the US Open (this weekend) is sometimes called the Ultimate Test we'd offer a little psychometric analysis of golf.
Despite having published an article on golf, I know almost nothing about the sport--I've never actually played macro-golf--so I'll link to Eric's note without comment.

See here. It took me 3 weeks the first time, about 1 week the second time. I remember setting my alarm to 5am so I could work on the cube for two hours in the morning before going to school. Eventually I got my time down to a little over 2 minutes (which is just about the longest I can concentrate on anything). There were two kinds of cube solvers: those who held the cube in a stationary orientation and spun the edges around, and those who kept turning the cube around in their hands to get just the right orientation for each move. I was of this second type, which I think kept my efficiency down. One of my math professors in college told me that he'd solved the cube in theory--he taught abstract algebra--but had never bothered to do it in practice. This impressed me to no end. A guy down the hall from me had a 4x4x4 cube, which at one point we tried to see if we could solve using only 3x3x3 operators. I don't think we succeeded.
It's been years since I've done the cube. Last time I tried and tried and tried and got stuck. If I ever want to do it again, I think I'll have to figure out some operators again from scratch.

Recent Comments