
See here for what I'm talking about.

See here for what I'm talking about.
Serena Ng sends along this paper by Emanuel Moench, Simon Potter, and herself. Here's the abstract:
Tom Ferguson writes:
At the Democratic Convention, perhaps the most memorable line was by one Barney Smith, who said that he wanted a candidate who cared more about Barney Smith than Smith Barney. Just for the record, Smith Barney is owned by Citigroup (it's Salomon Smith Barney). We all know who sits at the top of Citigroup: one Robert Rubin. The director of research for the Obama campaign is Jason Furman, who is closely associated with Rubin and the latter's Hamilton Project. Citigroup's cash is split massively in favor of Obama; about $400,000 to 260,000 or so.
Overall, the richest Americans give much more to Republicans than to Democrats, but the financial services industry is one of the Democrats' strengths.
Ubs writes that the Republican vice-presidential nominee, Sarah Palin, is extremely popular in her home state of Alaska because of her bipartisan competence. I think Ubs has some interesting things to say, both about Alaska politics and about competence and ideology in general. But I think he may be overinterpreting the poll data on her popularity.
I'll copy over some of Ubs's words and then give my thoughts:
Ubs says:
Holy crap, he actually did pick her! I've had a long, half-composed Sarah Palin post in the back of my mind since about May, when I first saw her mentioned as a running-mate candidate. . . . My initial reaction was that this tells me McCain doesn't expect to win. . . . But then I have to pause ... because my own opinion, in fact, is that Palin would probably be a pretty good president. . . .The most interesting part of that formula — and unfortunately the part we'll probably hear the least about — is "popular governor" part. Sarah Palin is not just popular. She is fantastically popular. Her percentage approval ratings have reached the 90s. Even now, with a minor nepotism scandal going on, she's still about 80%. . . . How does one do that? You might get 60% or 70% who are rabidly enthusiastic in their love and support, but you're also going to get a solid core of opposition who hate you with nearly as much passion. The way you get to 90% is by being boringly competent while remaining inoffensive to people all across the political spectrum.
Bipartisanship is a perpetual topic in political punditry, but it is distorted by the media environment. Due to the nature of what makes a story, the news media thrives on partisanship. Everything is viewed through partisan-colored glasses. , , , The real significance of Gov Palin's success and her phenomenal approval ratings is that they demonstrate her genuine talent as a non-partisan.
I was looking up the governors' popularity numbers on the web, and came across this page from Rasmussen Reports which shows Sarah Palin as the 3rd-most-popular governor. But then I looked more carefully. Janet Napolitano of Arizona is viewed as Excellent by 28% of respondents, Good by 27%, Fir by 26%, and Poor by 27%. That adds up to 108%! What's going on? I'd think they would have a computer program to pipe the survey results directly into the spreadsheet. But I guess not, someone must be entering these numbers by hand. Weird.
P.S. Mark Blumenthal writes that the question of whether to trust Rasmussen is complicated:
On the one hand (as Charles Franklin, Nate Silver and others can attest) their final polls in statewide races usually score as well or better than other pollsters on measures of accuracy. On the other, they break a lot of the rules: They now seem to prefer to do one night samples, make no call backs to non-contacted numbers (even with multi-night polls) and make no effort to randomly select a respondent within each household. Their questionnaire design choices can be...unusual.
And he links to this column which is relevant.
Sanjay Kaul points to this interesting news article by Matthew Herper about statistical controversies involving the drug Vytorin:
A 1,873-patient study called SEAS found there were 50% more cancers among patients who took Vytorin than those who received placebo. Researchers involved in the study put together a hastily organized, company-funded press conference on July 21 to release the data.There, Richard Peto, an Oxford University statistician, quieted the cancer scare before it really began. He pooled data from two much larger ongoing studies of Vytorin and said they showed that the cancer risk was a statistical fluke. He called the contention that Vytorin could cause cancer "bizarre." . . .
If you live in NYC, you can hear me tomorrow (Fri 29 Aug) from 12.30-1 on the Leonard Lopate show on WNYC, 93.9 FM and AM 820. I'll be talking about our Red State, Blue State book.
P.S. The interview is here.
Below are two long questions about variance components and multilevel models.
Shinichi Nakagawa forwards these questions from Hoger Schielzeth:
Shinghi Nakagawa sent along this, by Josep Carrasco and Lluis Joer. I haven't looked at it but it might be useful.
This is pretty funny. J. Robert Lennon reports that "Random House has decided to insert the following clause into its boilerplate contract for children's authors:"
If you act or behave in a way which damages your reputation as a person suitable to work with or be associated with children, and consequently the market for or value of the work is seriously diminished, and we may (at our option) take any of the following actions: Delay publication / Renegotiate advance / Terminate the agreement.
I'm wondering if this could be included into contracts for statistics books. If, for example, you publish an article in a leading statistical journal in which you have an unreadable graph, or present results to 8 significant digits when 2 will do, or if you run your simulations for a million iterations without even trying to see if 1000 would've sufficed, then your textbook could get yanked off the shelves!
P.S. No, publishing a false theorem shouldn't count.
See here for Jeremy's comments to my comments. I agree with what he writes. The whole discussion reminds me of a comment made to me once by a statistician who generally works with engineers. He said that when he talks with people about statistical procedures, engineers focus on the algorithm being applied to the data, whereas statisticians are always thinking about the psychology of the person doing the analysis.
According to Wikipedia the current population of South Ossetia is 70,000.
SIAM Review asked me to review Jeff Gill's new book (Bayesian Methods: A Social and Behavioral Sciences Approach, second edition) but they said they'd like a general review essay that would be of interest to their readers, not a mere Siskel-and-Ebert on the book itself. Below is my first draft. Any comments would be appreciated.
Regarding the question of what to call x and y in a regression (see comments here), David writes, "The semantics are ugly, and don't really add much, because we are concerned with the relation of one to the other, not what they themselves are."
I agree that the semantics don't really add much, but they can subtract, I think! First off, the words "dependent" and "independent" sound similar and can lead to confusion in conversation. Second, as commenter Infz noted, people confuse "independent variables" with statistical independence, leading to the incorrect view that multiple regression requires the predictors to be independent.
I agree, though, that the term "parameter" can be confusing; sometimes it's something that you can vary and sometimes it's something you can estimate. And I've already discussed how "marginal" has opposite meanings in statistics and in economics.
John Kastellec sent me this attractive paper:
We [Kastellec et al.] study the relationship between state-level public opinion and the roll call votes of senators on Supreme Court nominees. Applying recent advances in multilevel modeling, we use national polls on nine recent Supreme Court nominees to produce state-of-the-art estimates of public support for the confirmation of each nominee in all 50 states. We show that greater public support strongly increases the probability that a senator will vote to approve a nominee, even after controlling for standard predictors of roll call voting. We also find that the impact of opinion varies with context: it has a greater effect on opposition party senators, on ideologically opposed senators, and for generally weak nominees. These results establish a systematic and powerful link between constituency opinion and voting on Supreme Court nominees.
Another triumph of the Lax/Phillips approach of linking policy to state-level opinion (see also here). Also another example of the synergy that's supposed to happen with an academic department, with Jeff, Justin, John, and myself each bringing unique contributions. I don't think any of this would've happened if we weren't all brought together with repeated interactions on the 7th floor.
Someone writes:
Chris Paulse points to this interesting slideshow.
I realized while reading Red State, Blue State, Rich State, Poor State that I hadn’t seen a book with so many charts and graphs since I struggled though economics and statistics—and that if the textbooks back then had been as interesting as Andrew Gelman’s analysis of the American electorate, I might have done better in college. . . .But how do the Democrats manage to win in the rich states without winning rich voters? This is the Freakonomics-style analysis that every candidate and campaign consultant should read. . . .
That was our aim. . .
I find the National Weather Service display to be much more useful than weather.com and other commercial sites. But its city-search finder is terrible. Go here and enter "Atlanta" and see what you get. It's a list of about 20 cities. And they don't even list Atlanta, Georgia first. It's buried in there, about fifteenth in the list. What's that all about?
Juli linked to this study about Friday the 13th not being more unlucky:
A study published on Thursday by the Dutch Centre for Insurance Statistics (CVS) showed that fewer accidents and reports of fire and theft occur when the 13th of the month falls on a Friday than on other Fridays. . . . In the last two years, Dutch insurers received reports of an average 7,800 traffic accidents each Friday, the CVS study said. But the average figure when the 13th fell on a Friday was just 7,500.
Datacharmer recently made a good comment on this:
Apart from avoiding risky behaviour on Friday the 13th because it is deemed unlucky (which might well be happening), you should also consider that Friday the 13th - unlike other Fridays - CAN'T be Christmas or New Year's (where people get drunk and drive), and it will also be associated with a lower (or higher) probability of falling before a bank holiday weekend (or I guess in the States Independence day, etc).I guess all I'm saying that it could well be other factors driving this result other than a change in people's behaviour because Friday the 13th is 'unlucky'.
How about accidents on Friday the 12th of Friday the 14th? The article only compares Friday the 13th with an average Friday - in fact, it doesn't even reveal whether the 13th is least accident prone Friday in the book...
Howard Wainer writes:
A friend sent me this USA Today article with a graph about HIV:
Radford's a leading researcher in statistical computing. He started a new blog. Radford writes:
Many of my technical posts will point out flaws in research, methods, and tools that are commonly used. Such negative comments are essential to the scientific enterprise, being the social counterpart of the crucial role of self-criticism in individual research. I find that one of the main challenges in supervising PhD students is getting them to constantly ask whether their results, or the results of others, might be wrong. The aim in research is to discard bad ideas quickly, and with minimal effort. For this a blog is much more efficient than formal publication.
I hope he posts some positive things too. I mean, our blog here could be all Kanazawa all the time, and that would be fun for awhile, but generally it's much more of a challenge to make a positive contribution than a negative contribution, so I hope Radford applies his blogging talents, as he applies his research talents, to this area.
P.S. Here's my favorite Radford Neal quote. It's from his Ph.D. thesis:
Sometimes a simple model will outperform a more complex model . . . Nevertheless, I believe that deliberately limiting the complexity of the model is not fruitful when the problem is evidently complex. Instead, if a simple model is found that outperforms some particular complex model, the appropriate response is to define a different complex model that captures whatever aspect of the problem led to the simple model performing well.
I have mixed feelings about this picture

and accompanying note of Jeremy Freese, who writes:
Key findings in quantitative social science are often interaction effects in which the estimated “effect” of a continuous variable on an outcome for one group is found to differ from the estimated effect for another group. . . . Interaction effects are notorious for being much easier to publish than to replicate, partly because it is easy for researchers to forget (?) how they tested many dozens of possible interactions before finding one that is statistically significant and can be presented as though it was hypothesized by the researchers all along. . . . There are so many ways of dividing a sample into subgroups, and there are so many variables in a typical dataset that have low correlation with an outcome, that it is inevitable that there will be all kinds of little pockets for high correlation for some subgroup just by chance.
I take his point, and indeed I've written myself about the perils of fishing for statistical significance in a pond full of weak effects (uh, ok, let's shut down that metaphor right there). And I even cite Freese in my article.
On the other hand, I'm also on record as saying that interactions are important (see also here).
I guess my answer is that interactions are important, but we should look for them where they make sense. Jeremy's graph reproduced above doesn't really give enough context. Also, remember that the correlation between before and after measurements will be higher among controls than among treated units.
I encountered this. It writes, of our blog and Econlog:
Comments require pre-approval. Here's where the bullshit really starts. I only understand this if there's a clear policy that it's to reduce liability and to prevent posting of illegal, defamatory, or commercially exploitive materials. I don't see such an explicit policy with these blogs. So the feeling is that any comment posted has to also meet the threshhold in being something the blog owner is comfortable with. Yuck.
Actually, we require pre-approval because we sometimes we get tons of spam. Going in and approving comments once a day (or more frequently if I feel like having more distractions) seemed better than going in and deleting spam once a day. But, I don't know, maybe this other approach is better. There was a time when we were getting dozens of spam per day but now we only get something a couple of spam comments per day that slip through the filter.
Igor Carron forwards this (see the second item in the linked page). I don't know anything about this but it might be of interest to some of you.
"Miseducation" was so awesome, how come Lauryn Hill has done essentially nothing since then?
Mike Kruger pointed me to this site which estimates the probability you're female and the probability you're male based on your browsing history. It estimates the probability that I'm male as 66%. I don't really know what these "probabilities" are, though. They're between 0 and 1, but I doubt they're calibrated to give a direct probability interpretation. (For example, if you took everybody whose claimed probabilities were 66%, would 66% of these people actually be male?)
And here are Mike's comments. He calls the method "Bayesian" but I'm not so sure.
In an article on U.S. foreign policy and domestic politics, Samantha Power writes:
Since 1968, with the single exception of the election of George W. Bush in 2000, Americans have chosen Republican presidents in times of perceived danger and Democrats in times of relative calm.
So here's the difference between qualitative and quantitative researchers. Samantha Power knows more about foreign policy and politics than I'll ever know. But she could whip off the above sentence without pause. Whereas, when I see it, I think:
- Why start in 1968? Is this just a convenient choice of endpoint? Eisenhower ran as a national security expert, no?
- What evidence can you expect to get about public opinion from the essentially tied elections of 1968, 1976, and 2000?
- Anyway, if you're talking public opinion, it was Gore who won more votes in 2000--so it's funny to be taking that as an exception at all!
- How are "perceived danger" and "relative calm" defined? Was 1988, when George H. W. Bush floored Michael Dukakis, really such a time of "perceived danger"?
I have no expertise to comment on the rest of Power's article; I just think it's funny that she'd throw in a sentence like that. It's just a throwaway comment she made; I wouldn't put it in the class of David Runciman's "but viewed in retrospect, it is clear that it has been quite predictable" or John Yoo writing an entire op-ed on something he appears to know nothing about. It's just one of these things that rings alarm bells to a "quant" such as myself but just passes right by the qualitative analyst.
P.S. On an unrelated note, that same issue of the New York Review of Books had this great line by Michael Dirda: "Real readers always read for excitement; only the nature of that excitement changes through life."
Is this stuff useful? (Link from John Cook.) I pretty much think that any idea in programming can also apply to statistics (not to mention the programming that goes into statistics).
Several years ago I was at the Library of Congress and asked where to go to get to the stacks. The guard told me that the stacks were closed. I asked, when did that happen? He replied that the Library of Congress had never had open stacks. The funny thing is, I knew he was wrong, because in high school I went to the Library of Congress a couple of times and I remember roaming the stacks, which were positioned sort of like spokes in a wheel. It was so cool to go to the stacks and see all the books written by an author. (I also remember looking for the book, "Get Even: The Complete Book of Dirty Tricks," which was in the card catalog--remember those?--but not on the shelves. But only Members of Congress can check books out of the Library of Congress. Hmmm.....) It's annoying how people can be so sure of themselves. The guard had probably been working there 10 years and so he thought he knew everything about the place.
Sandy Gordon sent me this paper, which begins:
The 2007 U.S. Attorney firing scandal has raised the specter of political bias in the prosecution of officials under federal corruption laws. Has prosecutorial discretion been employed to persecute enemies or shield allies? To answer this question, I [Gordon] develop a model of the interaction between officials deciding whether to engage in corruption and a prosecutor deciding whether to pursue cases against them. Biased prosecutors will be willing to file weaker cases against political opponents than against allies. Consequently, the model anticipates that in the presence of partisan bias, sentences of prosecuted opponents will tend to be lower than those of co-partisans. Employing newly collected data on public corruption prosecutions, I find evidence of partisan bias under both the Bush and Clinton Justice Departments. However, additional analysis suggests that these results may understate the extent of bias under Bush, while overstating it under Clinton.
Interesting. This reminds me of Bill James's comment that Major League Baseball's discrimination against blacks could be seen by the fact that black players had much better statistics than whites: under a discriminatory regime, they were taking marginal white players who were worse than the marginal black players. It's also similar to what we found in Section 5.3 of our stop-and-frisk paper: the whites who were stopped were more likely than the blacks to be arrested, which suggests that police were disproportionally stopping minorities, at least with regard to this measure.
Gordon writes,
Employing an approach from economic models of discrimination, I [Gordon] treat partisan bias as a "taste" or preference for prosecuting one's political opponents (or for not prosecuting allies). This approach, pioneered by Becker (1957), has been employed recently to study discrimination against minorities in setting bail (Ayres and Waldfogel 1994; Ayres 2001), racial proling (Knowles, Persico, and Todd 2001), and discrimination against female candidates in congressional elections (Anzia and Berry 2007).
I actually think this model makes more sense for studying prosecutors (as in the current paper) than for studying racial profiling of police (the subject of my paper with Jeff Fagan and Alex Kiss). Without any direct knowledge of prosecutors or police, I'm only speculating, but the idea of a "taste" or political pressure to prosecute one side or the other makes sense to me, whereas the idea of a police officer having a "taste" for stopping one race or the other sounds a little silly. My impression of police stops is that the police use whatever cues they have, and many of these cues are correlated with race. That to me doesn't seem like the same thing as having a preference for stopping racial minorities per se.
Gordon writes, "The 2007 U.S. Attorney firing scandal raised the possibility that federal corruption laws could be deployed for partisan ends. In this paper, I have sought to move beyond anecdotes to construct a systematic test of partisan bias in corruption prosecutions." This makes sense to me. What I'd also like to see is some work bridging the anecdotes to the quantitative results, giving a sense of who are the people being prosecuted that are driving these results.
Larry Wasserman read my response to his comments on my article on Bayesian statistics and had some responses of his own. I'll post Larry's thoughts and then my response to his response etc. Here's Larry:
1. You correctly point out that if there are systematic errors, then confidence intervals will not have their advertised coverage. But this has nothing to do with the point under discussion: Bayes versus frequentist. All methods fail if there are systematic error. That's important but it is beside the point I was making. Assume the model is correct and there is no systematic error. Then what I said is correct. The frequentist method will cover as advertised (by definition) and the Bayesian method, in general will not. More importantly, and this is what I was really getting at, the hardcore subjectivist Bayesian will say that coverage is irrelevant. The people who think that coverage is completely irrelevant are being scientifically irresponsible in my opinion.2. ``I can dispose of the first two with a reference to Agresti and Coull (1998).'' Frequentist methods have correct coverage or they aren't frequentist methods. That's the definition. And when I said ``Bayesian methods don't'' I meant, there is nothing in Bayesian inference that automatically guarantees, in general, correct coverage. I did not mean there there don't exist Bayesian methods with correct coverage. The fact that (y+1)/(n+2) has good frequentist properties is fine but in this case we're thinking of it as a frequentist estimator. The fact that it happens to be Bayes for some prior isn't what I mean by a Bayesian procedure.
3. You're right that scientists want it both ways: they want coverage AND they would like to interpret a confidence interval as a posterior probability. So what. I'd like to measure an electron's position and its velocity. But I can't. Physics tells us you can measure one or the other but not both. Tough luck for me. If you explain to a scientist that they can have the comfort of a Bayesian interpretation OR coverage bot not both, I'll bet most would pick coverage.
4. Estimating the upper .01 quantile. You say there is no frequentist method for this. First of all, there is (as I discuss below). Second, here is a challenge. Find me one example where:
(i) there is no frequentist method to solve a problem
(ii) there is a Bayesian method
(iii) we can trust the Bayesian method.
I claim you cannot do this. Suppose you do find a Bayes procedure. If it has coverage then you have found a valid frequentist method so (i) is not true. If it does not have coverage then you have failed (iii).
But lets look closer at estimating theta = .01 quantile. We can always find the order statistics X_r and X_s so that:
P(X_r < theta < X_s) >= .95
This is an exact, nonparametric statement. So [X_r,X_s] is a valid 95 percent confidence interval. When you say there is no frequentist method, I suspect you mean that this interval is going to be very wide, unless the sample size n is very large. My reply is: great! There is very little information in the data about theta unless n is large. So the interval should be wide. This a a correct representation about the uncertainty. The Bayesian interval will be narrower but this reflects the prior not the data. Of course, if the prior is reliable that's fine. But many Bayesians would simply crank out an interval, see that it is narrower than the frequentist interval and declare victory. They're deluding themselves because they're sweeping the uncertainty under the carpet. I'd rather they use the frequentist interval so that they are aware of the difficulty of the problem.
The same applies to calibration problems etc where one gets huge intervals (sometimes even the whole real line). This is a virtue not a problem. The Bayesian answer hides the problem.
And here's my reply, which I'll divide into two parts: Points of Agreement, and Points of Disagreement.
I flew to Denver, saw some people, went to my session and gave my talk, and flew back. The talk was fun, and in preparing it I had some general thoughts on presentations:
- You don't have to try to impress the audience; just explain what you did. (Hal Stern gave me that advice 20 years ago, and it's still good.)
- When writing articles, I always tell people not to include anything that you don't want people to read. For example, don't display a table full of numbers if you're not expecting to convey some information with each number. Anyway, when preparing my talk, I realized that I hadn't been following my own advice! I went back and looked at each slide and removed lots of material that I couldn't really expect people to be looking at.
- You can't optimize to every audience. In my talk, I chose to make the big picture clear, but that meant less detail on our data and our models. Sometimes I've seen the advice to start broad and then "drill down" to some interesting detail, but in practice you still have to make some choices. It's ok to give a detailed, technical talk, but then you have to accept that people won't be getting the big picture. If it's going to be technical, get into it right away so you'll have time to explain things.
- Plan to end 5 minutes early. Put extra stuff you need at the end of the presentation (after the slide you'll end with), then you can use it to answer questions if need be.
After the talk, I rode to the airport in a cab with a statistician who said his dad is a political scientist. Who? Steven Rhoads. That's the guy who wrote "The Economist's View of the World"? Yeah. Wow--I love that book. And then on the flight back, the lady sitting next to me took a look at Red State, Blue State, and said she was going to buy a copy for her son, who's an economics student. That was really cool--she'll either buy the book, which is great, or she was just being polite, which isn't so bad either.
I've seen Jennifer Hill and Ed George give great talks on Bayesian additive regression trees. It looked awesome. So why haven't these papers appeared anywhere? All I can find are preprints.
See here and here for his comments and here for my further thoughts.

See here:
We humans seem to be born with a number line in our head. But a May 30 study in Science suggests it may look less like an evenly segmented ruler and more like a logarithmic slide rule on which the distance between two numbers represents their ratio (when divided) rather than their difference (when subtracted).
This is consistent with our analysis in chapter 5 of our book of decisions of Bangladeshis about whether to switch wells because of arsenic in drinking water. Among households with dangerous wells (arsenic content higher than 50 (in some units)), we predicted whether a household switches wells, given two predictors:
- distance to the nearest safe well;
- arsenic level of their existing well.
The data were consistent with the model that people weight "distance to nearest safe well" linearly but weight "arsenic level" on the log scale. As we discuss in our book, this makes psychological sense: distance is something you perceive directly and linearly, by walking (it takes twice as much time and effort to walk 200m as to walk 100m), whereas arsenic level is just a number and, as such, going from 50 to 100 seems about the same, psychologically, as going from 100 to 200 or 200 to 400--even though, in reality, that last jump is four times as bad as the first (arsenic being a cumulative poison).
You can see a copy of my new book, Rich State, Poor State, Red State, Blue State: Why Americans Vote the Way They Do, at the CRC Press booth. (It's not actually published by CRC but they kindly agreed to bring one copy so people could look at it.)
I'll only be at the Joint Statistical Meetings for a couple of hours. My talk is on Wed at 3pm. (The session goes from 2-4pm.).
See here for a brief description of what we did, or see here for the full paper, where we say:
Could John Kerry have gained votes in the 2004 Presidential election by more clearly distinguishing himself from George Bush on economic policy? At first thought, the logic of political preferences would suggest not: the Republicans are to the right of most Americans on economic policy, and so in a one-dimensional space with party positions measured with no error, the optimal strategy for the Democrats would be to stand infinitesimally to the left of the Republicans. The median voter theorem suggests that each party should keep its policy positions just barely distinguishable from the opposition.In a multidimensional setting, however, or when voters vary in their perceptions of the parties’ positions, a party can benefit from putting some daylight between itself and the other party on an issue where it has a public opinion advantage (such as economic policy for the Democrats). We set up a plausible theoretical model in which the Democrats could achieve a net gain in votes by moving to the left on economic policy, given the parties’ positions on a range of issue dimensions. We then evaluate this model based on survey data on voters’ perceptions of their own positions and those of the candidates in 2004.
Under our model, it turns out to be optimal for the Democrats to move slightly to the right but staying clearly to the left of the Republicans’ current position on economic issues.
The material is also in chapter 9 of the Red State, Blue State book.
Tyler Cowen links to a news article about David Galenson, an economist who is "convinced that the type of economic analysis that explains the $4-plus gas at the pump can also explain the greatest artists of the last 100 or so years." I assume that this line about gas prices is just something that the reporter added: at least, the factors that explain gas prices seem much different than the factors that explain great art.
The article continues to say that his "statistical approach . . . is based in part on how frequently an illustration of a work appears in textbooks." That sounds cool to me. I'd also like to see some cross-time analysis, since it seems to me that an analysis of textbooks would also be measuring what's currently trendy in art history. The article says that he analyzes 33 textbooks published between 1990 and 2005; I don't know if that's long enough to get enough variation in trendiness. But he should give it a try and not just lump all the years together.
Galenson then says, "Quantification has been almost totally absent from art history. Art historians hate markets." Whoa! How did he jump from "quantification" to "markets"? It sounds like he's limiting himself it he doesn't also apply quantitative methods to non-market situations.
Continuing, Galenson writes, "Important artists are innovators whose work changes the practices of their successors. The greater the changes, the greater the artist." Who says this sort of thing? Is this a way that art historians talk? It sounds like circular reasoning: it's his personal definition of "greatness."
The article then quotes art professor Michael Rushton as saying that in science or art, he said, "innovation really requires a market." Huh? Wha?? Tell that to my friend Seth, who spent 10 years self-experimentation. Heck, tell that to the cave painters. Or check out the American Visionary Art Museum.
It's so frustrating: I think much can be learned from quantitative study of just about everything, but why do people have to overreach and say such silly things?
P.S. Galenson's work on the trajectories of artists' work by age looks interesting. I'm reminded of Dick De Veaux's statement, "Math is like music, statistics is like literature": Why are there no six year old novelists? Statistics, like literature, benefits from some life experience.
There was a dynamic discussion on gender differences in performance a few days ago. Many interesting points were raised, but most of them regarded differences in models (variance, mean), rather than differences in distributions.
One of the comments referred to the Project TALENT database from 1960. It's one of the most exhaustive datasets of its type.
I have been unhappy for quite some time because papers do not show the actual data. For that reason I wrote a small plotting program that allows visual comparisons of histograms. The plentiful TALENT data makes it possible to avoid binning or kernel smoothing. Here are some plots:


The pink histogram is for girls, the blue one for boys, and where the pink and blue overlap, there is grey.
It is interesting to observe the skew, which might indicate incentives, learning curves or unbalanced tests. One of the most striking examples of skew is the difference in reading comprehension between Catholic/Protestant and Jewish populace, but I also list mechanical reasoning:


Project TALENT's data is from 1960, so things might have changed since then. Nowell & Hedges discuss some trends from 1960-1994.
In the end, let me reiterate that this posting does not make any statements about the causality of these differences - I am merely providing the data as such. The only assumptions were that the missing values can be dropped (boys were overrepresented in this respect) and that both underlying populations are comparable (no systematic effects with respect to extraneous biases such as age).
I did NOT observe boys being overrepresented on the low end of the spectrum for mathematics scores - but this could easily happen if one isn't careful throwing out the missing values coded with "-1" (5.4% among boys, 4.4% among girls).
I read an interesting op-ed by Jennifer Finney Boylan about classification of Olympic athletes as male or female. Apparently, they're now checking the sex of athletes based on physical appearance and blood samples. This should be an improvement over the simple chromosome test which can label a woman as a man because she has a Y chromosome, even if she is developmentally and physically female. But then Boylan writes:
Most efforts to rigidly quantify the sexes are bound to fail. For every supposedly unmovable gender marker, there is an exception. There are women with androgen insensitivity, who have Y chromosomes. There are women who have had hysterectomies, women who cannot become pregnant, women who hate makeup, women whose object of affection is other women.
I'm starting to lose the thread here. Nobody is talking about excluding from Olympic competition women who have had hysterctomies or cannot become pregnant, right? And lesbians are allowed to compete too, no? And makeup might be required for Miss America competition but not for athletes. Boylan continues:
So what makes someone female then? . . . The only dependable test for gender is the truth of a person’s life . . . The best judge of a person’s gender is what lies within her, or his, heart.
Would this really work? This just seems like a recipe for cheating, for Olympic teams in authoritarian countries to take some of their outstanding-but-not-quite-Olympic-champion caliber male athletes and tell them to live like women. It doesn't seem so fair to female athletes from the U.S., for example, to have to compete with any guy in the world who happens to be willing to say, for the purposes of the competition, that, in his heart, he feels like a woman.
Why do I mention this in a statistics blog?
I think people are often uncomfortable with ambiguity. Boylan correctly notes that sex tests can have problems and that there is no perfect rule, but then she jumps to the recommendation that there be no rules at all.
WSJ reports that people are more likely to provide socially-acceptable answers to survey questions about themselves when interviewed by a person (or even an avatar!) than when responding to an automated survey system or a recording. Such questions relate to politics, hygiene, exercise, health, and so on.
The research is helping refine polling at a university phone center nearby. Activity at the center, which sits in a former school building, picks up around dinnertime when the staff makes calls for university-run surveys from a warren of cubicles. The questioners are asked to speak in even tones, reading from scripts. No one is allowed to say, "How are you?" in case the person on the other end had a bad day. The interviewers don't laugh; they don't want people to treat this as a social call. They are allowed only neutral responses such as "I see" or "Hmm."

There are some interesting demonstrations at Harvard's Implicit project.
Kenneth Burman writes:
Some modern, computer intensive, data analysis methods may look to the non-statistician (or even our selves) like the equivalent of the notorious Rube Goldberg device for accomplishing a intrinsically simple task. Whereas some variants of the bootstrap or cross validation might fit this situation, mostly the risk this humiliation is to be found in MCMC-based Bayesian methods. I [Bruman] am not at all against such methods. I am only wondering if, to “outsiders” (who may already a negative impression of statistics and statisticians), these methods may appear like a Rube Goldberg device. You have parameters, likelihoods, hierarchies of fixed effects, random effects, hyper-parameters, then Markov chain Monte Carlo with tuning and burn followed by long “chains” of random variables, with possible thinning for lag-correlations, concerns about convergence to an ergodic state. And after all that, newly “armed” now with a “sample” of 100,000 (or more) numbers from a mysterious posterior probability distribution you proceed to analyze these new “data” (where did the real data go? – now you have more numbers than you started with for actual data) by more methods, simple (a mean) or complex (smoothing using kernel density methods, and then pull off the mode). All OK to a suitably trained statistician, but might we be in for ridicule and misunderstanding from the public? If such a charge were leveled at us (“you guys are doing Rube Goldberg statistics”) how would we respond, given the “complaint” comes from people with little or no statistics training? Of course, such folks may not be capable of generating such a critique, but could still realize they have no idea what the statistician is doing to the data to get answers. It does us no good if the public thinks our methods are Rube Goldberg in nature.
Recent Comments