Results matching “R”

Hipmunk < Expedia, again

This time on a NY-Cincinnati roundtrip. Hipmunk could find the individual flights but could not put them together. In contrast, Expedia got it right the first time.

See here and here for background. If anybody reading this knows David Pogue, please let him know about this. A flashy interface is fine, but ultimately what I'm looking for is a flight at the right place and the right time.

Jimmy pointed me to this blog by Drew Conway on word clouds. I don't have much to say about Conway's specifics--word clouds aren't really my thing, but I'm glad that people are thinking about how to do them better--but I did notice one phrase of his that I'll dispute. Conway writes

The best data visualizations should stand on their own . . .

I disagree. I prefer the saying, "A picture plus 1000 words is better than two pictures or 2000 words." That is, I see a positive interaction between words and pictures or, to put it another way, diminishing returns for words or pictures on their own. I don't have any big theory for this, but I think, when expressed as a joint value function, my idea makes sense. Also, I live this suggestion in my own work. I typically accompany my graphs with long captions and I try to accompany my words with pictures (although I'm not doing it here, because with the software I use, it's much easier to type more words than to find, scale, and insert images).

Desecration of valuable real estate

Malecki asks:

Is this the worst infographic ever to appear in NYT? USA Today is not something to aspire to.

To connect to some of our recent themes, I agree this is a pretty horrible data display. But it's not bad as a series of images. Considering the competition to be a cartoon or series of photos, these images aren't so bad.

One issue, I think, is that designers get credit for creativity and originality (unusual color combinations! Histogram bars shaped like mosques!) , which is often the opposite of what we want in a clear graph. It's Martin Amis vs. George Orwell all over again.

A departmental wiki page?

I was recently struggling with the Columbia University philophy department's webpage (to see who might be interested in this stuff). The faculty webpage was horrible: it's just a list of names and links with no information on research interests. So I did some searching on the web and found a wonderful wikipedia page which had exactly what I wanted.

Then I checked my own department's page, and it's even worse than what they have in philosophy! (We also have this page, which is even worse in that it omits many of our faculty and has a bunch of ridiculously technical links for some of the faculty who are included.)

I don't know about the philosophy department, but the statistics department's webpage is an overengineered mess, designed from the outset to look pretty rather than to be easily updated. Maybe we could replace it entirely with a wiki?

In the meantime, if anybody feels like setting up a wikipedia entry for the research of Columbia's statistics faculty, that would be great. As it is, I think it would be difficult for outsiders who don't know us to have any idea of what we do here!

P.S. The political science department's faculty listing is useless as well. We need a wiki for that one too!

P.P.S. The physics department's wikipage is pretty useless for a potential student's purposes, though--lots on history but nothing much on what the faculty are doing now.

Get the Data

At GetTheData, you can ask and answer data related questions. Here's a preview:getthedata.png

I'm not sure a Q&A site is the best way to do this.

My pipe dream is to create a taxonomy of variables and instances, and collect spreadsheets annotated this way. Imagine doing a search of type: "give me datasets, where an instance is a person, the variables are age, gender and weight" - and out would come datasets, each one tagged with the descriptions of the variables that were held constant for the whole dataset (person_type=student, location=Columbia, time_of_study=1/1/2009, study_type=longitudinal). It would even be possible to automatically convert one variable into another, if it was necessary (like age = time_of_measurement-time_of_birth). Maybe the dream of Semantic Web will actually be implemented for relatively structured statistical data rather than much fuzzier "knowledge", just consider the difficulties of developing a universal Freebase. Wolfram|Alpha is perhaps currently closest effort to this idea (consider comparing banana consumption between different countries), but I'm not sure how I can upload my own data or do more complicated data queries - also, for some simple variables (like weight), the results are not very useful.

I've talked about data tools before, as well as about Q&A sites.

Software request

I was posting a couple of comments on Kaiser's blog, and I'm finding the CAPTCHA's there increasingly hard to read. I was thinking that some good software would be helpful.

Is there a browser attachment I could download that would automatically read CAPTCHA's and translate them into letters and numbers? This would be very helpful. I'd appreciate any tips in this direction.

Calibration in chess

Has anybody done this study yet? I'm curious about the results. Perhaps there's some chess-playing cognitive psychologist who'd like to collaborate on this?

English-to-English translation

It's not just for Chaucer (or Mad Max) anymore. Peter Frase writes:

It's a shame that we neglect to re-translate older works into English merely because they were originally written in English. Languages change, and our reactions to words and formulations change. This is obvious when you read something like Chaucer, but it's true to a more subtle degree of more recent writings. There is a pretty good chance that something written in the 19th century won't mean the same thing to us that it meant to its contemporary readers. Thus it would make sense to re-translate Huckleberry Finn into modern language, in the same way we periodically get new translations of Homer or Dante or Thomas Mann. This is a point that applies equally well to non-fiction and social theory: in some ways, English-speaking sociologists are lucky that our canonical trio of classical theorists-Marx, Weber, and Durkheim-all wrote in another language. The most recent translation of Capital is eminently more readable than the older ones-and I know I could have used a modern English translation of Talcott Parsons when I was studying contemporary theory.

Now, one might respond to this by saying that writing loses much in translation, and that some things just aren't the same unless you read them in the original un-translated form. And that's probably true. But it would still be good to establish the "English-to-English translation" as a legitimate category . . .

Good point. I'm hoping someone will translate Bayesian Data Analysis into English. Half the time, I can't be sure what they're trying to say.

Dennis the dentist, debunked?

Devah Pager points me to this article by Uri Simonsohn, which begins:

Three articles published [by Brett Pelham et al.] have shown that a disproportionate share of people choose spouses, places to live, and occupations with names similar to their own. These findings, interpreted as evidence of implicit egotism, are included in most modern social psychology textbooks and many university courses. The current article successfully replicates the original findings but shows that they are most likely caused by a combination of cohort, geographic, and ethnic confounds as well as reverse causality.

From Simonsohn's article, here's a handy summary of the claims and the evidence (click on it to enlarge):

simonsohn1.png

The Pelham et al. articles have come up several times on the blog, starting with this discussion and this estimate and then more recently here. I'm curious what Pelham and his collaborators think of Simonsohn's claims.

Mike Grosskopf writes:

Tyler Cowen discusses his and Bryan Caplan's reaction to that notorious book by Amy Chua, the Yale law professor who boasts of screaming at her children, calling them "garbage," not letting them go to the bathroom when they were studying piano, etc. Caplan thinks Chua is deluded (in the sense of not being aware of research showing minimal effects of parenting on children's intelligence and personality), foolish (in writing a book and making recommendations without trying to lean about the abundant research on child-rearing), and cruel. Cowen takes a middle view in that he doesn't subscribe to Chua's parenting strategies but he does think that his friends' kids will do well (and partly because of his friends' parenting styles, not just from their genes).

Do you view yourself as special?

I have a somewhat different take on the matter, an idea that's been stewing in my mind for awhile, ever since I heard about the Wall Street Journal article that started this all. My story is that attitudes toward parenting are to some extent derived from attitudes about one's own experiences as a child.

Evaluating predictions of political events

Mike Cohen writes:

Statistician cracks Toronto lottery

Christian points me to this amusing story by Jonah Lehrer about Mohan Srivastava, (perhaps the same person as R. Mohan Srivastava, coauthor of a book called Applied Geostatistics) who discovered a flaw in a scratch-off game in which he could figure out which tickets were likely to win based on partial information visible on the ticket. It appears that scratch-off lotteries elsewhere have similar flaws in their design.

The obvious question is, why doesn't the lottery create the patterns on the tickets (including which "teaser" numbers to reveal) completely at random? It shouldn't be hard to design this so that zero information is supplied from the outside. in which case Srivastava's trick would be impossible.

So why not put down the numbers randomly? Lehrer quotes Srivastava as saying:

The tickets are clearly mass-produced, which means there must be some computer program that lays down the numbers. Of course, it would be really nice if the computer could just spit out random digits. But that's not possible, since the lottery corporation needs to control the number of winning tickets. The game can't be truly random. Instead, it has to generate the illusion of randomness while actually being carefully determined.

I'd phrase this slightly differently. We're talking about $3 payoffs here, so, no, the corporation does not need to control the number of winning tickets. What they do need to control is the probability of a win, but that can be done using a completely random algorithm.

From reading the article, I think the real reason the winning tickets could be predicted is that the lottery tickets were designed to be misleadingly appealing. Lehrer writes:

Instead of just scratching off the latex and immediately discovering a loser, players have to spend time matching up the revealed numbers with the boards. Ticket designers fill the cards with near-misses (two-in-a-row matchups instead of the necessary three) and players spend tantalizing seconds looking for their win. No wonder players get hooked.

"Ticket designers fill the cards with near-misses . . .": This doesn't sound like they're just slapping down random numbers. Instead, the system seems to be rigged in the fashion of old-time carnival games in order to manipulate one's intuition that the probability of near-misses should be informative about the underlying probability of hits. (See here for some general discussion of the use of precursors to estimate the probability of extremely rare events.)

In this sense, the story is slightly more interesting than "Lottery designers made a mistake." The mistake they made is directly connected to the manipulations they make in order to sucker people into spend more money.

P.S. Lehrer writes that Srivastava does consulting. This news story should get him all the business he needs for awhile!

Andrew has pointed to Jonathan Livengood's analysis of the correlation between poverty and PISA results, whereby schools with poorer students get poorer test results. I'd have written a comment, but then I couldn't have inserted a chart.

Andrew points out that a causal analysis is needed. This reminds me of an intervention that has been done before: take a child out of poverty, and bring him up in a better-off family. What's going to happen? There have been several studies examining correlations between adoptive and biological parents' IQ (assuming IQ is a test analogous to the math and verbal tests, and that parent IQ is analogous to the quality of instruction - but the point is in the analysis not in the metric). This is the result (from Adoption Strategies by Robin P Corley in Encyclopedia of Life Sciences):

adoptive-birth.png

So, while it did make a difference at an early age, with increasing age of the adopted child, the intelligence of adoptive parents might not be making any difference whatsoever in the long run. At the same time, the high IQ parents could have been raising their own child, and it would probably take the same amount of resources.

There are conscientious people who might not choose to have a child because they wouldn't be able to afford to provide to their own standard (their apartment is too small, for example, or they don't have enough security and stability while being a graduate student). On the other hand, people with less comprehension might neglect this and impose their child on society without the means to provide for him. Is it good for society to ask the first group to pay taxes, and reallocate the funds to the second group? I don't know, but it's a very important question.

I am no expert, especially not in psychology, education, sociology or biology. Moreover, there is a lot more than just IQ: ethics and constructive pro-social behavior are probably more important, and might be explained a lot better by nurture than nature.

I do know that I get anxious whenever a correlation analysis tries to look like a causal analysis. A frequent scenario introduces an outcome (test performance) with a highly correlated predictor (say poverty), and suggests that reducing poverty will improve the outcome. The problem is that poverty is correlated with a number of other predictors. A solution I have found is to understand that multiple predictors information about the outcome overlaps - a tool I use is interaction analysis, whereby we explicate that two predictors' information overlaps (in contrast to regression coefficients which misleadingly separate the contributions of each predictors). But the real solution is a study of interventions, and the twin and adoptive studies with a longer time horizon are pretty rigorous. I'd be curious about similarly rigorous studies of educational interventions, or about the flaws in the twin and adoptive studies.

[Feb 7, 8:30am] An email points out a potential flaw in the correlation analysis:


The thing which these people systematically missed, was that we don't really care at all about the correlation between the adopted child's IQ and that of the adopted parent. The right measure of effect is to look at the difference in IQ level.

Example to drive home the point: Suppose the IQ of every adoptive parent is 120, while the IQ of the biological parents is Normal(100,15), as is that of the biological control siblings is, but that of the adopted children is Normal(110,15). The correlation between adopted children and adopted parents would be exactly zero (because the adopted parents are all so similar), but clearly adoption would have had a massive effect. And, yes, adopted parents, especially in these studies, are very different from the norm, and similar to each other: I don't know about the Colorado study, but in the famous Minnesota twins study, the mean IQ of the adoptive fathers was indeed 120, as compared to a state average of 105.

The review paper you link to is, so far as I can tell, completely silent about these obvious-seeming points.

I would add that correlations are going to be especially misleading for causal inference in any situation where a variable is being regulated towards some goal level, because, if the regulation is successful. It's like arguing that the temperature in my kitchen is causally irrelevant to the temperature in my freezer --- it's uncorrelated, but only because a lot of complicated machinery does a lot of work to keep it that way! With that thought in mind, read this.


Indeed, the model based on correlation doesn't capture the improvement in the average IQ of what the adoptive child would have if brought up in an orphanage or by unwilling or incapable biological parents (as arguably all children put up for adoption are) vs being brought up in a well-functioning family (as probably all adoptive families are). And comments like these are precisely why we should discuss these topics systematically, so that better models can be developed and studied! As a European I am regularly surprised how politicized this topic seems to be in the US. It's an important question that needs more rigor.

Thanks for the emails and comments, they're the main reason why I still write these blog posts.

Bidding for the kickoff

Steven Brams and James Jorash propose a system for reducing the advantage that comes from winning the coin flip in overtime:

Dispensing with a coin toss, the teams would bid on where the ball is kicked from by the kicking team. In the NFL, it's now the 30-yard line. Under Brams and Jorasch's rule, the kicking team would be the team that bids the lower number, because it is willing to put itself at a disadvantage by kicking from farther back. However, it would not kick from the number it bids, but from the average of the two bids.

To illustrate, assume team A bids to kick from the 38-yard line, while team B bids its 32-yard line. Team B would win the bidding and, therefore, be designated as the kick-off team. But B wouldn't kick from 32, but instead from the average of 38 and 32--its 35-yard line.

This is better for B by 3 yards than the 32-yard line that it proposed, because it's closer to the end zone it is kicking towards. It's also better for A by 3 yards to have B kick from the 35-yard line, rather than from the 38-yard line, it proposed if it were the kick-off team.

In other words, the 35-yard line is a win-win solution--both teams gain a 3-yard advantage over what they reported would make them indifferent between kicking and receiving. While bidding to determine the yard line from which a ball is kicked has been proposed before, the win-win feature of using the average of the bids--and recognizing that both teams benefit if the low bidder is the kicking team--has not. Teams seeking to merely get the ball first would be discouraged from bidding too high--for example, the 45-yard line--as this could result in a kick-off pinning them far back in their own territory.

"Metaphorically speaking, the bidding system levels the playing field," Brams and Jorasch maintain. "It also enhances the importance of the strategic choices that the teams make, rather than leaving to chance which team gets a boost in the overtime period."

This seems like a good idea. Also fun for the fans--another way to second-guess the coach.

Education and Poverty

Jonathan Livengood writes:

There has been some discussion about the recent PISA results (in which the U.S. comes out pretty badly), for example here and here. The claim being made is that the poor U.S. scores are due to rampant individual- or family-level poverty in the U.S. They claim that when one controls for poverty, the U.S. comes out on top in the PISA standings, and then they infer that poverty causes poor test scores. The further inference is then that the U.S. could improve education by the "simple" action of reducing poverty. Anyway, I was wondering what you thought about their analysis.

My reply: I agree this is interesting and I agree it's hard to know exactly what to say about these comparisons. When I'm stuck in this sort of question, I ask, WWJD? In this case, I think Jennifer would ask what are the potential interventions being considered. Various ideas for changing the school system would perhaps have different effects on different groups of students. I think that would a useful way to focus discussion, to consider the effects of possible reforms in the U.S. and elsewhere. See here and here, for example.

P.S. Livengood has some graphs and discussion here.

Call for book proposals

Rob Calver writes:

In the spirit of Gapminder, Washington Post created an interactive scatterplot viewer that's using alpha channel to tell apart overlapping fat dots better than sorting-by-circle-size Gapminder is using:

fat.png

Good news: the rate of fattening of the USA appears to be slowing down. Maybe because of high gas prices? But what's happening with Oceania?

Handy Matrix Cheat Sheet, with Gradients

This post is an (unpaid) advertisement for the following extremely useful resource:

  • Petersen, K. B. and M. S. Pedersen. 2008. The Matrix Cookbook. Tehcnical Report, Technical University of Denmark.

It contains 70+ pages of useful relations and derivations involving matrices. What grabbed my eye was the computation of gradients for matrix operations ranging from eigenvalues and determinants to multivariate normal density functions. I had no idea the multivariate normal had such a clean gradient (see section 8).

An addition to the model-makers' oath

Yesterday Aleks posted a proposal for a model makers' Hippocratic Oath. I'd like to add two more items:

1. From Mark Palko: "Our model only describes the data we used to build it; if you go outside of that range, you do so at your own risk."

2. In case you like to think of your methods as nonparametric or non-model-based: "Our method, just like any model, relies on assumptions which we have the duty to state and to check."

(Observant readers will see that I use "we" rather than "I" in these two items. Modeling is an inherently collaborative endeavor.

Patterns

Pete Gries writes:

I [Gries] am not sure if what you are suggesting by "doing data analysis in a patternless way" is a pitch for deductive over inductive approaches as a solution to the problem of reporting and publication bias. If so, I may somewhat disagree. A constant quest to prove or disprove theory in a deductive manner is one of the primary causes of both reporting and publication bias. I'm actually becoming a proponent of a remarkably non-existent species - "applied political science" - because there is so much animosity in our discipline to inductive empirical statistical work that seeks to answer real world empirical questions rather than contribute to parsimonious theory building. Anyone want to start a JAPS - Journal of Applied Political Science? Our discipline is in danger of irrelevance.

My reply: By "doing data analysis in a patternless way," I meant statistical methods such as least squares, maximum likelihood, etc., that estimate parameters independently without recognizing the constraints and relationships between them. If you estimate each study on its own, without reference to all the other work being done in the same field, then you're depriving yourself of a lot of information and inviting noisy estimates and, in particular, overestimates of small effects.

Model Makers' Hippocratic Oath

Emanuel Derman and Paul Wilmott wonder how to get their fellow modelers to give up their fantasy of perfection. In a Business Week article they proposed, not entirely in jest, a model makers' Hippocratic Oath:

  • I will remember that I didn't make the world and that it doesn't satisfy my equations.
  • Though I will use models boldly to estimate value, I will not be overly impressed by mathematics.
  • I will never sacrifice reality for elegance without explaining why I have done so. Nor will I give the people who use my model false comfort about its accuracy. Instead, I will make explicit its assumptions and oversights.
  • I understand that my work may have enormous effects on society and the economy, many of them beyond my comprehension.

Found via Abductive Intelligence.

Peter Bergman writes:

is it possible to "overstratify" when assigning a treatment in a randomized control trial? I [Bergman] have a sample size of roughly 400 people, and several binary variables correlate strongly with the outcome of interest and would also define interesting subgroups for analysis. The problem is, stratifying over all of these (five or six) variables leaves me with strata that have only 1 person in them. I have done some background reading on whether there is a rule of thumb for the maximum number of variables to stratify. There does not seem to be much agreement (some say there should be between N/50-N/100 strata, others say as few as possible). In economics, the paper I looked to is here, which seems to summarize literature related to clinical trials. In short, my question is: is it bad to have several strata with 1 person in them? Should I group these people in with another stratum?

P.S. In the paper I mention above, they also say it is important to include stratum indicators in the regression analysis to ensure the appropriate sized type-I error in the final analysis (i.e. regress outcome on treatment & strata indicators). They demonstrate this through simulation, but is there a reference (or intuition) that shows why these indicators are important theoretically?

My reply: I doubt it matters so much exactly how you do this. If you want, there are techniques to ensure balance over many predictors. In balanced setups, you have ideas such as latin squares, and similar methods can be developed in unbalanced scenarios. It's ok to have strata with one person in them, but if you think people won't like it, then you should feel free to use larger strata.

In answer to your other question about references: Yes, it's standard advice to include all design information as regression predictors. We discuss this in chapter 7 of BDA, and I'm sure there's some non-Bayesian discussion out there too. I think Pearl discusses this in his Causality book as well. In any case, I don't give a damn about Type 1 error, but the idea is that the sorts of factors that you would be stratifying on are the sorts of things that can be correlated with your outcome, so if you don't adjust for them, any imbalance in the predictors will lead to bias in your estimated treatment effects.

P.S. Bergman actually wrote "dummies," but I couldn't bear to see that term so I changed it to "ïndicators."

Alex Tabarrok quotes Randall Morck and Bernard Yeung on difficulties with instrumental variables. This reminded me of some related things I've written.

In the official story the causal question comes first and then the clever researcher comes up with an IV. I suspect that often it's the other way around: you find a natural experiment and look at the consequences that flow from it. And maybe that's not such a bad thing. See section 4 of this article.

More generally, I think economists and political scientists are currently a bit overinvested in identification strategies. I agree with Heckman's point (as I understand it) that ultimately we should be building models that work for us rather than always thinking we can get causal inference on the cheap, as it were, by some trick or another. (This is a point I briefly discuss in a couple places here and also in my recent paper for the causality volume that Don Green etc are involved with.)

I recently had this discussion with someone else regarding regression discontinuity (the current flavor of the month; IV is soooo 90's), but I think the point holds more generally, that experiments and natural experiments are great when you have them, and they're great to aspire to and to focus one's thinking, but in practice these inferences are sometimes a bit of a stretch, and sometimes the appeal of an apparently clean identification strategy masks some serious difficulty mapping the identified parameter to underlying quantities of interest.

P.S. How I think about instrumental variables.

I saw this picture staring at me from the newsstand the other day:

TimeObamaReagan.jpg

Here's the accompanying article, by Michael Scherer and Michael Duffy, which echoes some of the points I made a few months ago, following the midterm election:

Why didn't Obama do a better job of leveling with the American people? In his first months in office, why didn't he anticipate the example of the incoming British government and warn people of economic blood, sweat, and tears? Why did his economic team release overly-optimistic graphs such as shown here? Wouldn't it have been better to have set low expectations and then exceed them, rather than the reverse?

I don't know, but here's my theory. When Obama came into office, I imagine one of his major goals was to avoid repeating the experiences of Bill Clinton and Jimmy Carter in their first two years.

Clinton, you may recall, was elected with less then 50% of the vote, was never given the respect of a "mandate" by congressional Republicans, wasted political capital on peripheral issues such as gays in the military, spent much of his first two years on centrist, "responsible" politics (budgetary reform and NAFTA) which didn't thrill his base, and then got rewarded with a smackdown on heath care and a Republican takeover of Congress. Clinton may have personally weathered the storm but he never had a chance to implement the liberal program.

Carter, of course, was the original Gloomy Gus, and his term saw the resurgence of the conservative movement in this country, with big tax revolts in 1978 and the Reagan landslide two years after that. It wasn't all economics, of course: there were also the Russians, Iran, and Jerry Falwell pitching in.

Following Plan Reagan

From a political (but not a policy) perspective, my impression was that Obama's model was not Bill Clinton or Jimmy Carter but Ronald Reagan. Like Obama in 2008, Reagan came into office in 1980 in a bad economy and inheriting a discredited foreign policy. The economy got steadily worse in the next two years, the opposition party gained seats in the midterm election, but Reagan weathered the storm and came out better than ever.

If the goal was to imitate Reagan, what might Obama have done?

- Stick with the optimism and leave the gloom-and-doom to the other party. Check.
- Stand fast in the face of a recession. Take the hit in the midterms with the goal of bouncing back in year 4. Check.
- Keep ideological purity. Maintain a contrast with the opposition party and pass whatever you can in Congress. Check.

The Democrats got hit harder in 2010 than the Republicans in 1982, but the Democrats had further to fall. Obama and his party in Congress can still hope to bounce back in two years.

Also recall that Reagan, like Roosevelt, was a statistician.

Matthew Yglesias links approvingly to the following statement by Michael Mandel:

Homeland Security accounts for roughly 90% of the increase in federal regulatory employment over the past ten years.

Roughly 90%, huh? That sounds pretty impressive. But wait a minute . . . what if total federal regulatory employment had increased a bit less. Then Homeland Security could've accounted for 105% of the increase, or 500% of the increase, or whatever. The point is the change in total employment is the sum of a bunch of pluses and minuses. It happens that, if you don't count Homeland Security, the total hasn't changed much--I'm assuming Mandel's numbers are correct here--and that could be interesting.

The "roughly 90%" figure is misleading because, when written as a percent of the total increase, it's natural to quickly envision it as a percentage that is bounded by 100%. There is a total increase in regulatory employment that the individual agencies sum to, but some margins are positive and some are negative. If the total happens to be near zero, then the individual pieces can appear to be large fractions of the total, even possibly over 100%.

I'm not saying that Mandel made any mistakes, just that, in general, ratios can be tricky when the denominator is the sum of positive and negative parts. In this particular case, the margins were large but not quite over 100%, which somehow gives the comparison more punch than it deserves, I think.

We discussed a mathematically identical case a few years ago involving the 2008 Democratic primary election campaign.

What should we call this?

There should be a name for this sort of statistical slip-up. The Fallacy of the Misplaced Denominator, perhaps? The funny thing is that the denominator has to be small (so that the numerator seems like a lot, "90%" or whatever) but not too small (because if the ratio is over 100%, the jig is up).

P.S. Mandel replies that, yes, he agrees with me in general about the problems of ratios where the denominator is a sum of positive and negative components, but that in this particular case, "all the major components of regulatory employment change are either positive or a very tiny negative." So it sounds like I was choosing a bad example to make my point!

Infovis vs. statistical graphics. Tues 1 Feb 2011 1pm, Avery Hall room 114. It's for the Lectures in Planning Series at the School of Architecture, Planning, and Preservation.

Background on the talk (joint with Antony Unwin) is here. And here are more of my thoughts on statistical graphics.

Mike McLaughlin writes:

Consider the Seeds example in vol. 1 of the BUGS examples. There, a binomial likelihood has a p parameter constructed, via logit, from two covariates. What I am wondering is: Would it be legitimate, in a binomial + logit problem like this, to allow binomial p[i] to be a function of the corresponding n[i] or would that amount to using the data in the prior? In other words, in the context of the Seeds example, is r[] the only data or is n[] data as well and therefore not permissible in a prior formulation?

I [McLaughlin] currently have a model with a common beta prior for all p[i] but would like to mitigate this commonality (a kind of James-Stein effect) when there are lots of observations for some i. But this seems to feed the data back into the prior. Does it really?

It also occurs to me [McLaughlin] that, perhaps, a binomial likelihood is not the one to use here (not flexible enough).

My reply:

Strictly speaking, "n" is data, and so what you want is a likelihood function p(y,n|theta), where theta represents all the parameters in the model. In a binomial-type example, it would make sense to factor the likelihood as p(y|n,theta)*p(n|theta). Or, to make this even clearer: p(y|n,theta_1)*p(n|theta_2), where theta_1 are the parameters of the binomial distribution (or whatever generalization you're using) and theta_2 are the parameters involving n. The vectors theta_1 and theta_2 can overlap. In any case, the next step is the prior distribution, p(theta_1,theta_2). Prior dependence between theta_1 and theta_2 induces a model of the form that you're talking about.

In practice, I think it can be reasonable to simplify a bit and write p(y|n,theta) and then use a prior of the form p(theta|n). We discuss this sort of thing in the first or second section of the regression chapter in BDA. Whether you treat n as data to be modeled or data to be conditioned on, either way you can put dependence with theta into the model.

New innovations in spam

I received the following (unsolicited) email today:

Hello Andrew,

I'm interested in whether you are accepting guest article submissions for your site Statistical Modeling, Causal Inference, and Social Science? I'm the owner of the recently created nonprofit site OnlineEngineeringDegree.org and am interested in writing / submitting an article for your consideration to be published on your site. Is that something you'd be willing to consider, and if so, what specs in terms of topics or length requirements would you be looking for?

Thanks you for your time, and if you have any questions or are interested, I'd appreciate you letting me know.

Sincerely,
Samantha Rhodes

Huh?

P.S. My vote for most obnoxious spam remains this one, which does its best to dilute whatever remains of the reputation of Wolfram Research. Or maybe that particular bit of spam was written by a particularly awesome cellular automaton that Wolfram discovered? I guess in the world of big-time software it's ok to lie if it might net you a bit of money.

Splitting the data

Antonio Rangel writes:

I'm a neuroscientist at Caltech . . . I'm using the debate on the ESP paper, as I'm sure other labs around the world are, as an opportunity to discuss some basic statistical issues/ideas w/ my lab.

Request: Is there any chance you would be willing to share your thoughts about the difference between exploratory "data mining" studies and confirmatory studies? What I have in mind is that one could use a dataset to explore/discover novel hypotheses and then conduct another experiment to test those hypotheses rigorously. It seems that a good combination of both approaches could be the best of both worlds, since the first would lead to novel hypothesis discovery, and the later to careful testing. . . it is a fundamental issue for neuroscience and psychology.

My reply:

I know that people talk about this sort of thing . . . but in any real setting, I think I'd want all my data right now to answer any questions I have. I like cross-validation and have used it with success, but I don't think I could bring myself to keep the split so rigorous as you describe. Once I have the second dataset, I'd form new hypotheses, etc.

Every once in awhile, the opportunity presents itself, though. We analyzed the 2000 and 2004 elections using the Annenberg polls. But when we were revising Red State Blue State to cover the 2008 election, the Annenberg data weren't available, so we went with Pew Research polls instead. (The Pew people are great--they post raw data on their website.) In the meantime, the 2008 Annenberg data have been released, so now we can check our results, once we get mrp all set up to do this.

Homework and treatment levels

Interesting discussion here by Mark Palko on the difficulty of comparing charter schools to regular schools, even if the slots in the charter schools have been assigned by lottery. Beyond the direct importance of the topic, I found the discussion interesting because I always face a challenge in my own teaching to assign the right amount of homework, given that if I assign too much, students will simply rebel and not do it.

To get back to the school-choice issue . . . Mark discussed selection effects: if a charter school is popular, it can require parents to sign a contract agreeing they will supervise their students to do lots of homework. Mark points out that there is a selection issue here, that the sort of parents who would sign that form are different from parents in general. But it seems to me there's one more twist: These charter schools are popular, right? So that would imply that there is some reservoir of parents who would like to sign the form but don't have the opportunity to do so in a regular school. So, even if the charter school is no more effective, conditional on the level of homework assigned, the spread of charter schools could increase the level of homework and thus be a good thing in general (assuming, of course, that you want your kid to do more homework). Or maybe I'm missing something here.

P.S. More here (from commenter ceolaf).

NYT shills for personal DNA tests

Kaiser nails it. The offending article, by John Tierney, somehow ended up in the Science section rather than the Opinion section. As an opinion piece (or, for that matter, a blog), Tierney's article would be nothing special. But I agree with Kaiser that it doesn't work as a newspaper article. As Kaiser notes, this story involves a bunch of statistical and empirical claims that are not well resolved by P.R. and rhetoric.

During our discussion of estimates of teacher performance, Steve Sailer wrote:

I suspect we're going to take years to work the kinks out of overall rating systems.

By way of analogy, Bill James kicked off the modern era of baseball statistics analysis around 1975. But he stuck to doing smaller scale analyses and avoided trying to build one giant overall model for rating players. In contrast, other analysts such as Pete Palmer rushed into building overall ranking systems, such as his 1984 book, but they tended to generate curious results such as the greatness of Roy Smalley Jr.. James held off until 1999 before unveiling his win share model for overall rankings.

I remember looking at Pete Palmer's book many years ago and being disappointed that he did everything through his Linear Weights formula. A hit is worth X, a walk is worth Y, etc. Some of this is good--it's presumably an improvement on counting walks as 0 or 1 hits, also an improvement on counting doubles and triples as equal to 2 and 3 hits, and so forth. The problem--besides the inherent inflexibility of a linear model with no interactions--is that Palmer seemed chained to it. When the model gave silly results, Palmer just kept with it. I don't do that with my statistical models. When I get a surprising result, I look more carefully. And if it really is a mistake of some sort, I go and change the model (see, for example, the discussion here). Now this is a bit unfair: after all, Palmer's a sportswriter and I'm a professional statistician--it's my job to check my models.

Still and all, my impression is that Palmer was locked into his regression models and that it hurt his sportswriting. Bill James had a comment once about some analysis of Palmer that gave players negative values in the declining years of their careers. As James wrote, your first assumption is that when a team keeps a player on their roster, they have a good reason. (I'm excepting Jim Rice from this analysis. Whenever he came up to bat with men on base, it was always a relief to see him strike out, as that meant that he'd avoided hitting into a double play.)

Bill James did not limit himself to linear models. He often used expressions of the form (A+B)/(C+D) or sqrt(A^2+B^2). This gave him more flexibility to fit data and also allowed him more entries into the modeling process: more ways to include prior information than simply to throw in variables.

What about my own work? I use linear regression a lot, to the extent that a couple of my colleagues once characterized my work on toxicology as being linear modeling. True, these were two of my stupider colleagues (and that's saying a lot), but the fact that a couple of Ph.D.'s could confuse a nonlinear differential equation with a linear regression does give some sense of statisticians' insensitivity to functional forms. we tend to focus on what variables go into the model without much concern for how they fit together. True, sometimes we use nonparametric methods--lowess and the like--but it's not so common that we do a Bill James and carefully construct a reasonable model out of its input variables.

But maybe I should be emulating Bill James in this way. Right now, I get around the constraints of linearity and additivity by adding interaction after interaction after interaction. That's fine, but perhaps a bit of thoughtful model construction would be a useful supplement to my usual brute-force approach.

P.S. Actually, I think that James himself could've benefited from the discipline of quantitative models. I don't know about Roy Smalley,Jr., but, near the end of the Baseball Abstract period, my impression was that James started to mix in more and more unsupported opinions, for example in 1988 characterizing Phil Bradley as possibly the best player in baseball. That's fine--I'm no baseball expert, and maybe Phil Bradley really was one of the top players of 1987, or maybe he's a really nice guy and Bill James wanted to help him out, or maybe James was just kidding on that one.. My guess (based on a lot of things in the last couple of Baseball Abstracts, not just that Phil Bradley article) is simply that James had been right on so many things where others had been wrong that he started to trust his hunches without backing them up with statistical analysis. Whatever. In any case, Win Shares was probably a good idea for Bill James as it kept him close to the numbers.

Lies, Damn Lies...that's pretty much it.

This post is by Phil Price.

We're all used to distortions and misleading statements in political discourse -- the use of these methods one thing on which politicians are fairly nonpartisan. But I think it's rare to see an outright lie, especially about a really major issue. We had a doozy yesterday, when Congresswoman Michelle Bachmann presented a graphic that attributed the 2009 federal budget to the Obama administration. Oddly, most of the other facts and figures she presented were correct, although some of them seem calculatedly misleading. If you're going to lie about something really big, why not just lie about everything?

Joan Nix writes:

Your comments on this paper by Scott Carrell and James West would be most appreciated. I'm afraid the conclusions of this paper are too strong given the data set and other plausible explanations. But given where it is published, this paper is receiving and will continue to receive lots of attention. It will be used to draw deeper conclusions regarding effective teaching and experience.

Nix also links to this discussion by Jeff Ely.

I don't completely follow Ely's criticism, which seems to me to be too clever by half, but I agree with Nix that the findings in the research article don't seem to fit together very well. For example, Carrell and West estimate that the effects of instructors on performance in the follow-on class is as large as the effects on the class they're teaching. This seems hard to believe, and it seems central enough to their story that I don't know what to think about everything else in the paper.

My other thought about teaching evaluations is from my personal experience. When I feel I've taught well--that is, in semesters when it seems that students have really learned something--I tend to get good evaluations. When I don't think I've taught well, my evaluations aren't so good. And, even when I think my course has gone wonderfully, my evaluations are usually far from perfect. This has been helpful information for me.

That said, I'd prefer to have objective measures of my teaching effectiveness. Perhaps surprisingly, statisticians aren't so good about measurement and estimation when applied to their own teaching. (I think I've blogged on this on occasion.) The trouble is that measurement and evaluation take work! When we're giving advice to scientists, we're always yammering on about experimentation and measurement. But in our own professional lives, we pretty much throw all our statistical principles out the window.

P.S. What's this paper doing in the Journal of Political Economy? It has little or anything to do with politics or economics!

P.P.S. I continued to be stunned by the way in which tables of numbers are presented in social science research papers with no thought of communication with, for example, tables with interval estimate such as "(.0159, .0408)." (What were all those digits for? And what do these numbers have to do with anything at all?). If the words, sentences, and paragraphs of an article were put together in such a stylized, unthinking way, the article would be completely unreadable. Formal structures with almost no connection to communication or content . . . it would be like writing the entire research article in iambic pentameter with an a,b,c,b rhyme scheme, or somesuch. I'm not trying to pick on Carrell and West here--this sort of presentation is nearly universal in social science journals.

Andrew Gelman (Columbia University) and Eric Johnson (Columbia University) seek to hire a post-doctoral fellow to work on the application of the latest methods of multilevel data analysis, visualization and regression modeling to an important commercial problem: forecasting retail sales at the individual item level. These forecasts are used to make ordering, pricing and promotions decisions which can have significant economic impact to the retail chain such that even modest improvements in the accuracy of predictions, across a large retailer's product line, can yield substantial margin improvements.

Activities focus on the development of iterative imputation algorithms and diagnostics for missing-data imputation. Activities would include model-development, programming, and data analysis. This project is to be undertaken with, and largely funded by, a firm which provides forecasting technology and services to large retail chains, and which will provide access to a unique and rich set of proprietary data. The postdoc will be expected to spend some time working directly with this firm, but this is fundamentally a research position.

The ideal candidate will have a background in statistics, psychometrics, or economics and be interested in marketing or related topics. He or she should be able to work fluently in R and should already know about hierarchical models and Bayesian inference and computation.

The successful candidate will become part of the lively Applied Statistics Center community, which includes several postdocs (with varied backgrounds in statistics, computer science, and social science), Ph.D., M.A., and undergraduate students, and faculty at Columbia and elsewhere. We want people who love collaboration and have the imagination, drive, and technical skills to make a difference in our projects.

If you are interested in this position, please send a letter of application, a CV, some of your articles, and three letters of recommendation to the Applied Statistics Center coordinator, Caroline Peters, cp2530@columbia.edu. Review of applications will begin immediately.

Andrew Gelman (Columbia University) and Jennifer Hill (New York University) seek to hire a post-doctoral fellow to work on development of iterative imputation algorithms and diagnostics for missing-data imputation. Activities would include model-development, programming, and data analysis. This project is funded by a grant from the Institute of Education Sciences. Other collaborators on the project include Jingchen Liu and Ben Goodrich. This is a two-year position that would start this summer (2011) or earlier if possible.

The ideal candidate will have a statistics or computer science background, will be interested in statistical modeling, serious programming, and applications. He or she should be able to work fluently in R and should already know about hierarchical models and Bayesian inference and computation. Experience or interest in designing GUIs would be a bonus, since we are attempting to improve our GUI for missing data imputation.

The successful candidate will become part of the lively Applied Statistics Center community, which includes several postdocs (with varied backgrounds in statistics, computer science, and social science), Ph.D., M.A., and undergraduate students, and faculty at Columbia and elsewhere. We want people who love collaboration and have the imagination, drive, and technical skills to make a difference in our projects.

If you are interested in this position, please send a letter of application, a CV, some of your articles, and three letters of recommendation to the Applied Statistics Center coordinator, Caroline Peters, cp2530@columbia.edu. Review of applications will begin immediately.

We need help picking out an automatic differentiation package for Hamiltonian Monte Carlo sampling from the posterior of a generalized linear model with deep interactions. Specifically, we need to compute gradients for log probability functions with thousands of parameters that involve matrix (determinants, eigenvalues, inverses), stats (distributions), and math (log gamma) functions. Any suggestions?

Bayes at the end

John Cook noticed something:

I [Cook] was looking at the preface of an old statistics book and read this:
The Bayesian techniques occur at the end of each chapter; therefore they can be omitted if time does not permit their inclusion.

This approach is typical. Many textbooks present frequentist statistics with a little Bayesian statistics at the end of each section or at the end of the book.

There are a couple ways to look at that. One is simply that Bayesian methods are optional. They must not be that important or they'd get more space. The author even recommends dropping them if pressed for time.

Another way to look at this is that Bayesian statistics must be simpler than frequentist statistics since the Bayesian approach to each task requires fewer pages.

My reaction:

Classical statistics is all about summarizing the data.

Bayesian statistics is data + prior information.

On those grounds alone, Bayes is more complicated, and it makes sense to do classical statistics first. Not necessarily p-values etc., but estimates, standard errors, and confidence intervals for sure.

Trends in partisanship by state

Matthew Yglesias discusses how West Virginia used to be a Democratic state but is now solidly Republican. I thought it would be helpful to expand this to look at trends since 1948 (rather than just 1988) and all 50 states (rather than just one). This would represent a bit of work, except that I already did it a couple years ago, so here it is (right-click on the image to see the whole thing):

My Wall Street Journal story

I was talking with someone the other day about the book by that Yale law professor who called her kids "garbage" and didn't let them go to the bathroom when they were studying piano . . . apparently it wasn't so bad as all that, she was misrepresented by the Wall Street Journal excerpt:

"I was very surprised," she says. "The Journal basically strung together the most controversial sections of the book. And I had no idea they'd put that kind of a title on it. . . . "And while it's ultimately my responsibility -- my strict Chinese mom told me 'never blame other people for your problems!' -- the one-sided nature of the excerpt has really led to some major misconceptions about what the book says, and about what I really believe."

I don't completely follow her reasoning here: just because, many years ago, her mother told her a slogan about not blaming other people, therefore she can say, "it's ultimately my responsibility"? You can see the illogic of this by flipping it around. What if her mother had told her that nothing is really your fault, everything you do is a product of what came before you, etc.? Then would she be able to say that that WSJ article is not her responsibility?

But I digress.

What I really want to say here is that I find completely plausible the claim that the Wall Street Journal sensationalized her book. I say this based on an experience I had last year.

The scalarization of America

Mark Palko writes:

You lose information when you go from a vector to a scalar.

But what about this trick, which they told me about in high school? Combine two dimensions into one by interleaving the decimals. For example, if a=.11111 and b=.22222, then (a,b) = .1212121212.

Third-party Dream Ticket

Who are the only major politicians who are viewed more positively than negatively by the American public?

ImageChef.com - Custom comment codes for MySpace, Hi5, Friendster and more

(See page 3 of this report.)

MS-Bayes?

I received the following email:

Did you know that it looks like Microsoft is entering the modeling game? I mean, outside of Excel. I recently received an email at work from a MS research contractor looking for ppl that program in R, SAS, Matlab, Excel, and Mathematica. . . . So far I [the person who sent me this email] haven't seen anything about applying any actual models. Only stuff about assigning variables, deleting rows, merging tables, etc. I don't know how common knowledge this all is within the statistical community. I did a quick google search for the name of the programming language and didn't come up with anything.

That sounds cool. Working with anything from Microsoft sounds pretty horrible, but it would be useful to have another modeling language out there, just for checking our answers if nothing else.

Elevator shame is a two-way street

Tyler Cowen links a blog by Samuel Arbesman mocking people who are so lazy that they take the elevator from 1 to 2. This reminds me of my own annoyance about a guy who worked in my building and did not take the elevator. (For the full story, go here and search on "elevator.")

Sharon Otterman reports:

When report card grades were released in the fall for the city's 455 high schools, the highest score went to a small school in a down-and-out section of the Bronx . . . A stunning 94 percent of its seniors graduated, more than 30 points above the citywide average. . . . "When I interviewed for the school," said Sam Buchbinder, a history teacher, "it was made very clear: this is a school that doesn't believe in anyone failing."

That statement was not just an exhortation to excellence. It was school policy.

By order of the principal, codified in the school's teacher handbook, all teachers should grade their classes in the same way: 30 percent of students should earn a grade in the A range, 40 percent B's, 25 percent C's, and no more than 5 percent D's. As long as they show up, they should not fail.

Hey, that sounds like Harvard and Columbia^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H various selective northeastern colleges I've known. Of course, we^H^H^H they give a lot more than 30% A's!

P.S. In all seriousness, it does appear from the report that the school has problems.

Seeing as the Freakonomics people were kind enough to link to my list of five recommended books, I'll return the favor and comment on a remark from Levitt, who said:

Cars vs. trucks

Anupam Agrawal writes:

I am an Assistant Professor of Operations Management at the University of Illinois. . . . My main work is in supply chain area, and empirical in nature. . . . I am working with a firm that has two separate divisions - one making cars, and the other makes trucks. Four years back, the firm made an interesting organizational change. They created a separate group of ~25 engineers, in their car division (from within their quality and production engineers). This group was focused on improving supplier quality and reported to car plant head . The truck division did not (and still does not) have such an independent "supplier improvement group". Other than this unit in car, the organizational arrangements in the two divisions mimic each other. There are many common suppliers to the car and truck division.

Data on quality of components coming from suppliers has been collected (for the last four years). The organizational change happened in January 2007.

My focus is to see whether organizational change (and a different organizational structure) drives improvements.

This post is by Phil Price.

An Oregon legislator, Mitch Greenlick, has proposed to make it illegal in Oregon to carry a child under six years old on one's bike (including in a child seat) or in a bike trailer. The guy says ""We've just done a study showing that 30 percent of riders biking to work at least three days a week have some sort of crash that leads to an injury... When that's going on out there, what happens when you have a four year old on the back of a bike?" The study is from Oregon Health Sciences University, at which the legislator is a professor.

Greenlick also says ""If it's true that it's unsafe, we have an obligation to protect people. If I thought a law would save one child's life, I would step in and do it. Wouldn't you?"

There are two statistical issues here. The first is in the category of "lies, damn lies, and statistics," and involves the statement about how many riders have injuries. As quoted on a blog, the author of the study in question says that, when it comes to what is characterized as an injury, "It could just be skinning your knee or spraining your ankle, but it couldn't just be a near miss." By this standard, lots of other things one might do with one's child -- such as playing with her, for instance -- might be even more likely to cause injury.

Substantial numbers of people have been taking their children on bikes for quite a while now, so although it may be impossible to get accurate numbers for the number of hours or miles ridden, there should be enough data on fatalities and severe injuries to get a semi-quantitative idea of how dangerous it is to take a child on a bike or in a bike trailer. And when I say "dangerous" I mean, you know, actually dangerous.

The second problem with Greenlick's approach is that it seems predicated on the idea that, in his words, "If I thought a law would save one child's life, I would step in and do it. Wouldn't you?" Well, no, and in fact that is just a ridiculous principle to apply. Any reasonable person should be in favor of saving children's lives, but not at all cost. We could make it illegal to allow children to climb trees, to eat peanuts, to cross the street without holding an adult's hand...perhaps they shouldn't be allowed to ride in cars. Where would it end?

Finally, a non-statistical note: another state rep has commented regarding this bill, saying that "this is the way the process often works: a legislator gets an idea, drafts a bill, introduces it, gets feedback, and then decides whether to try to proceed, perhaps with amendments, or whether to let it die." If true, this is a really wasteful and inefficient system. Better would be "a legislator gets an idea, does a little research to see if it makes sense, introduces it,..." Introducing it before seeing if it makes sense is probably a lot easier in the short run, but it means a lot of administrative hassle in introducing the bills, and it makes people waste time and effort trying to kill or modify ill-conceived bills.

Thiel update

A year or so ago I discussed the reasoning of zillionaire financier Peter Thiel, who seems to believe his own hype and, worse, seems to be able to convince reporters of his infallibility as well. Apparently he "possesses a preternatural ability to spot patterns that others miss."

More recently, Felix Salmon commented on Thiel's financial misadventures:

Peter Thiel's hedge fund, Clarium Capital, ain't doing so well. Its assets under management are down 90% from their peak, and total returns from the high point are -65%. Thiel is smart, successful, rich, well-connected, and on top of all that his calls have actually been right . . . None of that, clearly, was enough for Clarium to make money on its trades: the fund was undone by volatility and weakness in risk management.

There are a few lessons to learn here.

Firstly, just because someone is a Silicon Valley gazillionaire, or any kind of successful entrepreneur for that matter, doesn't mean they should be trusted with other people's money.

Secondly, being smart is a great way of getting in to a lot of trouble as an investor. In order to make money in the markets, you need a weird combination of arrogance and insecurity. Arrogance on its own is fatal, but it's also endemic to people in Silicon Valley who are convinced that they're rich because they're smart, and that since they're still smart, they can and will therefore get richer still. . . .

Just to be clear, I'm not saying that Thiel losing money is evidence that he's some sort of dummy. (Recall my own unsuccess as an investor.) What I am saying is, don't believe the hype.

Bill Harris writes:

I've read your paper and presentation showing why you don't usually worry about multiple comparisons. I see how that applies when you are comparing results across multiple settings (states, etc.).

Does the same principle hold when you are exploring data to find interesting relationships? For example, you have some data, and you're trying a series of models to see which gives you the most useful insight. Do you try your models on a subset of the data so you have another subset for confirmatory analysis later, or do you simply throw all the data against your models?

My reply: I'd like to estimate all the relationships at once and use a multilevel model to do partial pooling to handle the mutiplicity issues. That said, in practice, in my applied work I'm always bouncing back and forth between different hypotheses and different datasets, and often I learn a lot when next year's data come in and I can modify my hypotheses. The trouble with the classical hypothesis-testing framework, at least for me, is that so-called statistical hypotheses are very precise things, whereas the sorts of hypotheses that arise in science and social science are vaguer and are not so amenable to "testing" in the classical sense.

Spam is out of control

I just took a look at the spam folder . . . 600 messages in the past hour! Seems pretty ridiculous to me.

Problems with Haiti elections?

Mark Weisbrot points me to this report trashing a recent OAS report on Haiti's elections. Weisbrot writes:

The two simplest things that are wrong with the OAS analysis are: (1) By looking only at a sample of the tally sheets and not using any statistical test, they have no idea how many other tally sheets would also be thrown out by the same criteria that they used, and how that would change the result and (2) The missing/quarantined tally sheets are much greater in number than the ones that they threw out; our analysis indicates that if these votes had been counted, the result would go the other way.

I have not had a chance to take a look at this myself but I'm posting it here so that experts on election irregularities can see this and give their judgments.

P.S. Weisbrot updates:

R Advertised

The R language is definitely going mainstream:

IMG_0113.JPG

Mark Lilla recalls some recent Barack Obama quotes and then writes:

If this is the way the president and his party think about human psychology, it's little wonder they've taken such a beating.

In the spirit of that old line, "That and $4.95 will get you a tall latte," let me agree with Lilla and attribute the Democrats' losses in 2010 to the following three factors:

1. A poor understanding of human psychology;

2. The Democrats holding unified control of the presidency and congress with a large majority in both houses (factors that are historically associated with big midterm losses); and

3. A terrible economy.

I will let you, the readers, make your best guesses as to the relative importance of factors 1, 2, and 3 above.

Don't get me wrong: I think psychology is important, as is the history of ideas (the main subject of Lilla's article), and I'd hope that Obama (and also his colleagues in both parties in congress) can become better acquainted with psychology, motivation, and the history of these ideas. I just think it's stretching things to bring in the election as some sort of outcome of the Democrats' understanding of political marketing.

Later on, Lilla writes of "the Tea Party's ire, directed at Democrats and Republicans alike . . . " Huh? The Tea Party activists are conservative Republicans. Are there any Democrats that the Tea Party participants like? Zell Miller, maybe?

Lilla concludes with an inspiring story of Muhammed Ali coming to Harvard and delivering a two-line poem, at which point, in Lilla's words, "The students would have followed him anywhere." He seems to attribute this to Ali's passion ("In our politics, history doesn't happen when a leader makes an argument, or even strikes a pose. It happens when he strikes a chord. And you don't need charts and figures to do that; in fact they get in the way. You only need two words."), but is that really right? Ali is a culture hero for many reasons, and my guess is the students would've followed him anywhere--even if he'd given them charts and figures. Actually, then maybe they'd have had more of an idea of where he was leading them!

It says in the article linked above that Lilla is a professor at Columbia, and, looking him up, I see that he won an award from the American Political Science Association. So I'm a bit surprised to see him write some of the things he writes above, about the Tea Party and attributing the 2010 election to a lack of understanding of psychology. (I assume the Muhammed Ali story is just poetic license.) Probably I'm missing something here, maybe I can ask him directly at some point.

After reading all the comments here I remembered that I've actually written a paper on the generalized method of moments--including the bit about maximum likelihood being a special case. The basic idea is simple enough that it must have been rediscovered dozens of times by different people (sort of like the trapezoidal rule).

In our case, we were motivated to (independently) develop the (well-known, but not by me) generalized method of moments as a way of specifying an indirectly-parameterized prior distribution, rather than as a way of estimating parameters from direct data. But the math is the same.

Jas sends along this paper (with Devin Caughey), entitled Regression-Discontinuity Designs and Popular Elections: Implications of Pro-Incumbent Bias in Close U.S. House Races, and writes:

The paper shows that regression discontinuity does not work for US House elections. Close House elections are anything but random. It isn't election recounts or something like that (we collect recount data to show that it isn't). We have collected much new data to try to hunt down what is going on (e.g., campaign finance data, CQ pre-election forecasts, correct many errors in the Lee dataset). The substantive implications are interesting. We also have a section that compares in details Gelman and King versus the Lee estimand and estimator.

I had a few comments:

Bayes in China update

Some clarification on the Bayes-in-China issue raised last week:

1. We heard that the Chinese publisher cited the following pages that might contain politically objectionable materials: 3, 5, 21, 73, 112, 201.

2. It appears that, as some commenters suggested, the objection was to some of the applications, not to the Bayesian methods.

3. Our book is not censored in China. In fact, as some commenters mentioned, it is possible to buy it there, and it is also available in university libraries there. The edition of the book which was canceled was intended to be a low-cost reprint of the book. The original book is still available. I used the phrase "Banned in China" as a joke and I apologize if it was misinterpreted.

4. I have no quarrel with the Chinese government or with any Chinese publishers. They can publish whatever books they would like. I found this episode amusing only because I do not think my book on regression and multilevel models has any strong political content. I suspect the publisher was being unnecessarily sensitive to potentially objectionable material, but this is their call. I thought this was an interesting story (which is why I posted the original email on the blog) but I did not, and do not, intend it as any sort of comment on the Chinese government, Chinese society, etc. China is a big country and this is one person at one publisher making one decision. That's all it is; it's not a statement about China in general.

I did not write the above out of any fear of legal action etc. I just think it's important to be fair and clear, and it is possible that some of what I wrote could have been misinterpreted in translation. If anyone has further questions on this, feel free to ask in the comments and I will clarify as best as I can.

Columbia College has for many years had a Core Curriculum, in which students read classics such as Plato (in translation) etc. A few years ago they created a Science core course. There was always some confusion about this idea: On one hand, how much would college freshmen really learn about science by reading the classic writings of Galileo, Laplace, Darwin, Einstein, etc.? And they certainly wouldn't get much out by puzzling over the latest issues of Nature, Cell, and Physical Review Letters. On the other hand, what's the point of having them read Dawkins, Gould, or even Brian Greene? These sorts of popularizations give you a sense of modern science (even to the extent of conveying some of the debates in these fields), but reading them might not give the same intellectual engagement that you'd get from wrestling with the Bible or Shakespeare.

I have a different idea. What about structuring the entire course around computer programming and simulation? Start with a few weeks teaching the students some programming language that can do simulation and graphics. (R is a little clunky and Matlab is not open-source. Maybe Python?)

After the warm-up, students can program simulations each week:
- Physics: simulation of bouncing billiard balls, atomic decay, etc.
- Chemistry: simulation of chemical reactions, cool graphs of the concentrations of different chemicals over time as the reaction proceeds
- Biology: evolution and natural selection
And so forth.

There could be lecture material connecting these simulations with relevant scientific models. This could be great!

This post is by Phil Price.

A reporter once told me that the worst-kept secret of journalism is that every story has errors. And it's true that just about every time I know about something first-hand, the news stories about it have some mistakes. Reporters aren't subject-matter experts, they have limited time, and they generally can't keep revisiting the things they are saying and checking them for accuracy. Many of us have published papers with errors -- my most recent paper has an incorrect figure -- and that's after working on them carefully for weeks!

One way that reporters can try to get things right is by quoting experts. Even then, there are problems with taking quotes out of context, or with making poor choices about what material to include or exclude, or, of course, with making a poor selection of experts.

Yesterday, I was interviewed by an NPR reporter about the risks of breathing radon (a naturally occurring radioactive gas): who should test for it, how dangerous is it, etc. I'm a reasonable person to talk to about this, having done my post-doc and several subsequent years of research in this area, although that ended about ten years ago. Andrew and I, and other colleagues, published several papers, including a decision analysis paper that encompasses most of what I think I know about radon risk in the U.S. In this case, the reporter had a good understanding of the fact that the risk is very small at low concentrations; that the risk per unit exposure is thought to be much higher for smokers than for non-smokers; and that the published estimates of radon deaths are based on the unrealistic comparison to people being exposed to no radon at all. He had a much more sophisticated understanding than most reporters, and perhaps more than some radon researchers! So I hope the piece will come out OK. But I gave him a lot of "on the one hand..., on the other hand..." material, so if he quotes selectively he could make me look extreme in either direction. Not that I think he will, I think he'll do a good job.

The piece will be on NPR's Morning Edition tomorrow (Friday), and available on their archives afterwards.

The Road to a B

A student in my intro class came by the other day with a lot of questions. It soon became clear that he was confused about a lot of things, going back several weeks in the course. What this means is that we did not do a good job of monitoring his performance earlier during the semester. But the question now is: what do do next? I'll sign the drop form any time during the semester, but he didn't want to drop the class (the usual scheduling issues). And he doesn't want to get a C or a D. He's in big trouble and at this point is basically rolling the dice that he'll do well enough on the final to eke out a B in the course. (Yes, he goes to section meetings and office hours, and he even tried hiring a tutor. But it's tough--if you've already been going to class and still don't know what's going on, it's not so easy to pull yourself out of the hole, even if you have a big pile of practice problems ahead of you.)

What we really need for this student, and others like him, is a road to a B: a plan by which a student can get by and attain partial mastery of the material. The way that this, and other courses, is set up, is that if you do everything right you get an A, and you get a B if you make some mistakes and miss some things along the way. That's ok, but if you're really stuck, you want some sort of plan that will take you forward. And the existing plan (try lots of practice problems) won't cut it. What you need is some sort of triage, so you can nail down topics one at a time and do what it takes to get that B. And that's not something we have now. I think it needs to be a formal part of the course, in some way.

"Tied for Warmest Year On Record"

The National Climatic Data Center has tentatively announced that 2010 is, get this, "tied" for warmest on record. Presumably they mean it's tied to the precision that they quote (1.12 F above the 20th-century average). The uncertainty in the measurements, as well as some fuzziness about exactly what is being measured (how much of the atmosphere, and the oceans) makes these global-average things really suspect. For instance, if there's more oceanic turnover one year, that can warm the deep ocean but cool the shallow ocean and atmosphere, so even though the heat content of the atmosphere-ocean system goes up, some of these "global-average" estimates can go down. The reverse can happen too. And of course there are various sources of natural variability that are not, these days, what most people are most interested in. So everybody who knows about the climate professes to hate the emphasis on climate records. And yet, they're irresistible. I'm sure we'll see the usual clamor of some people touting this, while other people claim it's due to biased measurements, or that it's true but has nothing to do with anthropogenic carbon dioxide, or whatever. Sigh.

Also noteworthy is how incredibly poor the National Climatic Data Center website is. Fuzzy graphic and logos, jarring colors...looks like it was designed in about 1998..and clicking on "What's New" does not find the press release that they just issued today, about 2010 being tied for warmest year on record. Perhaps they should hire a 12-year-old to spend an hour making at least a better front page.

Chapter 1

On Sunday we were over on 125 St so I stopped by the Jamaican beef patties place but they were closed. Jesus Taco was next door so I went there instead. What a mistake! I don't know what Masanao and Yu-Sung could've been thinking. Anyway, then I had Jamaican beef patties on the brain so I went by Monday afternoon and asked for 9: 3 spicy beef, 3 mild beef (for the kids), and 3 chicken (not the jerk chicken; Bob got those the other day and they didn't impress me). I'm about to pay and then a bunch of people come in and start ordering. The woman behind the counter asks if I'm in a hurry, I ask why, she whispers, For the same price you can get a dozen. So I get two more spicy beef and a chicken. She whispers that I shouldn't tell anyone. I can't really figure out why I'm getting this special treatment. So I walk out of there with 12 patties. Total cost: $17.25. It's a good deal: they're small but not that small. Sure, I ate 6 of them, but I was hungry.

Chapter 2

A half hour later, I'm pulling keys out of my pocket lock up my bike and a bunch of change falls out. (Remember--the patties cost $17.25, so I had three quarters in my pocket, plus whatever happened to be there already.) I see all three quarters plus a couple of pennies. The change is on the street, and, as I'm leaning down to pick it up, I notice there's a parked car, right in front of me, with its engine running. There's no way the driver can see me if I'm bending down behind the rear wheels. And if he backs up, I'm dead meat.

It suddenly comes to me--this is what they mean when they talk about "picking pennies in front of a steamroller." That's exactly what I was about to do!

After a brief moment of indecision, I bent down and picked up the quarters. I left the pennies where they were, though.

P.S. The last time I experienced an economics cliche in real time was a few weeks ago, when I spotted $5 in cash on the street.

Benedict Carey writes a follow-up article on ESP studies and Bayesian statistics. (See here for my previous thoughts on the topic.) Everything Carey writes is fine, and he even uses an example I recommended:

The statistical approach that has dominated the social sciences for almost a century is called significance testing. The idea is straightforward. A finding from any well-designed study -- say, a correlation between a personality trait and the risk of depression -- is considered "significant" if its probability of occurring by chance is less than 5 percent.

This arbitrary cutoff makes sense when the effect being studied is a large one -- for example, when measuring the so-called Stroop effect. This effect predicts that naming the color of a word is faster and more accurate when the word and color match ("red" in red letters) than when they do not ("red" in blue letters), and is very strong in almost everyone.

"But if the true effect of what you are measuring is small," said Andrew Gelman, a professor of statistics and political science at Columbia University, "then by necessity anything you discover is going to be an overestimate" of that effect.

The above description of classical hypothesis testing isn't bad. Strictly speaking, one would follow "is less than 5 percent" above with "if the null hypothesis of zero effect were actually true," but they have serious space limitations, and I doubt many readers would get much out of that elaboration, so I'm happy with what Carey put there.

One subtlety that he didn't quite catch was the way that researchers mix the Neyman-Pearson and Fisher approaches to inference. The 5% cutoff (associated with Neyman and Pearson) is indeed standard, and it is indeed subject to all the problems we know about, most simply that statistical significance occurs at least 5% of the time, so if you do a lot of experiments you're gonna have a lot of chances to find statistical significance. But p-values are also used as a measure of evidence: that's Fisher's approach and it leads to its own problems (as discussed in the news article as well).

The other problem, which is not so well known, comes up in my quote: when you're studying small effects and you use statistical significance as a filter and don't do any partial pooling, whatever you have that's left standing that survives the filtering process will overestimate the true effect. And classical corrections for "multiple comparisons" do not solve the problem: they merely create a more rigorous statistical significance filter, but anything that survives that filter will be even more of an overestimate.

If classical hypothesis testing is so horrible, how is it that it could be so popular? In particular, what was going on when a well-respected researcher like this ESP guy would use inappropriate statistical methods.

My answer to Carey was to give a sort of sociological story, which went as follows.

Psychologists have experience studying large effects, the sort of study in which data from 24 participants is enough to estimate a main effect and 50 will be enough to estimate interactions of interest. I gave the example of the Stroop effect (they have a nice one of those on display right now at the Natural History Museum) as an example of a large effect where classical statistics will do just fine.

My point was, if you've gone your whole career studying large effects with methods that work, then it's natural to think you have great methods. You might not realize that your methods, which appear quite general, actually fall apart when applied to small effects. Such as ESP or human sex ratios.

The ESP dude was a victim of his own success: His past accomplishments studying large effects gave him an unwarranted feeling of confidence that his methods would work on small effects.

This sort of thing comes up a lot, and in my recent discussion of Efron's article, I list it as my second meta-principle of statistics, the "methodological attribution problem," which is that people think that methods that work in one sort of problem will work in others.

The other thing that Carey didn't have the space to include was that Bayes is not just about estimating the weight of evidence in favor of a hypothesis. The other key part of Bayesian inference--the more important part, I'd argue--is "shrinkage" or "partial pooling," in which estimates get pooled toward zero (or, more generally, toward their estimates based on external information).

Shrinkage is key, because if all you use is a statistical significance filter--or even a Bayes factor filter--when all is said and done, you'll still be left with overestimates. Whatever filter you use--whatever rule you use to decide whether something is worth publishing--I still want to see some modeling and shrinkage (or, at least, some retrospective power analysis) to handle the overestimation problem. This is something Martin and I discussed in our discussion of the "voodoo correlations" paper of Vul et al.

Should the paper have been published in a top psychology journal?

Real-life psychology researcher Tal Yarkoni adds some good thoughts but then he makes the ridiculous (to me) juxtaposition of the following two claims: (1) The ESP study didn't find anything real, there's no such thing as ESP, and the study suffered many methodological flaws, and (2) The journal was right to publish the paper.

If you start with (1), I don't see how you get to (2). I mean, sure, Yarkoni gives his reasons (basically, the claim that the ESP paper, while somewhat crappy, is no crappier than most papers that are published in top psychology journals), but I don't buy it. If the effect is there, why not have them demonstrated it for real? I mean, how hard would it be for the experimenters to gather more data, do some sifting, find out which subjects are good at ESP, etc. There's no rush, right? No need to publish preliminary, barely-statistically-significant findings. I don't see what's wrong with the journal asking for better evidence. It's not like a study of the democratic or capitalistic peace, where you have a fixed amount of data and you have to learn what you can. In experimental psychology, once you have the experiment set up, it's practically free to gather more data.

P.S. One thing that saddens me is that, instead of using the sex-ratio example (which I think would've been perfect for this article, Carey uses the following completely fake example:

Consider the following experiment. Suppose there was reason to believe that a coin was slightly weighted toward heads. In a test, the coin comes up heads 527 times out of 1,000.

And they he goes on two write about coin flipping. But, as I showed in my article with Deb, there is no such thing as a coin weighted to have a probability p (different from 1/2) of heads.

OK, I know about fake examples. I'm writing an intro textbook, and I know that fake examples can be great. But not this one!

P.P.S. I'm also disappointed he didn't use the famous dead-fish example, where Bennett, Baird, Miller, and Wolferd found statistically significant correlations in an MRI of a dead salmon. The correlations were not only statistically significant, they were large and newsworthy!

P.P.P.S. The Times does this weird thing with its articles where it puts auto-links on Duke University, Columbia University, and the University of Missouri. I find this a bit distracting and unprofessional.

I received the following in email from our publisher:

I write with regards to the project to publish a China Edition of your book "Data Analysis Using Regression and Multilevel/Hierarchical Models" (ISBN-13: 9780521686891) for the mainland Chinese market. I regret to inform you that we have been notified by our partner in China, Posts & Telecommunications Press (PTP), that due to various politically sensitive materials in the text, the China Edition has not met with the approval of the publishing authorities in China, and as such PTP will not be able to proceed with the publication of this edition. We will therefore have to cancel plans for the China Edition of your book. Please accept my apologies for this unforeseen development. If you have any queries regarding this, do feel free to let me know.

Oooh, it makes me feel so . . . subversive. It reminds me how, in Sunday school, they told us that if we were ever visiting Russia, we should smuggle Bibles in our luggage because the people there weren't allowed to worship.

Xiao-Li Meng told me once that in China they didn't teach Bayesian statistics because the idea of a prior distribution was contrary to Communism (since the "prior" represented the overthrown traditions, I suppose).

And then there's this.

I think that the next printing of our book should have "Banned in China" slapped on the cover. That should be good for sales, right?

P.S. Update here.

Chartjunk, but in a good cause!

From Dan Goldstein:

agb3.png

Pretty good, but really the pie chart should be three-dimensional, shown at an angle, and with one or two of the slices popping out.

P.S. They seemed to have placed a link for the Bill James Historical Baseball Abstract. That book's ok, but what I was really recommending were his Abstracts from 1982-1986, which are something else entirely.

The other day I posted some evidence that, however things used to be, congressional elections are increasingly nationalized, and it's time to retire Tip O'Neill's slogan, "all politics is local." (The discussion started with a remark by O.G. blogger Mickey Kaus; I also explain why I disagree with Jonathan Bernstein's disagreement with me.)

Alan Abramowitz writes in with an analysis of National Election Study from a recent paper of his:

Average Correlations of House and Senate Votes with Presidential Job Evaluations by Decade

Decade House.Vote Senate.Vote
1972-1980 .31 .28
1982-1990 .39 .42
1992-2000 .43 .50
2002-2008 .51 .57

This indeed seems like strong evidence of nationalization, consistent with other things we've seen. I also like Abramowitz's secret-weapon-style analysis, breaking the data up by decade rather than throwing all the data in at once and trying to estimate a trend.

A colleague recently sent me a copy of some articles on the estimation of treatment interactions (a topic that's interested me for awhile). One of the articles, which appeared in the Lancet in 2000, was called "Subgroup analysis and other (mis)uses of baseline data in clinical trials," by Susan F. Assmann, Stuart J. Pocock, Laura E. Enos, and Linda E. Kasten. . . .

Hey, wait a minute--I know Susan Assmann! Well, I sort of know her. When I was a freshman in college, I asked my adviser, who was an applied math prof, if I could do some research. He connected me to Susan, who was one of his Ph.D. students, and she gave me a tiny part of her thesis to work on.

The problem went as follows. You have a function f(x), for x going from 0 to infinity, that is defined as follows. Between 0 and 1, f(x)=x. Then, for x higher than 1, f'(x) = f(x) - f(x-1). The goal is to figure out what f(x) does. I think I'm getting this right here, but I might be getting confused on some of the details. The original form of the problem had some sort of probability interpretation, I think--something to do with a one-dimensional packing problem, maybe f(x) was the expected number of objects that would fit in an interval of size x, if the objects were drawn from a uniform distribution. Probably not that, but maybe something of that sort.

One of the fun things about attacking this sort of problem as a freshman is that I knew nothing about the literature on this sort of problem or even what it was called (a differential-difference equation, or it can also be formulated using as an integral). Nor was I set up to do any simulations on the computer. I just solved the problem from scratch. First I figured out the function in the range [1,2], [2,3], and so forth, then I made a graph (pencil on graph paper) and conjectured the asymptotic behavior of f. The next step was to prove my conjecture. It ate at me. I worked on the problem on and off for about eleven months, then one day I finally did it: I had carefully proved the behavior of my function! This accomplishment gave me a warm feeling for years after.

I never actually told Susan Assmann about this--I think that by then she had graduated, and I never found out whether she figured out the problem herself as part of her Ph.D. thesis or whether it was never really needed in the first place. And I can't remember if I told my adviser. (He was a funny guy: extremely friendly to everyone, including his freshman advisees, but one time we were in his office when he took a phone call. He was super-friendly during the call, then after the call was over he said, "What an asshole." After this I never knew whether to trust the guy. If he was that nice to some asshole on the phone, what did it mean that he was nice to us?) I switched advisers. the new adviser was much nicer--I knew him because I'd taken a class with him--but it didn't really matter since he was just another mathematician. I was lucky enough to stumble into statistics, but that's another story.

Anyway, it was funny to see that name--Susan Assmann! I did a quick web search and I'm pretty sure it is the same person. And her paper was cited 430 times--that's pretty impressive!

P.S. The actual paper by Assmann et al. is reasonable. It's a review of some statistical practice in medical research. They discuss the futility of subgroup analysis given that, compared to main effects, interactions are typically (a) smaller in magnitude and (b) estimated with larger standard errors. That's pretty much a recipe for disaster! (I made a similar argument in a 2001 article in Biostatistics, except that my article went in depth for one particular model and Assmann et al. were offering more general advice. And, unlike me, they had some data.) Ultimately I do think treatment interactions and subgroup analysis are important, but they should be estimated using multilevel models. If you try to estimate complex interactions using significance tests or classical interval estimation, you'll probably just be wasting your time, for reasons explained by Assmann et al.

John Talbott points me to this, which I briefly mocked a couple months ago. I largely agree with the critics of this research, but I want to reiterate my point from earlier that all the statistical sophistication in the world won't help you if you're studying a null effect. This is not to say that the actual effect is zero--who am I to say?--just that the comments about the high-quality statistics in the article don't say much to me.

There's lots of discussion of the lack of science underlying ESP claims. I can't offer anything useful on that account (not being a psychologist, I could imagine all sorts of stories about brain waves or whatever), but I would like to point out something that usually doesn't seem to get mentioned in these discussions, which is that lots of people want to believe in ESP. After all, it would be cool to read minds. (It wouldn't be so cool, maybe, if other people could read your mind and you couldn't read theirs, but I suspect most people don't think of it that way.) And ESP seems so plausible, in a wish-fulfilling sort of way. It really feels like if you concentrate really hard, you can read minds, or predict the future, or whatever. Heck, when I play squash I always feel that if I really really try hard, I should be able to win every point. The only thing that stops me from really believing this is that I realize that the same logic holds symmetrically for my opponent. But with ESP, absent a controlled study, it's easy to see evidence all around you supporting your wishful thinking. (See my quote in bold here.) Recall the experiments reported by Ellen Langer, that people would shake their dice more forcefully when trying to roll high numbers and would roll gently when going for low numbers.

When I was a little kid, it was pretty intuitive to believe that if I really tried, I could fly like Superman. There, of course, there was abundant evidence--many crashes in the backyard--that it wouldn't work. For something as vague as ESP, that sort of simple test isn't there. And ESP researchers know this--they use good statistics--but it doesn't remove the element of wishful thinking. And, as David Weakiem and I have discussed, classical statistical methods that work reasonably well when studying moderate or large effects (see the work of Fisher, Snedecor, Cochran, etc.) fall apart in the presence of small effects.

I think it's naive when people implicitly assume that the study's claims are correct, or the study's statistical methods are weak. Generally, the smaller the effects you're studying, the better the statistics you need. ESP is a field of small effects and so ESP researchers use high-quality statistics.

To put it another way: whatever methodological errors happen to be in the paper in question, probably occur in lots of researcher papers in "legitimate" psychology research. The difference is that when you're studying a large, robust phenomenon, little statistical errors won't be so damaging as in a study of a fragile, possibly zero effect.

In some ways, there's an analogy to the difficulties of using surveys to estimate small proportions, in which case misclassification errors can loom large, as discussed here.

Now to criticize the critics: some so-called Bayesian analysis that I don't really like

I agree with the critics of the ESP paper that Bayesian analysis is a good way to combine the results of this not-so-exciting new finding that people in the study got 53% correct instead of the expected 50% correct, with the long history of research in this area.

But I wouldn't use the Bayesian methods that these critics recommend. In particular, I think it's ludicrous for Wagenmakers et.al. to claim a prior probability of 10^-20 for ESP, and I also think that they're way off base when they start talking about "Bayesian t-tests" and point null hypotheses. I think a formulation based on measurement-error models would be far more useful. I'm very disturbed by purportedly Bayesian methods that start with meaningless priors which then yield posterior probabilities that, instead of being interpreted quantitatively, have to be converted to made-up categories such as "extreme evidence," "very strong evidence," "anecdotal evidence," and the like. This seems to me to be taking some of the most arbitrary aspects of classical statistics. Perhaps I should call this the "no true Bayesian" phenomenon.

And, if you know me at all (in a professional capacity), you'll know I hate statements like this:

Another advantage of the Bayesian test that it is consistent: as the number of participants grows large, the probability of discovering the true hypothesis approaches 1.

The "true hypothesis," huh? I have to go to bed now (no, I'm not going to bed at 9am; I set this blog up to post entries automatically every morning). If you happen to run into an experiment of interest in which psychologists are "discovering a true hypothesis," (in the statistical sense of a precise model), feel free to wake me up and tell me. It'll be newsworthy, that's for sure.

Anyway, the ESP thing is pretty silly and so there are lots of ways of shooting it down. I'm only picking on Wagenmakers et.al. because often we're full of uncertainty about more interesting problems For example, new educational strategies and their effects on different sorts of students. For these sorts of problems, I don't think that models of null effects, verbal characterizations of Bayes factors, and reassurances about "discovering the true hypothesis" are going to cut it. These methods are important, and I think that, even when criticizing silly studies, we should think carefully about what we're doing and what our methods are actually purporting to do.

I'll be on Radio 4 at 8.40am, on the BBC show "Today," talking about The Honest Rainmaker. I have no idea how the interview went (it was about 5 minutes), but I'm kicking myself because I was planning to tell the hookah story, but I forgot. Here it is:

I was at a panel for the National Institutes of Health evaluating grants. One of the proposals had to do with the study of the effect of water-pipe smoking, the hookah. There was a discussion around the table. The NIH is a United States government organisation; not many people in the US really smoke hookahs; so should we fund it? Someone said, 'Well actually it's becoming more popular among the young.' And if younger people smoke it, they have a longer lifetime exposure, and apparently there is some evidence that the dose you get of carcinogens from hookah smoking might be 20 times the dose of smoking a cigarette. I don't know the details of the math, but it was a lot. So even if not many people do it, if you multiply the risk, you get a lot of lung cancer.

Then someone at the table - and I couldn't believe this - said, 'My uncle smoked a hookah pipe all his life, and he lived until he was 90 years old.' And I had a sudden flash of insight, which was this. Suppose you have something that actually kills half the people. Even if you're a heavy smoker, your chance of dying of lung cancer is not 50 per cent, so therefore, even with something as extreme as smoking and lung cancer, you still have lots of cases where people don't die of the disease. The evidence is certainly all around you pointing in the wrong direction - if you're willing to accept anecdotal evidence - there's always going to be an unlimited amount of evidence which won't tell you anything. That's why the psychology is so fascinating, because even well-trained people make mistakes. It makes you realise that we need institutions that protect us from ourselves.

I think that last bit--"if you're willing to accept anecdotal evidence, there's always going to be an unlimited amount of evidence which won't tell you anything." Of course, what makes this story work so well is that it's backed up by a personal anecdote!

Damn. I was planning to tell his story but I forgot. Next time I do radio, I'm gonna bring an index card with my key point. Not my 5 key points, not my 3 key points, but my 1 key point. Actually, I'm gonna be on the radio (in Seattle) next Monday afternoon, so I'll have a chance to try this plan then.

Gayle Laackmann reports (link from Felix Salmon) that Microsoft, Google, etc. don't actually ask brain-teasers in their job interviews. The actually ask a lot of questions about programming. (I looked here and was relieved to see that the questions aren't very hard. I could probably get a job as an entry-level programmer if I needed to.)

Laackmann writes:

Let's look at the very widely circulated "15 Google Interview Questions that will make you feel stupid" list [here's the original list, I think, from Lewis Lin] . . . these questions are fake. Fake fake fake. How can you tell that they're fake? Because one of them is "Why are manhole covers round?" This is an infamous Microsoft interview question that has since been so very, very banned at both companies . I find it very hard to believe that a Google interviewer asked such a question.

We'll get back to the manhole question in a bit.

Lacakmann reports that she never saw any IQ tests in three years of interviewing at Google and that "brain teasers" are banned. But . . . if brain teasers are banned, somebody must be using them, right? Otherwise, why bother to ban them? For example, one of her commenters writes:

I [the commenter] have been phone screened by Google and so have several colleagues. I can say that the questions are different depending on who is asking them. I went in expecting a lot of technical questions, and instead they asked me one question:
"If I were to give you $1000 to count all the manholes in San Francisco, how would you do it?"

I don't think you can count on one type of phone screen or interview from Google. Each hiring team probably has their own style of screening.

And commenter Bjorn Borud writes:

Though your effort to demystify the interview process is laudable you should know better than to present assumptions as facts. At least a couple of the questions you listed as "fake" were used in interviews when I worked for google. No, I can't remember ever using any of them (not my style), but I interviewed several candidates who had other interviewers ask some of these. Specifically I know a lot of people were given the two eggs problem. Which is not an entirely unreasonable problem to observe problem solving skills.

And commenter Tim writes:

I was asked the manhole cover question verbatim during a Google interview for a Datacenter Ops position.

What we seem to have here is a debunking of a debunking of an expose.

Who do we believe?

You'll be unsurprised to hear that I think there's an interesting statistical question underlying all this mess. The question is: Who should we believe, and what evidence are we using or should be using to make this judgment?

What do we have so far?

- Felix Salmon implicitly endorses the analysis of Laakmann (who he labels as "Technology Woman"). I like Salmon; he seems reasonable and I'm inclined to trust him (even if I still don't know who this Nouriel Roubini person is who Salmon keeps mocking for buying a 5 million dollar house).

- Salmon associated the "fake" interview questions with "Business Insider," an unprofessional-looking website of the sort that clogs the web with recycled content and crappy ads.

- Laackman's website looks professional (unlike that of Business Insider) and reports her direct experiences at Google. After reading her story, I was convinced.

- There was one thing that bugged me about Laackmann's article, though. It was the very last sentence:

Want to see real Google interview questions, Microsoft interview questions, and more? Check CareerCup.

I followed the link, and CareerCup is Laackmann's commercial website. That's fine--we all have to earn a living. But what bothered me was that the sentence above contained three links (on "Google interview questions," "Microsoft interview questions," and "CareerCup")--and they all linked to exact same site. That's the kind of thing that spammers do.

Add +1 to the Laackmann's spam score.

- I didn't think much of this at first, but then there are the commenters, who report direct experiences of their own that contradict the blog's claims. And I couldn't see why someone would bother to write in with fake stories. It's not like they have something to sell.

- Laackmann has a persuasive writing style, but not in the mellow style of Salmon (or myself) but more in the in-your-face style Seth Godin, Clay Shirky, Philip Greenspun, Jeff Jarvis, and other internet business gurus. This ends up being neutral for me: the persuasiveness persuades me, then I resist the pushiness, and the net is to be neither more or less convincing than if the article were written in a flatter style.

What do I think? I'm guessing that Laackmann is sincere but is overconfident: she's taking the part of the world she knows and is generalizing with too much certainty. On the other hand, she may be capturing much of the truth: even if these wacky interview questions are used occasionally, maybe they're not asked most of the time.

My own story

As part of my application to MIT many years ago, I was interviewed by an alumnus in the area. We talked for awhile--I don't remember what about--and then he said he had to go off and do something in the other room, and while I was waiting I could play with these four colored cubes he had, that you were supposed to line up so that the colors on the outside lined up. It was a puzzle called Instant Insanity, I think. Anyway, he left the room to do whatever, and I started playing with the cubes. After a couple minutes I realized he'd given me an impossible problem: there was no possible way to line up the cubes to get the configuration he'd described. When he returned, I told him the puzzle was impossible, and he gave some sort of reply like, Yeah, I can't figure out what happened--maybe we had two sets and lost a couple of cubes? I still have no idea if he was giving this to me as some kind of test or whether he was just giving me something to amuse myself while he got some work done. He was an MIT grad, after all.

Joscha Legewie points to this article by Lars Ronnegard, Xia Shen, and Moudud Alam, "hglm: A Package for Fitting Hierarchical Generalized Linear Models," which just appeared in the R journal. This new package has the advantage, compared to lmer(), of allowing non-normal distributions for the varying coefficients. On the downside, they seem to have reverted to the ugly lme-style syntax (for example, "fixed = y ~ week, random = ~ 1|ID" rather than "y ~ week + (1|D)"). The old-style syntax has difficulties handling non-nested grouping factors. They also say they can estimated models with correlated random effects, but isn't that just the same as varying-intercept, varying-slope models, which lmer (or Stata alternatives such as gllam) can already do? There's also a bunch of stuff on H-likelihood theory, which seems pretty pointless to me (although probably it won't do much harm either).

In any case, this package might be useful to some of you, hence this note.

Clarity on my email policy

I never read email before 4. That doesn't mean I never send email before 4.

Cash in, cash out graph

David Afshartous writes:

I thought this graph [from Ed Easterling] might be good for your blog.

The 71 outlined squares show the main story, and the regions of the graph present the information nicely.

Looks like the bins for the color coding are not of equal size and of course the end bins are unbounded. Might be interesting to graph the distribution of the actual data for the 71 outlined squares. In addition, I assume that each period begins on Jan 1 so data size could be naturally increased by looking at intervals that start on June 1 as well (where the limit of this process would be to have it at the granularity of one day; while it most likely wouldn't make much difference, I've seen some graphs before where 1 year returns can be quite sensitive to starting date, etc).

I agree that (a) the graph could be improved in small ways--in particular, adding half-year data seems like a great idea--and (b) it's a wonderful, wonderful graph as is. And the NYT graphics people added some nice touches such as the gray (rather than white) background and the thin white lines to separate the decades.

On a (slightly) more substantive note, I don't think growth-adjusted-for-inflation is the best benchmark. Instead of growth minus inflation, I'd like to see growth minus the default interest rate you could get from a savings account or T-bill or something like that. Lots of possibilities here.

Bribing statistics

I Paid a Bribe by Janaagraha, a Bangalore based not-for-profit, harnesses the collective energy of citizens and asks them to report on the nature, number, pattern, types, location, frequency and values of corruption activities. These reports would be used to argue for improving governance systems and procedures, tightening law enforcement and regulation and thereby reduce the scope for corruption.

Here's a presentation of data from the application:
bribe.png

Transparency International could make something like this much more widely available around the world.

While awareness is good, follow-up is even better. For example, it's known that New York's subway signal inspections were being falsified. Signal inspections are pretty serious stuff, as failures lead to disasters, such as the one in Washington. Nothing much happened after: the person responsible (making $163k a year) was merely reassigned.

5 books

I was asked by Sophie Roell, an editor at The Browser, where every day they ask an expert in a field to recommend the top five books, not by them, in their subject. I was asked to recommend five books on how Americans vote.

The trouble is that I'm really pretty unfamiliar with the academic literature of political science, but it seemed sort of inappropriate for a political scientist such as myself to recommend non-scholarly books that I like (for example, "Style vs. Substance" by George V. Higgins, "Lies My Teacher Told Me," by James Loewen, "The Rascal King" by Jack Beatty, "Republican Party Reptile" by P. J. O'Rourke, and, of course, "All the King's Men," by Robert Penn Warren). I mean, what's the point of that? Nobody needs me to recommend books like that.

Instead, I moved sideways and asked if I could discuss five books on statistics instead. Roell said that would be fine, so I sent her a quick description, which appears below.

The actual interview turned out much better. Readable and conversational. I give Roell credit for this, keeping me from rambling too much. The interview includes the notorious hookah story, which should provoke a wince of recognition from anyone who's ever served on an NIH panel.

Below is my original email; the full interview appears here.

Hipmunk update

Florence from customer support at Hipmunk writes:

Hipmunk now includes American Airlines in our search results. Please note that users will be taken directly to AA.com to complete the booking/transaction. . . . we are steadily increasing the number of flights that we offer on Hipmunk.

As you may recall, Hipmunk is a really cool flight-finder that didn't actually work (as of 16 Sept 2010). At the time, I was a bit annoyed at the NYT columnist who plugged Hipmunk without actually telling his readers that the site didn't actually do the job. (I discovered the problem myself because I couldn't believe that my flight options to Raleigh-Durham were really so meager, so I checked on Expedia and found a good flight.)

I do think Hipmunk's graphics are beautiful, though, so I'm rooting for them to catch up.

P.S. Apparently they include Amtrak Northeast Corridor trains, so I'll give them a try, next time I travel. The regular Amtrak website is about as horrible as you'd expect.

Theoretical vs applied statistics

Anish Thomas writes:

Tukey's philosophy

The great statistician John Tukey, in his writings from the 1970s onward (and maybe earlier) was time and again making the implicit argument that you should evaluate a statistical method based on what it does; you should {\em not} be staring at the model that purportedly underlies the method, trying to determine if the model is "true" (or "true enough"). Tukey's point was that models can be great to inspire methods, but the model is the scaffolding; it is the method that is the building you have to live in.

I don't fully agree with this philosophy--I think models are a good way to understand data and also often connect usefully to scientific models (although not as cleanly as is thought by our friends who work in economics or statistical hypothesis testing).

To put it another way: What makes a building good? A building is good if it is useful. If a building is useful, people will use it. Eventually improvements will be needed, partly because the building will get worn down, partly because the interactions between the many users will inspire new, unforeseen uses, partly for the simple reason that if a building is popular, more space will be desired. At that point, work needs to be done. And, at that point, wouldn't it be great if some scaffolding were already around?

That scaffolding that we'd like to have . . . if we now switch the analogy back from buildings to statistical methods, that scaffolding is the model that was used in constructing the method in the first place.

No statistical method is perfect. In fact, it is the most useful, wonderful statistical methods that get the most use and need improvements most frequently. So I like the model and I don't see the virtue in hiding it and letting the method stand alone. The model is the basis for future improvements in many directions. And this is one reason why I think that one of the most exciting areas in statistical research is the systematization of model building. The network of models and all that.

But, even though I don't agree with the implicit philosophy of late Tukey (I don't agree with the philosophy of early Tukey either, with all that multiple comparisons stuff), I think (of course) that he made hugely important contributions. So I'd like to have this philosophy out there for statisticians and users to evaluate on their own.

I have not ever seen Tukey's ideas expressed in this way before (and they're just my own imputation; I only met Tukey once, many years ago, and we spoke for about 30 seconds), so I'm posting them here, on the first day of this new decade.

Obituaries in 2010

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48