Results matching “R”

Jonathan Chait writes that the most important aspect of a presidential candidate is "political talent":

Republicans have generally understood that an agenda tilted toward the desires of the powerful requires a skilled frontman who can pitch Middle America. Favorite character types include jocks, movie stars, folksy Texans and war heroes. . . . [But the frontrunners for the 2012 Republican nomination] make Michael Dukakis look like John F. Kennedy. They are qualified enough to serve as president, but wildly unqualified to run for president. . . . [Mitch] Daniels's drawbacks begin -- but by no means end -- with his lack of height, hair and charisma. . . . [Jeb Bush] suffers from an inherent branding challenge [because of his last name]. . . . [Chris] Christie . . . doesn't cut a trim figure and who specializes in verbally abusing his constituents. . . . [Haley] Barbour is the comic embodiment of his party's most negative stereotypes. A Barbour nomination would be the rough equivalent of the Democrats' nominating Howard Dean, if Dean also happened to be a draft-dodging transsexual owner of a vegan food co-op.

Chait continues:

The impulse to envision one of these figures as a frontman represents a category error. These are the kind of people you want advising the president behind the scenes; these are not the people you put in front of the camera. The presidential candidate is the star of a television show about a tall, attractive person who can be seen donning hard hats, nodding at the advice of military commanders and gazing off into the future.

Geddit? Mike Dukakis was short, ethnic-looking, and didn't look good in a tank. (He did his military service in peacetime.) And did I mention that his middle name was Stanley? Who would vote for such a jerk?

All I can say is that Dukakis performed about as well in 1988 as would be predicted from the economy at the time. Here's a graph based on Doug Hibbs's model:

hibbsnew.png

Sorry, but I don't think the Democrats would've won the 1988 presidential election even if they'd had Burt Reynolds at the top of the ticket. And, remember, George H. W. Bush was widely considered to be a wimp and a poser until he up and won the election. Conversely, had Dukakis won (which he probably would've, had the economy been slumping that year), I think we'd be hearing about how he was a savvy low-key cool dude.

Let me go on a bit more about the 1988 election.

Dean Eckles writes:

I remember reading on your blog that you were working on some tools to fit multilevel models that also include "fixed" effects -- such as continuous predictors -- that are also estimated with shrinkage (for example, an L1 or L2 penalty). Any new developments on this front?

I often find myself wanting to fit a multilevel model to some data, but also needing to include a number of "fixed" effects, mainly continuous variables. This makes me wary of overfitting to these predictors, so then I'd want to use some kind of shrinkage.

As far as I can tell, the main options for doing this now is by going fully Bayesian and using a Gibbs sampler. With MCMCglmm or BUGS/JAGS I could just specify a prior on the fixed effects that corresponds to a desired penalty. However, this is pretty slow, especially with a large data set and because I'd like to select the penalty parameter by cross-validation (which is where this isn't very Bayesian I guess?).

My reply:

We allow informative priors in blmer/bglmer. Unfortunately blmer/bglmer aren't ready yet but they will be soon, I hope. They'll be in the "arm" package in R. We're also working on a bigger project of multilevel models for deep interactions of continuous predictors. But that won't be ready for awhile; we still have to figure out what we want to do there.

In politics, as in baseball, hot prospects from the minors can have trouble handling big-league pitching.

Of Beauty, Sex, and Power: Statistical Challenges in Estimating Small Effects. At the Institute of Policy Research, Thurs 7 Apr 2011, 3.30pm.

Regular blog readers know all about this topic. (Here are the slides.) But, rest assured, I don't just mock. I also offer constructive suggestions.

My last talk at Northwestern was fifteen years ago. Actually, I gave two lectures then, in the process of being turned down for a jobenjoying their chilly Midwestern hospitality.

P.S. I searched on the web and also found this announcement which gives the wrong title.

Internal and external forecasting

Some thoughts on the implausibility of Paul Ryan's 2.8% unemployment forecast. Some general issues arise.

P.S. Yes, Democrats also have been known to promote optimistic forecasts!

Marc Tanguay writes in with a specific question that has a very general answer. First, the question:

I [Tanguay] am currently running a MCMC for which I have 3 parameters that are restricted to a specific space. 2 are bounded between 0 and 1 while the third is binary and updated by a Beta-Binomial. Since my priors are also bounded, I notice that, conditional on All the rest (which covers both data and other parameters), the density was not varying a lot within the space of the parameters. As a result, the acceptance rate is high, about 85%, and this despite the fact that all the parameter's space is explore. Since in your book, the optimal acceptance rates prescribed are lower that 50% (in case of multiple parameters), do you think I should worry about getting 85%. Or is this normal given the restrictions on the parameters?

First off: Yes, my guess is that you should be taking bigger jumps. 85% seems like too high an acceptance rate for Metropolis jumping.

More generally, though, my recommendation is to monitor expected squared jumped distance (ESJD), which is a cleaner measure than acceptance probability. Generally, the higher is ESJD, the happier you should be. See this paper with Cristian Pasarica.

The short story is that if you maximize ESJD, you're minimizing the first-order autocorrelation. And, within any parameterized family of jumping rules, if you minimize the first-order autocorrelation, I think you'll pretty much be minimizing all of your autocorrelations and maximizing your efficiency.

As Cristian and I discuss in the paper, you can use a simple Rao-Blackwellization to compute expected squared jumped distance, rather than simply average squared jumped distance. We develop some tricks based on differentiation and importance sampling to adaptively optimize the algorithm to maximize ESJD, but you can always try the crude approach of trying a few different jumping scales, calculating ESJD for each, and then picking the best to go forward.

We all have opinions about the federal budget and how it should be spent. Infrequently, those opinions are informed by some knowledge about where the money actually goes. It turns out that most people don't have a clue. What about you? Here, take this poll/quiz and then compare your answers to (1) what other people said, in a CNN poll that asked about these same items and (2) compare your answers to the real answers.

Quiz is below the fold.

Tyler Cowen links to this article by Matt Ridley that manages to push all my buttons. Ridley writes:

No joke. See here (from Kaiser Fung). At the Statistics Forum.

This article by Thomas Crag, at Copenhagenize, is marred by reliance on old data, but it's so full of informative graphical displays --- most of them not made by the author, I think --- that it's hard to pick just one. But here ya go. This figure shows fatalities (among cyclists) versus distance cycled, with a point for each year...unfortunately ending in way back in 1998, but still:
050207_Cycling_safety_ecf_Thomas_Krag_16.jpg This is a good alternative to the more common choice for this sort of plot, which would be overlaying curves of fatalities vs time and distance cycled vs time.

The article also explicitly discusses the fact, previously discussed on this blog, that it's misleading, to the point of being wrong in most contexts, to compare the safety of walking vs cycling vs driving by looking at the casualty or fatality rate per kilometer. Often, as in this article, the question of interest is something like, if more people switched from driving to cycling, how many more or fewer people would die? Obviously, if people give up their cars, they will travel a lot fewer kilometers! According to the article, in Denmark in 1992 (!), cycling was about 3x as dangerous per kilometer as driving, but was essentially equally safe per hour and somewhat safer per trip.

The article also points out that, since cycling is good for you --- if you avoid death or serious injury in a crash! --- it might make sense to look at overall mortality rather than just fatalities in crashes, and sure enough, a ten-year-old study finds that regular cyclists have a much lower death rate (adjusted for sex, smoking status, education, etc.) than non-cyclists. I'm rather suspicious about the ability to quantify this, since it seems virtually impossible to actually control for what you need to control for, but not having actually read the study in question, it's hard to say for sure.

At any rate, check out the small-multiples plot, the other graphics (most of them pretty good), and the relatively sophisticated discussion of statistical principles...sophisticated to what one usually sees in the media, anyway. (I'm not sure if this is a reflection of higher standards of statistical literacy in the Netherlands vs here, or if that's reading too much into it.)

This recent story of a wacky psychology professor reminds me of this old story of a wacky psychology professor.

This story of a wacky philosophy professor reminds me of a course I almost took at MIT. I was looking through the course catalog one day and saw that Thomas Kuhn was teaching a class in the philosophy of science. Thomas Kuhn--wow! So I enrolled in the class. I only sat through one session before dropping it, though. Kuhn just stood up there and mumbled.

At the time, this annoyed me a little. In retrospect, though, it made more sense. I'm sure he felt he had better things to do with his life than teach classes. And MIT was paying him whether or not he did a good job teaching, so it's not like he was breaking his contract or anything. (Given the range of instructors we had at MIT, it was always a good idea to make use of the shopping period at the beginning of the semester. I had some amazing classes but only one or two really bad ones. Mostly I dropped the bad ones after a week or two.)

Thinking about the philosophies of Kuhn, Lakatos, Popper, etc., one thing that strikes me is how much easier it is to use their ideas now that they're long gone. Instead of having to wrestle with every silly think that Kuhn or Popper said, we can just pick out the ideas we find useful. For example, my colleagues and I can use the ideas of paradigms and of the fractal nature of scientific revolutions without needing to get annoyed at Kuhn's gestures in the direction of denying scientific reality.

P.S. Morris also mentioned that Kuhn told him, "Under no circumstances are you to go to those lectures" by a rival philosopher. Which reminds me of when I asked one of my Ph.D. students at Berkeley why he chose to work with me. He told me that Prof. X had told him not to take my course and Prof. Y had made fun of Bayesian statistics in his class. At this point the student got curious. . . . and the rest is history (or, at least, Mister P).

Steve Ziliak points me to this article by the always-excellent Carl Bialik, slamming hypothesis tests. I only wish Carl had talked with me before so hastily posting, though! I would've argued with some of the things in the article. In particular, he writes:

Reese and Brad Carlin . . . suggest that Bayesian statistics are a better alternative, because they tackle the probability that the hypothesis is true head-on, and incorporate prior knowledge about the variables involved.

Brad Carlin does great work in theory, methods, and applications, and I like the bit about the prior knowledge (although I might prefer the more general phrase "additional information"), but I hate that quote!

My quick response is that the hypothesis of zero effect is almost never true! The problem with the significance testing framework--Bayesian or otherwise--is in the obsession with the possibility of an exact zero effect. The real concern is not with zero, it's with claiming a positive effect when the true effect is negative, or claiming a large effect when the true effect is small, or claiming a precise estimate of an effect when the true effect is highly variable, or . . . I've probably missed a few possibilities here but you get the idea.

In addition, none of Carl's correspondents mentioned the "statistical significance filter": the idea that, to make the cut of statistical significance, an estimate has to reach some threshold. As a result of this selection bias, statistically significant estimates tend to be overestimates--whether or not a Bayesian method is used, and whether or not there are any problems with fishing through the data.

Bayesian inference is great--I've written a few books on the topic--but, y'know, garbage in, garbage out. If you start with a model of exactly zero effects, that's what will pop out.

I completely agree with this quote from Susan Ellenberg, reported in the above article:

You have to make a lot of assumptions in order to do any statistical test, and all of those are questionable.

And being Bayesian doesn't get around that problem. Not at all.

P.S. Steve Stigler is quoted as saying, "I don't think in science we generally sanction the unequivocal acceptance of significance tests." Unfortunately, I have no idea what he means here, given the two completely opposite meanings of the word "sanction" (see the P.S. here.)

Bill James and the base-rate fallacy

I was recently rereading and enjoying Bill James's Historical Baseball Abstract (the second edition, from 2001).

But even the Master is not perfect. Here he is, in the context of the all-time 20th-greatest shortstop (in his reckoning):

Are athletes special people? In general, no, but occasionally, yes. Johnny Pesky at 75 was trim, youthful, optimistic, and practically exploding with energy. You rarely meet anybody like that who isn't an ex-athlete--and that makes athletes seem special. [italics in the original]

Hey, I've met 75-year-olds like that--and none of them are ex-athletes! That's probably because I don't know a lot of ex-athletes. But Bill James . . . he knows a lot of athletes. He went to the bathroom with Tim Raines once! The most I can say is that I saw Rickey Henderson steal a couple bases when he was playing against the Orioles once.

Cognitive psychologists talk about the base-rate fallacy, which is the mistake of estimating probabilities without accounting for underlying frequencies. Bill James knows a lot of ex-athletes, so it's no surprise that the youthful, optimistic, 75-year-olds he meets are likely to be ex-athletes. The rest of us don't know many ex-athletes, so it's no suprrise that most of the youthful, optimistic, 75-year-olds we meet are not ex-athletes.

The mistake James made in the above quote was to write "You" when he really meant "I." I'm not disputing his claim that athletes are disproportionately likely to become lively 75-year-olds; what I'm disagreeing with is his statement that almost all such people are ex-athletes.

Yeah, I know, I'm being picky. But the point is important, I think, because of the window it offers into the larger issue of people being trapped in their own environment (the "availability heuristic," in the jargon of cognitive psychology). Athletes loom large in Bill James's world--and I wouldn't want it any other way--and sometimes he forgets that the rest of us live in a different world.

So many topics, so little time

As many of you know, this blog is on an approximate one-month delay. I schedule my posts to appear roughly once a day, and there's currently a backlog of about 20 or 30 posts.

Recently I've decided to spend less time blogging, but I have some ideas I'd still like to share. To tweet, if you will. So I thought I'd just put a bunch of ideas out there that interested readers could follow up on. Think of it like one of those old-style dot-dot-dot newspaper columns.

Why Edit Wikipedia?

Zoe Corbyn's article for The Guardian (UK), titled Wikipedia wants more contributions from academics, and the followup discussion on Slashdot got me thinking about my own Wikipedia edits.

The article quotes Dario Taraborelli, a research analyst for the Wikimedia Foundation, as saying "Academics are trapped in this paradox of using Wikipedia but not contributing," Huh? I'm really wondering what man-in-the-street wrote all the great stats stuff out there. And what's the paradox? I use lots of things without contributing to them.

Taraborelli is further quoted as saying "The Wikimedia Foundation is looking at how it might capture expert conversation about Wikipedia content happening on other websites and feed it back to the community as a way of providing pointers for improvement."

This struck home. I recently went through the entry for latent Dirichlet allocation and found a bug in their derivation. I wrote up a revised derivation and posted it on my own blog.

But why didn't I go back and fix the Wikipedia? One, editing in their format is a pain. Second, as Corbyn's article points out, I was afraid I'd put in lots of work and my changes would be backed out. I wasn't worried that Wikipedia would erase whole pages, but apparently it's an issue for some these days. A real issue is that most of the articles are pretty good, and while they're not necessarily written the way I'd write them, they're good enough that I don't think it's worth rewriting the whole thing (also, see point 2).

If you're status conscious in a traditional way, you don't blog either. It's not what "counts" when it comes time for tenure and promotion. And if you think blogs don't count, which are at least attributed, what about Wikipedia? Well, encyclopedia articles and such never counted for much on your CV. I did a few handbook type things and then started turning them down, mainly because I'm not a big fan of the handbook format.

In that sense, it's just like teaching. I was told many times on tenure track that I shouldn't be "wasting" so much time teaching. I was even told by a dean at a major midwestern university that they barely even counted teaching. So is it any surprise we don't want to focus on teaching or writing encyclopedia articles?

Radford writes:

The word "conservative" gets used many ways, for various political purposes, but I would take it's basic meaning to be someone who thinks there's a lot of wisdom in traditional ways of doing things, even if we don't understand exactly why those ways are good, so we should be reluctant to change unless we have a strong argument that some other way is better. This sounds very Bayesian, with a prior reducing the impact of new data.

I agree completely, and I think Radford will very much enjoy my article with Aleks Jakulin, "Bayes: radical, liberal, or conservative?" Radford's comment also fits with my increasing inclination to use informative prior distributions.

This is a chance for me to combine two of my interests--politics and statistics--and probably to irritate both halves of the readership of this blog. Anyway...

I recently wrote about the apparent correlation between Bayes/non-Bayes statistical ideology and liberal/conservative political ideology:

The Bayes/non-Bayes fissure had a bit of a political dimension--with anti-Bayesians being the old-line conservatives (for example, Ronald Fisher) and Bayesians having a more of a left-wing flavor (for example, Dennis Lindley). Lots of counterexamples at an individual level, but my impression is that on average the old curmudgeonly, get-off-my-lawn types were (with some notable exceptions) more likely to be anti-Bayesian.

This was somewhat based on my experiences at Berkeley. Actually, some of the cranky anti-Bayesians were probably Democrats as well, but when they were being anti-Bayesian they seemed pretty conservative.

Recently I received an interesting item from Gerald Cliff, a professor of mathematics at the University of Alberta. Cliff wrote:

I took two graduate courses in Statistics at the University of Illinois, Urbana-Champaign in the early 1970s, taught by Jacob Wolfowitz. He was very conservative, and anti-Bayesian. I admit that my attitudes towards Bayesian statistics come from him. He said that if one has a population with a normal distribution and unknown mean which one is trying to estimate, it is foolish to assume that the mean is random; it is fixed, and currently unknown to the statistician, but one should not assume that it is a random variable.

Wolfowitz was in favor of the Vietnam War, which was still on at the time. He is the father of Paul Wolfowitz, active in the Bush administration.

To which I replied:

Very interesting. I never met Neyman while I was at Berkeley (he had passed away before I got there) but I've heard that he was very liberal politically (as was David Blackwell). Regarding the normal distribution comment below, I would say:

1. Bayesians consider parameters to be fixed but unknown. The prior distribution is a regularization tool that allows more stable estimates.

2. The biggest assumptions in probability models are typically not the prior distribution but in the data model. In this case, Wolfowitz was willing to assume a normal distribution with no question but then balked at using any knowledge about its mean. It seems odd to me, as a Bayesian, for one's knowledge to be divided so sharply: zero knowledge about the parameter, perfect certainty about the distributional family.

To return to the political dimension: From basic principles, I don't see any strong logical connection between Bayesianism and left-wing politics. In statistics, non-Bayesian ("classical") methods such as maximum likelihood are often taken to be conservative, as compared to the more assumption-laden Bayesian approach, but, as Aleks Jakulin and I have argued, the labeling of a political method as liberal or conservative depends crucially on what is considered your default.

As statisticians, we are generally trained to respect conservatism, which can sometimes be defined mathematically (for example, nominal 95% intervals that contain the true value more than 95% of the time) and sometimes with reference to tradition (for example, deferring to least-squares or maximum-likelihood estimates). Statisticians are typically worried about messing with data, which perhaps is one reason that the Current Index to Statistics lists 131 articles with "conservative" in the title or keywords and only 46 with the words "liberal" or "radical."

In that sense, given that, until recently, non-Bayesian approaches were the norm in statistics, it was the more radical group of statisticians (on average) who wanted to try something different. And I could see how a real hardline conservative such as Wolfowitz could see a continuity between anti-Bayesian skepticism and political conservatism, just how, on the other side of the political spectrum, a leftist such as Lindley could ally Bayesian thinking with support of socialism, a planned economy, and the like.

As noted above, I don't think these connections make much logical sense but I can see where they were coming from (with exceptions, of course, as noted regarding Neyman above).

A common aphorism among artificial intelligence practitioners is that A.I. is whatever machines can't currently do.

Adam Gopnik, writing for the New Yorker, has a review called Get Smart in the most recent issue (4 April 2011). Ostensibly, the piece is a review of new books, one by Joshua Foer, Moonwalking with Einstein: The Art and Science of Remembering Everything, and one by Stephen Baker Final Jeopardy: Man vs. Machine and the Quest to Know Everything (which would explain Baker's spate of Jeopardy!-related blog posts). But like many such pieces in highbrow magazines, the book reviews are just a cover for staking out a philosophical position. Gopnik does a typically New Yorker job in explaining the title of this blog post.

The Conservative States of America

After noting the increasing political conservatism of people in the poorer states, Richard Florida writes:

The current economic crisis only appears to have deepened conservatism's hold on America's states. This trend stands in sharp contrast to the Great Depression, when America embraced FDR and the New Deal.

Liberalism, which is stronger in richer, better-educated, more-diverse, and, especially, more prosperous places, is shrinking across the board and has fallen behind conservatism even in its biggest strongholds. This obviously poses big challenges for liberals, the Obama administration, and the Democratic Party moving forward.

But the much bigger, long-term danger is economic rather than political. This ideological state of affairs advantages the policy preferences of poorer, less innovative states over wealthier, more innovative, and productive ones. American politics is increasingly disconnected from its economic engine. And this deepening political divide has become perhaps the biggest bottleneck on the road to long-run prosperity.

What are my thoughts on this?

First, I think Obama would be a lot more popular had he been elected in 1932, rather than 1930.

Second, transfers from the richer, more economically successful states to the poorer, less developed states are not new. See, for example, this map from 1924 titled "Good Roads Everywhere" that shows a proposed system of highways spanning the country, "to be built and forever maintained by the United States Government."

minimap.jpg

The map, made by the National Highways Association, also includes the following explanation for the proposed funding system: "Such a system of National Highways will be paid for out of general taxation. The 9 rich densely populated northeastern States will pay over 50 per cent of the cost. They can afford to, as they will gain the most. Over 40 per cent will be paid for by the great wealthy cities of the Nation. . . . The farming regions of the West, Mississippi Valley, Southwest and South will pay less than 10 per cent of the cost and get 90 per cent of the mileage." [emphasis added] Beyond its quaint slogans ("A paved United States in our day") and ideas that time has passed by ("Highway airports"), the map gives a sense of the potential for federal taxing and spending to transfer money between states and regions.

Bayesian spam!

Cool! I know Bayes has reached the big time when I receive spam like this:

Bayesian networks are rapidly emerging as a new research paradigm . . . With this monthly newsletter, we'll keep you up to date . . . Financial Analytics Webinar . . . will exhibit at this year's INFORMS Analytics Conference in downtown Chicago. Please join us for our Bayesian networks technology workshop on April 10 . . . a powerful desktop application (Windows/Mac/Unix) for knowledge discovery, data mining, analytics, predictive modeling and simulation . . . the world's only comprehensive software package for learning, editing and analyzing Bayesian networks . . . If you no longer wish to receive these emails, please reply to this message with "Unsubscribe" in the subject line . . .

You know the saying, "It's not real unless it's on TV"? My saying is: It's not real until it's on spam.

Unfinished business

This blog by J. Robert Lennon on abandoned novels made me think of the more general topic of abandoned projects. I seem to recall George V. Higgins writing that he'd written and discarded 14 novels or so before publishing The Friends of Eddie Coyle.

I haven't abandoned any novels but I've abandoned lots of research projects (and also have started various projects that there's no way I'll finish). If you think about the decisions involved, it really has to be that way. You learn while you're working on a project whether it's worth continuing. Sometimes I've put in the hard work and pushed a project to completion, published the article, and then I think . . . what was the point? The modal number of citations of our articles is zero, etc.

Explaining that plot.

With some upgrades from a previous post.
PoliticEst.png

And with a hopefully clear 40+ page draft paper (see page 16).
Drawing Inference - Literally and by Individual
Contribution.pdf

Comments are welcome, though my reponses may be delayed.
(Working on how to best render the graphs.)

K?
p.s. Plot was modified so that it might be better interpreted without reading any of the paper - though I would not suggest that - reading at least pages 1 to 17 is recomended.

Wobegon on the Potomac

This story reminds me that, when I was in grad school, the state of Massachusetts instituted a seat-belt law which became a big controversy. A local talk show host made it his pet project to shoot down the law, and he succeeded! There was a ballot initiative and the voters repealed the seat belt law. A few years later the law returned (it was somehow tied in with Federal highway funding, I think, the same way they managed to get all the states to up the drinking age to 21), and, oddly enough, nobody seemed to care the second time around.

It's funny how something can be a big political issue one year and nothing the next. I have no deep insights on the matter, but it's worth remembering that these sorts of panics are nothing new. Recall E.S. Turner's classic book, Roads to Ruin. I think there's a research project in here, to understand what gets an issue to be a big deal and how it is that some controversies just fade away.

Reviewing a research article by Michael Spence and Sandile Hlatshwayo about globalization (a paper with the sobering message that "higher-paying jobs [are] likely to follow low-paying jobs in leaving US," Tyler Cowen writes:

It is also a useful corrective to the political conspiracy theories of changes in the income distribution. . .

Being not-so-blissfully ignorant of macroeconomics, I can focus on the political question, namely these conspiracy theories.

I'm not quite sure what Cowen is referring to here--he neglects to provide a link to the conspiracy theories--but I'm guessing he's referring to the famous graph by Piketty and Saez showing how very high-end incomes (top 1% or 0.1%) have, since the 1970s, risen much more dramatically in the U.S. than in other countries, along with claims by Paul Krugman and others that much of this difference can be explained by political changes in the U.S. In particular, top tax rates in the U.S. have declined since the 1970s and the power of labor unions have decreased. The argument that Krugman and others make on the tax rates is both direct (the government takes away money from people with high incomes) and indirect (with higher tax rates, there is less incentive to seek or to pay high levels of compensation). And there's an even more indirect argument that as the rich get richer, they can use their money in various ways to get more political influence.

Anyway, I'm not sure what the conspiracy is. I mean, whatever Grover Norquist might be doing in a back room somewhere, the move to lower taxes was pretty open. According to Dictionary,com, a conspiracy is "an evil, unlawful, treacherous, or surreptitious plan formulated in secret by two or more persons; plot."

Hmm . . . I suppose Krugman etc. might in fact argue that there has been some conspiracy going on--for example of employers conspiring to use various illegal means to thwart union drives--but I'd also guess that to him and others on the left or center-left, most of the political drivers of inequality changes have been open, not conspiratorial.

I might be missing something here, though; I'd be interested in hearing more. At this point I'm not sure if Cowen's saying that these conspiracies don't exist, or whether they exist (and are possibly accompanied by similar conspiracies on the other side) but have been ineffective. Also I might be completely wrong in assigning Cowen's allusion to Krugman etc.

This discussion is relevant to this here blog because the labeling of a hypothesis as a "conspiracy" seems relevant to how it is understood and evaluated.

In my discussion of dentists-named-Dennis study, I referred to my back-of-the-envelope calculation that the effect (if it indeed exists) corresponds to an approximate 1% aggregate chance that you'll pick a profession based on your first name. Even if there are nearly twice as many dentist Dennises as would be expected from chance alone, the base rate is so low that a shift of 1% of all Dennises would be enough to do this. My point was that (a) even a small effect could show up when looking at low-frequency events such as the choice to pick a particular career or live in a particular city, and (b) any small effects will inherently be difficult to detect in any direct way.

Uri Simonsohn (the author of the recent rebuttal of the original name-choice article by Brett Pelham et al.) wrote:

100-year floods

According to the National Weather Service:

What is a 100 year flood? A 100 year flood is an event that statistically has a 1% chance of occurring in any given year. A 500 year flood has a .2% chance of occurring and a 1000 year flood has a .1% chance of occurring.

The accompanying map shows a part of Tennessee that in May 2010 had 1000-year levels of flooding.

tennessee.png

At first, it seems hard to believe that a 1000-year flood would have just happened to occur last year. But then, this is just a 1000-year flood for that particular place. I don't really have a sense of the statistics of these events. How many 100-year, 500-year, and 1000-year flood events have been recorded by the Weather Service, and when have they occurred?

Sam Stroope writes:

I'm creating county-level averages based on individual-level respondents. My question is, how few respondents are reasonable to use when calculating the average by county? My end model will be a county-level (only) SEM model.

My reply: Any number of respondents should work. If you have very few respondents, you should just end up with large standard errors which will propagate through your analysis.

P.S. I must have deleted my original reply by accident so I reconstructed something above.

My last post on albedo, I promise

After seeing my recent blogs on Nathan Myhrvold, a friend told me that, in the tech world, the albedo-obsessed genius is known as a patent troll.

Really?

Yup. My friend writes:

Physics is hard

Readers of this bizarre story (in which a dubious claim about reflectivity of food in cooking transmuted into a flat-out wrong claim about the relevance of reflectivity of solar panels) might wonder how genius Nathan Myhrvold (Ph.D. in theoretical physics from Princeton at age 24, postdoc with Stephen Hawking for chrissake) could make such a basic mistake.

In an earlier comment, I dismissed this with a flip allusion to Wile E. Coyote. But now I'm thinking there's something more going on.

In our blog discussion (see links above), Phil is surprised I didn't take a stronger stance on the albedo issue after reading Pierrehumbert's explanation. Phil asks: Why did I write "experts seem to think the albedo effect is a red herring" instead of something stronger such as, "as Pierrehumbert shows in detail, the albedo effect is a red herring"?

I didn't do this because my physics credentials are no better than Myhrvold's. And, given that Myhrvold got it wrong, I don't completely trust myself to get it right!

I majored in physics in college and could've gone to grad school in physics--I actually almost did so, switching to statistics at the last minute. I could be a Ph.D. in physics too. But I've never had a great physical intuition. I could definitely get confused by a slick physics argument. And I suspect Myhrvold is the same way. Given what he's written on albedo, I doubt his physics intuition is anywhere near as good as Phil's. My guess is that Myhrvold, like me, got good grades and was able to solve physics problems but made a wise choice in leaving physics to do something else.

Now, it's true, I don't think I would've made Myhrvold's particular mistake, because I would've checked--to start with, I would've asked my friends Phil and Upmanu before making any public claims about physics. In that sense, the difference between me and Myhrvold is not that I know more (or less) than he does, but that I have more of a clear sense of my areas of ignorance.

P.S. I'm on a Windows machine but my spell checker keeps flagging "Myhrvold." I'm surprised that in all his years there, he didn't use his influence to put his name in the dictionary. Then again, "Obama" gets flagged as a typo too. But "Clinton" it knows about. Hmm, lemme try some more: "Dukakis" gets flagged. But not "Reagan" or "Nixon" or "Roosevelt." Or "Quayle." If I were Nathan Myhrvold or Mike Dukakis, I'd be pretty annoyed at this point. Getting frozen out by Reagan or Roosevelt, fine. But Quayle??

Ed Glaeser writes:

The late Senator Daniel Patrick Moynihan of New York is often credited with saying that the way to create a great city is to "create a great university and wait 200 years," and the body of evidence on the role that universities play in generating urban growth continues to grow.

I've always thought this too, that it's too bad that, given the total cost, a lot more cities would've benefited, over the years, by maintaining great universities rather than building expensive freeways, RenCens, and so forth.

But Joseph Delaney argues the opposite, considering the case of New Haven, home of what is arguably the second-best university in the country (I assume Glaeser would agree with me on this one):

Baseball's greatest fielders

Someone just stopped by and dropped off a copy of the book Wizardry: Baseball's All-time Greatest Fielders Revealed, by Michael Humphreys. I don't have much to say about the topic--I did see Brooks Robinson play, but I don't remember any fancy plays. I must have seen Mark Belanger but I don't really recall. Ozzie Smith was cool but I saw only him on TV. The most impressive thing I ever saw live was Rickey Henderson stealing a base. The best thing about that was that everyone was expecting him to steal the base, and he still was able to do it. But that wasn't fielding either.

Anyway, Humphreys was nice enough to give me a copy of his book, and since I can't say much (I didn't have it in me to study the formulas in detail, nor do I know enough to be able to evaluate them), I might as well say what I can say right away.

(Note: Humphreys replies to some of these questions in a comment.)

1. Near the beginning, Humphreys says that 10 runs are worth about 1 win. I've always been curious about Bill James's Pythagorean projection, so let me try it out here. If a team scores 700 runs in 162 games, then an extra 10 runs is 710, and Bill James's prediction is Games.Won/Games.Lost = (710/700)^2 = 1.029. Winning 1 extra game gives you an 82-80 record, for a ratio of 82/80=1.025. So that basically lines up.

There must be some more fundamental derivation, though. I don't see where the square comes from in James's model, and I don't see where the 10 comes from in Humphreys. I mean, I can see where it can arise empirically--and the idea that 10 runs = 1 extra win is a good thing to know, partly because it seems like a surprise at first (my intuition would've been that 10 extra runs will win you a few extra games), but I feel like there's some more fundamental relationship from which the 10:1 or Pythagorean relationship can be derived.

2. As I understand it, Humphreys is proposing two methods to evaluate fielders:
- The full approach, given knowledge of where all the balls are hit when a player is in the field.
- The approximate approach using available available data.

What I'm wondering is: Are there some simpler statistics that capture much of the substance of Humphreys's more elaborate analysis? For example, Bill James has his A*B/C formula for evaluating offensive effectiveness. But there's also on-base percentage and slugging average, both of which give a pretty good sense of what's going on and serve as a bridge between the basic statistics (1B, 2B, 3B, BB, etc) and the ultimate goal of runs scored. Similarly, I think Humphreys would make many a baseball fan happy if he could give a sense of the meaning of some basic fielding statistics--not just fielding average but also #assists, #double plays, etc. One of my continuing struggles as an applied statistician is to move smoothly between data, model, and underlying substance. In this case, I think Humphreys would be providing a richer picture if he connected some of these dots. (One might say, perversely, that Bill James had an advantage of learning in public, as it were: instead of presenting a fully-formed method, he tried out different ideas each year, thus giving us a thicker understanding of batting and pitching statistics, on top of our already-developed intuition about doubles, triples, wins and losses, etc.)

3. Humphreys makes the case that fielding is more important, as a contribution to winning, than we've thought. But perhaps his case could be made even stronger. Are there other aspects of strong (or weak) fielding not captured in the data? For example, suppose you have a team such as the '80s Cardinals with a fast infield, a fast outfield, and a pitching staff that throws a lot of low pitches leading to ground balls. I might be getting some of these details wrong, but bear with me. In this case, the fielders are getting more chances because the manager trusts them enough to get ground-ball pitchers. Conversely, a team with bad fielders perhaps will adjust their pitching accordingly, taking more chances with the BB and HR. Is this captured in Humphreys's model? I don't know. If not, this is not meant as a criticism, just a thought of a way forward. Also, I didn't read every word of the book so maybe he actually covers this selection issue at some point.

4. No big deal, but . . . I'd like to see some scatterplots. Perhaps start with something simple like some graphs of (estimated) offensive ability vs. (estimated) defensive ability, for all players and for various subsets. Then some time series of fielding statistics, both the raw data of putouts, chances, assists, etc. (see point 2 above) and then the derived statistics. It would be great to see individual career trajectories and also league averages by position.

5. Speaking of time series . . . Humphreys talks a lot about different eras of baseball and argues persuasively that players are much better now than in the old days. This motivates some adjustment for the years in which a player was active, just as with statistics for offense and pitching.

The one thing I'm worried about in the comparison of players from different eras is that I assume that fielding as a whole has been more important in some periods (e.g., the dead-ball era) than in others. If you're fielding in an era where fielding matters more, you can actually save more runs and win more games through fielding. I don't see how Humphreys's method of adjustment can get around that. Basically, in comparing fielders in different eras, you have a choice between evaluating what they did or what they could do. This is a separate issue from expansion of the talent pool and general improvement in skills.

Summary

I enjoyed the book. I assume that is clear to most of you already, as I wouldn't usually bother with a close engagement if I didn't think there was something there worth engaging with. Now I'll send it off to Kenny Shirley who might have something more helpful to say about it.

Remember that bizarre episode in Freakonomics 2, where Levitt and Dubner went to the Batcave-like lair of a genius billionaire who told them that "the problem with solar panels is that they're black." I'm not the only one who wondered at the time: of all the issues to bring up about solar power, why that one?

Well, I think I've found the answer in this article by John Lanchester:

In 2004, Nathan Myhrvold, who had, five years earlier, at the advanced age of forty, retired from his job as Microsoft's chief technology officer, began to contribute to the culinary discussion board egullet.org . . . At the time he grew interested in sous vide, there was no book in English on the subject, and he resolved to write one. . . . broadened it further to include information about the basic physics of heating processes, then to include the physics and chemistry of traditional cooking techniques, and then to include the science and practical application of the highly inventive new techniques that are used in advanced contemporary restaurant food--the sort of cooking that Myhrvold calls "modernist."

OK, fine. But what does this have to do with solar panels? Just wait:

Notwithstanding its title, "Modernist Cuisine" contains hundreds of pages of original, firsthand, surprising information about traditional cooking. Some of the physics is quite basic: it had never occurred to me that the reason many foods go from uncooked to burned at such speed is that light-colored foods reflect heat better than dark: "As browning reactions begin, the darkening surface rapidly soaks up more and more of the heat rays. The increase in temperature accelerates dramatically."

Aha! Now, I'm just guessing here, but my conjecture is that after studying this albedo effect in the kitchen, Myhrvold was primed to see it everywhere. Of course, maybe it went the other way: he was thinking about solar panels first and then applied his ideas to the kitchen. But, given that the experts seem to think the albedo effect is a red herring (so to speak) regarding solar panels, I wouldn't be surprised if Myhrvold just started talking about reflectivity because it was on his mind from the cooking project. My own research ideas often leak from one project to another, so I wouldn't be surprised if this happens to others too.

P.S. More here and here.

I followed the link of commenter "Epanechnikov" to his blog, where I found, among other things, an uncritical discussion of Richard von Mises's book, "Probability, Statistics and Truth."

The bad news is that, based on the evidence of his book, Mises didn't seem to understand basic ideas of statistical significance. See here, Or at the very least, he was grossly overconfident (which can perhaps be seen from the brash title of his book). This is not the fault of "Epanechnikov," but I just thought that people should be careful about taking too seriously the statistical philosophy of someone who didn't think to do a chi-squared test when it was called for. (This is not a Bayesian/non-Bayesian thing; it's just basic statistics.)

Online James?

Eric Tassone writes:

A commenter wrote (by email):

I've noticed that you've quit approving my comments on your blog. I hope I didn't anger you in some way or write something you felt was inappropriate.

My reply:

I have not been unapproving any comments. If you have comments that have not appeared, they have probably been going into the spam filter. I get literally thousands of spam comments a day and so anything that hits the spam filter is gone forever. I think there is a way to register as a commenter; that could help.

I read this story by Adrian Chen on Gawker (yeah, yeah, so sue me):

Why That 'NASA Discovers Alien Life' Story Is Bullshit

Fox News has a super-exciting article today: "Exclusive: NASA Scientist claims Evidence of Alien Life on Meteorite." OMG, aliens exist! Except this NASA scientist has been claiming to have evidence of alien life on meteorites for years.

Chen continues with a quote from the Fox News item:

[NASA scientist Richard B. Hoover] gave FoxNews.com early access to the out-of-this-world research, published late Friday evening in the March edition of the Journal of Cosmology. In it, Hoover describes the latest findings in his study of an extremely rare class of meteorites, called CI1 carbonaceous chondrites -- only nine such meteorites are known to exist on Earth. . . .

The bad news is that Hoover reported this same sort of finding in various low-rent venues for several years. Replication, huh? Chen also helpfully points us to the website of the Journal of Cosmology, which boasts that it "now receives over 500,000 Hits per month, which makes the Journal of Cosmology among the top online science journals."

So where's the statistics?

OK, fine. So far, so boring. Maybe it's even a bit mean to make fun of this crank scientist guy. The guy is apparently an excellent instrumentation engineer who is subject to wishful thinking. Put this together with tabloid journalism and you can get as many silly headlines as you'd like.

The statistics connection is that Hoover has repeatedly made similar unverified claims in this area, which suggests that this current work is not to be trusted. When someone has a track record of mistakes (or, at best, claims not backed up by an additional evidence), this is bad news.

In this case, we have prior information. But not the usual form of prior information on the parameter of interest, theta. Rather, we have prior information on the likelihood--that is, on the model that relates the data to the underlying phenomenon of interest. Any data comes with an implicit model of measurement error--the idea that what you're measuring isn't quite what you want to be measuring. Once we see a researcher or research group having a track record of strong but unsupported claims, this gives us some information about that measurement-error model.

In this particular example, I can't see a good way of quantifying this prior information or this measurement-error model (or any reason to do so). More generally, though, I think there's an important point here. People spend so much time arguing about their priors but often there are big problems with the likelihood. (Consider, for example, the recent ESP studies.)

I put it on the sister blog so you loyal readers here wouldn't be distracted by it.

Under the heading, "Why Preschool Shouldn't Be Like School," cognitive psychologist Alison Gopnik describes research showing that four-year-olds learn better if they're encouraged to discover and show to others, rather than if they're taught what to do. This makes sense, but it's not clear to me why this wouldn't apply to older kids and adults. It's a commonplace in teaching at all levels that students learn by doing and by demonstrating what they can do. Even when a student is doing nothing but improvising from a template, we generally believe the student will learn better by explaining what's going on, by having a mental model of the process to go along with the proverbial 10,000 hours or practice. The challenge is in the implementation, how to get students interested, motivated, and focused enough to put the effort into learning.

So why the headline above? Why does Gopnik's research support the idea that preschool should be different from school? I'm not trying to disagree with Gopnik here. I just don't understand the reasoning.

P.S. One more thing, which certainly isn't Gopnik's fault but it's pretty funny/scary, given that it's the 21st century and all. Slate put this item in the category "Doublex: What women really think about news, politics, and culture." What? It wasn't good enough for "Science"? No, that space was taken by "The eco-guide to responsible drinking." But, sure, I guess it makes sense: kids in school . . . that sounds like it belongs on the women's page, along with Six recipes to get your kids to eat their vegetables, etc.

Chess vs. checkers

Mark Palko writes:

Chess derives most of its complexity through differentiated pieces; with checkers the complexity comes from the interaction between pieces. The result is a series of elegant graph problems where the viable paths change with each move of your opponent. To draw an analogy with chess, imagine if moving your knight could allow your opponent's bishop to move like a rook. Add to that the potential for traps and manipulation that come with forced capture and you have one of the most remarkable games of all time. . . .

It's not unusual to hear masters of both chess and checkers (draughts) to admit that they prefer the latter. So why does chess get all the respect? Why do you never see a criminal mastermind or a Bond villain playing in a checkers tournament?

Part of the problem is that we learn the game as children so we tend to think of it as a children's game. We focus on how simple the rules are and miss how much complexity and subtlety you can get out of those rules.

As a person who prefers chess to checkers, I have a slightly different story. To me, checkers is much more boring to play than chess. All checkers games look the same, but each chess game it its own story. I expect this is true at the top levels too, but the distinction is definitely there for casual players. I can play chess (at my low level) without having to think too hard most of the time and still enjoy participating, making plans, attacking and defending. I feel involved at any level of effort. In contrast, when I play a casual game of checkers, it just seems to me that the pieces are moving by themselves and the whole game seems pretty random.

I'm not saying this is true of everyone--I'm sure Palko is right that checkers can have a lot going for it if you come at it with the right attitude--but I doubt my experiences are unique, either. My argument in favor of chess is not a naive "Chess has more possibilities" (if that were the attraction, we'd all be playing 12x12x12 three-dimensional chess by now) but that the moderate complexity of chess allows for a huge variety of interesting positions that are intricately related to each other.

Overall, I think Palko's argument about elegant simplicity applies much better to Go than to checkers.

But what happens next?

I wonder what will happen when (if?) chess is fully solved, so that we know (for example) that with optimal play the game will end in a draw. Or, if they ever make that rules change so that a stalemate is a loss, maybe they'll prove that White can force a win. In a way this shouldn't change the feel of a casual game of chess, but I wonder.

This is pretty amazing.

Jonathan Livengood writes:

I have a couple of questions on your paper with Cosma Shalizi on "Philosophy and the practice of Bayesian statistics."

First, you distinguish between inductive approaches and hypothetico-deductive approaches to inference and locate statistical practice (at least, the practice of model building and checking) on the hypothetico-deductive side. Do you think that there are any interesting elements of statistical practice that are properly inductive? For example, suppose someone is playing around with a system that more or less resembles a toy model, like drawing balls from an urn or some such, and where the person has some well-defined priors. The person makes a number of draws from the urn and applies Bayes theorem to get a posterior. On your view, is that person making an induction? If so, how much space is there in statistical practice for genuine inductions like this?

Second, I agree with you that one ought to distinguish induction from other kinds of risky inference, but I'm not sure that I see a clear payoff from making the distinction. I'm worried because a lot of smart philosophers just don't distinguish "inductive" inferences from "risky" inferences. One reason (I think) is that they have in mind Hume's problem of induction. (Set aside whether Hume ever actually raised such a problem.) Famously, Popper claimed that falsificationism solves Hume's problem. In a compelling (I think) rejoinder, Wes Salmon points out that if you want to do anything with a scientific theory (or a statistical model), then you need to believe that it is going to make good predictions. But if that is right, then a model that survives attempts at falsification and then gets used to make predictions is still going to be open to a Humean attack. In that respect, then, hypothetico-deductivism isn't anti-inductivist after all. Rather, it's a variety of induction and suffers all the same difficulties as simple enumerative induction. So, I guess what I'd like to know is in what ways you think the philosophers are misled here. What is the value / motivation for distinguishing induction from hypothetico-deductive inference? Do you think there is any value to the distinction vis-a-vis Hume's problem? And what is your take on the dispute between Popper and Salmon?

I replied:

My short answer is that inductive inference of the balls-in-urns variety takes place within a model, and the deductive Popperian reasoning takes place when evaluating a model. Beyond this, I'm not so familiar with the philosophy literature. I think of "Popper" more as a totem than as an actual person or body of work. Finally, I recognize that my philosophy, like Popper's, does not say much about where models come from. Crudely speaking, I think of models as a language, with models created in the same way that we create sentences, by working with recursive structures. But I don't really have anything formal to say on the topic.

Livengood then wrote:

The part of Salmon's writing that I had in mind is his Foundations of Scientific Inference. See especially Section 3 on deductivism, starting on page 21.

Let me just press a little bit so that I am sure I'm understanding the proposal. When you say that inductive inference takes place within a model, are you claiming that an inductive inference is justified just to the extent that the model within which the induction takes place is justified (or approximately correct or some such -- I know you won't say "true" here ...)? If so, then under what conditions do you think a model is justified? That is, under what conditions do you think one is justified in making *predictions* on the basis of a model?

My reply:

No model will perform well for every kind of prediction. For any particular kind of prediction, we can use posterior predictive checks and related ideas such as cross-validation to see if the model performs well on these dimensions of interest. There will (almost) always be some assumptions required, some sense in which any prediction is conditional on something. Stepping back a bit, I'd say that scientists get experience with certain models, they work well for prediction until they don't. For an example from my own research, consider opinion polling. Those survey estimates you see in the newspapers are conditional on all sorts of assumptions. Different assumptions get checked at different times, often after some embarrassing failure.

Hey, here's a book I'm not planning to read any time soon!

As Bill James wrote, the alternative to good statistics is not "no statistics," it's bad statistics.

(I wouldn't have bothered to bring this one up, but I noticed it on one of our sister blogs.)

Uh-oh

I don't know for sure, but I've long assumed that we get most of our hits from the link on the Marginal Revolution page. The bad news is that in their new design, they seem to have removed the blogroll!

Coauthorship norms

I followed this link from Chris Blattman to an article by economist Roland Fryer, who writes:

I [Fryer] find no evidence that teacher incentives increase student performance, attendance, or graduation, nor do I find any evidence that the incentives change student or teacher behavior.

What struck me were not the findings (which, as Fryer notes in his article, are plausible enough) but the use of the word "I" rather than "we." A field experiment is a big deal, and I was surprised to read that Fryer did it all by himself!

Here's the note of acknowledgments (on the first page of the article):

This project would not have been possible without the leadership and support of Joel Klein. I am also grateful to Jennifer Bell-Ellwanger, Joanna Cannon, and Dominique West for their cooperation in collecting the data necessary for this project, and to my colleagues Edward Glaeser, Richard Holden, and Lawrence Katz for helpful comments and discussions. Vilsa E. Curto, Meghan L. Howard, Won Hee Park, Jörg Spenkuch, David Toniatti, Rucha Vankudre, and Martha Woerner provided excellent research assistance.

Joel Klein was the schools chancellor so I assume he wasn't deeply involved in the study; his role was presumably to give it his OK. I'm surprised that none of the other people ended up as coauthors on the paper. But I guess it makes sense: My colleagues and I will write a paper based on survey data without involving the data collectors as coauthors, so why not do this with experimental data too? I guess I just find field experiments so intimidating that I can't imagine writing an applied paper on the topic without a lot of serious collaboration. (And, yes, I feel bad that it was only my name on the cover of Red State, Blue State, given that the book had five authors.) Perhaps the implicit rules about coauthorship are different in economics than in political science.

P.S. I was confused by one other thing in Fryer's article. On page 1, it says:

Despite these reforms to increase achievement, Figure 1 demonstrates that test scores have been largely constant over the past thirty years.

Here's Figure 1:

testtrends.png

Once you get around the confusingly-labeled lines and the mass of white space on the top and bottom of each graph, you see that math scores have improved a lot! Since 1978, fourth-grade math scores have gone up so much that they're halfway to where eighth grade scores were in 1978. Eighth grade scores also have increased substantially, and twelfth-grade scores have gone up too (although not by as much). Nothing much has happened with reading scores, though. Perhaps Fryer just forgot to add the word "reading" in the sentence above. Or maybe something else is going on in Figure 1 that I missed. I only wish that he'd presented the rest of his results graphically. Even a sloppy graph is a lot easier for me to follow than a table full of numbers presented to three decimal places. I know Fryer can do better; his previous papers had excellent graphs (see here and here).

Secret weapon with rare events

Gregory Eady writes:

I'm working on a paper examining the effect of superpower alliance on a binary DV (war). I hypothesize that the size of the effect is much higher during the Cold War than it is afterwards. I'm going to run a Chow test to check whether this effect differs significantly between 1960-1989 and 1990-2007 (Scott Long also has a method using predicted probabilities), but I'd also like to show the trend graphically, and thought that your "Secret Weapon" would be useful here. I wonder if there is anything I should be concerned about when doing this with a (rare-events) logistic regression. I was thinking to graph the coefficients in 5-year periods, moving a single year at a time (1960-64, 1961-65, 1962-66, and so on), reporting the coefficient in the graph for the middle year of each 5-year range).

My reply:

I don't know nuthin bout no Chow test but, sure, I'd think the secret weapon would work. If you're analyzing 5-year periods, it might be cleaner just to keep the periods disjoint. Set the boundaries of these periods in a reasonable way (if necessary using periods of unequal lengths so that your intervals don't straddle important potential change points). I suppose in this case you could do 1960-64, 65-69, ..., and this would break at 1989/90 so it would be fine. If you're really running into rare events, though, you might want 10-year periods rather than 5-year.

Single or multiple imputation?

Vishnu Ganglani writes:

It appears that multiple imputation appears to be the best way to impute missing data because of the more accurate quantification of variance. However, when imputing missing data for income values in national household surveys, would you recommend it would be practical to maintain the multiple datasets associated with multiple imputations, or a single imputation method would suffice. I have worked on household survey projects (in Scotland) and in the past gone with suggesting single methods for ease of implementation, but with the availability of open source R software I am think of performing multiple imputation methodologies, but a bit apprehensive because of the complexity and also the need to maintain multiple datasets (ease of implementation).

My reply: In many applications I've just used a single random imputation to avoid the awkwardness of working with multiple datasets. But if there's any concern, I'd recommend doing parallel analyses on multiple imputed datasets and then combining inferences at the end.

Rajiv Sethi has some very interesting things to say:

As the election season draws closer, considerable attention will be paid to prices in prediction markets such as Intrade. Contracts for potential presidential nominees are already being scrutinized for early signs of candidate strength. . . .

This interpretation of prices as probabilities is common and will be repeated frequently over the coming months. But what could the "perceived likelihood according to the market" possibly mean?

Prediction market prices contain valuable information about this distribution of beliefs, but there is no basis for the common presumption that the price at last trade represents the beliefs of a hypothetical average trader in any meaningful sense [emphasis added]. In fact, to make full use of market data to make inferences about the distribution of beliefs, one needs to look beyond the price at last trade and examine the entire order book.

Sethi looks at some of the transaction data and continues:

What, then, can one say about the distribution of beliefs in the market? To begin with, there is considerable disagreement about the outcome. Second, this disagreement itself is public information: it persists despite the fact that it is commonly known to exist. . . . the fact of disagreement is not itself considered to be informative, and does not lead to further belief revision. The most likely explanation for this is that traders harbor doubts about the rationality or objectivity of other market participants. . . .

More generally, it is entirely possible that beliefs are distributed in a manner that is highly skewed around the price at last trade. That is, it could be the case that most traders (or the most confident traders) all fall on one side of the order book. In this case the arrival of seemingly minor pieces of information can cause a large swing in the market price.

Sethi's conclusion:

There is no meaningful sense in which one can interpret the price at last trade as an average or representative belief among the trading population.

This relates to a few points that have come up here on occasion:

1. We're often in the difficult position of trying to make inferences about marginal (in the economic sense) quantities from aggregate information.

2. Markets are impressive mechanisms for information aggregation but they're not magic. The information has to come from somewhere, and markets are inherently always living in the phase transition between stability and instability. (It is the stability that makes prices informative and the instability that allows the market to be liquid.)

3. If the stakes in a prediction market are too low, participants have the incentive and ability to manipulate it; if the stakes are too high, you have to worry about point-shaving.

This is not to say that prediction markets are useless, just that they are worth studying seriously in their own right, not to be treated as oracles. By actually looking at and analyzing some data, Sethi goes far beyond my sketchy thoughts in this area.

It's no fun being graded on a curve

Mark Palko points to a news article by Michael Winerip on teacher assessment:

No one at the Lab Middle School for Collaborative Studies works harder than Stacey Isaacson, a seventh-grade English and social studies teacher. She is out the door of her Queens home by 6:15 a.m., takes the E train into Manhattan and is standing out front when the school doors are unlocked, at 7. Nights, she leaves her classroom at 5:30. . . .

Her principal, Megan Adams, has given her terrific reviews during the two and a half years Ms. Isaacson has been a teacher. . . . The Lab School has selective admissions, and Ms. Isaacson's students have excelled. Her first year teaching, 65 of 66 scored proficient on the state language arts test, meaning they got 3's or 4's; only one scored below grade level with a 2. More than two dozen students from her first two years teaching have gone on to . . . the city's most competitive high schools. . . .

You would think the Department of Education would want to replicate Ms. Isaacson . . . Instead, the department's accountability experts have developed a complex formula to calculate how much academic progress a teacher's students make in a year -- the teacher's value-added score -- and that formula indicates that Ms. Isaacson is one of the city's worst teachers.

According to the formula, Ms. Isaacson ranks in the 7th percentile among her teaching peers -- meaning 93 per cent are better. . . .

How could this happen to Ms. Isaacson? . . . Everyone who teaches math or English has received a teacher data report. On the surface the report seems straightforward. Ms. Isaacson's students had a prior proficiency score of 3.57. Her students were predicted to get a 3.69 -- based on the scores of comparable students around the city. Her students actually scored 3.63. So Ms. Isaacson's value added is 3.63-3.69.

Remember, the exam is on a 1-4 scale, and we were already told that 65 out of 66 students scored 3 or 4, so an average of 3.63 (or, for that matter, 3.69) is plausible. The 3.57 is "the average prior year proficiency rating of the students who contribute to a teacher's value added score." I assume that the "proficiency rating" is the same as the 1-4 test score but I can't be sure.

The predicted score is, according to Winerip, "based on 32 variables -- including whether a student was retained in grade before pretest year and whether a student is new to city in pretest or post-test year. . . . Ms. Isaacson's best guess about what the department is trying to tell her is: Even though 65 of her 66 students scored proficient on the state test, more of her 3s should have been 4s."

This makes sense to me. Winerip seems to presenting this is as some mysterious process but it seems pretty clear to me. A "3" is a passing grade, but if you're teaching in a school with "selective admissions" with the particular mix of kids that this teacher has, the expectation is that most of your students will get "4"s.

We can work through the math (at least approximately). We don't know this teacher's students did this year so I'll use the data given above, from her first year. Suppose that x students in the class got 4's, 65-x got 3's, and one student got a 2. To get an average of 3.63, you need 4x + 3(65-x) + 2 = 3.63*66. That is, x = 3.63*66 - 2 - 3*65 = 42.58. This looks like x=43. Let's try it out: (4*43 + 3*22 + 2)/66 = 3.63 (or, to three decimal places, 3.636). This is close enough for me. To get 3.69 (more precisely, 3.697), you'd need 47 4's, 18 3's, and a 2. So the gap would be covered by four students (in a class of 66) moving up from a 3 to a 4. This gives a sense of the difference between a teacher in the 7th percentile and a teacher in the 50th.

I wonder what this teacher's value-added scores were for the previous two years.

John Sides points to this discussion (with over 200 comments!) by political scientist Charli Carpenter of her response to a student from another university who emailed with questions that look like they come from a homework assignment. Here's the student's original email:

Hi Mr. Carpenter,

I am a fourth year college student and I have the honor of reading one of your books and I just had a few questions... I am very fascinated by your work and I am just trying to understand everything. Can you please address some of my questions? I would greatly appreciate it. It certainly help me understand your wonderful article better. Thank you very much! :)

1. What is the fundamental purpose of your article?

2. What is your fundamental thesis?

3. What evidence do you use to support your thesis?

4. What is the overall conclusion?

5. Do you feel that you have a fair balance of opposing viewpoints?

Sincerely,

After a series of emails in which Carpenter explained why she thought these questions were a form of cheating on a homework assignment and the student kept dodging the issues, Carpenter used the email address to track down the student's name and then contacted the student's university.

I have a few thoughts on this.

- Carpenter and her commenters present this bit of attempted cheating as a serious violation on the student's part. I see where she's coming from--after all, asking someone else to do your homework for you really is against the rules--but, from the student's perspective, sending an email to an article's author is just a slightly enterprising step beyond scouring the web for something written on the article. And you can't stop students from searching the web. All you can hope for is that students digest any summaries they read and ultimately spit out some conclusions in their own words.

- To me, what would be most annoying about receiving the email above is how insulting it is:

Will Wilkinson adds to the discussion of Jonathan Haidt's remarks regarding the overwhelming prevalance of liberal or left-wing attitudes among psychology professors. I pretty much agree with Wilkinson's overview:

Folks who constantly agree with one another grow insular, self-congratulatory, and not a little lazy. The very possibility of disagreement starts to seem weird or crazy. When you're trying to do science about human beings, this attitude's not so great.

Wilkinson also reviewed the work of John Jost in this area. Jost is a psychology researcher with the expected liberal/left political leanings, but his relevance here is that he has actually done research on political attitudes and personality types. In Wilkinson's words:

Jost has done plenty of great work that helps explain not only why the best minds in science are liberal, but why most scientists-most academics, even-are liberal. Individuals with the personality trait that most strongly predicts an inclination toward liberal politics also predict an attraction to academic careers. That's why, as Haidt well notes, it's silly to expect the distribution of political opinion in academia to mirror the distribution of opinion in society at large. . . . one of the most interesting parts of Jost's work shows how personality, which is largely hereditary, predicts political affinity. Of the "Big Five" personality traits, "openness to experience" and "conscientiousness" stand out for their effects on political inclination. . . . the content of conservatism and liberalism changes over time. We live in a liberal and liberalizing culture, so today's conservatives, for example, are very liberal compared to conservatives of their grandparents' generation. But there is a good chance they inherited some of their tendency toward conservatism from grandparents.

University professors and military officers

The cleanest analogy, I think, is between college professors (who are disproportionately liberal Democrats) and military officers (mostly conservative Republicans; see this research by Jason Dempsey). In both cases there seems to be a strong connection between the environment and the ideology. Universities have (with some notable exceptions) been centers of political radicalism for centuries, just as the military has long been a conservative institution in most places (again, with some exceptions).

And this is true even though many university professors are well-paid, live well, and send their kids to private schools, and even though the U.S. military has been described as the one of the few remaining bastions of socialism remaining in the 21st century.

Assumptions vs. conditions, part 2

In response to the discussion of his remarks on assumptions vs. conditions, Jeff Witmer writes:

If [certain conditions hold] , then the t-test p-value gives a remarkably good approximation to "the real thing" -- namely the randomization reference p-value. . . .

I [Witmer] make assumptions about conditions that I cannot check, e.g., that the data arose from a random sample. Of course, just as there is no such thing as a normal population, there is no such thing as a random sample.

I disagree strongly with both the above paragraphs! I say this not to pick a fight with Jeff Witmer but to illustrate how, in statistics, even the most basic points that people take for granted, can't be.

Let's take the claims in order:

1. The purpose of a t test is to approximate the randomization p-value. Not to me. In my world, the purpose of t tests and intervals is to summarize uncertainty in estimates and comparisons. I don't care about a p-value and almost certainly don't care about a randomization distribution. I'm not saying this isn't important, I just don't think it's particularly fundamental. One might as well say that the randomization p-value is a way of approximating the ultimate goal which is the confidence interval.

2. There is no such thing as a random sample. Hey--I just drew a random sample the other day! Well, actually it was a few months ago, but still. It was a sample of records to examine for a court case. I drew random numbers in R and everything.

Assumptions vs. conditions

Jeff Witmer writes:

I noticed that you continue the standard practice in statistics of referring to assumptions; e.g. a blog entry on 2/4/11 at 10:54: "Our method, just like any model, relies on assumptions which we have the duty to state and to check."

I'm in the 6th year of a three-year campaign to get statisticians to drop the word "assumptions" and replace it with "conditions." The problem, as I see it, is that people tend to think that an assumption is something that one assumes, as in "assuming that we have a right triangle..." or "assuming that k is even..." when constructing a mathematical proof.

But in statistics we don't assume things -- unless we have to. Instead, we know that, for example, the validity of a t-test depends on normality, which is a condition that can and should be checked. Let's not call normality an assumption, lest we imply that it is something that can be assumed. Let's call it a condition.

What do you all think?

Responding to a proposal to move the journal Political Analysis from double-blind to single-blind reviewing (that is, authors would not know who is reviewing their papers but reviewers would know the authors' names), Tom Palfrey writes:

I agree with the editors' recommendation. I have served on quite a few editorial boards of journals with different blinding policies, and have seen no evidence that double blind procedures are a useful way to improve the quality of articles published in a journal. Aside from the obvious administrative nuisance and the fact that authorship anonymity is a thing of the past in our discipline, the theoretical and empirical arguments in both directions lead to an ambiguous conclusion. Also keep in mind that the editors know the identity of the authors (they need to know for practical reasons), their identity is not hidden from authors, and ultimately it is they who make the accept/reject decision, and also lobby their friends and colleagues to submit "their best work" to the journal. Bias at the editorial level is far more likely to affect publication decisions than bias at the referee level, and double blind procedures don't affect this. One could argue then that perhaps the main thing double blinding does is shift the power over journal content even further from referees and associate editors to editors. It certainly increases the informational asymmetry.

Another point of fact is that the use of double blind procedures in economics and political science shares essentially none of the justifications for it with the other science disciplines from which the idea was borrowed. In these other disciplines, like biology, such procedures exist for different (and good) reasons. Rather than a concern about biasing in favor of well-known versus lesser-known authors, in these other fields it is driven by a concern of bias because of the rat-race competition over a rapidly moving frontier of discovery. Because of the speed at which the frontier is moving, authors of new papers are intensely secretive (almost paranoid) about their work. Results are kept under wrap until the result has been accepted for publication - or in some cases until it is actually published. [Extra, Extra, Read All About It: PNAS article reports that Caltech astronomer Joe Shmoe discovered a new planet three months ago...] Double blind is indeed not a fiction in these disciplines. It is real, and it serves a real purpose. Consider the contrast with our discipline, in which many researchers drool over invitations from top places to present their newest results, even if the paper does not yet exist or is in very rough draft form. Furthermore, financial incentives for bias in these other disciplines are very strong, given the enormous stakes of funding. [Think how much a new telescope costs.] Basically none of the rationales for double blinding in those disciplines applies to political science. One final note. In those disciplines, editors are often "professional" editors. That is, they do not have independent research careers. This may have to do with the potential bias that results from intense competition in disciplines where financial stakes are enormous and the frontier of discovery moves at 'blinding' speed.

Tom's comparison of the different fields was a new point to me and it seems sensible.

I'd also add that I'm baffled by many people's attitudes toward reviewing articles for journals. As I've noted before, I don't think people make enough of the fact that editing and reviewing journal articles is volunteer work. Everyone's always getting angry at referees and saying what they should or should not do, but, hey--we're doing it for free. In this situation, I think it's important to get the most you can out of all participants.

Mark Palko asks what I think of this news article by John Tierney. The article's webpage is given the strange incomplete title above.

My first comment is that the headline appears false. I didn't see any evidence presented of liberal bias. (If the headline says "Social psychologists detect," I expect to see some detection, not just anecdotes.) What I did see was a discussion of the fact that most academic psychologists consider themselves politically liberal (a pattern that holds for academic researchers in general), along with some anecdotes of moderates over the years who have felt their political views disrespected by the liberal majority.

  1. Have data graphics progressed in the last century?
    The first addresses familiar subjects to readers of the blog, with some nice examples of where infographics emphasize the obvious, or increase the probability of an incorrect insight.Your Help Needed: the Effect of Aesthetics on Visualization
    I borrow the term ‘insight’ from the second link, a study by a group of design & software researchers based around a single interactive graphic. This is similar in spirit to Unwin’s ‘caption this graphic’ assignment.

Timothy Noah reports:

At the end of 2007, Harvard announced that it would limit tuition to no more than 10 percent of family income for families earning up to $180,000. (It also eliminated all loans, following a trail blazed by Princeton, and stopped including home equity in its calculations of family wealth.) Yale saw and raised to $200,000, and other wealthy colleges weighed in with variations.

Noah argues that this is a bad thing because it encourages other colleges to give tuition breaks to families with six-figure incomes, thus sucking up money that could otherwise go to reduce tuition for lower-income students. For example:

Roger Lehecka, a former dean of students at Columbia, and Andrew Delbanco, director of American studies there, wrote in the New York Times that Harvard's initiative was "good news for students at Harvard or Yale" but "bad news" for everyone else. "The problem," they explained, "is that most colleges will feel compelled to follow Harvard and Yale's lead in price-discounting. Yet few have enough money to give more aid to relatively wealthy students without taking it away from relatively poor ones."

I don't follow the reasoning here. Noah also writes that Harvard received "35,000 applications for fewer than 1,700 slots," so I don't see why these other schools have to match Harvard at all. Why not just compete for the 33,300 kids who get rejected from Harvard (not to mention those who don't apply to the big H at all)? Sure, there's Yale too, but still, there's something about this story that's bothering me.

Ultimately, this doesn't seem like it's about income at all. I mean, suppose Harvard, Yale, etc., took the big steps of zeroing out their tuitions entirely, so that even Henry Henhouse III could send little Henry IV to Harvard without paying a cent (ok, maybe something for room and board, but really that could be free too, if Harvard wanted to do it that way). Now maybe this wouldn't be a good move for the university--I'm sure the money would be more effectively spent as a salary increase for the statistics and political science faculty--but let's not worry about the details. The point is, if Harvard and Yale became free, Noah's argument would continue to hold. But is it really right to criticize a rich institution for giving things out for free? I'm planning to publish my intro statistics book for free. Does this mean I'm a bad guy because I'm depriving Cambridge University Press the free money that they can use to subsidize worthy but unprofitable books on classical studies? I don't think so.)

To put it another way, it seems pretty weird to me to say that Harvard has an obligation to keep its tuition high, just to give other colleges a break. If Harvard and Yale want to cut tuition costs, or if MIT wants to stream lectures online for free, that's good, no?

?

But I think I'm missing something. At the end of his essay, Noah says he wants college costs to decrease ("surely the answer is to curb the inflation of this commodity's price"), which seems to contradict his earlier complaints about Harvard and Yale's tuition-cutting. I'd be interested in hearing from him (and from Lehecka and Delbanco at Columbia) what their ideal Harvard and Yale tuition plans would be. These institutions already charge very little for kids from low-income families, so if you want to cut the cost of tuition, but not to offer discounts for the upper middle class, then what exactly are they recommending? It's hard for me to imagine they want Harvard to cut tuition for rich kids, but that seems like the only option left. I'm confused.

The new R environment RStudio looks really great, especially for users new to R. In teaching, these are often people new to programming anything, much less statistical models. The R GUIs were different on each platform, with (sometimes modal) windows appearing and disappearing and no unified design. RStudio fixes that and has already found a happy home on my desktop.

Initial impressions

I've been using it for the past couple of days. For me, it replaces the niche that R.app held: looking at help, quickly doing something I don't want to pollute a project workspace with; sometimes data munging, merging, and transforming; and prototyping plots. RStudio is better than R.app at all of these things. For actual development and papers, though, I remain wedded to emacs+ess (good old C-x M-c M-Butterfly).

Favorite features in no particular order

  • plots seamlessly made in new graphics devices. This is huge— instead of one active plot window named something like quartz(1) the RStudio plot window holds a whole stack of them, and you can click through to previous ones that would be overwritten and ‘lost’ in R.app.
  • help viewer. Honestly I use this more than anything else in R.app and the RStudio one is prettier (mostly by being not set in Times), and you can easily get contextual help from the source doc or console pane (hit tab for completions, then F1 on what you want).
  • workspace viewer with types and dimensions of objects. Another reason I sometimes used R.app instead of emacs. This one doesn’t seem much different from the R.app one, but its integration into the environment is better than floaty thing that R.app does.
  • ‘Import Dataset’ menu item and button in the workspace pane. For new R users, the answer to “How do I get data into this thing?” has always been “Use one of the textbook package’s included datasets until you learn to read.csv()”. This is a much better answer.
  • obviously, the cross-platform nature of RStudio took the greatest engineering effort. The coolest platform is actually that it will run on a server and you access it using a modern browser (i.e., no IE). (“While RStudio is compatible with Internet Explorer, other browsers provide an improved user experience through faster JavaScript performance and more consistent handling of browser events.” more).

It would be nice if…

  • indents worked like emacs. I think my code looks nice largely because of emacs+ess. The default indent of two spaces is nice (see the Google style guide) but where newlines line up by default is pretty helpful in avoiding silly typing errors (omitted commas, unclosed parentheses
  • you could edit data.frames, which I’ll guess they are working on. It must be hard, since the R.app one and the X one that comes up in emacs are so abysmal (the R.app one is the least bad). RStudio currently says “ Editing of matrix and data.frame objects is not currently supported in RStudio.” :-(

Overall, really great stuff!

Dikran Karagueuzian writes:

What Zombies see in Scatterplots

This video caught my interest - news video clip
(from this
post2)

http://www.stat.columbia.edu/~cook/movabletype/archives/2011/02/on_summarizing.html

The news commentator did seem to be trying to point out what a couple of states had to say about the claimed relationship - almost on their own.

Some methods have been worked out for zombies to do just this!

So I grabbed the data as close as I quickly could, modified the code slightly and here's the zombie veiw of it.

PoliticInt.pdf

North Carolina is the bolded red curve, Idaho the bolded green curve.
Missisipi and New York are the bolded blue.

As ugly as it is this is the Bayasian marginal picture - exactly (given MCMC errror).

K?
p.s. you will get a very confusing picture if you forget to centre the x (i.e. see chapter 4 of Gelman and Hill book)

This was just bizarre. It's an interview with Colin Camerer, a professor of finance and economics at Caltech, The topic is Camerer's marriage, but what's weird is that he doesn't say anything specific about his wife at all. All we get are witticisms of the sub-Henny-Youngman level, for example, in response to the question, "Any free riding in your household?", Camerer says:

No. Here's why: I am one of the world's leading experts on psychology, the brain and strategic game theory. But my wife is a woman. So it's a tie.

Also some schoolyard evolutionary biology ("men signaling that they will help you raise your baby after conception, and women signaling fidelity" blah blah blah) and advice for husbands in "upper-class marriages with assets." (No advice to the wives, but maybe that's a good thing.) And here are his insights on love and marriage:

Marriage is like hot slow-burning embers compared to the flashy flames of love. After the babies, the married brain has better things to do--micromanage, focus on those babies, create comfort zones. Marriage love can then burrow deeper, to the marrow.

To the marrow, eh? And what about couples who have no kids? Then maybe you're burrowing through the skin and just to the surface of the bone, I guess.

It seems like a wasted opportunity, really: this dude could've shared insights from his research and discussed its applicability (or the limitations of its applicability) to love, a topic that everybody cares about. (In contrast, another interview in this Economists in Love series, by Daniel Hamermesh, was much more to the point.)

Yeah, sure, I'm a killjoy, the interview is just supposed to be fluff, etc. Still, what kind of message are you sending when you define yourself as "one of the world's leading experts on psychology" and define your wife as "a woman"? Yes, I realize it's supposed to be self-deprecating, but to me it comes off as self-deprecating along the lines of, "Yeah, my cat's much smarter than I am. She gets me to do whatever she wants. Whenever she cries out, I give her food."

I'm not talking about political correctness here. I'm more worried about the hidden assumptions that can sap one's research, as well as the ways in which subtle and interesting ideas in psychology can become entangled with various not-so-subtle, not-so-well-thought-out ideas on sex roles etc.

I'm being completely unfair to Camerer

I have no idea how this interview was conducted but it could well have been done over the phone in ten minutes. Basically, Camerer is a nice guy and when these reporters called him up to ask him some questions, he said, Sure, why not. And then he said whatever came to his mind. If I were interviewed without preparation and allowed to ramble, I'd say all sorts of foolish things too. So basically I'm slamming Camerer for being nice enough to answer a phone call and then having the misfortune to see has casual thoughts spread all over the web (thanks to a link from Tyler Cowen, who really should've known better). So I take it all back.

P.S. Camerer's webpage mentions that he received his Ph.D. in 1981 at the age of 22. Woudn't it be more usual to simply give your birth year (1958 or 1959, in this case)? Perhaps it's some principle of behavioral economics, that if people have to do the subtraction they'll value the answer a bit more.

Heat map

Jarad Niemi sends along this plot:

heat.png

and writes:

2010-2011 Miami Heat offensive (red), defensive (blue), and combined (black) player contribution means (dots) and 95% credible intervals (lines) where zero indicates an average NBA player. Larger positive numbers for offensive and combined are better while larger negative numbers for defense are better.

In retrospect, I [Niemi] should have plotted -1*defensive_contribution so that larger was always better. The main point with this figure is that this awesome combination of James-Wade-Bosh that was discussed immediately after the LeBron trade to the Heat has a one-of-these-things-is-not-like-the-other aspect. At least according to my analysis, Bosh is hurting his team compared to the average player (although not statistically significant) due to his terrible defensive contribution (which is statistically significant).

All fine so far. But the punchline comes at the end, when he writes:

Anyway, a reviewer said he hated the figure and demanded to see a table with the actual numbers instead.

Oof!

John Cook links to a blog by Ben Deaton arguing that people often waste time trying to set up ideal working conditions, even though (a) your working conditions will never be ideal, and (b) the sorts of constraints and distractions that one tries to avoid, can often stimulate new ideas.

Deaton seems like my kind of guy--for one thing, he works on nonlinear finite element analysis, which is one of my longstanding interests--and in many ways his points are reasonable and commonsensical (I have little doubt, for example, that Feynman made a good choice in staying clear of the Institute for Advanced Study!), but I have a couple of points of disagreement.

1. In my experience, working conditions can make a difference. And once you accept this, it could very well make sense to put some effort into improving your work environment. I like to say that I spent twenty years reconstructing what it felt like to be in grad school. My ideal working environment has lots of people coming in and out, lots of opportunities for discussion, planned and otherwise. It's nothing like I imagine the Institute for Advanced Study (not that I've ever been there) but it makes me happy. So I think Deaton is wrong to generalize to "don't spend time trying to keep a very clean work environment" to "don't spend time trying to get a setup that works for you."

2. Also consider effects on others. I like to feel that the efforts I put into my work environment have positive spillovers on others--the people I work with, the other people they work with, etc., also as setting an example for others in the department. In contrast, people who want super-clean work conditions (the sort of thing that Deaton, rightly, is suspicious of) can impose negative externalities on others. For example, one of the faculty in my department once removed my course listings from the department webpage. I never got a straight answer on why this happened, but I assumed it was because he didn't like what he taught, and it offended his sensibilities to see these courses listed. Removing the listing had the advantage from his perspective of cleanliness (I assume) but negatively impacted potential students and others who might have been interested in our course offerings. That is an extreme case, but I think many of us have experienced work environments in which intellectual interactions are discouraged in some way. This is clear from Deaton's stories as well.

3. Deaton concludes by asking his readers, "How ideal is ideal enough for you to do something great?" I agree with his point that there are diminishing returns to optimization and that you shouldn't let difficulties with our workplace stop us from doing good work (unless, of course, you're working somewhere where your employer gets possession of everything you do). But I am wary of his implicit statement that "you" (whoever you are) can "do something great." I think we should all try to do our best, and I'm sure that almost all of us are capable of doing good work. But is everyone out there really situated in a place where he or she can "do something great"? I doubt it. Doing something "great" is a fine aspiration, but I wonder if some of this go-for-it advice can backfire for the people out there who really aren't in a position to achieve greatness.

About 12 years ago Greg Wawro, Sy Spilerman, and I started a M.A. program here in Quantitative Methods in Social Sciences, jointly between the departments of history, economics, political science, sociology, psychology, and statistics. We created a bunch of new features for the program, including an interdisciplinary course based on this book.

Geen Tomko asks:

Can you recommend a good introductory book for statistical computation? Mostly, something that would help make it easier in collecting and analyzing data from student test scores.

I don't know. Usually, when people ask for a starter statistics book, my recommendation (beyond my own books) is The Statistical Sleuth. But that's not really a computation book. ARM isn't really a statistical computation book either. But the statistical computation books that I've seen don't seems so relevant for the analyses that Tomko is looking for. For example, the R book of Venables and Ripley focuses on nonparametric statistics, which is fine but seems a bit esoteric for these purposes.

Does anyone have any suggestions?

Seeing Sarah Palin's recent witticism:

It's no wonder Michelle Obama is telling everybody you need to breast feed your babies ... the price of milk is so high!

I was reminded of Dan Quayle's quip during the 1988 campaign:

The governor of Massachusetts, he lost his top naval adviser last week. His rubber ducky drowned in the bathtub.

And this got me wondering: how often do legitimate political figures--not talk show hosts, but actual politicians--communicate via schoolyard-style taunts?

5 seconds of every #1 pop single

This is pretty amazing. Now I want to hear volume 3. Also is there a way to download this as I play it so I can listen when I'm offline?

P.S. Typo in title fixed.

P.P.S. I originally gave a different link but was led to the apparently more definitive link above (which allows direct download) from a commenter. Thanks!

John Sides discusses how his scatterplot of unionization rates and budget deficits made it onto cable TV news:

It's also interesting to see how he [journalist Chris Hayes] chooses to explain a scatterplot -- especially given the evidence that people don't always understand scatterplots. He compares pairs of cases that don't illustrate the basic hypothesis of Brooks, Scott Walker, et al. Obviously, such comparisons could be misleading, but given that there was no systematic relationship depicted that graph, these particular comparisons are not.

This idea--summarizing a bivariate pattern by comparing pairs of points--reminds me of a well-known statistical identities which I refer to in a paper with David Park:

weightedavg.png

John Sides is certainly correct that if you can pick your pair of points, you can make extremely misleading comparisons. But if you pick every pair of points, and average over them appropriately, you end up with the least-squares regression slope.

Pretty cool, and it helps develop our intuition about the big-picture relevance of special-case comparisons.

A statistical version of Arrow's paradox

Amy Cohen points me to this blog by Jim Manzi, who writes:

Harold Pollack writes:

Antony Unwin writes:

I [Unwin] find it an interesting exercise for students to ask them to write headlines (and subheadlines) for graphics, both for ones they have drawn themselves and for published ones. The results are sometimes depressing, often thought-provoking and occasionally highly entertaining.

This seems like a great idea, both for teaching students how to read a graph and also for teaching how to make a graph. I've long said that when making a graph (or, for that matter, a table), you want to think about what message the reader will get out of it. "Displaying a bunch of numbers" doesn't cut it.

Statisticians vs. everybody else

Statisticians are literalists.

When someone says that the U.K. boundary commission's delay in redistricting gave the Tories an advantage equivalent to 10 percent of the vote, we're the kind of person who looks it up and claims that the effect is less than 0.7 percent.

When someone says, "Since 1968, with the single exception of the election of George W. Bush in 2000, Americans have chosen Republican presidents in times of perceived danger and Democrats in times of relative calm," we're like, Hey, really? And we go look that one up too.

And when someone says that engineers have more sons and nurses have more daughters . . . well, let's not go there.

So, when I was pointed to this blog by Michael O'Hare making the following claim, in the context of K-12 education in the United States:

My [O'Hare's] favorite examples of this junk [educational content with no workplace value] are spelling and pencil-and-paper algorithm arithmetic. These are absolutely critical for a clerk in an office of fifty years ago, but being good at them is unrelated to any real mental ability (what, for example, would a spelling bee in Chinese be?) and worthless in the world we live in now. I say this, by the way, aware that I am the best speller that I ever met (and a pretty good typist). But these are idiot-savant abilities, genetic oddities like being able to roll your tongue. Let's just lose them.

My first reaction was: Are you sure? I also have no systematic data on this, but I strongly doubt that being able to spell and add are "unrelated to any real world abilities" and are "genetic oddities like being able to roll your tongue." For one thing, people can learn to spell and add but I think it's pretty rare for anyone to learn how to roll their tongue! Beyond this, I expect that one way to learn spelling is to do a lot of reading and writing, and one way to learn how to add is to do a lot of adding (by playing Monopoly or whatever). I'd guess that these are indeed related to "real mental ability," however that is defined.

My guess is that, to O'Hare, my reactions would miss the point. He's arguing that schools should spend less time teaching kids spelling and arithmetic, and his statements about genetics, rolling your tongue, and the rest are just rhetorical claims. I'm guessing that O'Hare's view on the relation between skills and mental ability, say, is similar to Tukey's attitude about statistical models: they're fine as an inspiration for statistical methods (for Tukey) or as an inspiration for policy proposals (for O'Hare), but should not be taken literally. That things I write are full of qualifications, which might be a real hindrance if you're trying to propose policy changes.

Weather visualization with WeatherSpark

WeatherSpark: prediction and observation quantiles, historic data, multiple predictors, zoomable, draggable, colorful, wonderful:

weatherspark.png

weatherspark2.png

Via Jure Cuhalev.

"The best living writer of thrillers"

On the back of my yellowing pocket book of "The Mask of Dimitros" is the following blurb:

'Eric Ambler is the best living writer of thrillers.' -- News Chonicle

What I'm wondering is, why the qualifier "living"? Did the News Chronicle think there was a better writers of thrillers than Ambler who was no longer alive? I can't imagine who that could be, considering that Ambler pretty much defined the modern thriller.

Val has reported success with the following trick:

Get to the classroom a few minutes earlier and turn on soft music. Then set everything up and, the moment it's time for class to begin, put a clicker question on the screen and turn off the music. The students quiet down and get to work right away.

I've never liked the usual struggle with students to get them to settle down in class, as it seemed to set up a dynamic in which I was trying to get the students to focus and they were trying to goof off. Turning off the music seems like a great non-confrontational way to send the signal that class is starting.

Steve Hsu has posted a series of reflections here, here, and here on the dominance of graduates of HYPS (Harvard, Yale, Princeton, and Stanford (in that order, I believe)) in various Master-of-the-Universe-type jobs at "elite law firms, consultancies, and I-banks, hedge/venture funds, startups, and technology companies." Hsu writes:

In the real world, people believe in folk notions of brainpower or IQ. ("Quick on the uptake", "Picks things up really fast", "A sponge" ...) They count on elite educational institutions to do their g-filtering for them. . . .

Most top firms only recruit at a few schools. A kid from a non-elite UG school has very little chance of finding a job at one of these places unless they first go to grad school at, e.g., HBS, HLS, or get a PhD from a top place. (By top place I don't mean "gee US News says Ohio State's Aero E program is top 5!" -- I mean, e.g., a math PhD from Berkeley or a PhD in computer science from MIT -- the traditional top dogs in academia.) . . .

I teach at U Oregon and out of curiosity I once surveyed the students at our Honors College, which has SAT-HSGPA characteristics similar to Cornell or Berkeley. Very few of the kids knew what a venture capitalist or derivatives trader was. Very few had the kinds of life and career aspirations that are *typical* of HYPS or techer kids. . . .

I have just a few comments.

1. Getting in to a top college is not the same as graduating from said college--and I assume you have to have somewhat reasonable grades (or some countervailing advantage). So, yes, the people doing the corporate hiring are using the educational institutions to do their "g-filtering," but it's not all happening at the admissions stage. Hsu quotes researcher Lauren Rivera as writing, "it was not the content of an elite education that employers valued but rather the perceived rigor of these institutions' admissions processes"--but I don't know if I believe that!

2. As Hsu points out (but maybe doesn't emphasize enough), the selection processes at these top firms don't seem to make a lot of sense even on their own terms. Here's another quote from Rivera: "his halo effect of school prestige, combined with the prevalent belief that the daily work performed within professional service firms was "not rocket science" gave evaluators confidence that the possession of an elite credential was a sufficient signal of a candidate's ability to perform the analytical capacities of the job." The reasoning seems to be: The job isn't so hard so the recruiters can hire whoever they want if such people pass a moderately stringent IQ threshold, thus they can pick the HYPS graduates who they like. It seems like a case of the lexicographic fallacy: the idea that you pick IQ based on the school and then clubbability, etc., among the subset of applicants who remain.

3. I should emphasize that academic hiring is far from optimal. We never know who's going to apply for our postdoc positions. And, when it comes to faculty hiring, I think Don Rubin put it best when he said that academic hiring committees all to often act as if they're giving out an award rather than trying to hire someone to do a job. And don't get me started on tenure review committees.

4. Regarding Hsu's last point above, I've long been glad that I went to MIT rather than Harvard, maybe not overall--I was miserable in most of college--but for my future career. Either place I would've taken hard classes and learned a lot, but one advantage of MIT was that we had no sense--no sense at all--that we could make big bucks. We had no sense of making moderately big bucks as lawyers, no sense of making big bucks working on Wall Street, and no sense of making really big bucks by starting a business. I mean, sure, we knew about lawyers (but we didn't know that a lawyer with technical skills would be a killer combination), we knew about Wall Street (but we had no idea what they did, other than shout pork belly prices across a big room), and we knew about tech startups (but we had no idea that they were anything to us beyond a source of jobs for engineers). What we were all looking for was a good solid job with cool benefits (like those companies in California that had gyms at the office). I majored in physics, which my friends who were studying engineering thought was a real head-in-the-clouds kind of thing to do, not really practical at all. We really had no sense that a physicist degree from MIT degree with good grades was a hot ticket.

And it wasn't just us, the students, who felt this way. It was the employers too. My senior year I applied to some grad schools (in physics and in statistics) and to some jobs. I got into all the grad schools and got zero job interviews. Not just zero jobs. Zero interviews. And these were not at McKinsey, Goldman Sachs, etc. (none of which I'd heard of). They were places like TRW, etc. The kind of places that were interviewing MIT physics grads (which is how I thought of applying for these jobs in the first place). And after all, what could a company like that do with a kid with perfect physics grades from MIT? Probably not enough of a conformist, eh?

This was fine for me--grad school suited me just fine. I'm just glad that big-buck$ jobs weren't on my radar screen. I think I would've been tempted by the glamour of it all. If I'd gone to college 10 or 20 years later, I might have felt that as a top MIT grad, I had the opportunity--even the obligation, in a way--to become some sort of big-money big shot. As it was, I merely thought i had the opportunity and obligation to make important contributions in science, which is a goal that I suspect works better for me (and many others like me).

P.S. Hsu says that "much of (good) social science seems like little more than documenting what is obvious to any moderately perceptive person with the relevant life experience." I think he might be making a basic error here. If you come up with a new theory, you'll want to do two things: (a) demonstrate that it predicts things you already know, and (b) use it to make new predictions. To develop, understand, and validate a theory, you have to do a lot of (a)--hence Hsu's impression--in order to be ready to do (b).

A simpler response to Hsu is that it's common for "moderately perceptive persons with the relevant life experience" to disagree with each other. In my own field of voting and elections, even someone as renowned as Michael Barone (who is more than moderately perceptive and has much more life experience than I do) can still get things embarrassingly wrong. (My reflections on "thinking like a scientist" may be relevant here.)

P.P.S. Various typos fixed.

Annals of really really stupid spam

This came in the inbox today:

Chris Masse points me to this response by Daryl Bem and two statisticians (Jessica Utts and Wesley Johnson) to criticisms by Wagenmakers et.al. of Bem's recent ESP study. I have nothing to add but would like to repeat a couple bits of my discussions of last month, of here:

Classical statistical methods that work reasonably well when studying moderate or large effects (see the work of Fisher, Snedecor, Cochran, etc.) fall apart in the presence of small effects.

I think it's naive when people implicitly assume that the study's claims are correct, or the study's statistical methods are weak. Generally, the smaller the effects you're studying, the better the statistics you need. ESP is a field of small effects and so ESP researchers use high-quality statistics.

To put it another way: whatever methodological errors happen to be in the paper in question, probably occur in lots of researcher papers in "legitimate" psychology research. The difference is that when you're studying a large, robust phenomenon, little statistical errors won't be so damaging as in a study of a fragile, possibly zero effect.

In some ways, there's an analogy to the difficulties of using surveys to estimate small proportions, in which case misclassification errors can loom large.

And here:

[One thing that Bem et al. and Wagenmakers et al. both miss] is that Bayes is not just about estimating the weight of evidence in favor of a hypothesis. The other key part of Bayesian inference--the more important part, I'd argue--is "shrinkage" or "partial pooling," in which estimates get pooled toward zero (or, more generally, toward their estimates based on external information).

Shrinkage is key, because if all you use is a statistical significance filter--or even a Bayes factor filter--when all is said and done, you'll still be left with overestimates. Whatever filter you use--whatever rule you use to decide whether something is worth publishing--I still want to see some modeling and shrinkage (or, at least, some retrospective power analysis) to handle the overestimation problem. This is something Martin and I discussed in our discussion of the "voodoo correlations" paper of Vul et al.

Finally, my argument for why a top psychology journal should never have published Bem's article:

I mean, how hard would it be for the experimenters to gather more data, do some sifting, find out which subjects are good at ESP, etc. There's no rush, right? No need to publish preliminary, barely-statistically-significant findings. I don't see what's wrong with the journal asking for better evidence. It's not like a study of the democratic or capitalistic peace, where you have a fixed amount of data and you have to learn what you can. In experimental psychology, once you have the experiment set up, it's practically free to gather more data.

I made this argument in response to a generally very sensible paper by Tal Yarkoni on this topic.

P.S. Wagenmakers et al. respond (to Bem et al., that is, not to me). As Tal Yarkoni would say, I agree with Wagenmakers et al. on the substantive stuff. But I still think that both they and Bem et al. err in setting up their models so starkly: either there's ESP or there's not. Given the long history of ESP experiments (as noted by some of the commenters below), it seems more reasonable to me to suppose that these studies have some level of measurement error of magnitude larger than that of any ESP effects themselves.

As I've already discussed, I'm not thrilled with the discrete models used in these discussions and I am for some reason particularly annoyed by the labels "Strong," "Substantial," "Anecdotal" in figure 4 of Wagenmakers et al. Whether or not a study can be labeled "anecdotal" seems to me to be on an entirely different dimension than what they're calculating here. Just for example, suppose you conduct a perfect randomized experiment on a large random sample of people. There's nothing anecdotal at all about this (hypothetical) study. As I've described it, it's the opposite of anecdotal. Nonetheless, it might very well be that the effect under study is tiny, in which case a statistical analysis (Bayesian or otherwise) is likely to report no effect. It could fall into the "anecdotal" category used by Wagenmakers et al. But that would be an inappropriate and misleading label.

That said, I think people have to use what statistical methods they're comfortable with, so it's sort of silly for me to fault Wagenmakers et al. for not using the sorts of analysis I would prefer. The key point that they and other critics have made is that the Bem et al. analyses aren't quite as clean as a casual observer might think, and it's possible to make that point coming from various statistical directions. As I note above, my take on this is that if you study very small effects, then no amount of statistical sophistication will save you. If it's really true, as commenter Dean Radin writes below, that these studies "took something like 6 or 7 years to complete," then I suppose it's no surprise that something turned up.

What are the trickiest models to fit?

John Salvatier writes:

What do you and your readers think are the trickiest models to fit? If I had an algorithm that I claimed could fit many models with little fuss, what kinds of models would really impress you? I am interested in testing different MCMC sampling methods to evaluate their performance and I want to stretch the bounds of their abilities.

I don't know what's the trickiest, but just about anything I work on in a serious way gives me some troubles. This reminds me that we should finish our Bayesian Benchmarks paper already.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48  

Recent Comments

  • Wayne: @Bob: Ah, I hadn't seen that posting. The comparison of read more
  • Bob Carpenter: Thanks for the pointer. This dovetails nicely with my earlier read more
  • Wayne: @Bob: Have you considered ADMB (AD Model Builder: http://admb-project.org/) as read more
  • Bob Carpenter: Just the problem we (in a project with Andrew) are read more
  • jsalvatier: That does sound terrifyingly difficult. Perhaps a better question is: read more
  • Matt Hoffman: I'd say that the short answer to this question is: read more
  • Dana: Hierarchical Bayes models of population variability (heterogeneity) are often the read more

Find recent content on the main index or look in the archives to find all content.