Results matching “R”

Jeff Lax sends along this good catch from Ben Somberg, who noticed this from Washington Post writer Lori Montgomery:

If Congress doesn't provide additional stimulus spending, economists inside and outside the administration warn that the nation risks a prolonged period of high unemployment or, more frightening, a descent back into recession. But a competing threat -- the exploding federal budget deficit -- seems to be resonating more powerfully in Congress and among voters.

Somberg is skeptical, though, at least of the part about "resonating among voters." He finds that in four out of five recent polls, people are much more concerned about jobs than about the deficit:

Paired comparisons

Mark Palko writes:

I've got a stat problem I'd like to run past you. It's one of those annoying problems that feels like it should be obvious but the solution has evaded me and the colleagues I've discussed it with. I'm working on a project where the metric of interest is defined in relation to pairs of data points. It has nothing to do with sports or betting but the following analogy (which I also post on the blog) covers the basic situation:

"You want to build a model predicting the spread for games in a new football league. Because the line-up of teams is still in flux, you decide to use only stats from individual teams as inputs (for example, an indicator variable for when the Ambushers play the Ravagers would not be allowed)."

Is there a standard approach for modeling this kind of data?

My reply: I don't quite understand your question, but are you familiar with the Bradley-Terry and Thurstone-Mosteller models for paired comparisons? These are old--from the 1920s and 1940s, I believe--but they might do what you need. Interesting work has been done on these models recently by Hal Stern, Mark Glickman, and others, to allow the underlying parameters to vary over time.

As part of my continuing research project with Grazia and Roberto, I've been reading papers on happiness and life satisfaction research. I'll share with you my thoughts on some of the published work in this area.

Grazia Pittau, Roberto Zelli, and I came out with a paper investigating the role of economic variables in predicting regional disparities in reported life satisfaction of European Union citizens. We use multilevel modeling to explicitly account for the hierarchical nature of our data, respondents within regions and countries, and for understanding patterns of variation within and between regions. Here's what we found:

- Personal income matters more in poor regions than in rich regions, a pattern that still holds for regions within the same country.

- Being unemployed is negatively associated with life satisfaction even after controlled for income variation. Living in high unemployment regions does not alleviate the unhappiness of being out of work.

- After controlling for individual characteristics and modeling interactions, regional differences in life satisfaction still remain.

Here's a quick graph; there's more in the article:

The course outline

ZombieCourseOutline.rtf

Hints/draft R code for implementing this for a regression example from D. Pena
x=c(1:10,17,17,17)
y=c(1:10,25,25,25)

ZombieAssign1.txt

The assignment being to provide a legend that explains all the lines and symbols in this plot

ZombieAssign1.pdf

With a bonus assignment being to provide better R code and or techniques.

And a possible graduate student assignment to investigate what percentage of examples in graduate stats texts (e.g. Cox & Hinkley) could be displayed this way (reducing the number of parameters to least number possible).

K?
p.s. might have been a better post for Friday the 13th
p.s.2 background material from my thesis (passed in 2007)
ThesisReprint.pdf


SAT stories

I received a bunch of interesting comments on my blog on adjusting SAT scores. Below I have a long comment from a colleague with experience in the field.

But first, this hilarious (from a statistical perspective) story from Howard Wainer:

Some years ago when we were visiting Harvard [as a parent of a potential student, not in Howard's role as educational researcher], an admissions director said two things of relevance (i) the SAT hasn't got enough 'top' for Harvard -- it doesn't discriminate well enough at the high end. To prove this she said (ii) that Harvard had more than 1500 'perfect 1600s' apply. Some were rejected. I mentioned that there were only about 750 1600s from HS seniors in the US -- about 400 had 1600 in their junior year (and obviously didn't retake) and about 350 from their senior year. So, I concluded, she must be mistaken.

Then I found out that they allowed applicants to pick and choose their highest SAT-V score and their highest SAT-M score from separate administrations, and so constructed their 1500. I stopped talking at that point, deciding against discussing the probability of throwing snake eyes if you cold throw dice many times and pick out a one from one toss and the other one from another.

My other colleague sent in the following thoughts:

After reading the Rewarding Strivers book, I had some thoughts about how to make the college admissions system more fair to students from varying socioeconomic backgrounds. Instead of boosting up the disadvantaged students, why not pull down the advantaged students?

Here's the idea. Disadvantaged students are defined typically not by a bad thing that they have, but rather by good things that they don't have: financial resources, a high-quality education, and so forth. In contrast, advantaged students get all sorts of freebies. So here are my suggestions:

Several years ago, I heard about a project at the Educational Testing Service to identify "strivers": students from disadvantaged backgrounds who did unexpectedly well on the SAT (the college admissions exam formerly known as the "Scholastic Aptitude Test" but apparently now just "the SAT," in the same way that Exxon is just "Exxon" and that Harry Truman's middle name is just "S"), at least 200 points above a predicted score based on demographic and neighborhood information. My ETS colleague and I agreed that this was a silly idea: From a statistical point of view, if student A is expected ahead of time to do better than student B, and then they get identical test scores, then you'd expect student A (the non-"striver") to do better than student B (the "striver") later on. Just basic statistics: if a student does much better than expected, then probably some of that improvement is noise. The idea of identifying these "strivers" seemed misguided and not the best use of the SAT.

So, when I recently heard about a new book called Rewarding Strivers (by Richard Kahlenberg, Edward Fiske, Anthony Carnevale, and Jeff Strohl), I was interested. My first reaction was: No, not more of that Strivers crap! But then I thought that maybe I was being unfair: Even if there are statistical problems with the original idea, there may be some policy benefits that I was missing. I sent off a request for a review copy of the book, warning the book's editor, Richard Kahlenberg, ahead of time that I was likely to be critical, given my earlier discussions on the topic, but that the review would be thoughtful in any case. To Kahlenberg's credit, he sent me a copy anyway.

RSS mess

Apparently some of our new blog entries are appearing as old entries on the RSS feed, meaning that those of you who read the blog using RSS may be missing a lot of good stuff. We're working on this. But, in the meantime, I recommend you click on the blog itself to see what's been posted in the last few weeks. Enjoy.

Oil spill and corn production

See here.

Thomas Ferguson and Robert Johnson write:

Financial crises are staggeringly costly. Only major wars rival them in the burdens they place on public finances. Taxpayers typically transfer enormous resources to banks, their stockholders, and creditors, while public debt explodes and the economy runs below full employment for years. This paper compares how relatively large, developed countries have handled bailouts over time. It analyzes why some have done better than others at containing costs and protecting taxpayers. The paper argues that political variables - the nature of competition within party systems and voting turnout - help explain why some countries do more than others to limit the moral hazards of bailouts.

I know next to nothing about this topic, so I'll just recommend you click through and read the article yourself. Here's a bit more:

Many recent papers have analyzed financial crises using large data bases filled with cases from all over the world. Our [Ferguson and Johnson's] interest here is different. We deliberately limit our concern to relatively large, developed countries that suffered systemic banking crisis involving the actual collapse (or near-collapse) of big financial houses.

They conclude their paper with some reflections on reactions to financial crises in the 1930s.

Christian points me to this interesting (but sad) analysis by Diego Valle with an impressive series of graphs. There are a few things I'd change (notably the R default settings which result in ridiculously over-indexed y-axes, as well as axes for homicide rates which should (but do not) go town to zero (and sometimes, bizarrely, go negative), and a lack of coherent ordering of the 32 states (including D.F.),

I'm no expert on Mexico (despite having coauthored a paper on Mexican politics) so I'll leave it to others to evaluate the substantive claims in Valle's blog. Just looking at what he's done, though, it seems impressive to me. To put it another way, it's like something Nate Silver might do.

Seth, a retired university professor who, during his employment at an elite school, spent a lot of time doing research with the goal of improving people's lives, writes:

Professors, especially at elite schools, dislike doing research with obvious value. It strikes them as menial. "Practical" and "applied" are terms of disparagement, whereas "pure" research (research without obvious value) is good.

Given that Seth isn't that way himself, I assume he'd say that this claim applies to "many" professors or "most" professors but surely not all?

What I've noticed, though, is more the opposite, that even people who do extremely theoretical work like to feel that it is applied, practical, and useful. I think that, among other things, Seth is confusing what people want to do with what they actually can do. For example, he criticizes biologists for researching stem cells and prions rather than prevention of disease. But preventing diseases is difficult! That's why the scientists are doing research, because they don't know how to cure diseases.

Why do I bother arguing with this. I guess because I'm impressed at the effort Seth has put into doing research that might end up directly changing people's lives. I agree that most professors don't do this, even those of us in departments such as psychology or political science that might seem to have a lot of practical relevance. But I think he's making an old, old mistake by assuming that, just because people are not doing a certain thing, it's because they don't want to do it. Similar to his earlier claim that people write badly on purpose. Research, like writing, is hard. Especially given that so many of the low-hanging research fruit have been plucked. If you want to knock academic research for being useless, fine, but it seems to be (mistakenly) adding insult to injury to say that we're all being useless on purpose!

If you accept that professors want to be useful and don't always succeed, that's much more interesting (and, I think, true) than stating that they're being useless on purpose.

David Shor writes:

I'm fitting a state-space model right now that estimates the "design effect" of individual pollsters (Ratio of poll variance to that predicted by perfect random sampling). What would be a good prior distribution for that?

My quickest suggestion is start with something simple, such as a uniform from 1 to 10, and then to move to something hierarchical, such as a lognormal on (design.effect - 1), with the hyperparameters estimated from data.

My longer suggestion is to take things apart. What exactly do you mean by "design effect"? There are lots of things going on, both in sampling error (the classical "design effect" that comes from cluster sampling, stratification, weighting, etc.) and nonsampling error (nonresponse bias, likeliy voter screening, bad questions, etc.) It would be best if you could model both pieces.

Is it 1930?

Lawrence Mishel of the Economic Policy Institute reports:

Goldman Sachs' latest forecast (and they've been pretty accurate so far) is that unemployment will rise to 9.9% by early 2011 and trend down to 9.7% for the last quarter of 2011. Obviously, this is a simply awful scenario but it seems one that is being accepted. That is, we seem to be in the process of accepting the unacceptable. Note that this scenario probably assumes the passage of the limited efforts now being considered in Congress.

One might be surprised that Obama and congressional Democrats are not doing more to try to bring unemployment down. On the other hand, just to speak in generalities (not knowing any of the people involved), I would think that Obama would be much much more worried about the economy doing well in 2010 and then crashing in 2012. A crappy economy through 2011 and then improvement in 2012--that would be his ideal, no? Not that he would have the ability to time this sort of thing.

But perhaps Mishel is saying that the Democrats are reading from the wrong script. Here's what I wrote a few months ago:

"Too much data"?

Chris Hane writes:

I am scientist needing to model a treatment effect on a population of ~500 people. The dependent variable in the model is the difference in a person's pre-treatment 12 month total medical cost versus post-treatment cost. So there is large variation in costs, but not so much by using the difference between the pre and post treatment costs. The issue I'd like some advice on is that the treatment has already occurred so there is no possibility of creating a fully randomized control now. I do have a very large population of people to use as possible controls via propensity scoring or exact matching.

If I had a few thousand people to possibly match, then I would use standard techniques. However, I have a potential population of over a hundred thousand people. An exact match of the possible controls to age, gender and region of the country still leaves a population of 10,000 controls. Even if I use propensity scores to weight the 10,000 observations (understanding the problems that poses) I am concerned there are too many controls to see the effect of the treatment.

Would you suggest using narrower matching criteria to get the "best" matches, would weighting the observations be enough, or should I also consider creating many models by sampling from both treatment and control and averaging their results? If you could point me to some papers that tackle similar issues that would be great.

My reply: Others know more about this than me, but my quick reaction is . . . what's wrong with having 10,000 controls? I don't see why this would be a problem at all. In a regression analysis, having more controls shouldn't create any problems. But, sure, match on lots of variables. Don't just control for age, sex, and region; control for as many relevant pre-treatment variables as you can get.

People keep telling me that Sas isn't as bad as everybody says, but then I see (from Christian Robert) this listing from the Sas website of "disadvantages in using Bayesian analysis":

There is no correct way to choose a prior. Bayesian inferences require skills to translate prior beliefs into a mathematically formulated prior. If you do not proceed with caution, you can generate misleading results. . . . From a practical point of view, it might sometimes be difficult to convince subject matter experts who do not agree with the validity of the chosen prior.

That is so tacky! As if least squares, logistic regressions, Cox models, and all those other likelihoods mentioned in the Sas documentation are so automatically convincing to subject matter experts.

P.S. For some more serious objections to Bayesian statistics, see here and here.

P.P.S. In case you're wondering why I'm commenting on month-old blog entries . . . I have a monthlong backlog of entries, and I'm spooling them out day by day. I actually wrote this one on 7 May.

Warning - this blog post is meant to encourage some loose, fuzzy and possibly distracting thoughts about the practice of statistics in research endeavours. There maybe spelling and grammatical errors as well as a lack of proper sentence structure. It may not be understandable to many or even possibly any readers.

But somewhat more seriously, its better that "ConUnMax"

So far I have five maxims

1. Explicit models of uncertanty are useful but - always wrong and can always be made less wrong
2. If the model is formally a probability model - always use probability calculus (Bayes)
3. Always useful to make the model a formal probability model - no matter what (Bayesianisn)
4. Never use a model that is not empirically motivated and strongly empirically testable (Frequentist - of the anti-Bayesian flavour)
5. Quantitative tools are always just a means to grasp and manipulate models - never an end in itself (i.e. don't obsess over "baby" mathematics)
6. If one really understood statistics, they could always successfully explain it to any zoombie

K?

Rajiv Sethi offers a fascinating discussion of the incentives involved in paying people zillions of dollars to lie, cheat, and steal. I'd been aware for a long time of the general problem of the system of one-way bets in which purported risk-takers can make huge fortunes with little personal risks, but Rajiv and his commenters go further by getting into the specifics of management at financial firms.

George Leckie points to this free online course from the Centre for Multilevel Modelling (approx 600 pages of materials covering theory and implementation in MLwiN and Stata).

Matthew Yglesias noticed something interesting in a political story today that reminds me of one of our arguments in Red State, Blue State. I have the feeling that most readers of this blog are less fascinated than I am by U.S. politics, so I'll put the rest below the fold.

Hey, where's my kickback?

I keep hearing about textbook publishers who practically bribe instructors to assign their textbooks to students. And then I received this (unsolicited) email:

Sof[t]

Joe Fruehwald writes:

I'm working with linguistic data, specifically binomial hits and misses of a certain variable for certain words (specifically whether or not the "t" sound was pronounced at the end of words like "soft"). Word frequency follows a power law, with most words appearing just once, and with some words being hyperfrequent. I'm not interested in specific word effects, but I am interested in the effect of word frequency.

A logistic model fit is going to be heavily influenced by the effect of the hyperfrequent words which constitute only one type. To control for the item effect, I would fit a multilevel model with a random intercept by word, but like I said, most of the words appear only once.

Is there a principled approach to this problem?

My response: It's ok to fit a multilevel model even if most groups only have one observation each. You'll want to throw in some word-level predictors too. Think of the multilevel model not as a substitute for the usual thoughtful steps of statistical modeling but rather as a way to account for unmodeled error at the group level.

Both R and Stata

A student I'm working with writes:

I was planning on getting a applied stat text as a desk reference, and for that I'm assuming you'd recommend your own book. Also, being an economics student, I was initially planning on doing my analysis in STATA, but I noticed on your blog that you use R, and apparently so does the rest of the statistics profession. Would you rather I do my programming in R this summer, or does it not matter? It doesn't look too hard to learn, so just let me know what's most convenient for you.

My reply: Yes, I recommend my book with Jennifer Hill. Also the book by John Fox, An R and S-plus Companion to Applied Regression, is a good way to get into R. I recommend you use both Stata and R. If you're already familiar with Stata, then stick with it--it's a great system for working with big datasets. You can grab your data in Stata, do some basic manipulations, then save a smaller dataset to read into R (using R's read.dta() function). Once you want to make fun graphs, R is the way to go. It's good to have both systems at your disposal.

This one was so dumb I couldn't resist sharing it with you.

TEMPLETON BOOK FORUM invites you to "Is the Cyber Mob a Threat to Freedom?" featuring Ron Rosenbaum, Slate, Lee Siegel, The New York Observer, moderated by Michael Goodwin, The New York Post

New Threats to Freedom

Today's threats to freedom are "much less visible and obvious than they were in the 20th century and may even appear in the guise of social and political progress," writes Adam Bellow in his introduction to the new essay collection that he has edited for the Templeton Press. Indeed, Bellow suggests, the danger often lies precisely in our "failure or reluctance to notice them."

According to Ron Rosenbaum and Lee Siegel, in their provocative contributions to the volume, the extraordinary advances made possible by the Internet have come at a sometimes worrisome cost. Rosenbaum focuses on how online anonymity has become a mask encouraging political discourse that is increasingly distorted by vitriol, abuse, and thuggishness. Siegel argues that the Internet has undermined long-established standards of excellence, promoting participation and popularity over talent and originality. Both writers warn against the growing influence of what Siegel calls "interactive mobs." . . .

The John Templeton Foundation serves as a philanthropic catalyst for research and discoveries relating to the Big Questions of human purpose and ultimate reality. We support work at the world's top universities in such fields as theoretical physics, cosmology, evolutionary biology, cognitive science, and social science relating to love, forgiveness, creativity, purpose, and the nature and origin of religious belief. We also seek to stimulate new thinking about freedom and free enterprise, character development, and exceptional cognitive talent and genius.

"Extreme views weakly held"

Alan and Felix.

Observational Epidemiology

Sometimes I follow up on the links of commenters and it turns out they have their own blogs. Here's Joseph Delaney's, on which I have a few comments:

Delaney's co-blogger Mark writes:

There are some serious issues that need to be addressed (but almost never are) when comparing performance of teachers. Less serious but more annoying is the reporters' wide-eyed amazement at common classroom techniques. Things like putting agendas on the board or calling on students by name without asking for volunteers (see here) or having students keep a journal and relate lessons to their own life (see any article on Erin Gruwell). Things that many or most teachers already do. Things that you're taught in your first education class. Things that have their own damned boxes on the evaluation forms for student teachers.

These techniques are very common and are generally good ideas. They are not, however, great innovations (with a handful of exceptions -- Polya comes to mind) and they will seldom have that big of an impact on a class (again with exceptions like Polya and possibly Saxon). Their absence or presence won't tell you that much and they are certainly nothing new.

To which I must reply: Yes, but. They never taught me this stuff when I was in grad school. And our students don't learn it too (unless they take my class). Lots and lots of college teachers just stand up at the board and lecture. Maybe things are better in high school. So even if it's "certainly nothing new," it's certainly new to many of us.

And here's another one from Mark, reporting on a lawsuit under which a scientist, if he were found to have manipulated data, could have to return his research money--plus damages--to the state. This seems reasonable to me. I just hope nobody asks me to return all the grant money I've received for projects I've begun with high hopes but never successfully finished. I always end up making progress on related work, and that seems to satisfy the granting agencies, but if they ever were to go back and see if we've followed up on all our specific aims, well, then we'd be in big trouble.

Valencia: Summer of 1991

With the completion of the last edition of Jose Bernardo's Valencia (Spain) conference on Bayesian statistics--I didn't attend, but many of my friends were there--I thought I'd share my strongest memory of the Valencia conference that I attended in 1991. I contributed a poster and a discussion, both on the topic of inference from iterative simulation, but what I remember most vividly, and what bothered me the most, was how little interest there was in checking model fit. Not only had people mostly not checked the fit of their models to data, and not only did they seem uninterested in such checks, even worse was that many of these Bayesians felt that it was basically illegal to check model fit.

I don't want to get too down on Bayesians for this. Lots of non-Bayesian statisticians go around not checking their models too. With Bayes, though, model checking seems particularly important because Bayesians rely on their models so strongly, not just as a way of getting point estimates but to get full probability distributions.

I remember feeling very frustrated and disillusioned at that 1991 conference, to see all these people who seemed to have no interest in going back to first principles and thinking about what they were doing. It's like I tell our students: Grad school is the best and most open time. After that, most people are just stuck in their ways.

P.S. I'm not claiming any special virtue on my part. The above were just my reactions, and I'm sure that others since then have had similar reactions to my own mistakes.

Mister P goes on a date

I recently wrote something on the much-discussed OK Cupid analysis of political attitudes of a huge sample of people in their dating database. My quick comment was that their analysis was interesting, but participants on an online dating site must certainly be far from a random sample of Americans.

But suppose I want to not just criticize but also think in a positive direction. OK Cupid's database is huge, and one thing statistical methods are good at--Bayesian methods in particular--is combining a huge amount of noisy, biased data with a smaller amount of good data. This is what we did in our radon study, using a high-quality survey of 5000 houses in 125 counties to calibrate a set of crappier surveys totaling 80,000 houses in 3000 counties.

How would it work for OK Cupid? We'd want to take their data and poststratify on:

Age
Sex
Marital/family status
Education
Income
Partisanship
Ideology
Political participation
Religion and religious attendance
State
Urban/rural/suburban
Probably some other key variables that I'm not thinking of right now.

We'd do multilevel regression and poststratification (MRP, "Mister P"), with enough cells that it's reasonable to think of the OK Cupid people as being a random sample within each cell. This is not a trivial project--it would involve also including Census data and large public opinion surveys such as Annenberg or Pew--but it could be worth it. The goal would be to get the flexibility and power of the OK Cupid analyses, but with the warm feelings that come from matching their sample to the U.S. population.

Inferences would necessarily be strongly model-based--for example, any claims about married people would be essentially 100% based on regression-based extrapolation--but, hey, that's the way it is. The goal is to be as honest as possible with the data available.

Pay for an A?

Judah Guber writes about his new company:

What we have done with Ultrinsic is created a system of incentives for students to allow them to invest in their ability to achieve a certain grade and when they achieve that grade we reward them with a cash incentive on top of receiving their original investment. This helps remove one of the large barriers students have to studying and staying motivated over the course of long semesters of college by giving them rewards on a much more immediate basis.

We have been doing a pilot program in 2 schools, NYU and Penn, for the past year or so, and are currently in the process of a major roll out of our services to 37 schools all across the country. This is due to our popularity and inquiries from students in tons of schools all around the country regarding getting Ultrinsic's services in their school. In the Fall 2010 semester, Ultrinsic will be revolutionizing student motivation on a grand scale . This is the dream of many economists: to change the cost benefit analysis of people to cause them to improve themselves and to even allow them to put them in control of this change.

Our system is forever sustainable because we earn a profit on motivating students to improve their performance, there will always be a motivation to sustain this program. Another very important aspect to our market, the college student, is that it's one thing that's never going to go out of style.

Guber asked if I wanted to interview him, which I did by email. Here are my questions and his responses:

AG: How will your system make money?

A Wikipedia whitewash

After hearing a few times about the divorce predictions of researchers John Gottman and James Murray (work that was featured in Blink with a claim that they could predict with 83 percent accuracy whether a couple would be divorced--after meeting with them for 15 minutes) and feeling some skepticism, I decided to do the Lord's work and amend Gottman's wikipedia entry, which had a paragraph saying:

Gottman found his methodology predicts with 90% accuracy which newlywed couples will remain married and which will divorce four to six years later. It is also 81% percent accurate in predicting which marriages will survive after seven to nine years.

I added the following:

Gottman's claim of 81% or 90% accuracy is misleading, however, because the accuracy is measured only after fitting a model to his data. There is no evidence that he can predict the outcome of a marriage with high accuracy in advance. As Laurie Abraham writes, "For the 1998 study, which focused on videotapes of 57 newlywed couples . . . He knew the marital status of his subjects at six years, and he fed that information into a computer along with the communication patterns turned up on the videos. Then he asked the computer, in effect: Create an equation that maximizes the ability of my chosen variables to distinguish among the divorced, happy, and unhappy. . . . What Gottman did wasn't really a prediction of the future but a formula built after the couples' outcomes were already known. . . . The next step, however--one absolutely required by the scientific method--is to apply your equation to a fresh sample to see whether it actually works. That is especially necessary with small data slices (such as 57 couples), because patterns that appear important are more likely to be mere flukes. But Gottman never did that. Each paper he's published heralding so-called predictions is based on a new equation created after the fact by a computer model."

I was thinking this would just get shot down right away, but I checked on it every now and then and it was still up.

Finally, on 21 May, my paragraph was completely removed by contributor Annsy5, who also wrote:

Full disclosure: I [Annsy5] work for The Gottman Relationship Institute, which was co-founded by John Gottman, and we would like a change made to the Wikipedia entry on him.

The 3rd paragraph is made up largely of Laurie Abraham's claims about Dr. Gottman's research. Ms. Abraham's claims are inaccurate, and thorough citations can be found here: http://www.gottman.com/49853/Research-FAQs.html. We would like the paragraph removed, or at least moved to a section where the details of Dr. Gottman's research can be expanded upon.

I know that it would be a violation of the Conflict of Interest policy for me to just go in and make the changes, so I would like other editors' input. We're not trying to bury anything "bad" about Dr. Gottman, we just want the information that is out there to be accurate! Please advise...

I don't know enough about Wikipedia to want to add my paragraph back in, but what's going on here? On 23:57, 20 May 2010, Annsy5 writes "I know that it would be a violation of the Conflict of Interest policy for me to just go in and make the changes," and then on 23:13, 21 May 2010, Annsy5 goes and removes the paragraph and all references to criticisms of Gottman's work.

That doesn't seem right to me. A link to a rebuttal by Gottman would be fine. But removing all criticism while leaving the disputed "90% accuracy" claim . . . that's a bit unscholarly, no?

P.S. A commenter asked why I posted this on the blog rather than doing this on wikipedia. The reason is that I'm more interested in the wikipedia aspect of this than the marriage-counseling aspect, and I thought the blog entry might get some interesting discussion. I know nothing about Gottman and Murray beyond what I've written on the blog, and I'm certainly not trying to make any expert criticism of their work. What does seem to be happening is that they get their claims out in the media and don't have much motivation to tone down the sometimes overstated claims made on their behalf. Whatever the detailed merits of Abraham's criticisms, I thought it was uncool for them to be removed from the wikipedia pages: Her reporting is as legitimate as Gladwell's. But I'm not the one to make any technical additions here.

A New York Times article reports the opening of a half-mile section of bike path, recently built along the west side of Manhattan at a cost of $16M, or roughly $30 million per mile. That's about $5700 per linear foot. Kinda sounds like a lot, doesn't it?

Well, $30 million per mile for about one car-lane mile is a lot, but it's not out of line compared to other urban highway construction costs. The Doyle Drive project in San Francisco --- a freeway to replace the current old and deteriorating freeway approach to the Golden Gate Bridge --- is currently under way at $1 billion for 1.6 miles...but hey, it will have six lanes each way, so that isn't so bad, at $50 million per lane-mile. And there are other components to the project, too, not just building the highway (there will also be bike paths, landscaping, on- and off-ramps, and so on). All in all it seems roughly in line with the New York bike lane project.

Speaking of the Doyle Drive project, one expense was the cost of moving a bush called a "San Francisco Manzanita" out of the way. Maybe I shouldn't say A San Francisco Manzanita, I should say THE San Francisco Manzanita: this species hadn't been reported since 1947 and was presumed extinct, but a single bush was found last year, in the path of the Doyle Drive project. It was moved this January at a reported cost of $175,000. If you look at the photos linked to that article, you'll see that the move involved a heavy crane, a bunch of workers, etc.; and I think it's understandable that when you are working with possibly the very last example of a species, you hire some consultants to make sure that the place you're moving it to will actually support it. But still. (I'm a big fan of preserving endangered species, and if this is what it costs then so be it, but...is this really what it costs? Well, maybe it is).

It's very tempting to be snarky about these high prices, but I'm hardly in a position to criticize, considering that my work time (including overhead and benefits) is charged at about $320,000 per year, which, trust me on this, is far more than my salary.

And of course, anyone who has work done on their house will discover that it costs about 3 times more than you think it should, and that is true even after you have inflated your initial guess by a factor of 3.

And if you're a cyclist, like me, you know that buying a nice (but by no means top-of-the-line) bicycle can easily cost $1800, and that the local bike shop where you buy it is probably struggling to make ends meet (which might also be true of the manufacturer).

If there's a point to all of this (which, I must admit, I'm not sure there is), I guess it's that some things cost much, much more than you think they should, but you're probably kidding yourself if you think you could do them cheaper yourself.

Hank Aaron at the Brookings Institution, who knows a lot more about policy than I do, had some interesting comments on the recent New York Times article about problems with the Dartmouth health care atlas. which I discussed a few hours ago. Aaron writes that much of the criticism in that newspaper article was off-base, but that there are real difficulties in translating the Dartmouth results (finding little relation between spending and quality of care) to cost savings in the real world.

Aaron writes:

Perhaps because of these discussions, I was pointed toward an article on "Rethinking Darfur" by Marc Gustafson, which was written up in a news story here. From the publicity email:

How best to learn R?

Alban Zeber writes:

I am wondering whether there is a reference (online or book) that you would recommend to someone who is interested in learning how to program in R.

Any thoughts?

P.S. If I had a name like that, my books would be named, "Bayesian Statistics from A to Z," "Teaching Statistics from A to Z," "Regression and Multilevel Modeling from A to Z," and so forth.

Reed Abelson and Gardiner Harris report in the New York Times that some serious statistical questions have been raised about the Dartmouth Atlas of Health Care, an influential project that reports huge differences in health care costs and practices in different places in the United States, suggesting large potential cost savings if more efficient practices are used. (A claim that is certainly plausible to me, given this notorious graph; see here for background.)

Here's an example of a claim from the Dartmouth Atlas (just picking something that happens to be featured on their webpage right now):

Medicare beneficiaries who move to some regions receive many more diagnostic tests and new diagnoses than those who move to other regions. This study, published in the New England Journal of Medicine, raises important questions about whether being given more diagnoses is beneficial to patients and may help to explain recent controversies about regional differences in spending.

Abelson and Harris raise several points that suggest the Dartmouth claims may be overstated because of insufficient statistical adjustment. Abelson and Harris's article is interesting, thoughtful, and detailed, but along the way it reveals a serious limitation of the usual practices of journalism, when applied to evaluating scientific claims.

John Lawson writes:

I have been experimenting using Bayesian Methods to estimate variance components, and I have noticed that even when I use a noninformative prior, my estimates are never close to the method of moments or REML estimates. In every case I have tried, the sum of the Bayesian estimated variance components is always larger than the sum of the estimates obtained by method of moments or REML.

Postdoc #1. Hierarchical Modeling and Computation: We are fitting hierarchical regression models with deep interactions. We're working on new models with structured prior distributions, and this also requires advances in Bayesian computation. Applications include public opinion, climate reconstruction, and education research.

Postdoc #1 is funded by grants from the Department of Energy, Institute of Education Sciences, and National Science Foundation.

Postdoc #2. Hierarchical Modeling and Statistical Graphics: The goal of this research program is to investigate the application of the latest methods of multi-level data analysis, visualization and regression modeling to an important commercial problem: forecasting retail sales at the individual item level. These forecasts are used to make ordering, pricing and promotions decisions which can have significant economic impact to the retail chain such that even modest improvements in the accuracy of predictions, across a large retailer's product line, can yield substantial margin improvements.

Project #2 is to be undertaken with, and largely funded by, a firm which provides forecasting technology and services to large retail chains, and which will provide access to a unique and rich set of proprietary data. The postdoc will be expected to spend some time working directly with this firm, but it is fundamentally a research position.

Ideally, postdoc #1 will have a statistics or computer science background, will be interested in statistical modeling, serious programming, and applications. Ideally, postdoc #2 will have a background in statistics, psychometrics, or economics and be interested in marketing or related topics. Both postdocs should be able to work fluently in R and should already know about hierarchical models and Bayesian inference and computation.

Both projects will be at the Applied Statistics Center at Columbia University, also with connections outside Columbia. Collaborators on Postdoc #1 include Jinchen Liu, Jennifer Hill, Matt Schofield, Upmanu Lall, Chad Scherrer, Alan Edelman, and Sophia Rabe-Hesketh. Collaborators on Postdoc #2 include Eric Johnson.

If you're interested in either, please send a letter of application, a c.v., some of your articles, and three letters of recommendation to the Applied Statistics Center coordinator, Caroline Peters, cp2530@columbia.edu.

A data visualization manifesto

Details matter (at least, they do for me), but we don't yet have a systematic way of going back and forth between the structure of a graph, its details, and the underlying questions that motivate our visualizations. (Cleveland, Wilkinson, and others have written a bit on how to formalize these connections, and I've thought about it too, but we have a ways to go.)

I was thinking about this difficulty after reading an article on graphics by some computer scientists that was well-written but to me lacked a feeling for the linkages between substantive/statistical goals and graphical details. I have problems with these issues too, and my point here is not to criticize but to move the discussion forward.

When thinking about visualization, how important are the details?

Aleks pointed me to this article by Jeffrey Heer, Michael Bostock, and Vadim Ogievetsky, "A Tour through the Visualization Zoo: A survey of powerful visualization techniques, from the obvious to the obscure." They make some reasonable points, but a big problem I have with the article is in the details of the actual visualizations they show. Briefly:

Figure 1A looks like it should be on a log scale, also it has an unclear y-axis (I don't think "Gain/Loss Factor" is a standard term) and a time axis that is not fully labeled (you have to reconstruct it from the title of the graph).

Figure 1B has that notorious alphabetical order, also some weird visual artifacts that get created by stacking curves, and a x-axis that is not fully labeled. (What is the point labeled "2001"? Is it Jan 1, July 1, or some other date?) Yes, I realize that one purpose of the article is to criticize such graphs ("While such charts have proven popular in recent years, they do have some notable limitations. . . . stacking may make it difficult to accurately interpret trends that lie atop other curves.). Still, it doesn't help to list the industries in alphabetical order.

Figure 1C (see below) seems just wrong. If you look at the graphs, unemployment seems to have gone up by something like a factor of 10 in almost every sector! Something went terribly wrong here; perhaps each graph was rescaled to its own range, which wouldn't make much sense in a small multiples plot. (For unemployment rates, I'd think you'd want zero as a baseline, or maybe some conventional "natural rate" such as 3%.) On a more minor note, it would help to put the labels on the upper left of each series rather than below the axes. Also, the colors don't seem to add any information, and it's a bit odd to list "Other" as the second or third category of industries--I still can't figure out how that happened!

Heer_fig1c.png

I could keep going here through all the other graphs in the article But maybe these criticisms are irrelevant. On one hand, they don't matter because the writers of the article are simply trying to give an example of each sort of graph. On the other hand, I worry that people will see this sort of authoritatively-written article and take the graphs as models for their own work.

Is it important to get the details "right"?

What harm is done, if any, by having ambiguous labels, uninformative orderings of variables, inconsistent scaling of axes, and all the rest? From a psychological or graphical perception perspective, maybe these create no problem at all. Perhaps such glitches (from my perspective) are either irrelevant to the general message of the graph or, from the other direction, force the reader to look at the graph and read the surrounding text more clearly to figure out what's going on. After all, a graph isn't a TV show, readers aren't passive, so maybe it's actually good to make them work to figure out what's going on.

At a statistical level, though, I think the details are very important, because they connect the data being graphed with the underlying questions being studied. For example, if you want to compare unemployment rates for different industries, you want them on the same scale. If you're not interested in an alphabetical ordering, you don't want to put it on a graph. If you want to convey something beyond simply that big cars get worse gas mileage, you'll want to invert the axes on your parallel coordinate plot. And so forth. When I make a graph, I typically need to go back and forth between the form of the plot, its details, and the questions I'm studying.

If you wanted to say I'm wrong, you could perhaps invoke an opportunity cost argument, that the time I spend worrying about where to label the lines on a graph (not to mention the time I spend blogging about it!) is time I could be spending doing statistical modeling and data analysis. For me, the details of the graphing are absolutely necessary to the statistical analysis--decades ago, before I did everything on the computer, I spent lots and lots of time making graphs by hand, using colored pens and all the rest--but for others, maybe not.

Dot plots, line plots, and scatterplots

My biggest complaint about the Heer et al. article is that it doesn't mention what are perhaps the three most important kinds of graphs: dot plots, line plots, and scatterplots. See here here for a dotplot (from Jeff and Justin), and here for some line plots and scatterplots. (I just picked these for convenience; there are dozens more in Red State, Blue State and all over the place in the statistical literature.) Perhaps the authors felt that readers would be already familiar with these ideas and didn't need to see them again. But I think, No, the readers do need to see these again! A clearer understanding of line plots would've been a big help in making Figure 1C, for example. And some dot plotting principles would've helped with Figure 4C (coming up with an ordering more sensible than alphabetical, and displaying the "KB" numbers as dots on a scale; as is, you can pretty much only read the size of each number, which really means we're seeing the numbers on a very crude logarithmic scale).

Do I have anything constructive to say here?

OK, OK, I'm not trying to be a grump. Different people have different perspectives, and that's fine. My point, I think, is that there's something missing in many discussions--even well-informed discussions--of visualization. What's missing is the link from the substantive questions (what are the reasons for making the graph in the first place?) and the details of the graph. It's a weakness of our software, and of our conceptual frameworks for thinking about graphs, that we don't usually have a systematic way of making that link. Instead we go through menus of possibilities (actual forced options on computer packages, or mental menus in which we make choices based on what we've seen before) and then have to go back and fix things.

We should be able to do better. I'm not faulting Heer et al. for not doing better, since I don't have my own general solution either. Rather, I'm using their article as an opportunity to push for further thinking on all of this.

P.S. I wrote this in my standard blog style, which was to start with something I'd seen and go from there. Once it was done, I changed the title, "When thinking about visualization, how important are the details?" to the grabbier "A data visualization manifesto" (snappier than "A statistical graphics manifesto," perhaps?) and appended the very first two paragraphs above as an intro. This should be better, right? Readers should be more interested in my point than in how I got there. I didn't feel like revising the whole piece, but I guess I will if I want to rewrite the article for publication somewhere, which maybe I'll do if I find the right coauthor.

Rodney Sparapani writes:

My Windows buddies have been bugging me about BRUGS and how great it is. Now, running BRUGS on OS X may be possible. Check out this new amazing software by Amit Singh.

Personally, I'd go with R2jags, but I thought I'd pass this on in case others are interested.

Mark Palko writes:

Stupid legal crap

From the website of a journal where I published an article:

In Springer journals you have the choice of publishing with or without open access. If you choose open access, your article will be freely available to everyone everywhere. In exchange for an open access fee of € 2000 / US $3000 you retain the copyright and your article will carry the Creative Commons License. Please make your choice below.

Hmmm . . . pay $3000 so that an article that I wrote and gave to the journal for free can be accessed by others? Sounds like a good deal to me!

Roth and Amsterdam

I used to think that fiction is about making up stories, but in recent years I've decided that fiction is really more of a method of telling true stories. One thing fiction allows you to do is explore what-if scenarios. I recently read two books that made me think about this: The Counterlife by Philip Roth and Things We Didn't See Coming by Steven Amsterdam. Both books are explicitly about contingencies and possibilities: Roth's tells a sequence of related but contradictory stories involving his Philip Roth-like (of course) protagonist, and Amsterdam's is based on an alternative present/future. (I picture Amsterdam's book as being set in Australia, but maybe I'm just imagining this based on my knowledge that the book was written and published in that country.) I found both books fascinating, partly because of the characters' voices but especially because they both seemed to exemplify George Box's dictum that to understand a system you have to perturb it.

So, yes, literature and statistics are fundamentally intertwined (as Dick De Veaux has also said, but for slightly different reasons).

Yesterday we had a spirited discussion of the following conditional probability puzzle:

"I have two children. One is a boy born on a Tuesday. What is the probability I have two boys?"

This reminded me of the principle, familiar from statistics instruction and the cognitive psychology literature, that the best way to teach these sorts of examples is through integers rather than fractions.

For example, consider this classic problem:

"10% of persons have disease X. You are tested for the disease and test positive, and the test has 80% accuracy. What is the probability that you have the disease?"

This can be solved directly using conditional probability but it appears to be clearer to do it using integers:

Start with 100 people. 10 will have the disease and 90 will not. Of the 10 with the disease, 8 will test positive and 2 will test negative. Of the 90 without the disease, 18 will test positive and 72% will test negative. (72% = 0.8*90.) So, out of the original 100 people, 26 have tested positive, and 8 of these actually have the disease. The probability is thus 8/26.

OK, fine. But here's my new (to me) point). Expressing the problem using a population distribution rather than a probability distribution has an additional advantage: it forces us to be explicit about the data-generating process.

Consider the disease-test example. The key assumption is that everybody (or, equivalently, a random sample of people) are tested. Or, to put it another way, we're assuming that the 10% base rate applies to the population of people who get tested. If, for example, you get tested only if you think it's likely you have the disease, then the above simplified model won't work.

This condition is a bit hidden in the probability model, but it jumps out (at least, to me) in the "population distribution" formulation. The key phrases above: "Of the 10 with the disease . . . Of the 90 without the disease . . . " We're explicitly assuming that all 100 people will get tested.

Similarly, consider the two-boys example that got our discussion started. The crucial unstated assumption was that, every time someone had exactly two children with at least one born on a Tuesday, he would give you this information. It's hard to keep this straight, given the artificial nature of the problem and the strange bit of linguistics ("I have two children" = "exactly two," but "One is a boy" = "exactly one"). But if you do it with a population distribution (start with 4x49 families and go from there), then it's clear that you're assuming that everyone in this situation is telling you this particular information. It becomes less of a vague question of "what are we conditioning on?" and more clearly an assumption about where the data came from.

Douglas Anderton informed us that, in a Linux system, you can't call OpenBugs from R using bugs() from the R2Winbugs package. Instead, you should call Jags using jags() from the R2jags package.

P.S. Not the Rotter's Club guy.

Jason Kottke posts this puzzle from Gary Foshee that reportedly impressed people at a puzzle-designers' convention:

I have two children. One is a boy born on a Tuesday. What is the probability I have two boys?

The first thing you think is "What has Tuesday got to do with it?" Well, it has everything to do with it.

I thought I should really figure this one out myself before reading any further, and I decided this was a good time to apply my general principle that it's always best to solve such problems from scratch rather than trying to guess at the answer.

Intellectual property

Somebody should warn Doris Kearns Goodwin not to take any of this guy's material. . . .

In response to the post The bane of many causes in the context of mobile phone use and brain cancer, Robert Erikson wrote:


The true control here is the side of the head of the tumor: same side as phone use or opposite side. If that is the test, the data from the study are scary. Clearly tumors are more likely on the "same" side, at whatever astronomical p value you want to use. That cannot be explained away by misremembering, since an auxiliary study showed misremembering was not biased toward cell phone-tumor consistency.

A strong signal in the data pointed by Prof. Erikson is that the tumors are overwhelmingly likelier to appear on the same side of the head as where the phone is held. I've converted the ratios into percentages, based on an assumption that the risk for tumors would be apriori equal for both sides of the head.

lateral1.png

There is a group of people with low-to-moderate exposure and high lateral bias, but the bias does increase quite smoothly with increasing exposure. It's never below 50%.

But even with something apparently simple like handedness, there are possible confounding factors. For example, left-handed and ambidextrous people have a lower risk of brain cancer, perhaps because they zap their brain with cell phones more evenly across both sides, reducing the risk that a single DNA strand will be zapped one too many times, but they also earn more. I've written about handling multiple potential causes at the same time a few years ago.

The authors also point out that people might be inclined to blame it all on the phones and to report phone use on the side where the tumor was identified. This could be resolved if the controls are led to think that they have a tumor too, or if instead of asking how the phone is held, the interviewers instead made a call and observed the subject, or asked about a value neutral attribute such as handedness. Still, even in papers that reject the influence of phones on brain tumors, it's always the case that more tumors are on the right side, just as we know that more people are right-handed than left-handed.

In the light of this investigation, I fully agree with Prof. Erikson that there is something going on.

I've recently decided that statistics lies at the intersection of measurement, variation, and comparison. (I need to use some cool Venn-diagram-drawing software to show this.) I'll argue this one another time--my claim is that, to be "statistics," you need all three of these elements, no two will suffice-.

My point here, though, is that as statisticians, we teach all of these three things and talk about how important they are (and often criticize/mock others for selection bias and other problems that arise from not recognizing the difficulties of good measurement, attention to variation, and focused comparisons), but in our own lives (in deciding how to teach and do research, administration, and service--not to mention our personal lives), we think about these issues almost not at all. In our classes, we almost never use standardized tests, let alone the sort of before-after measurements we recommend to others. We do not evaluate our plans systematically nor do we typically even record what we're doing. We draw all sorts of conclusions based on sample sizes of 1 or 2. And so forth.

We say it, and we believe it, but we don't live it. So maybe we don't believe it. So maybe we shouldn't say it? Now I'm working from a sample size of 0. Go figure.

Looking for Sister Right

Several people have asked me what I thought of this. For example, one colleague wrote:

I thought you might find data of this quantity and quality as interesting as I do.

In their latest blog entry, okcupid uses its very large dating-site userbase to assess the progression of social and economic views with age, and (taking into account self-specified importance of economic and social views) the resulting political affiliations. The final assessment is that the spectrum of views encompassed under "democrat" ends up being much broader than those under "republican". I've seen this mentioned many times before, but I have never seen such pretty data illustrating the phenomenon. Perhaps this is more graphics to impress, rather than graphics in a typical statistical setting, but I find them captivating nonetheless. In any case, I would be curious to hear your opinions of this.

My reply:

It's interesting stuff, although I wonder a lot about the representativeness of their sample. For example, they talk about libertarian views, which happen to be held by only a small fraction of the population (but a larger fraction of people who are online).

Along similar lines, I think their two-dimensional scheme is misleading; I'd prefer to just label the axes as left/right on economic ideology and left/right on social ideology. For example, where does government support for health insurance fit into this? I don't see this as "economically restrictive"? Or what about support for a graduated income tax or an estate tax, vs. support for a flat tax or a sales tax? Conditional on total tax revenues being fixed, I don't see the former as any more or less economically restrictive than the latter--it's just restrictive to different people.

Their graphs are pretty but I'm not sure about their interpretation. They interpret a cross-sectional pattern as age effects, but they could very well be cohort effects. People in their 40s came of political age around the early eighties when Reagan was president. Not to mention the fact that people in a dating site are far from typical of the population--I'm assuming the people in their 40s are mostly either divorced or never-married; either way, that's a different perspective than a married person.

Similarly, the trend in social attitudes has got to be, at least to some extent, a cohort effect. Although I suppose it depends on the issue. Americans have become much more liberal on gay rights but not on abortion.

Also note that, before 2006, the 18-30-year-olds were not such overwhelmingly Democratic voters.

Finally,the pretty graph that they show with red and blue is a bit misleading: it's not a graph of Democrats and Republicans, it's a graph of average opinions of people at different ages.

Getting to the specifics, I also don't know why they're so sure that nuns won't have liberal views on gay marriage. Lots of nuns are gay, no? I once met a woman whose reason for joining an order of nuns was that, in her words, "I want to meet Sister Right."

In any case, I agree that age is a good variable to look at. And, considering that these guys are starting from scratch on their analyses, many of the things they're saying are reasonable.

P.S. For a deeper look at systematic asymmetries between the Democratic and Republican parties in their political support, see Jonathan Rodden and Chris Warshaw's article, "Why the Democrats Need Boll Weevils and Blue Dogs: The Distribution of Political Preferences across U.S. House Districts."

Blogging

Rajiv Sethi quotes Bentley University economics professor Scott Sumner writing on the first anniversary of his blog:

Be careful what you wish for. Last February 2nd I [Sumner] started this blog with very low expectations... I knew I wasn't a good writer . . . And I was also pretty sure that the content was not of much interest to anyone.

Now my biggest problem is time--I spend 6 to 10 hours a day on the blog, seven days a week. Several hours are spent responding to reader comments and the rest is spent writing long-winded posts and checking other economics blogs. . . .

I [Sumner] don't think much of the official methodology in macroeconomics. Many of my fellow economists seem to have a Popperian view of the social sciences. You develop a model. You go out and get some data. And then you try to refute the model with some sort of regression analysis. . . .

My problem with this view is that it doesn't reflect the way macro and finance actually work. Instead the models are often data-driven. Journals want to publish positive results, not negative. So thousands of macroeconomists keep running tests until they find a "statistically significant" VAR model, or a statistically significant "anomaly" in the EMH. Unfortunately, because the statistical testing is often used to generate the models, and determine which get published, the tests of statistical significance are meaningless.

I'm not trying to be a nihilist here, or a Luddite who wants to go back to the era before computers. I do regressions in my research, and find them very useful. But I don't consider the results of a statistical regression to be a test of a model, rather they represent a piece of descriptive statistics, like a graph, which may or may not usefully supplement a more complex argument that relies on many different methods . . .

I [Sumner] like Rorty's pragmatism; his view that scientific models don't literally correspond to reality, or mirror reality. Rorty says that one should look for models that are "coherent," that help us to make sense of a wide variety of facts. . . .

Interesting, especially given my own veneration of Popper (or, at least the ideal version of Popper as defined in Lakatos's writings). Sumner is writing about macroeconomics, which I know nothing about. In any case, I should probably read something by Rorty. (I've read the name "Rorty" before--I'm pretty sure he's a philosopher and I think his first name is "Richard," but that's all I know about him.)

Sumner also writes:

I suppose it wasn't a smart career move to spend so much time on the blog. If I had ignored my commenters I could have had my manuscript revised by now. . . . And I really don't get any support from Bentley, as far as I know the higher ups don't even know I have a blog. So I just did 2500 hours of uncompensated labor.

I agree with Sethi that Sumner's post is interesting and captures much of the blogging experience. But I don't agree with that last bit about it being a bad career move. Or perhaps Sumner was kidding? (It's notoriously difficult to convey intonation in typed speech.) What exactly is the marginal value of his having a manuscript revised? It's not like Bentley would be compensating him for that either, right? For someone like Sumner (or, for that matter, Alex Tabarrok or Tyler Cowen or my Columbia colleague Peter Woit), blogging would seem to be an excellent career move, both by giving them and their ideas much wider exposure than they otherwise would've had, and also (as Sumner himself notes) by being a convenient way to generate many thousands of words that can be later reworked into a book. This is particularly true of Sumner (more than Tabarrok or Cowen or, for that matter, me) because he tends to write long posts on common themes. (Rajiv Sethi, too, might be able to put together a book or some coherent articles by tying together his recent blog entries.)

Blogging and careers, blogging and careers . . . is blogging ever really bad for an academic career? I don't know. I imagine that some academics spend lots of time on blogs that nobody reads, and that could definitely be bad for their careers in an opportunity-cost sort of way. Others such as Steven Levitt or Dan Ariely blog in an often-interesting but sometimes careless sort of way. This might be bad for their careers, but quite possibly they've reached a level of fame in which this sort of thing can't really hurt them anymore. And this is fine; such researchers can make useful contributions with their speculations and let the Gelmans and Fungs of the world clean up after them. We each have our role in this food web. (Personally I think I'm as careful in everything I blog as in my published research--take this one however you want!--and I welcome blogging as a way to put ideas out there and often get useful criticism. My impression is that Sumner and Sethi feel the same way, but authors who have reached the bestseller level probably just don't have the time to read their blog comments.)

And then of course there are the many many bloggers, academic and otherwise, whose work I assume I would've encountered much more rarely were they not blogging.

The other issue that Sethi touches on in is the role of blogging in economic discourse. Which brings us to the ("reverse causal") question of why there are so many prominent academic bloggers from economics (also sociology and law, it appears) but not so many in political science or psychology or, for that matter, statistics.

I guess the last one of these is easy enough to answer: there aren't so many statisticians out there, most of them don't seem to really enjoy writing, and statistics isn't particularly newsworthy. I had a conversation about this the other day after writing something for Physics Today. Physics Today is the monthly magazine of the American Physical Society, and it's fun to read. It was a pleasure to write for it. But could there be Statistics Today? It wouldn't be so easy! In physics there's news every month, exciting new experiments, potential path-breaking theories, and the like. Somebody somewhere is building a microscope that can look inside a quark, and somebody else is figuring out how to generalize Heisenberg's uncertainty principle to account for this. Meanwhile, in statistics, there's . . . a new efficient estimator for Poisson regression? News about the Census? No, when statisticians try to be entertaining, they typically end up writing about statistical errors made by non-statisticians. (Oops, I've done that too!). This can be fun now and then, but you can't make a monthly magazine out of it.

The bane of many causes

One of the newsflies buzzing around today is an article "Brain tumour risk in relation to mobile telephone use: results of the INTERPHONE international case-control study".

The results, shown in this pretty table below, appear to be inconclusive.

nonlinearity.png

A limited amount of cellphone radiation is good for your brain, but not too much? It's unfortunate that the extremes are truncated. The commentary at Microwave News blames bias:

The problem with selection bias --also called participation bias-- became apparent after the brain tumor risks observed throughout the study were so low as to defy reason. If they reflect reality, they would indicate that cell phones confer immediate protection against tumors. All sides agree that this is extremely unlikely. Further analysis pointed to unanticipated differences between the cases (those with brain tumors) and the controls (the reference group).

The second problem concerns how accurately study participants could recall the amount of time and on which side of the head they used their phones. This is called recall bias.

Mobile phones are not the only cause for development and detection of brain tumors. There are lots of factors: age, profession, genetics - all of them affecting the development of tumors. It's too hard to match everyone, but it's a lot easier to study multiple effects at the same time.

We'd see, for example, that healthy younger people at lower risk of brain cancer tend to use mobile phones more, and that older people sick with cancer that might spread to the brain don't need mobile phones. Similar could hold for alcohol consumption (social drinkers tend to be healthy and social, but drinking is an effect, not a cause) and other potential risk factors.

Here's a plot of the relative risk based on cumulative phone usage:

risks.png

It seems that the top 10% of users has much higher risk. If the data wasn't discretized into just 10 categories, there could be interesting information here, beyond the obvious one that you need to be old and wealthy enough to accumulate 1600 hours of mobile phone usage.

[Changed the title from "many effects" to "many causes" - thanks to a comment by Cyrus]

Of home runs and grand slams

I've occasionally mocked academic economists for their discussions of research papers as "singles" or "home runs" (a world in which, counter to the findings of sabermetrics, one of the latter is worth more than four of the former). The best thing, of course, is a "grand slam," a term that I always found particularly silly as it depends not just on the quality of the hit but also on external considerations ("men on base"). But then I was thinking about this again, and I decided the analogy isn't so bad: For a research paper to be really influential and important, it has to come at the right time, and the field has to be ready for it. That's what turns a home run into a grand slam. So in this case the reasoning works pretty well.

J. Robert Lennon writes:

At the moment I [Lennon] am simultaneously working on two magazine articles, each requiring me to assess not just a book, but (briefly) a writer's entire career. The writers in question are both prominent, both widely published, read, and appreciated. And yet neither, I think, enjoys a full appreciation of their career--its real scope, with all its twists and turns, its eccentricities intact.

In one case, the writer had one smash hit, and one notorious book everyone hates. In the other, the writer has somehow become known as the author of one really serious book that gets taught a lot in college classes, and a bunch of other stuff generally thought to be a little bit frivolous. But close readings of each (hell, not even that close) reveals these reputations to be woefully inadequate. Both writers are much more interesting than their hits and bombs would suggest.

This naturally got me thinking about statisticians. Some statisticians are famous (within the statistics world) without having made any real contributions (as far as I can tell). And then there are the unappreciated (such as the psychometrician T. L. Kelley) and the one-hit wonders (Wilcoxon?)

I was also curious who Lennon's subjects are. Feel free to place your guesses below. I'll send a free book to the first person who guesses both authors correctly (if you do it before Lennon announces it himself).

Boris was right

Boris Shor in January:

the pivotal Senator will now be a Republican, not a Democrat . . . Brown stands to become the pivotal member of the Senate.

The New York Times today:

The Senate voted on Thursday afternoon to close debate on a far-reaching financial regulatory bill . . . In an interesting twist, the decisive vote was supplied by Senator Scott Brown, the Republican freshman of Massachusetts . . .

There's this idea called "domain specificity" that I learned from my sister many years ago: it's the idea that a skill might work in some domains but not others. For example, knowing (some) Spanish helps a bit with my French but it doesn't do anything for my Chinese. Traditionally in psychometrics, this sort of thing is studied using correlation--scores on various math tests are highly correlated with each other, and these are also correlated (but less so) with scores on English tests. Michael Jordan would probably be good at just about any sport he tried (even if he couldn't quite hit the curveball). Top actors can typically sing, dance, do magic, etc.--they're great all-around performers. And so on.

I was reminded of domain specificity after reading a blog by Felix Salmon arguing that Robert Rubin and Lawrence Summers made big mistakes when they served as Treasury Secretaries during the Clinton administration. Salmon writes:

I'll defer to Nate on the details but just wanted to add a couple of general thoughts.

My quick answer is that you can't learn much from primary elections. They can be important in their effects--both directly on the composition of Congress and indirectly in how they can affect behavior of congressmembers who might be scared of being challenged in future primaries--but I don't see them as very informative indicators of the general election vote. Primaries are inherently unpredictable and are generally decided by completely different factors, and from completely different electorates, than those that decide general elections.

The PA special election is a bit different since it's a Dem vs. a Rep, but it's also an n of 1, and it's an election now rather than in November. Nate makes a convincing case that it's evidence in favor of the Democrats, even if not by much.

Here are solutions to about 50 of the exercises from Bayesian Data Analysis. The solutions themselves haven't been updated; I just cleaned up the file: some change in Latex had resulted in much of the computer code running off the page, so I went in and cleaned up the files.

I wrote most of these in 1996, and I like them a lot. I think several of them would've made good journal articles, and in retrospect I wish I'd published them as such. Original material that appears first in a book (or, even worse, in homework solutions) can easily be overlooked.

Updated R code and data for ARM

Patricia and I have cleaned up some of the R and Bugs code and collected the data for almost all the examples in ARM. See here for links to zip files with the code and data.

What visualization is best?

Jeff Heer and Mike Bostock provided Mechanical Turk workers with a problem they had to answer using different types of charts. The lower error the workers got, the better the visualization. Here are some results from their paper Crowdsourcing Graphical Perception: Using Mechanical Turk to Assess Visualization Design:

visi-quality.png

They also looked at various settings, like density, aspect ratio, spacing, etc.

Visualization has become empirical science, no longer just art.

Dan Lakeland asks:

When are statistical graphics potentially life threatening? When they're poorly designed, and used to make decisions on potentially life threatening topics, like medical decision making, engineering design, and the like. The American Academy of Pediatrics has dropped the ball on communicating to physicians about infant jaundice. Another message in this post is that bad decisions can compound each other.

It's an interesting story (follow the link above for the details), would be great for a class in decision analysis or statistical communication. I have no idea how to get from A to B here, in the sense of persuading hospitals to do this sort of thing better. I'd guess the first step is to carefully lay out costs and benefits. When doctors and nurses make extra precautions for safety, it could be useful to lay out the ultimate goals and estimate the potential costs and benefits of different approaches.

The 1.6 rule

In ARM we discuss how you can go back and forth between logit and probit models by dividing by 1.6. Or, to put it another way, logistic regression corresponds to a latent-variable model with errors that are approximately normally distributed with mean 0 and standard deviation 1.6. (This is well known, it's nothing original with our book.) Anyway, John Cook discusses the approximation here.

Helen DeWitt links to this blog that reports on a study by Scott Bateman, Carl Gutwin, David McDine, Regan Mandryk, Aaron Genest, and Christopher Brooks that claims the following:

Guidelines for designing information charts often state that the presentation should reduce 'chart junk'--visual embellishments that are not essential to understanding the data. . . . we conducted an experiment that compared embellished charts with plain ones, and measured both interpretation accuracy and long-term recall. We found that people's accuracy in describing the embellished charts was no worse than for plain charts, and that their recall after a two-to-three-week gap was significantly better.

As the above-linked blogger puts it, "chartjunk is more useful than plain graphs. . . . Tufte is not going to like this."

I can't speak for Ed Tufte, but I'm not gonna take this claim about chartjunk lying down.

I have two points to make which I hope can stop the above-linked study from being slashdotted and taken as truth.

1. The non-chart-junk graphs in the paper are not so good. Figure 1 is a time series of dollars that is unhelpfully presented as a bar chart and which is either unadjusted for inflation or, if adjusted, is not indicated as such. Figure 2a is a lineplot that whose y-axis should go down to 0, but doesn't. Both graphs also use the nonstandard strategy of labeling the y-axis on the right rather than the left. Figure 2b is an impossible-to-read pie chart with one of the wedges popping out of the circle. Regular readers of this blog will know what I think of that. Figures 2c and 2d are blurry and have no axis labels. Figure 2d is particularly bad because it's a time-series graph in which time is presented on the y-axis; it also has the problem with inflation adjustment noted earlier. Figures 4-9, presenting their own findings, are not particularly easy to read either.

Chartjunk aside, it's hard to make good graphs, so I can't really blame Bateman et al. for their performance here. They're doing about as well as might be expected in routine psychology research. And maybe they're right that crappy chartjunk graphs are better than crappy non-chartjunk graphs. But I don't think it's appropriate to generalize to the claim that chartjunk graphs are better than good graphs.

2. This brings me to my second point, which is that a huge, huge drawback of chartjunk is that it limits the amount of information you can display in a graph. If all you want is to display a sequence of 5 numbers, then, sure, go for the chartjunk, I don't really care. But why limit yourself to only displaying 5 numbers? Consider the graphs in Red State, Blue State (or in our other research publications, or on this blog). Sure, you can do pretty instead of plain (see this discussion with examples), but here the graphics design is used to enhance the points made in the graph, not as a distraction.

Around the time I was finishing up my Ph.D. thesis, I was trying to come up with a good title--something more grabby than "Topics in Image Reconstruction for Emission Tomography"--and one of the other students said: How about something iike, Female Mass Murderers: Babes Behind Bars? That sounded good to me, and I was all set to use it. I had a plan: I'd first submit the one the boring title--that's how it would be recorded in all the official paperwork--but then at the last minute I'd substitute in the new title page before submitting to the library. (This was in the days of hard copies.) Nobody would look at the time, then later on, if anyone went into the library to find my thesis, they'd have a pleasant surprise. Anyway, as I said, I was all set to do this, but a friend warned me off. He said that at some point, someone might find it, and the rumor would spread that I'm a sexist pig. So I didn't.

I was thinking about this after hearing this report based on a reading of Supreme Court nominee Elena Kagan's undergraduate thesis. Although in this case, I suppose the title was unobjectionable, it was the content that bothered people.

I think youall are probably getting sick of this by now so I'll put it all below the fold.

The official announcement:

The Excellence in Statistical Reporting Award for 2010 is presented to Felix Salmon for his body of work, which exemplifies the highest standards of scientific reporting. His insightful use of statistics as a tool to understanding the world of business and economics, areas that are critical in today's economy, sets a new standard in statistical investigative reporting.

Here are some examples:

Tiger Woods

Nigerian spammers

How the government fudges job statistics

This one is important to me. The idea is that "statistical reporting" is not just traditional science reporting (journalist talks with scientists and tries to understand the consensus) or science popularization or silly feature stories about the lottery. Salmon is doing investigative reporting using statistical thinking.

Also, from a political angle, Salmon's smart and quantitatively sophisticated work (as well as that of others such as Nate Silver) is an important counterweight to the high-tech mystification that surrounds so many topics in economics.

Causal inference in economics

Aaron Edlin points me to this issue of the Journal of Economic Perspectives that focuses on statistical methods for causal inference in economics. (Michael Bishop's page provides some links.)

To quickly summarize my reactions to Angrist and Pischke's book: I pretty much agree with them that the potential-outcomes or natural-experiment approach is the most useful way to think about causality in economics and related fields. My main amendments to Angrist and Pischke would be to recognize that:

1. Modeling is important, especially modeling of interactions. It's unfortunate to see a debate between experimentalists and modelers. Some experimenters (not Angrist and Pischke) make the mistake of avoiding models: Once they have their experimental data, they check their brains at the door and do nothing but simple differences, not realizing how much more can be learned. Conversely, some modelers are unduly dismissive of experiments and formal observational studies, forgetting that (as discussed in Chapter 7 of Bayesian Data Analysis) a good design can make model-based inference more robust.

2. In the case of a "natural experiment" or "instrumental variable," inference flows forward from the instrument, not backwards from the causal question. Estimates based on instrumental variables, regression discontinuity, and the like are often presented with the researcher having a causal question and then finding an instrument or natural experiment to get identification. I think it's more helpful, though, to go forward from the intervention and look at all its effects. Your final IV estimate or whatever won't necessarily change, but I think my approach is a healthier way to get a grip on what you can actually learn from your study.

Now on to the articles:

Jenny writes:

The Possessed made me [Jenny] think about an interesting workshop-style class I'd like to teach, which would be an undergraduate seminar for students who wanted to find out non-academic ways of writing seriously about literature. The syllabus would include some essays from this book, Geoff Dyer's Out of Sheer Rage, Jonathan Coe's Like a Fiery Elephant - and what else?

I agree with the commenters that this would be a great class, but . . . I'm confused on the premise. Isn't there just a huge, huge amount of excellent serious non-academic writing about literature? George Orwell, Mark Twain, Bernard Shaw, T. S. Eliot (if you like that sort of thing), Anthony Burgess, Mary McCarthy (I think you'd call her nonacademic even though she taught the occasional college course), G. K. Chesterton, etc etc etc? Teaching a course about academic ways of writing seriously about literature would seem much tougher to me.

Visualization in 1939

Willard Cope Brinton's second book Graphic Presentation (1939) surprised me with the quality of its graphics. Prof. Michael Stoll has some scans at Flickr. For example:


1939-g.jpg

The whole book can be downloaded (in a worse resolution) from Archive.Org.

Trips to Cleveland

Helen DeWitt writes about The Ask, the new book by Sam Lipsyte, author of a hilarious book I read a couple years ago about a loser guy who goes to his high school reunion. I haven't read Lipsyte's new book but was interested to see that he teaches at Columbia. Perhaps I can take him to lunch (either before or after I work up the courage to call Gary Shteyngart and ask him about my theory that the main character of that book is a symbol of modern-day America).

In any case, in the grand tradition of reviewing the review, I have some thoughts inspired by DeWitt, who quotes from this interview:

LRS: I was studying writing at college and then this professor showed up, a disciple of Gordon Lish, and we operated according to the Lish method. You start reading your work and then as soon as you hit a false note she made you stop.

Lipsyte: Yeah, Lish would say, "That's bullshit!"

If they did this for statistics articles, I think they'd rarely get past the abstract, most of the time. The methods are so poorly motivated. You're doing a so-called "exact test" because . . . why? And that "uniformly most powerful test" is a good idea because . . . why again? Because "power" is good? And that "Bayes factor"? Etc.

The #1 example of motivation I've ever seen was in the move The Grifters. In a very early scene, John Cusack gets punched in the stomach and is seriously injured, and that drives everything else in the plot.

DeWitt quotes Gerald Howard:

Lish's influence can been seen in Sam's obvious concentration on the crafting of his sentences and his single-minded focus on style, a quality less prevalent in the work of younger American writers than it should be. (Savor the perfectly pitched ear required to turn a simple phrase like "a dumpling, some knurled pouch of gristle.") Sam replies that "Gordon said many things that I will never forget, but the one thing that I always think about is that he said once, 'There is no getting to the good part. It all has to be the good part.' And so I think that when people are writing their novels they are just thinking about the story, about what has to happen so their character can get to Cleveland. . . ."

The way I put it (from the perspective of nonfiction writing) is "Tell 'em what they don't know." And, ever since having read The Princess Bride many years ago, I've tried to put in only the "good parts" in all my books. That was one thing that was fun about writing Teaching Statistics: A Bag of Tricks. We felt no obligation to be complete or to include boring stuff just because we were supposed to. Most textbooks I've seen have way too many trips to Cleveland.

One thing I say about statistics is: I always try to fit the dumbest, simplest possible model for any problem I'm working on. But, unfortunately, the simplest method that is even plausibly appropriate for any problem is typically just a little bit more complicated than the most complicated thing I know how to fit.

I guess there's a similar principle in writing: You restrict yourself to the good stuff, but there's just a bit too much good stuff to fit in whatever container you have in mind. And then you must, as the saying goes, kill your darlings.

P.S. To connect to another of our common themes: Ed Tufte's mother, of all people, wrote a good book about the construction of sentences. Sentences are important and, to the best of my knowledge, nonalgorithmic That is, I have no clean method for constructing clear sentences. I often have to rephrase to avoid the notorious garden-path phenomenon. I wonder how Vin Scully did it. Was it just years of practice?

P.P.S. One thing I love about Marquand are his chapter titles. I can't usually hope to match him, but he's my inspiration for blog entry titles such as this one.

Marty McKee at Wolfram Research appears to have a very very stupid colleague. McKee wrote to Christian Robert:

Your article, "Evidence and Evolution: A review", caught the attention of one of my colleagues, who thought that it could be developed into an interesting Demonstration to add to the Wolfram Demonstrations Project.

As Christian points out, adapting his book review into a computer demonstration would be quite a feat! I wonder what McKee's colleague could be thinking? I recommend that Wolfram fire McKee's colleague immediately: what an idiot!

P.S. I'm not actually sure that McKee was the author of this email; I'm guessing this was the case because this other very similar email was written under his name.

P.P.S. To head off the inevitable comments: Yes, yes, I know this is no big deal and I shouldn't get bent out of shape about it. But . . . Wolfram Research has contributed such great things to the world, that I hate to think of them wasting any money paying the salary of Marty McKee's colleague, somebody who's so stupid that he or she thinks that a book review can be developed into an interactive computer demonstration. I'd like to feel that I'm doing my part by alerting them so they can fire this incompetent colleague person.

P.P.P.S. Hey, this new Zombies category is really coming in handy!

Dan Goldstein did an informal study asking people the following question:

When two baseball teams play each other on two consecutive days, what is the probability that the winner of the first game will be the winner of the second game?

You can make your own guess and the continue reading below.

When Sonia Sotomayor was nominated for the Supreme Court, and there was some discussion of having 6 Roman Catholics on the court at the same time, I posted the following historical graph:

court.png

It's time for an update:

court2.png

It's still gonna take awhile for the Catholics to catch up. . . .

And this one might be relevant too:

court3.png

It looks as if Jews and men have been overrepresented, also Episcopalians (which, as I noted earlier, are not necessarily considered Protestant in terms of religious doctrine but which I counted as such for the ethnic categorization). Religion is an interesting political variable because it's nominally about religious belief but typically seems to be more about ethnicity.

Update on the spam email study

A few days ago I reported on the spam email that I received from two business school professors (one at Columbia)! As noted on the blog, I sent an email directly to the study's authors at the time of reading the email, but they have yet to respond.

This surprises me a bit. Certainly if 6300 faculty each have time to respond to one email on this study, the two faculty have time to respond to 6300 email replies, no? I was actually polite enough to respond to both of their emails! If I do hear back, I'll let youall know!

P.S. Paul Basken interviewed me briefly for a story in the Chronicle of Higher Education on the now-notorious spam email study. Basken's article is reasonable--he points out that (a) the study irritated a lot of people, but (b) is ultimately no big deal.

One interesting thing about the article is that, although some people felt that the spam email study was ethical, nobody came forth with an argument that the study was actually worth doing.

P.P.S. In all seriousness . . . everyone makes mistakes. Heck, I'm a full professor of statistics but I published a false theorem once. And the notorious Frank Flynn is a professor of usiness at Stanford, so I'm sure he's done a lot of great stuff. I also imagine the designers of the spam email survey have a lot of good ideas in them: they're young researchers and they made a silly mistake, it's ultimately no big deal.

Vlad Kogan writes:

I've using your book on regression and multilevel modeling and have a quick R question for you. Do you happen to know if there is any R package that can estimate a two-stage (instrumental variable) multi-level model?

My reply: I don't know. I'll post on blog and maybe there will be a response. You could also try the R help list.

Last year I spoke at a conference celebrating the 10th anniversary of the University of Washington's Center for Statistics and the Social Sciences, and just today a special issue of the journal Statistical Methodology came out in honor of the center's anniversary. My article in the special issue actually has nothing to do with my talk at the conference; rather, it's an exploration of an idea that Iven Van Mechelen and I had for understanding deterministic models probabilistically:

Adam Gurri writes:

Any chance you could do a post explaining Popper's propensity theory of probability? I have never understood it.

My reply: I'm a big fan of Popper (search this blog for details), especially as interpreted by Lakatos, but as far as I can tell, Popper's theory of probability is hopeless. We've made a lot of progress on probability in the past 75 years, and I don't see any real need to go back to the bad old days.

The (U.S.) "President's Cancer Panel" has released its 2008-2009 annual report, which includes a cover letter that says "the true burden of environmentally induced cancer has been grossly underestimated." The report itself discusses exposures to various types of industrial chemicals, some of which are known carcinogens, in some detail, but gives nearly no data or analysis to suggest that these exposures are contributing to significant numbers of cancers. In fact, there is pretty good evidence that they are not.

U.S. male cancer mortality by year for various cancers

For "humanity, devotion to truth and inspiring leadership" at Columbia College. Reading Jenny's remarks ("my hugest and most helpful pool of colleagues was to be found not among the ranks of my fellow faculty but in the classroom. . . . we shared a sense of the excitement of the enterprise on which we were all embarked") reminds me of the comment Seth made once, that the usual goal of university teaching is to make the students into carbon copies of the instructor, and that he found it to me much better to make use of the students' unique strengths. This can't always be true--for example, in learning to speak a foreign language, I just want to be able to do it, and my own experiences in other domains is not so relevant. But for a worldly subject such as literature or statistics or political science, then, yes, I do think it would be good for students to get involved and use their own knowledge and experiences.

One other statement of Jenny's caught my eye. She wrote:

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48