Results matching “R”

Cool science experiment videos

From Jason Kottke:

An illustration of how insanely effective water is at absorbing heat: you can hold a water balloon over a candle without popping it. The rest of Robert Krampf's videos are worth a look as well.

We need some cool statistics videos too.

A rant on the virtues of data mining

I view data analysis as summarization: use the machine to work with large quantities of data that would otherwise be hard to deal with by hand. I am also curious about what would the data suggest, and open to suggestions. Automated model selection can be used to list a few hypotheses that stick out of the crowd: I was not using model selection to select anything, but merely to be able to quantify how much a hypothesis sticks out from the morass of the null.

The response from several social scientists has been rather unappreciative along the following lines: "Where is your hypothesis? What you're doing isn't science! You're doing DATA MINING !" Partially as a result of this and of failures of the government monitoring technology programs, data mining researchers are trying to rename themselves to sound more politically unbiased.

Of course, I'm doing data mining, and I'm proud of it. Because data mining is summarization, it's journalism, it's surveying, it's mapping. That where one gets ideas from and impressions. Of course what data mining found isn't "true". The models underlying data mining are most definitely not "true". But a mean is informative even if the distribution isn't symmetric.

The "scientific" approach corresponds to picking The One and Only Holy Hypothesis. Then you collect the data. Then you fit the model and verify whether it works or not. Then you write a paper. The good thing about the "scientific" approach is that you don't have to think, and that you need very little common sense. But real science is curiosity and pursuit of improved understanding of the world, not mindlessly following algorithms that can be taught even to imbeciles.

Let me analyze where the problem lies. There is data D. And there are multiple models M. In confirmatory data analysis (CDA) high prior probabilities are assigned to a single model and its negative (null): so it is very easy to establish which of the two is better. In exploratory data analysis (EDA) and data mining the prior over models is relatively flat. Yes, there are models underlying EDA too: if you rotate your scatter plot in three dimensions to get a good view of the phenomenon, your parameters are the rotations and you're doing kernel density estimation with your eyes. When you see a fit, you stop and save the snapshot. The problem is that no model in particular sticks out, so it's hard to establish the best one. Yes, it's hard to establish what "truth" is. "Truth" is the domain of religion. "Model", "data" and "evidence" are the domain of science.

Many of the hypothesis generated by people from theory might be understood as deserving higher prior probability: after all they are based on experience. In turn, a flat prior includes many models that are unlikely. For that matter, one should use a bit of common sense interpreting EDA results: because the prior was flat, if something looks fishy, subtract a little bit from it and study it in more detail. On the other hand, if you don't see something you think you should, add a little and study it in more detail. A CDA that tells you everything you've already known doesn't deserve a paper. But it's better to just eyeball the results with an implicit prior in your mind than to try to cook up a complex prior that will do the same. But once you've found a surprise, throw all the CDA you've got at it.

They started me out on SPSS . . .

Yph writes,

They started me out on SPSS, then quickly moved me to Stata. Now I'm learning R. Do you think R is it? Will I have to learn a new programming language in the future? I get apprehensive about investing time learning new technology when the turnover rate of programming language seems so high.

My reply:

"P" sends along a link to this paper by Sebastian Zollner and Jonathan Pritchard:

Genomewide association studies are now a widely used approach in the search for loci that affect complex traits. After detection of significant association, estimates of penetrance and allele-frequency parameters for the associated variant indicate the importance of that variant and facilitate the planning of replication studies. However, when these estimates are based on the original data used to detect the variant, the results are affected by an ascertainment bias known as the “winner’s curse.” The actual genetic effect is typically smaller than its estimate. This overestimation of the genetic effect may cause replication studies to fail because the necessary sample size is underestimated. Here, we present an approach that corrects for the ascertainment bias and generates an estimate of the frequency of a variant and its penetrance parameters. The method produces a point estimate and confidence region for the parameter estimates. We study the performance of this method using simulated data sets and show that it is possible to greatly reduce the bias in the parameter estimates, even when the original association study had low power.

This is the statistical phenomenon we were writing about here in the context of sex-ratio studies: when sample sizes are small, "statistically significant" estimates will tend to be much larger than the true parameters being estimated. Sollner and Pritchard have a statistical correction procedure, conditioning on the observation of a statistically significant result. I think a Bayesian approach would work better--this might be worth trying on Sollner and Pritchard's examples.

Data visualization

Eric Tassone writes,

Have you seen this impressive article, "Data Visualization: Modern Approaches"? There are some nice visualizations there so I think you will find it worth a browse, and it should balance my account since I once forwarded you that ultra-ugly Treasury graph that made its way onto your blog.

Oddly enough, I was distracted by the ads at the beginning of the linked article. (This was odd because I'm rarely distracted by ads in print magazines.) I wasn't particularly impressed by the examples in the article (except for the Rosling talk which I already knew about; see here and here).

Also I like the baby names site.

Graphs from tables

Samantha Ross writes:

David Weakliem sent along this paper coauthored with Robert Biggert:

Regional differences in party support have attracted a good deal of attention since the 2000 election. A striking feature of the current pattern is that Democratic support is higher in more affluent states. At the individual level, income is associated with Republican support, but in a recent paper, Gelman et al. (2006) find that this relationship is weaker in more affluent states. In affluent states, people with high and low incomes both tend to vote Democratic; in poorer states, people with low incomes vote Democratic while people with high incomes vote Republican. This paper extends Gelman et al.'s analysis by considering both education and income. We find that the effects of income and college education both vary among states, in a largely independent manner. Variation in the effects of college education is related to the educational composition of the state: where college education is more common, it is more strongly associated with support for the Democrats. Overall, regional differences are largest in the middle classes, contrary to the claims of some popular and theoretical accounts. There is some evidence that a pattern of weaker class divisions is associated with more support for the Democrats.

Looks interesting to me--we'll have to look into this some more.

Hierarchical Bayes Compiler

I was very excited to see Hal Daume announce the Hierarchical Bayes Compiler. The version is 0.1, but from taking an initial look at the source code, Hal seems to be using the latest from the computer science candy store such as Haskell. For one, his applications are in natural language processing, where scaling to large quantities of data are vital.

This is very relevant for us, as we're building an application built on a hierarchical model, and currently rely on WinBUGS to do the estimation. But WinBUGS and OpenBUGS are both based on the Oberon environment which is very closed in addition to being centered on the user interface. A more plug-and-play alternative is JAGS, but I have not tried out. The alternative is to use Jouni's Umacs (universal MC sampler), with quite a few adaptive sampling tricks. However, R is very slow for such things, and I've been reluctant to adopt it for our purpose. HBC works as a compiler, generating the sampler in C. While I do not know how well this works, but we can expect at least an order or two of magnitude improvement over interpreted samplers.

Will report more when I test it out.

Ralph Blair sent this in. It's so horrible that I have to put it in the continuation part of the blog entry. I recommend you all stop reading right here.

Stop . . . It's not too late!!!!!!!!!!!

R-squared: useful or evil?

I had the following email exchange with Gary King.

Generic advice for Bugs bugs

Peter Park writes,

There's this funny thing that "conservative" is seen as a compliment, even coming from political liberals. For example, Nicholas Lemann writes in the New Yorker that Karl Rove "was never a real conservative, except in the liberal-hating sense, because the idea that everybody who participates in politics expects something from government was at the heart of his thinking." This seems to me to be a funny definition of "conservative." I would think a conservative would be realistic enough to expect that "everybody who participates in politics expects something from government." To continue, I'd think a conservative would want to preserve the existing social order as much as possible. There are different flavors of this; Rove's involves reducing taxes and business regulations, both of which seem like pretty mainstream conservative goals. I'm not saying that Lemann or others should necessarily support Rove--one might instead prefer goals such as redistribution, environmental protection, etc.--but I don't see how you can say that Rove was never a "real conservative."

In general, I think these sorts of labels are a topic worth studying: how do words like "conservative" get used differently at different times, and by people of different political persuasions.

Standardized coefficients

Denis Cote writes,

Using experts' ranges

Doug McNamara writes,

Encouraged by the success of his self-experimentation to help his sleep, mood, and weight concerns, Seth Roberts has been experimenting with the effects of drinking flaxseed oil. Here's an example of his results:

sethgraph.jpg

Commenting on another recent one of Seth's self-experiments, I wrote,

Seth, Not to be a wet blanket or anything, but aren’t you worried that your findings might be due to expectation effects: you knew which oil you were taking when doing the tests, right?

Seth replied,

Andrew, no, I’m not worried that the results are due to expectations. If the results always conformed to my expectations, I’d be worried, but they haven’t — see my post about eggs. Moreover, this particular result confirms a result that was a surprise. In other words, I’ve gotten the same result when I was expecting it and when I wasn’t expecting it.

I'm still concerned, though. Seth is saying that it's not just an expectation effect because he wasn't always expecting the results. But I could see a bias arising from positive feedback, as follows: You try a new treatment and then see what happens after, with no expectations except that things might change. There is some noise to this measurement--just at random, it will be higher or lower than before. Having seen this, you adjust your expectations; this then affects your next measurement, etc.

I'm not saying this is definitely happening, but it could be.

To Seth: maybe you could get a partner in experimentation, someone who lives or works nearby, and he or she could give you a randomly assigned oil. That is, your partner would know which oil you're getting, but you wouldn't. In fact, you wouldn't even know if you were being given something new that day. (It wouldn't be hard to set up some complicated randomization scheme so that, for example, you would get one oil for several days, then another, etc.) You, of course, could provide the same service for your self-experimentation partner. This also has the virtue that you'll get twice as many measurements.

The Political Brain

Boris pointed me to this review by David Brooks of Drew Westen's book, ``The Political Brain: The Role of Emotion in Deciding the Fate of the Nation.'' Brooks writes:

How do you summarize logistic regressions and other nonlinear models? The coefficients are only interpretable on a transformed scale. One quick approach is to divide logistic regression coefficients by 4 to convert on to the probability scale--that works for probabilities near 1/2--and another approach is to compute changes with other predictors held at average values (as we did for Figure 8 in this paper). A more general strategy is to average over the distribution of the data--this will make more sense, especially with discrete predictors. Iain Pardoe and I wrote a paper on this which will appear in Sociological Methodology:

In a predictive model, what is the expected difference in the outcome associated with a unit difference in one of the inputs? In a linear regression model without interactions, this average predictive comparison is simply a regression coefficient (with associated uncertainty). In a model with nonlinearity or interactions, however, the average predictive comparison in general depends on the values of the predictors. We consider various definitions based on averages over a population distribution of the predictors, and we compute standard errors based on uncertainty in model parameters. We illustrate with a study of criminal justice data for urban counties in the United States. The outcome of interest measures whether a convicted felon received a prison sentence rather than a jail or non-custodial sentence, with predictors available at both individual and county levels. We fit three models: a hierarchical logistic regression with varying coefficients for the within-county intercepts as well as for each individual predictor; a hierarchical model with varying intercepts only; and a non-hierarchical model that ignores the multilevel nature of the data. The regression coefficients have different interpretations for the different models; in contrast, the models can be compared directly using predictive comparisons. Furthermore, predictive comparisons clarify the interplay between the individual and county predictors for the hierarchical models, as well as illustrating the relative size of varying county effects.

The next step is to program it in general in R.

In The Strange Death of Tory England, a book full of great lines, Geoffrey Wheatcroft writes,

Just as the labour movement had never been quite sure whether the capitalist system was on its last legs and needed only a final push to be toppled, or was healthy enough to be milked over and again, so the cultural-intellectual left had never quite decided whether it liked increasing prosperity or not.

Early primaries

Frank DiTaglia writes,

Do you know of any research into the influence of early primaries? I've been wondering about the extent to which bandwagon effects give the first few primaries undue weight, but it seems like a relatively difficult problem to study. Any thoughts?

My reply: I agree that this is a difficult problem to study. I think, like most people, that early primaries are more influential because of the winnowing process as the number of candidates get reduced. There's "bandwagon" (which I interpret as voters supporting a candidate because they hear good things about the candidate, or even because the fact that the candidate gets support indicates that he or she has something good to offer. And then there's "electability" and there's "strategic voting" (or, as the say in England, "tactical voting") which is related but slightly different. I expect there's lots of research on these topics.

3D social network visualization

Juli sent this link. My pet peeve is calling things "3D." It's a 2D display--it's on a flat screen, after all. Or I guess it is actually 3D, counting time as the third dimension. In any case, it looks cool, but I'm skeptical about its usefulness for understanding networks (for the reasons Matt has given in the past).

Percent Changes?

Benjamin Kay writes about a problem that seems simple but actually is not:

I've come across a pair of problems in my work into which you may have some insight. I am looking at the percentage change in earnings per share (EPS) of various large American companies over a 3 year period. I am interested in doing comparisons of how other attributes influence the median value of earnings per share. For example, it might be that high paying companies have higher EPS growth than low paying ones. I am aware that this model might not fully take advantage of the data but I'm preparing it for an audience with limited statistical education.

The problems occur in ranking percentages. If you calculate percentages as (New - Old)/Old then there are two major problems:
1) Anything near zero explodes
2) Companies which go from negative to positive EPS appear to have negative growth rates. (1$) -(- $1) / -$1 = -200%

The first problem is seemingly intractable as long as I am using percent changes, but I cannot use dollar changes because it ignores the issue of scale. A company with 100 shares and $100 in earnings has $1 EPS, and one with 20 shares (and the same earnings) has $5 EPS. If both companies double their earnings to $200 dollars, they've performed identically. However, in absolute changes the former shows $1 change and the latter $5. I'm stuck with what to do here, maybe there is another measure of change that I haven't considered or another way of doing this entirely.

One thing I've considered for the second problem is taking the absolute value of companies whose EPS changes sign. That seems equivalent to claiming that a change from $1 to $3 EPS is equivalent to a -$1 to $1 change in EPS. Is that a standard approach to treating percent changes? Are there any other assumptions lurking underneath when doing this?

Is there a classic reference to doing order statistic work like this on percentile data?

My reply: this is an important problem that comes up all the time. The percent-change approach is even worse than you suggest, because it will blow up if the denominator approaches zero. Similar problems arise with marginal cost-benefit ratios, LD50 in logistic regression (see chapter 3 of Bayesian Data Analysis for an example), instrumental variables, and the Fieller-Creasy problem in theoretical statistics. I've actually been planning for awhile to write a paper on estimation of ratios where the denominator can be positive or negative.

In general, the story is that the ratio completely changes in interpretation when the denominator changes sign (as you illustrated in your example). But yeah, dollar values can't be right either. I have a couple questions for you:

a. How important are the signs to you? For example, if a given company changes from -$1 to $1, is that more important to you than a change from $1 to $3, or from $3 to $5?

b. For any given company, do you want to use the same scaling for all three years? I imagine the answer is Yes (so you don't have to worry about funny things happening such as an increase of 25%, followed by a decrease of 25%, does not bring things to the initial value).

One approach might be to rescale based on some relevant all-positive variable such as total revenue. I'm sure many other good options are available, once you get away from trying to rescale based on a variable that can be positive or negative.

Ken Rice on conditioning in 2x2 tables

At the bottom of this entry I wrote that the so-called Fisher exact test for categorical data does not make sense. Ken Rice writes:

It turns out the standard conditional likelihood argument (which to me always looked prima facie contrived and artifical) is in fact exactly what you get from a carefully considered random-effects approach.

Data Analysis Using etc. etc.

Timothy Hellwig wrote a nice review of our book in The Political Methodologist.

My only complaint is that he writes, "Instructors of first-year graduate methods courses should consider complementing their texts with material from Part I." I think we should be the primary text! In all seriousness, it would be helpful to know what he (and others) think our book should have so that it can become appropriate for a primary text. (We're preparing to put together a shorter version suitable for basic classes in regression.) I agree completely with Hellwig in preferring "learning by doing" to "learning by lecturing."

Jeremy Miles writes,

The most recent issue of Significance had a very interesting article by Stephen Senn, in which he wrote about the TeGenero tgn1412 drug trial catastrophe which occurred in March 2006, when 6 volunteers received the drug, and two received a placebo. The 6 volunteers almost immediately had massive immune system reactions - specifically a cytokine storm, and were hospitalised for at least a month.

What we have here, is the potential of a statistical analysis. We've got a 2x2 table, so let's do the stats.

                       Placebo    Drug
Yes 0 6
Cytokine Storm
No 2 0

A 2x2 table. We obviously can't do a chi-square test, as the sample is too small. But we can do a Fisher's exact test. If we do that we get a one-tailed p of 0.036. It's a one-tailed test, so our p-value cut off is 0.025, so we don't have evidence that the drug caused the cytokine storm, and all the subsequent ills.

But that's got to be a silly thing to say. It's obvious that the drug did cause the cytokine storm. It's not just barely significant; it's really, really obvious. Why is it so obvious? It's obvious because people don't have cytokine storms every day. In fact, if you haven't got the Spanish Flu we're pretty safe saying that you will never have a cytokine storm. In other words, it's not just the data that we have obtained here that we need to take into account. We need to take into account the probability of having a cytokine storm ever is very low. In other words, we need to take into account the prior probability. And so we have just done a Bayesian analysis.

This is a great example. I have no knowledge of cytokine storms, so I'm not quite sure how to put a prior distribution on this. One way to think about this problem is to imagine how you would analyze the data if you only had the 6/6 data from the treated group, and no data from the controls. Or what if the treated group were only 3/6? If cytokine storms are really rare, then even 3/6 cases (even 1/6?) would be evidence of a problem. (Conversely, even one cytokine storm among the controls would cast doubt upon the prior assumption that cytokine storms are extremely rare in the population represented by the study.)

A question

I'm curious what Stephen Senn (or Jeremy Miles) would conclude if all we had were data on 6 controls, where, say, 1 had cytokine storms and 5 didn't. It sounds like this would be enough to stop the clinical trial. In that case, the prior distribution would certainly be relevant!

Just for laffs

I analyzed the data using our default weakly informative prior distribution. In R:

> library(arm)
> y <- rep (c(1,0), c(6,2))
> x <- y
> M1 <- bayesglm (y~x, family=binomial(link="logit"))
> display(M1)
bayesglm(formula = y ~ x, family = binomial(link = "logit"))
            coef.est coef.se
(Intercept) -1.86     1.88  
x            4.80     2.31  
  n = 10, k = 2
  residual deviance = 1.2, null deviance = 9.0 (difference = 7.8)

So the difference is statistically significant! As noted above, this doesn't really resolve the real issue, since apparently the problem would arise even with only 1 storm among the treated units and no control data at all. But I was just curious how this would work out.

By the way, don't do the so-called Fisher exact test

Senn discusses how the so-called Fisher exact test does not give a statistically significant result in this example. In any case, as I've written elsewhere (see Section 3.3 of this paper), the so-called Fisher exact test makes no sense in this sort of problem, where only one of the two margins is specified by the experimental design.

In this discussion of Allegra Goodman's book novel Intuition, Barry wrote, "brilliant people are at least as capable of being dishonest as ordinary people."  The novel is loosely based on some scientific fraud scandals from the 1980s, the one of its central characters, a lab director, is portrayed as brilliant and a master of details, but who makes a mistake by brushing aside evidence of fraud by a postdoc in her lab.  One might describe the lab director's behavior as "soft cheating" since, given the context of the novel, she had to have been deluding herself by ignoring the clear evidence of a problem.

Anyway, the question here is:  are brilliant scientists at least as likely to cheat?  I have no systematic data on this and am not sure how how to get this information.  One approach would be to randomly sample scientists, index them by some objective measure of "brilliance" (even something like asking their colleagues to rate their brilliance on a 1-10 scale and then taking averages would probably work), then do a through audit of their work to look for fraud, and then regress Pr(fraud) on brilliance.  This would work if the prevalence of cheating were high enough.  Another approach would be to do a case-control study of cheaters and non-cheaters, but the selection issues would seem to be huge here, since you'd be only counting the cheaters who got caught.  Data might also be available within colleges on the GPA's and SAT scores of college students who were punished for cheating; we could compare these to the scores of the general population of students.  And there might be useful survey data of students, asking questions like "do you cheat" and "what's your SAT" or whatever.  I guess there might even be a survey of scientists, but it seems harder to imagine they'd admit to cheating.

Imputing categorical regressors

Jonathan writes,

A grad student, Gabriel Katz (no relation), and I are working on some MCMC code with survey data to handle misreporting issues in voting data. (An old idea of mine that I presented an early draft of at Columbia years ago). Since we have coded up the models in Bugs/JAGS, we decided we might as well try to also handle some of the missing data in covariates. We are, however, a little trouble with the imputation of missing categorical covariates. We have been trying ordered probit-logit priors (so as to avoid using the easy way out using normal priors), but the problem is that it is quite hard to get BUGS/Jags to bracket-slice for certain categories of the ordered variables. The problem seems to be in choosing good priors for the thresholds and means of the categorical variable. We don't know of any principled way to this. We have tried several different values for the variance of the priors, but in our model we have 5 categorical variables with a total of 22 thresholds, so trial-and-error seems hopeless. Since this is really just a secondary problem for us, perhaps we should ignore the missingness problem as most researchers do. If you have any suggestions or pointers, it would be much appreciated.

IRB watch

This is amusing (from Seth). It can get a bit Kafka-esque. I've been on NIH panels and it's amazing the things people bring up as human subjects objections. On the other hand, it's in reaction to real abuses in the past. Here's the article by Fredric Coe with the story about the scrap urine.

Anna Reimondos writes,

Racial bias in baseball umpiring?

Some economists from McGill University and UT-Austin just wrote a paper (Parsons et al., 2007) that purports to find racial bias in baseball umpiring, specifically ball/strike calls on pitches at which the batter does not swing. Here's the payoff quote:


The highest percentage of called strikes occurs when both umpire and pitcher are White, while the lowest percentage is when a White umpire is judging a Black pitcher. What is intriguing is that Black umpires judge Hispanic pitchers harshly, relative to how they are judged by White and Hispanic umpires; but Hispanic umpires treat Black pitchers nearly identically to the way Black umpires treat them. Minority umpires treat Asian pitchers far worse than they treat White pitchers.

(Personally, I'm not sure I agree that the apparent bias of black umpires against hispanics and vice versa is more (or less) intriguing than the apparent bias of whites against blacks. But Tom Lehrer's National Brotherhood Week comes to mind.)

Another bad chart

I don't want to be doing this every day, but I have to agree with Brendan, who writes that "the differing tilt and skew of the two Y axes makes it really hard to interpret." I'll do youall a favor and not repeat the graph here. It comes from this article. Visually, though, it is weirdly compelling.

P.S. I fixed the link to the graph.

This (by Aleks, Grazia, Yu-Sung, and myself) is really cool. Here's the abstract:

We propose a new prior distribution for classical (non-hierarchical) logistic regression models, constructed by first scaling all nonbinary variables to have mean 0 and standard deviation 0.5, and then placing independent Student-t prior distributions on the coefficients. As a default choice, we recommend the Cauchy distribution with center 0 and scale 2.5, which in the simplest setting is a longer-tailed version of the distribution attained by assuming one-half additional success and one-half additional failure in a logistic regression. We implement a procedure to fit generalized linear models in R with this prior distribution by incorporating an approximate EM algorithm into the usual iteratively weighted least squares. We illustrate with several examples, including a series of logistic regressions predicting voting preferences, an imputation model for a public health data set, and a hierarchical logistic regression in epidemiology.

We recommend this default prior distribution for routine applied use. It has the advantage of always giving answers, even when there is complete separation in logistic regression (a common problem, even when the sample size is large and the number of predictors is small) and also automatically applying more shrinkage to higher-order interactions. This can be useful in routine data analysis as well as in automated procedures such as chained equations for missing-data imputation.

It solves the separation problem and now I use it in my routine applied work. It's implemented as bayesglm() in the "arm" package in R.

Here's a pretty picture from the paper showing the performance of different Student-t prior distributions on cross-validation with a corpus of datasets:

log-.png

The Cauchy with scale 0.8 does the best, but we go with the Cauchy with scale 2.5 because it is more "conservative," as statisticians would say. (See here for more discussion of conservatism in statistics.)

Worse than a pie chart?

John sends in this horrible example:

earmarkspercap.jpg

Among other problems, the graph uses areas to represent per-capita numbers. On the plus side, the graph did what was necessary, which was to get attention. For that purpose, the graph is excellent.

Seth points to another example here on taboo research (for background on taboo research, see here and here).

Alice Dreger writes:

Just by coincidence, I happened to get this (unsolicited) email today. It relates to the second item on Steven Pinker's "taboo questions" list ("Were the events in the Bible fictitious -- not just the miracles, but those involving kings and empires?"). It indeed seems to be taboo in the sense that he was discussing.

Names and affiliations on the emails are redacted for politeness.

Medians?

Jeff noticed this news article by Gina Kolata:

EVERYONE knows men are promiscuous by nature. It's part of the genetic strategy that evolved to help men spread their genes far and wide. The strategy is different for a woman, who has to go through so much just to have a baby and then nurture it. She is genetically programmed to want just one man who will stick with her and help raise their children.

Surveys bear this out. In study after study and in country after country, men report more, often many more, sexual partners than women.

One survey, recently reported by the federal government, concluded that men had a median of seven female sex partners. Women had a median of four male sex partners. Another study, by British researchers, stated that men had 12.7 heterosexual partners in their lifetimes and women had 6.5.

But there is just one problem, mathematicians say. It is logically impossible for heterosexual men to have more partners on average than heterosexual women. Those survey results cannot be correct.
...

Jeff's response: MEDIANS??!!

Indeed, there's no reason the two distributions should have the same median. I gotta say, it's disappointing that the reporter talked to mathematicians rather than statisticians. (Next time, I'd recommend asking David Dunson for a quote on this sort of thing.) I'm also surprised that they considered that respondents might be lying but not that they might be using different definitions of sex partner. Finally, it's amusing that the Brits report more sex partners than Americans, contrary to stereotypes.

Intuition, by Allegra Goodman

I read this novel, which is loosely based on various scientific fraud scandals from the 1980s. It was readable, sort of like John Updike in the general themes and similar to Scott Turow in writing style and characterization. (Everything fits into place a bit too cleanly, with each character given some small quirk, a sort of hyper-realism that is just a bit too reasonable to be quite convincing. But, as with Turow, this style actually helps in keeping the reader focused on the ideas of the story rather than on individual characters). Spoilers below . . .

Charlie Gibbons writes,

I will be a TA for intermediate micro this fall and am looking for a program to use to draw graphs for my handouts (eg, utility curves, budget sets, etc). I would like to be able to:

Draw the curves "freehand" (smooth curves based upon path points, but not by plotting, say, y = ln(x) ) Save in vector format (especially encapsulated PostScript so that it is compatible with LaTeX).

This sounds good to me. Does anyone know of any software out there that does this? My quick thought is that it's not actually so hard to write R code to make graphs that look right--I just play with the functional form a bit (using curve(), that convenient hack in R) until it looks how I want. But, yeah, it would be useful to make sketches and put them in documents. I'm sure something's available that does so.

And now, the unsolicited advice

But . . . maybe my real question to Charlie should be: Should you really be making handouts at all? Teaching assistants always want to lecture and prepare handouts. Really, though, textbook writers are professionals, and they've put everything you need right in the book.

My advice is: take the time you were going to put into these handouts, and instead spend it supervising the students in active learning: mostly working in pairs or small groups on homework or homework-like problems. Or, if you really want to prepare some extra material, prepare some drills so you can do your best to make sure that all your students get the basics down.

Many faces

Juan-José Gibaja-Martíns sends in this link. The graphs aren't all how I would do them, but I like that people are taking graphics seriously.

Alban Z writes,

Bill Wilkerson writes:

I am a semi-regular reader of your blog and I teach our applied research course [Dept of Political Science, College at Oneonta]. I have been teaching for 16 years and am generally pretty good at it, but I still feel like I am new at the course although I have taught it four times. I just don't have a feel for what my students need to get out of the course. My colleagues are not quantitative and few of our students go on to grad school in political science or public policy. (Most that continue their formal educations go to law school or go for an MPA.) I know what they don't want: anything with numbers. This is true despite the fact that we require stats 101. Numbers seem to be part of the deal with a social science methods course. :-) And it is a comfort and knowledge with numbers that will set them apart from many of their colleagues in the work force.

Anyway, I am curious about your recent discussion of graphs versus tables. Should I be teaching my undergraduate students to graph this data? If so what tool should they use? I used STATA when writing my diss, but have settled on SPSS at Oneonta as that is what we have a license for. I have never used R in any serious way. I have even toyed with using Excel as that is what they are likely to use in an office setting. Thoughts?

My reply: That's a good question. I have the luxury of being associated with particular topics that our grad students want to learn: multilevel models, statistical graphics, Bayesian statistics, statistical consulting, sample surveys, decision analysis, and the teaching of statistics. I did teach intro stat to the undergrads for several years (which motivated our Teaching Statistics book), but I became so dissatisfied with what I was covering that I've taken a break from teaching it until I can redo the entire course--new textbook, new workbook for the students and T.A.'s, etc.

I think you might be in a slightly better position: at least your students are all from the political science department and have some common interests. I have no firm views on your software question. Stata is an excellent tool in social science research, but Dick De Veaux said he's had lots of success teaching using JMP-In, which is a version of SAS (fixed; thanks to the commenters). But, yeah, I think they should have to make graphs by hand if necessary. I hate, hate, hate the Excel graphs. More generally, when considering "what students need to get out of the course," maybe you could survey some alumni. This could be an excellent class project, and something the students could be motivated to do! It would also introduce them to some qualitative research techniques. One of my difficulties when setting up our Quantitative Methods in Social Sciences program at Columbia several years ago is that the students just wanted to download data from the internet rather than use their personal expertise (whatever it was, for each student) to learn something new. (It's sort of like that anecdote about the creative writing teacher who asks the students to write about what they know, and then gets lots of sub-Tarantino screenplays.)

P.S. That last remark is not meant to mock students, rather to indicate the challenge that anyone--student or practitioner--has in trying to connect coursework to real life.

I prefer dotplots to barplots

Masanao made this:

pointgrid.png

and this:

bargrid.png

for a paper I'm involved in. Unfortunately, neither one is going in the paper. In any case, I prefer the one with the dots.

I would only make a few little changes, mostly to make everything smaller (while keeping fonts readable), pulling the lines closer together and also writing the leftmost labels on two lines so they'll fit.

Oscar winners do not live longer

Modeling on the original or log scale

Shravan writes,

Here is a typical problem I keep running into. I'm analyzing eyetracking data of the sort you have already seen in the polarity paper. Specifically, I am analyzing re-reading times at a particular word as a function of some experimental conditions that I will call c1 and c2. I expect an effect of c1 and c2, and an interaction. I get it when I analyze on raw reading times (milliseconds) but get only the interaction when I analyze on the log RTs. The logs' residuals are normally distributed and the raw RTs' are not. I am inclined to trust the log RTs more because of the normal residuals (theory, however, is more in line with raw RT-based results). But reviewers keep insisting I analyze on untransformed (raw) reading times, and your book also advises the reader to ignore residuals.

My reply:

JudgeIt II

JudgeIt is a computer program for evaluating election returns and redistricting plans. Gary King and I wrote it around 1990 for the purpose of estimating seats-votes curves for congressional and state legislative elections, and at the same time adapted it to do other things such as district-by-district forecasting. Gary and others have used it many times since to evaluate actual redistricting plans (in actual court cases, or so I've heard). The method is described in detail in our 1994 AJPS paper--it's simple enough that we reprogrammed the basics from scratch in this paper on the 2006 House--but it's convenient to have it already programmed.

Anyway, Andrew Thomas and Gary translated Judgeit into R, and here it is! It's all open-source and so we're hoping people will improve it and criticize it as well as make it easier to use. The next step is to link it with a big database of elections. Also, we'd like to update our 1994 APSR paper to estimate the effects of the 1990 and 2000 redistrictings--people say that these were more nefarious, anti-competitive affairs than in the past, but my guess is that when we crunch the numbers, we'll find again that redistricting (usually) enhances democracy.

What is a taboo question?

This is some mix of political science and sociology, I'm not quite sure which...

From Greg Mankiw I saw this newspaper article by Steven Pinker, "In defense of dangerous ideas: In every age, taboo questions raise our blood pressure and threaten moral panic. But we cannot be afraid to answer them":

Do women, on average, have a different profile of aptitudes and emotions than men?

Were the events in the Bible fictitious -- not just the miracles, but those involving kings and empires?

Has the state of the environment improved in the last 50 years?

From British Psychological Society research digest, a fascinating article by Merrill Hiscock on the Flynn effect. Here's the abstract:

Evidence from several nations indicates that performance on mental ability tests is rising from one generation to the next, and that this "Flynn effect" has been operative for more than a century. No satisfactory explanation has been found. Nevertheless, the phenomenon has important implications for clinical utilization of IQ tests. This article summarizes the empirical basis of the Flynn effect, arguments about the nature of the skill that is increasing, and proposed explanations for the cause of the increase. Ramifications for clinical neuropsychology are discussed, and some of the broader implications for psychology and society are noted.

Among other things, Hiscock notes that Flynn and others have found the Flynn effect, and the related occasional re-norming of IQ scores, to cause jumps in the number of people classified as mentally retarded (conventionally, an IQ of 70, which is two standard deviations below the mean if the mean is scaled at 100). When they rescale the tests, the proportion of people labeled "retarded" jumps up. Seems like a natural experiment that might be a good opportunity to study effects of classifying people in this way on the margin. If the renorming is done differently in different states or countries, this would provide more opportunity for identifying treatment effects.

See here for a discussion of Flynn's thought on why meritocracy is logically impossible (also here for related thoughts).

Relative prices of different liquids

OK, this is great (see below). What I want is a cleaner graph (a horizontal dotplot, please, instead of a *&^!@&*#@ 3-d color barplot), on the logarithmic scale (base 10, please), also including some other liquids of interest, such as fresh water (e.g., divide the total cost of maintaining the NYC water system divided by the total amount used each year), mercury, pig's blood, olive oil, Coca-Cola, epoxy, liquid nitrogen, etc etc.

BloodInk.jpg

Tim Penn links to this paper by Christopher Zorn and Jeff Gill. Here's the abstract:

Since its introduction in 1973, major league baseball’s designated hitter (DH) rule has been the subject of continuing controversy. Here, we investigate the political and socio–demographic determinants of public opinion toward the DH rule, using data from a nationwide poll conducted during September 1997. Our findings suggest that it is in fact Democrats, not Republicans, who tend to favor the DH. In addition, we find no effect for respondents’ proximity to American or National League teams, though older respondents were consistently more likely to oppose the rule.

My first thought is: this is amusing but why is it in a top political science journal? But, reading the article, I realize that it indeed has more general implications. In particular, if we can make the assumption that causality only goes in one direction here--that a change in the view on the designated hitter will not affect one's political preferences--then this is a clean study, a way of estimating the coherence of political ideology into non-political areas.

Ranking is a trap

Ranks have lots of problems. They're statistically unstable (see the work of Tom Louis) and can mask nonlinearity. I was recently reminded of these patterns in seeing two sets of graphs reproduced by Kaiser:

The worst graph every made?

This (found by Kaiser Fung from this horrible BBC site) it is possibly the worst graph I've ever seen:

skillswise.png

As Kaiser writes, "The use of patterns for shading is especially disconcerting. The graphic also lacks self-sufficiency as we have trouble comparing countries without referencing the underlying data." And commenter Chris Hibbert points out that "they seem to be using a pie chart to compare independent data points. The chart about GP consultations should be a bar chart, since the point is to compare the countries side-by-side rather than to emphasize that they make up a single whole (they don't.)" Actually, I'd prefer a dotplot (following Bill Cleveland's general advice), but that's just quibbling.

Bob Shapiro sent along this paper:

Does unipolarity per se free the United States to use force abroad and thus make war more likely? Hardly. As the United States is learning, an imprudent war can still be costly for a sole superpower. The price tag for the Iraq occupation alone has been projected to exceed a trillion dollars. This is true not only in fact, as they say, but even more important, in theory. According to rational bargaining theories of war, an increase in the power of a hegemonic state should in itself have no effect on the likelihood of war. As long as all actors share common information about the change in power, the states losing relative power should simply give proportionately more concessions in disputes.

If so, the main effect of unipolarity on the likelihood of war, if any, should come from its effects on domestic politics and ideology, which could cause the expectations of the opponents to diverge. Under unipolarity, the immediate, self-evident costs and risks of war are more likely to seem manageable, especially for a hegemonic power like the U.S. that commands more military capacity than the rest of the world combined. This does not necessarily make the use of force cheap or wise, but it means that the costs and risks of the use of force are comparatively indirect, long-term, and thus highly subject to interpretation. This interpretive leeway may open the door to domestic political impulses that lead the hegemon to overreach its capabilities. If opponents sense that the hegemon is overplaying a weak hand, this increases the chance that the hegemon will need to fight hard to try to get its way.

Electrosensitivity

A blogger writes,

Standard errors from lmer()

Jarrett Byrnes writes,

This is something that Suresh Venkatasubramanian did. I'm not quite sure what it's all about, but since I gave him some help on it, I thought I'd link to it. The thing that impresses me is that apparently the papers in these conferences have median citation counts of about 20. I think the median citation count of all my published papers is zero.

Matt Franklin sent in this improved version of this picture:

votemap.jpg

Thanks, Matt!

Shravan writes,

Masanao: Aleks and I did a PCA analysis on 2008 Presidential Election Candidates on the Issues data and plotted the 2 principal components scores against each other and got this nice result:
President2008.png
The horizontal axis is the 1st primary component score; it represents the degree to which a candidate supports Iraq War and Homeland Security (Guantanamo), and opposes Iraq War Withdrawal, Universal Healthcare, and Abortion Rights. The vertical axis is the 2nd primary component score which represents the degree to which a candidate supports Iraq War Withdrawal and Energy & Oil (ANWR Drilling), and opposes Death Penalty, Iran Sanctions, and Iran Military Action as Option.

The first principal component is the dividing axis for the Democrats and the Republicans. When we reorder the loadings according to the 1st component we get the following:
President2008loadings2.png
So for the first principal component, Republicans generally support red variables and the Democrats the blue colored variables. Ron Paul appears to be the only candidate that does not deviate much from the middle.

The second principal component is a little more difficult to interpret. Here most of the candidates are clustered around the middle except for candidate Ron Paul who supports Iraq withdrawal, Energy & Oil (ANWR Drilling), Immigration (Border Fence) but does not support other issues.
Here are the loadings ordered by the 2nd component:
President2008loadings1.png

Aleks: With the exception of Paul, there is a lot of polarization on the first component. To some extent the polarization is a consequence of the data expressing candidates' opinions in terms of binary supports/opposes. When a candidate did not express an opinion, we have assumed that the opinion is unknown (so we use imputation), in contrast to a candidate refusing to take an opinion on an issue. When it comes to the issue of polarization: Delia Baldassarri and Andrew have suggested, that it's the parties that are creating polarization, not the general public.

In fact, I think polarization is a runaway consequence of political wedging: in the spirit of Caesar's divide et impera, one party wants to insert a particular issue to split the opposing party. This gives rise to the endless debates on rights of homosexuals, biblical literalism, gun toting, weed smoking, stem cells and abortion rights: these debates are counter-productive (especially at federal level), but the real federal-level problems of special interest influence, level of interventionism, economy, health care get glossed over. It just saddens me that the candidates are classified primarily by a bunch of wedge issues. A politician needs a wedge issue just as much as a soldier needs a new gun: it's good for him, but once both sides come up with guns, the soldier loses. In the end, it's better for all politicians to get rid of wedge issues every now and then by refusing to take a stance on a wedge issue. In summary, it would be refreshing if the candidates jointly decided not to take positions on these runaway wedge issues on which people will continue to disagree on, and delegate them to the state level, while focusing on the important stuff.

Masanao: Although the candidates' opinions in the spreadsheet are probably not their final ones, it's interesting to see the current political environment. If there was similar data on of the general public, it would be interesting to overlay them on top of each other to see who is more representative of the public.

Details of methodology:

DecisionChainB.png

Some browsers don't seem to be able to display this image very well for some reason, so you may need to try the PDF version.
This is a rather complicated graphic that relates an acceptable risk of contracting anthrax (upper left) to the number of samples you should take to check for anthrax in a building (lower right). The relationship between risk and airborne concentration of spores is very uncertain; the relationship between airborne and surface concentrations is highly variable, depending on activity levels and surface type and other factors; detection probabilities can vary quite a bit; and you might decide that you want to be 99% sure or 99.9% sure that, if the building is contaminated at an unacceptable level, you will obtain at least one 'positive' sample.

I like this graphic: I think that by playing around with it for a few minutes, you can understand how the various assumptions affect the outcome. I think it is more effective than, say, 8 different plots of "number of samples needed" versus "acceptable risk level", representing different combinations of assumptions for dose-response, resuspension, and detection probability. But a statistician friend says it's "too clever by half."

What do you think?

Shravan writes,

In his book, Fooled by Randomness, Taleb essentially rejects the notion that past results [knowledge] can increase information incrementally. For example, he says [approvingly] that Popper "refused to blindly accept the notion that knowledge can always increase with incremental information--which is the foundation of statistical inference" (p. 127). Isn't this the same thing as saying that informative priors are not really informative? Informative priors represent our previous knowledge--Taleb rejects that as a basis for predicting the future. I see his point regarding trading practices, but I wonder if his position would extend to any statistically driven inference in, say, experimental psychology. I would think not.

My reply: I think the point is that the model itself continually needs to be reassessed, and in good work it ends up getting revised at irregular intervals; see here.

The difference between ...

Bruce McCullough points out this blog entry from Eric Falkenstein:

Recently the Wall Street Journal has had several articles about estrogen's link to heart disease in women, highlighting a recent New England Journal of Medicine article showing that it lowers risk of arterial sclerosis. Then last week, the Journal did a story concentrating on how the Women's Health Initiative (WHI) misread the data by focusing on the increased heart attack risk for women over 70, While neglecting the lowered rate of heart attack for women under 60 (since the WHI's 2002 report arguing that estrogen therapy actually raised heart disease--opposite sign to previous findings--hormone sales plummeted 30%). The WHI shot back in a letter to the WSJ, arguing they stand by their interpretation of the data, which they think is somewhat mixed, and in their words, the differences in heart disease between the older and younger (one up, one down!) is not 'statistically significant'. If the difference isn't statistically significant, I can't see how the old cohort can be thought to have a higher than average risk (eg, if the sample estimate for the old is +14%, for the young, -30%, if the difference is noise, the +14% is certainly noise). As Paul Feyerabend argued, there are no definitive tests in science, as people just ignore evidence that goes against them, emphasizing the consistent results.

I don't really have anything new to say about the Women's Health Initiative but I did want to point this out since it's an interesting reminder about the difficulty of using statistical signifcance as a measure of effect size.

Just a couple of weeks ago I was meeting with some people who were doing a health study where effect A was positive and not statistically significant, effect B was negative and not stat signif., but the difference was stat. signif. They had another comparison in their study where A was positive and stat signif, B was negative and not signif, and the difference was not stat signif. They were struggling to figure out how to explain all these things. Rather than give some sort of "multiple comparisons correction" answer, I suggested the opposite: to graphically display all their comparisons of interest in a big grid, to get a better understanding of what their study said. Then they could go further and fit a model if they want.

Confidence building

Confidence-building is an under-researched area in statistics. Some pieces of confidence-building:

Gabor gets the grab

Actually a few weeks ago.

Jim Gibson sent me this paper. Here's the abstract:

Conventional political science wisdom holds that contemporary American politics is characterized by deep and profound partisan and ideological divisions. Unanswered is the question of whether those divisions have spilled over into threats to the legitimacy of American political institutions, such as the United States Supreme Court. Since the Court is often intimately involved in making policy in many issue areas that divide Americans—including the contested 2000 presidential election—it is reasonable to hypothesize that loyalty toward the institution depends upon policy and/or ideological agreement and partisanship. Using data stretching from 1987 through 2005, the analysis reveals that Court support among the American people has not declined. Nor is it connected to partisan and ideological identifications. Instead, support is embedded within a larger set of relatively stable democratic values. Institutional legitimacy may not be obdurate, but it does not seem to be caught up in the divisiveness that characterizes so much of American politics — at least not at present.

My comments:

Kanazawarama

Thomas Volscho writes:

David Weakliem mentioned your blog posting on one of Kanazawa's papers and its methodological shortcomings. I wrote a critique of one of his papers for The Sociological Quarterly and the editor gave him a chance to respond and allowed me to write a reply. He said he would respond but never did. The article appeared in The Sociological Quarterly in 2005.

Jeremy Freese has also had a couple of printed critiques of Kanzawa's research appearing in the American Journal of Sociology and Evolution and Human Behavior.

Copernican probability estimates

Benjamin Kay writes,

The article Survival Imperative for Space Colonization applies what they call Copernican probability estimates. Essentially, this is a way of getting a probability estimate of the life span being in an interval by assuming their is nothing special about it being alive today. Is this like choosing a very particular prior such that the posterior distribution is uniform, and essentially updating it with a single observation? I'm just wondering what this is exactly, and I'd be interested in seeing a Statistical Modeling, Causal Inference, and Social Science post on this subject if you were looking for something to write about.

Is statistical inference or really more of a parlor trick?

My reply: I think this is the same as the "doomsday argument," and I think it's wrong. To quote myself from a couple years ago:

Questions not to ask me

I got the following email:

Dick De Veaux gave a talk for us a few years ago, getting to some general points about statistics teaching by starting with the question, Why are there no six year old novelists? Statistics, like literature, benefits from some life experience. Dick writes,

We haven’t evolved to be statisticians. Our students who think statistics is an unnatural subject are right. This isn’t how humans think naturally. But it is how humans think rationally. And it is how scientists think. This is the way we must think if we are to make progress in understanding how the world works and, for that matter, how we ourselves work.

Here's the talk. I recommend it to everyone who teaches statistics.

Isolationists and internationalists

Matt Winters points us to this paper by Brian Rathbun. Matt writes: "I came across this article today that reminded me of last week's discussion in the playroom regarding isolationists and internationalists. I haven't read it, but he appears to use principal components analysis of survey questions to identify people as caring about community or hierarchy at the international level or else being isolationists, and then he looks at how people's attitudes affect their responses with regard to proposed actions in hypothetical scenarios."

tables2graphs.com

John Kastellec writes,

Eduardo Leoni and I have created a web site, located at http://tables2graphs.com, accompanying our paper, "Using Graphs Instead of Tables in Political Science," which is available here.

The site contains complete and annotated R code for all the graphs that appear in the paper. We hope that readers interested in turning tables into graphs can use this code to produce their own graphs in R.

We also would like your help. Because so many social scientists use Stata, we would also like to provide Stata code for creating each graph (if possible). Neither of us is fully versed in Stata graphics, however, and the site currently provides Stata code for only one of the graphs. If you have Stata code that we could apply to some of our graphs and don't mind sharing it with us, we would greatly appreciate it. (Our email addresses can be found on the site).

Regular (or even irregular) readers of this blog will be able to guess that I am supportive of this project.

Neal writes,

As I start your Bayesian stuff, can I ask you the same question I asked Boris a few years ago, namely, as you note, noninf priors simply represent the situation where we know very little and want the data to speak (so in the end not too far from the classical view). Can you point me to any social science (closer to ps is better) where people actually update, so that the prior in a second study is the posterior of the first (whether or not the two studies done by same person or not).

Equivalently - point me to a study which uses non-inf priors. (as more than a toy - i know the piece by gill and his student).

Btw do you know the old piece by Harry Roberts, saying that as a scientist all we can report is the likelihood, and that everyone should put their own prior in and then produce their own posterior. so all articles would just be a computer program which takes as input my prior and produces my posterior given the likelihood surface estimated by the author?

My reply: now I like weakly informative priors. But that's new since our books. Regarding informative priors in applied research, we can distinguish three categories:

(1) Prior distributions giving numerical information that is crucial to estimation of the model. This would be a traditional informative prior, which might come from a literature review or explicitly from an earlier data analysis.

(2) Prior distributions that are not supplying any controversial information but are strong enough to pull the data away from inappropriate inferences that are consistent with the likelihood. This might be called a weakly informative prior.

(3) Prior distributions that are uniform, or nearly so, and basically allow the information from the likelihood to be interpreted probabilistically. These are noninformative priors, or maybe, in some cases, weakly informative.

I have examples of (1), (2), and (3) in my own applied research. Category (3) is the most common for me, but an example of (2) is my 1990 paper with King on seats-votes curves, where we fit a mixture model and used an informative prior to constrain the locations, scales, and masses of the three components. An example of (3) is my 1996 paper with Bois and Jiang where we used an informative prior distribution for several parameters in a toxicology model. We were careful to parameterize the model so that these priors made sense, and the model also had an interesting two-level structure which we discuss in that paper and also in Section 9.1 of Bayesian Data Analysis.

Regarding your question about models where people actually update: we did this in our radon analysis (see here) where the posterior distribution from a national data analysis (based on data from over 80,000 houses) gives inference for each county in the U.S., which is in turn used as the prior distribution for the radon level in your house, which in turn can be updated if you have information from a measurement in your house.

One of the convenient things about doing applied statistics is that eventually I can come up with an example for everything from my own experience. (This also makes it fun to write books.)

Regarding your last comment: yes, there is an idea that a Bayesian wants everyone else to be non-Bayesian so that he or she can do cleaner analyses. I discuss that idea in this talk from 2003 which I've been too lazy to write up as a paper.

To be honest, I have no idea what this stuff is all about, but it certainly seems important and worth a few dozen Ph.D. theses. Any students around here who can figure this all out and want to explain it to me, I'd be glad to be your advisor, or be on your committee, or whatever. I can't figure out what's going on with this one-pixel camera thing but it certainly looks cool.

cscam.gif

P.S. Apparently David Dunson's on the job. Cool.

Michael Sobel sent me this paper which will appear in the Journal of Educational and Behavioral Statistics. It's about mediation: a crucial issue in causal inference and a difficult issue to think about. The usual rhetorical options here are:

- Blithe acceptance of structural equation models (of the form, "we ran the analysis and found that A mediates the effects of X on Y")

- Blanket dismissal (of the form, "estimating mediation requires uncheckable assumptions, so we won't do it")

- Claims of technological wizardry (of the form, "with our new method you can estimate mediation from observational data")

For example, in our book, Jennifer and I illustrate that regression estimates of mediation make strong assumptions, and we vaguely suggest that something better might come along. We don't provide any solutions or even much guidance.

Michael has thought hard about these problems for a long time. (For example, see here and here, or for some laffs, here.) Michael's also notorious for pointing out that the phrase "causal effect" is redundant: all effects are causal. Anyway, I was interested to see what he has to say about mediation. Here's the abstract of the paper:

Two different people (Christoper Mann and Jeff Lax) pointed me to this graph in the Wall Street Journal that features a goofy regression line. My expertise on taxes and economic growth is zero, and the statistical problems with the regression line are apparent, so I don't really have anything to say here. Hey, if all roads go through Rome, it's only fair that all lines go through Norway.

But, to get serious for a minute . . . Setting aside the concerns with the regression line or with measurement issues in defining the variables being graphed, it's an interesting reminder of the duality between descriptive vs. causal inference and aggregate vs. individual-level analysis (or, as would be said in psychology, between-subject vs. within-subject analysis). I'm not criticizing the use of graphs such as these (or corresponding regression models) that use between-country comparisons to make implicit causal inferences about policies--it's just helpful to remember the assumptions needed to draw these conclusions.

Calibration in chess

Daniel Kahneman posted the following on the Judgment and Decision Making site:

Have there been studies of the calibration of expert players in judgments of chess situations -- e.g., probability that white will win?

In terms of the amount and quality experience and feedback, chess players are at least as privileged as weather forecasters and racetrack bettors -- but they don't have the experience of expressing their judgments in probabilities. I [Kahneman] am guessing that the distinction between a game that is "certainly lost" and "probably lost" is one that very good players can make reliably, but I know of no evidence.

chess.jpg

Despite knowing much less about decision making and (likely) less about chess than Kahneman, I have three conjectures:

Matt Wand asks,

I'm wondering if you have any pointers to putting priors on shape parameters (e.g. the positive one that extends the Poisson to negative binomial)? I've already started taking advice for variance components from you.

My thoughts:

As implied by Matt's question, shape parameters are closely connected to hierarchical variance parameters. For example, the negative binomial shape parameter can be mapped to the coefficient of variation of a latent gamma distribution for the underlying probabilities. Thus, one approach would be to put a prior distribution on this coefficient of variation parameter--which is very close to a hierarchical sd--and then transform that back to an implication for the negative binomial shape parameter.

In our paper on social networks, we used uniform prior distributions on the negative binomial overdispersion parameters, and I think we could've done better, in this case using a hierarchical model on these parameters. (We had 32 of them in our example.) This is the approach recommended in Section 6 of our paper.

P.S. See Dana's suggestion in comments: it's a model that seems to allow the distribution to be underdispersed, which is slightly different (maybe better) than my model. Dana suggests an exponential prior distribution for the coefficient of variation parmater, which seems like it could work, but I have a sentimental attachment to the half-Cauchy because it is flat near zero and allows for occasional large values.

David Kane writes,

You posted before on the Burnham et al (2006) study on Iraqi mortality. I [Kane] have an R package with some preliminary analysis and comments.

My question concerns the confidence intervals reported in the prior study, Roberts et al (2004), by many of the same authors.

Jonah Piovia-Scott writes,

Goals and plans in decision making

For years, Dave Krantz has been telling me about his goal-based model of decision analysis. It's always made much more sense to me than the usual framework of decision trees and utility theory (which, I agree with Dave, is not salvaged by bandaids such as nonlinear utilities and prospect theory). But, much as I love Dave's theory, or proto-theory, I always get confused when I try to explain it to others (or to myself): "it's, uh, something about defining decisions based on goals, rather than starting with the decision options, uh, ...." So I was thrilled to find that Dave and Howard Kunreuther just published an article describing the theory. Here's the abstract:

We propose a constructed-choice model for general decision making. The model departs from utility theory and prospect theory in its treatment of multiple goals and it suggests several different ways in which context can affect choice.

It is particularly instructive to apply this model to protective decisions, which are often puzzling. Among other anomalies, people insure against non-catastrophic events, underinsure against catastrophic risks, and allow extraneous factors to influence insurance purchases and other protective decisions. Neither expected-utility theory nor prospect theory can explain these anomalies satisfactorily. To apply this model to the above anomalies, we consider many different insurance-related goals, organized in a taxonomy, and we consider the effects of context on goals, resources, plans and decision rules.

The paper concludes by suggesting some prescriptions for improving individual decision making with respect to protective measures.

Going to their paper, Table 1 shows the classical decision-analysis framework, and Table 2 shows the new model, which I agree is better. I want to try to apply it to our problem of digging low-arsenic wells for drinking water in Bangladesh.

Is vs. should

I have a couple of qualms about Dave's approach, though, which involve distinguishing between descriptive and normative concerns. This comes up in all models of decision making: on one hand, you can't tell people what to do (at best, you can point out inconsistencies in their decisions or preferences), but on the other hand these theories are supposed to provide guidance, not just descriptions of our flawed processes.

Anyway, I'm not so thrilled with goals such as in Krantz and Kunreuther's Table 5, of "avoid regretting a modest loss." The whole business of including "regret" in a decision model has always seemed to me to be too clever by half. Especially given all the recent research on the difficulties of anticipating future regret. I'd rather focus on more stably-measurable outcomes.

Also, Figure 4 is a bit scary to me. All those words in different sizes! It looks like one of those "outsider art" things:

krantzmap.png

In all seriousness, though, I think this paper is great. The only model of decision making I've seen that has the potential to make sense.

Need a better name

But I wish they wouldn't call their model "Aristotelian." As a former physics student, I don't have much respect for Aristotle, who seems to have gotten just about everything wrong. Can't they come up with a Galilean model?

Tom Moertel writes,

I wanted to let you know that I have packaged the CRAN "arm" package and its prerequisite R packages for Fedora Linux, making it easy to install using Fedora's integrated package-management system. Normally, installing "arm" is somewhat tricky because of the dependency upon on R2WinBUGS, which requires BRugs, which doesn't currently build on Linux. (I got around this problem by making R2WinBUGS not require BRugs, which is the best work-around I could come up with. The resulting functionality is therefore incomplete, but at least Linux users get the functionality that is possible on their platform rather than a failed installation.)

I don't know how many of your students or readers use Fedora Linux, but if any do, feel free to point them to my packages: http://blog.moertel.com/articles/2007/04/25/new-fedora-core-rpms-for-cran-packages

Just a warning: we've been updating "arm" occasionally, mostly improvements to bayesglm() and fixing the calls to lmer().

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48