I had a couple of email exchanges with Jan-Emmanuel De Neve and James Fowler, two of the authors of the article on the gene that is associated with life satisfaction which we blogged the other day. (Bruno Frey, the third author of the article in question, is out of town according to his email.) Fowler also commented directly on the blog.
I won't go through all the details, but now I have a better sense of what's going on. (Thanks, Jan and James!) Here's my current understanding:
1. The original manuscript was divided into two parts: an article by De Neve alone published in the Journal of Human Genetics, and an article by De Neve, Fowler, Frey, and Nicholas Christakis submitted to Econometrica. The latter paper repeats the analysis from the Adolescent Health survey and also replicates with data from the Framingham heart study (hence Christakis's involvement).
The Framingham study measures a slightly different gene and uses a slightly life-satisfaction question compared to the Adolescent Health survey, but De Neve et al. argue that they're close enough for the study to be considered a replication. I haven't tried to evaluate this particular claim but it seems plausible enough. They find an association with p-value of exactly 0.05. That was close! (For some reason they don't control for ethnicity in their Framingham analysis--maybe that would pull the p-value to 0.051 or something like that?)
2. Their gene is correlated with life satisfaction in their data and the correlation is statistically significant. The key to getting statistical significance is to treat life satisfaction as a continuous response rather than to pull out the highest category and call it a binary variable. I have no problem with their choice; in general I prefer to treat ordered survey responses as continuous rather than discarding information by combining categories.
3. But given their choice of a continuous measure, I think it would be better for the researchers to stick with it and present results as points on the 1-5 scale. From their main regression analysis on the Adolescent Health data, they estimate the effect of having two (compared to zero) "good" alleles as 0.12 (+/- 0.05) on a 1-5 scale. That's what I think they should report, rather than trying to use simulation to wrestle this into a claim about the probability of describing oneself as "very satisfied."
They claim that having the two alleles increases the probability of describing oneself as "very satisfied" by 17%. That's not 17 percentage points, it's 17%, thus increasing the probability from 41% to 1.17*41% = 48%. This isn't quite the 46% that's in the data but I suppose the extra 2% comes from the regression adjustment. Still, I don't see this as so helpful. I think they'd be better off simply describing the estimated improvement as 0.1 on a 1-5 scale. If you really really want to describe the result for a particular category, I prefer percentage points rather than percentages.
4. Another advantage as describing the result as 0.1 on a 1-5 scale is that it is more consistent with intuitive notions of 1% of variance explained. It's good they have this 1% in their article--I should present such R-squared summaries in my own work, to give a perspective on the sizes of the effects that I find.
5. I suspect the estimated effect of 0.1 is an overestimate. I say this for the usual reason, discussed often on this blog, that statistically significant findings, by their very nature, tend to be overestimates. I've sometimes called this the statistical significance filter, although "hurdle" might be a more appropriate term.
6. Along with the 17% number comes a claim that having one allele gives an 8% increase. 8% is half of 17% (subject to rounding) and, indeed, their estimate for the one-allele case comes from their fitted linear model. That's fine--but the data aren't really informative about the one-allele case! I mean, sure, the data are perfectly consistent with the linear model, but the nature of leverage is such that you really don't get a good estimate on the curvature of the dose-response function. (See my 2000 Biostatistics paper for a general review of this point.) The one-allele estimate is entirely model-based. It's fine, but I'd much prefer simply giving the two-allele estimate and then saying that the data are consistent with a linear model, rather than presenting the one-allele estimate as a separate number.
7. The news reports were indeed horribly exaggerated. No fault of the authors but still something to worry about. The Independent's article was titled, "Discovered: the genetic secret of a happy life," and the Telegraph's was not much better: "A "happiness gene" which has a strong influence on how satisfied people are with their lives, has been discovered." An effect of 0.1 on a 1-5 scale: an influence, sure, but a "strong" influence?
8. There was some confusion with conditional probabilities that made its way into the reports as well. From the Telegraph:
The results showed that a much higher proportion of those with the efficient (long-long) version of the gene were either very satisfied (35 per cent) or satisfied (34 per cent) with their life - compared to 19 per cent in both categories for those with the less efficient (short-short) form.
After looking at the articles carefully and having an email exchange with De Neve, I can assure you that the above quote is indeed wrong, which is really too bad because it was an attempted correction of an earlier mistake. The correct numbers are not 35, 34, 19, 19. Rather, they are 41, 46, 37, 44. A much less dramatic difference: changes of 4% and 2% rather than 18% and 15%. The Telegraph reporter was giving P(gene|happiness) rather than P(happiness|gene). What seems to have happened is that he misread Figure 2 in the Human Genetics paper. He then may have got stuck on the wrong track by expecting to see a difference of 17%.
9. The abstract for the Human Genetics paper reports a p-value of 0.01. But the baseline model (Model 1 in Table V of the Econometrica paper) reports a p-value of 0.02. The lower p-values are obtained by models that control for a big pile of intermediate outcomes.
10. In section 3 of the Econometrica paper, they compare identical to fraternal twins (from the Adolescent Health survey, it appears) and estimate that 33% of the variation in reported life satisfaction is explained by genes. As they say, this is roughly consistent with estimates of 50% or so from the literature. I bet their 33% has a big standard error, though: one clue is that the difference in correlations between identical and fraternal twins is barely statistically significant (at the 0.03 level, or, as they quaintly put it, 0.032). They also estimate 0% of the variation to be due to common environment, but again that 0% is gonna be a point estimate with a huge standard error.
I'm not saying that their twin analysis is wrong. To me the point of these estimates is to show that the Adolescent Health data are consistent with the literature on genes and happiness, thus supporting the decision to move on with the rest of their study. I don't take their point estimates of 33% and 0% seriously but it's good to know that the twin results go in the expected direction.
11. One thing that puzzles me is why De Neve et al. only studied one gene. I understand that this is the gene that they expected to relate to happiness and life satisfaction, but . . . given that it only explains 1% of the variation, there must be hundreds or thousands of genes involved. Why not look at lots and lots? At the very least, the distribution of estimates over a large sample of genes would give some sense of the variation that might be expected. I can't see the point of looking at just one gene, unless cost is a concern. Are other gene variants already recorded for the Adolescent Health and Framingham participants?
12. My struggles (and the news reporters' larger struggles) with the numbers in these articles makes me feel, even more strongly than before, the need for a suite of statistical methods for building from simple comparisons to more complicated regressions. (In case you're reading this, Bob and Matt3, I'm talking about the network of models.)
As researchers, transparency should be our goal. This is sometimes hindered by scientific journals' policies of brevity. You can end up having to remove lots of the details that make a result understandable.
13. De Neve concludes the Human Genetics article as follows:
There is no single ''happiness gene.' Instead, there is likely to be a set of genes whose expression, in combination with environmental factors, influences subjective well-being.
I would go even further. Accepting their claim that between one-third and one-half of the variation in happiness and life satisfaction is determined by genes, and accepting their estimate that this one gene explains as much as 1% of the variation, and considering that this gene was their #1 candidate (or at least a top contender) for the "happiness gene" . . . my guess is that the set of genes that influence subjective well-being is a very large number indeed! The above disclaimer doesn't seem disclaimery-enough to me, in that it seems to leave open the possibility that this "set of genes" might be just three or four. Hundreds or thousands seems more like it.
I'm reminded of the recent analysis that found that the simple approach of predicting child's height using a regression model given parents' average height performs much better than a method based on combining 54 genes.
14. Again, I'm not trying to present this as any sort of debunking, merely trying to fit these claims in with the rest of my understanding. I think it's great when social scientists and public health researchers can work together on this sort of study. I'm sure that in a couple of decades we'll have a much better understanding of genes and subjective well-being, but you have to start somewhere. This is a clean study that can be the basis for future research.
Hmmm . . . .could I publish this as a letter in the Journal of Human Genetics? Probably not, unfortunately.
P.S. You could do this all yourself! This and my earlier blog on the happiness gene study required no special knowledge of subject matter or statistics. All I did was tenaciously follow the numbers and pull and pull until I could see where all the claims were coming from. A statistics student, or even a journalist with a few spare hours, could do just as well. (Why I had a few spare hours to do this is another question. The higher procrastination, I call it.) I probably could've done better with some prior knowledge--I know next to nothing about genetics and not much about happiness surveys either--but I could get pretty far just tracking down the statistics (and, as noted, without any goal of debunking or any need to make a grand statement).
P.P.S. See comments for further background from De Neve and Fowler!
Recent Comments