“Genomics” vs. genetics

John Cook and Joseph Delaney point to an article by Yurii Aulchenko et al., who write:

54 loci showing strong statistical evidence for association to human height were described, providing us with potential genomic means of human height prediction. In a population-based study of 5748 people, we find that a 54-loci genomic profile explained 4-6% of the sex- and age-adjusted height variance, and had limited ability to discriminate tall/short people. . . .

In a family-based study of 550 people, with both parents having height measurements, we find that the Galtonian mid-parental prediction method explained 40% of the sex- and age-adjusted height variance, and showed high discriminative accuracy. . . .

The message is that the simple approach of predicting child’s height using a regression model given parents’ average height performs much better than the method they have based on combining 54 genes.

They also find that, if you start with the prediction based on parents’ heights and then throw the genetic profile information into the model, you can do better–but not much better. Parents’ height + genetic profile is only very slightly better, as a predictor, than parents’ height alone.

I have a few thoughts on this study.

1. The most important point, I think, is that made by Delaney: The predictive power of parents’ heights on child height is, presumably, itself mostly genetic in this population. Thus, the correct interpretation of the study is not that genetics doesn’t predict height, but that the particular technique described in the paper doesn’t work well. Galton’s predictor also uses a combination of genes.

2. How exactly did the researchers combine those 54 genes to get their predictor? I looked at their paper but couldn’t follow all the details. Here’s what they write:

The genomic profile, based on 54 recently identified loci, was computed as the sum of the number of height-increasing alleles carried by a person, similar to Weedon et al. This profile explained 3.8% of the sex- and age-adjusted variation of height in the Rotterdam Study (Figure 2a). We also estimated the upper explanatory limit of the 54-loci allelic profile by defining the profile as a weighted sum of height-increasing alleles, with weights proportional to the effects estimated in our own data using a multivariable model.

Is it possible that a savvier use of this genetic information could give a much better predictor? I have no idea.

3. The 5748 people in the study come from “a prospective cohort study that started in 1990 in Ommoord, a suburb of Rotterdam, among 10 994 men and women aged 55 and over.” In this homogeneous population (?), maybe these 54 genes don’t discriminate so well. But maybe things would look different if they were studying a more diverse group.

P.S. Usually I like to list all the authors of any articles I cite–but this one has 12 authors. C’mon!

11 thoughts on ““Genomics” vs. genetics

  1. Another advantage to using the parents heights is that the parents were likely subject to many of the same environmental factors, particularly in stable times.

    Presumably Galton's method would work well for predicting heights of children in North or South Korea. The two populations are genetically identical, though the people in the north are much shorter.

  2. Andrew, you are 'complaining' about a paper with 12 authors – but if you look at some of the genetics papers coming out lately you will find papers with 50+ authors. And the way things are moving soon there will be genetics papers with 200+ authors!!

  3. Maybe unobservables are driving differences in both parental height and child height, which accounts for the extra predictive power. i.e. my parents were born into low-income households with poor nutrition and/or were also overweight (which is connected with earlier pubescence and shorter growth spurts) and thus my household environment is likely to be similar, resulting in similar outcomes.

  4. Galton's mid-parent method would work just about as well in predicting the height of their biological kids who were reared by adoptive parents. So the superiority of the mid-parent method mostly comes from capturing all genetic effects, vs. the genomic method of singling out specific genes and looking at their additive effects.

    Another way to view this is as a split between the rationalistic or mechanistic approach vs. the empirical or black-box approach.

    Genomics specifies a mechanism of heredity — genes — and looks for which particular ones affect a trait, and in what way (weighted sum, interactions, etc.). It provides you with a specific reason why offspring resemble parents.

    The mid-parent approach should probably be called hereditarian rather than genetic, as Galton and the biometricians had no idea about genes and initially fought against the Mendelians. They couldn't care less about what the unit of heredity is, let alone which ones influence a trait, let alone in what form. Their findings "don't make sense" to a skeptic since they don't tell you how the thing works. But they are better at predicting real-world outcomes.

    The genomic approach is geared toward convincing intellectuals, who care about figuring things out. The hereditarian approach is more for use by actors making decisions in the real world — someone sizing up a potential mate to forecast what their kids would be like, farmers and herders who want to breed their animals in certain directions, etc.

  5. It's no well known in human genetics that gene studies like this one only pull out a small amount of the genetic variation (there are better methods of estimating genetic variance that mid-parent regression). So I don't think this is a great surprise.

    BTW, the term regression was, of course, invented by Galton when studying heredity. But perhaps less well known is that the methods used in this study to connect genes to phenotype have their beginnings in Fisher's 1918 paper where he showed how Mendelism could explain variation in continuous traits. This was also the paper where the term "analysis of variance" was introduced.

  6. There are methods in statistical epidemiology that in a sense try to 'separate' the contribution of shared environment between say, parents and their children, and the contribution of genetics itself. For example, you can measure how similar in height are Monozigotic twins to each other vs. how similar are Dizigotic twins (the assumption made here is that whatever environment that Monozygotic twins share, Dizigotic twins will share it as well.), and the difference can tell you what proportion of variance in height can be explained by genetics effects, even though you do not discover the relevant genes themselves.

    Using these type of methods, it is estimated that in western societies around 80% of the variance in height is genetic.

    It is not surprising that parent's height serves as a better predictor for child's height than the bunch of genes (actually SNPs) discovered so far.
    The 54 loci mentioned above are the ones found to have the strongest effects on height, thus having the lowest p-values, passing multiple comparison correction etc.
    Height, like many other complex traits, is polygenic, and one or a few genes will not explain the majority of variance in this trait.
    There are most probably numerous other such
    SNPs which affect height but they typically have smaller effect sizes and thus current studies did not have enough statistical power to discover them. Discovering additional SNPs, as well as modeling correctly interactions between them
    will likely improve predictive power – in a sense one of the long-term goals of these kind of studies is to find all genes related to a specific traits and explain all genetic variance (i.e. for height reaching an ability to explain ~80% of variance). A useful benchmark to aspire for is how well can you predict a trait for a person given that you know the trait's value for his/her Monozygotic twin (since they essentially share their entire genetic information, you cannot hope to do better than that.)
    Another goal, not less important, of this type of studies is to simply find the genes and pathways influencing a given trait. So even if finding these genes does not improve predictive power and will thus not be very interesting from a statistician's point of view, they may be very valuable to the basic understanding the molecular pathways affecting the trait, for developing drugs which interact with these genes (this is of course more relevant for traits other than height such as, say BMI or certain diseases) etc.

  7. If you tossed in uncles', aunts', and grandparents' heights too, that would make the Galtonian hereditarian system even more accurate.

    The general point is that the new technologies, such as genome scans and brain scans, are very slowly working our way back to a level of predictive accuracy that was already been achieved through much simpler methods, typically methods pioneered by Galton and his followers.

  8. @Or Zuk: I would never argue that there is not value in genetic studies to learn more about pathways. SNPs that are detectable predictors of the outcome in genome-wide association can reveal new biological pathways that we can predict.

    It's pretty valuable stuff. The one paper I participated in that was a GWAS was extremely interesting (even if I'd never use the SNP we found for prediction). But I see the second goal that you articulated as the main contribution to science that GWAS is making (and it is a huge one) whereas I am a but less convinced traits that are hard to measure (i.e. outcomes) are tractable problems due to measurement error in the outcomes themselves.

    But it is a really interesting area of research!

  9. The idea isn't to predict people's phenotype from their genotype — if someone says that's what they're doing, hold on firmly to your wallet. The idea is to screen genes (well, SNPs) as targets for basic molecular biology research.

  10. BTW this just came out in nature yesterday:
    http://www.nature.com/nature/journal/vaop/ncurren

    So number of loci found has increased from 54 to 180, and variance explained has increased to ~10%.
    This highlights two points mentioned in comments above:

    1. Progress achieved mainly by increased sample size due to a meta-analysis thus having greater statistical power.

    2. As more loci are found, they begin to fall into pathways (a more precise statement is that one starts seeing statistically significant enrichment in several pathways) which should give new insights into the basic underlying molecular biology.

Comments are closed.