Overestimating the magnitude of parameters when variation is high

“P” sends along a link to this paper by Sebastian Zollner and Jonathan Pritchard:

Genomewide association studies are now a widely used approach in the search for loci that affect complex traits. After detection of significant association, estimates of penetrance and allele-frequency parameters for the associated variant indicate the importance of that variant and facilitate the planning of replication studies. However, when these estimates are based on the original data used to detect the variant, the results are affected by an ascertainment bias known as the “winner’s curse.” The actual genetic effect is typically smaller than its estimate. This overestimation of the genetic effect may cause replication studies to fail because the necessary sample size is underestimated. Here, we present an approach that corrects for the ascertainment bias and generates an estimate of the frequency of a variant and its penetrance parameters. The method produces a point estimate and confidence region for the parameter estimates. We study the performance of this method using simulated data sets and show that it is possible to greatly reduce the bias in the parameter estimates, even when the original association study had low power.

This is the statistical phenomenon we were writing about here in the context of sex-ratio studies: when sample sizes are small, “statistically significant” estimates will tend to be much larger than the true parameters being estimated. Sollner and Pritchard have a statistical correction procedure, conditioning on the observation of a statistically significant result. I think a Bayesian approach would work better–this might be worth trying on Sollner and Pritchard’s examples.