Correlations and absolute differences

“Perpetually Statistically Curious” writes:

Say you have two variables, Y1 and Y2, whose correlation depends on the value of a third dichotomous variable, X. Now say you take the absolute value of the difference between Y1 and Y2, and regress that absolute difference on the dichotomous (indicator) variable, X. My sense is that the expected value of the coefficient for the variable X in the regression would be related in a deterministic way with the gap between the correlations between Y1 and Y2 at the different values of X. But how?

This comes up in research on identical and fraternal twins, where the chief research interest is in the degree of similarity on some trait between identical twins relative to similarity on some trait between fraternal twins.

So for example, if the correlation between identical twins on some behavior was 0.5, and the correlation between fraternal twins on that same behavior was 0.25, then the rough heritability estimate for that trait is twice the difference between the correlations, or 0.5. But say you were using a different methodology, regressing the absolute value of the difference between the co-twins on a dummy variable for zygosity. I would think that the expected value of the coefficient for that dummy variable could be derived from the difference in correlations, without ever running the regression. But how?

My reply: The relation between |y1-y2| and corr(y1,y2) depends on the model. In some contexts it’s easier to work with the differences. For example, in spatial statistics, we work with variograms rather than correlation functions. Correlations are defined based on variances, which in spatial processes aren’t well defined. In your example, though, I agree that analyses based on differences and correlations should be essentially the same. If they’re continuous measures, though, the exact correspondence between the two analyses will depend on the distribution function.

P.S. People sometimes ask why I answer just about any statistical question that comes in. My quick reply is that it’s easier to reply than to say no. And, once I’m replying, I might as well put it on the blog to share with others. Also, it’s good discipline–it makes me work a little to have to think about something new.

1 thought on “Correlations and absolute differences

  1. It's also worth pointing out that, in the twin studies case, the regression depends on the variance among twins, which reduces to (y1-y2)^2 terms.

    Variances make more sense to use in twin studies because they can be directly related to the biological models. This was all started by Fisher in 1918, where as a side-product he invented the term analysis of variance.

Comments are closed.