Modeling sexual concurrency

Posted on March 5, 2006 6:47 AM by Andrew

Caroline Korves writes,

We have dyadic data on sexual partnerships between individuals and want to analyze data at the partnership (dyad) level. The data come from a structured social network questionnaire. The outcome measure of interest is sexual concurrency practiced by at least one member of the dyad (so if one or both partners have concurrent partners then response =1, otherwise response=0). We want to look at what factors are associated with concurrency, since concurrency drives the spread of HIV and other STIs. As an example, one factor might be whether or not there is an age difference of at least five years between partners.

Here is an example of how data appear– Individuals may appear in more than one partnership. The dyads in the database are as such, where a letter represents a distinct person:
Partnership 1: A-B
Partnership 2: C-D
Partnership 3: E-F*
Partnership 4: E-G*
Partnership 5: H-I
Partnership 6: J-K**
Partnership 7: J-L**

*Partnerships 3 and 4 are correlated because partner E appears in both
**Partnerships 6 and 7 are correlated because partner J appears in both

An important point–Individuals in other partnerships may report concurrent partners, but these partners were not included in the survey; therefore, even though these individuals appear in only one partnership above, they still have concurrent partners.

The question: How do I handle this correlation? Do I treat partnerships 3,4,6,7 as arising from a separate cluster as the others since these partnerships involve people who appear in more than one partnership in the study? Or, do we consider partnerships 3 and 4 arising from a separate cluster from partnerships 6 and 7?

My reply: I agree that the correlation is an issue–these are not independent measurements. If you know the time ordering of the partnerships, then one approach would be to perform a model of the probability that someone in an existing partnership forms a new partnership (before dissolving the first partnership, I suppose?). The data would then come in sequentially so the correlation wouldn’t be an issue. Then again, serial monogamy could also represent public health risks, so you might want to know about past partnerships too.

I’m not quite sure what would be the best way to model the data as a static graph; perhaps others have helpful suggestions. The work of Peter Hoff, Adrian Raftery, and Mark Handcock might be relevant: they focus on modeling whether two people are connected, not whether an individual person has two different connections, but the modeling ideas seem similar.

At a simpler level, you could just fit a logistic regression at the dyad level, get your estimates and standard errors, then adjust your standard errors using some sort of jackknifing on the individual persons in the study (that is, re-fitting the model, leaving each person out, then using the jackknife formula to compute s.e.’s).