Lazy ways of modeling proportions

Posted on September 21, 2009 3:25 AM by Andrew

Andrew Therriault writes:

I’m creating a model of issue emphasis in political campaigns as a product of public opinion (so candidates choose what to discuss strategically based on which issue will help them most), and the data I’m using combines candidates’ ad spending (coded by issue) with the public’s issue positions in the candidates’ districts. Thus far, I’ve used percentage of ad spending per issue for each candidate as my DV in OLS and tobit models. I know that this specification is not optimal, though, because of the correlation between each candidate’s observations (since they are constrained to sum to 100).

One alternative I’ve been thinking of is to switch the unit of analysis to each ad airing, and use multinomial logit with the DV being a categorical variable reflecting the issue emphasized. I worry, though, that the SEs would be off because I would basically be creating many duplicate observations for every ad that’s aired multiple times. There’s also the issue of each candidate’s ads being correlated with each other (since we’d expect more variance in strategy *between* campaigns than *within* them). Does this call for a multilevel model, or is there something else you might recommend instead?

Thanks for any advice you can give.

My advice:

First, do a global search-and-replace to change “DV” to “outcome” and to change “OLS” to “linear regression.”

Now, on to the specifics: I’m lazy, so one thing I might try is a logit sort of thing, either considering each issue compared to all the others, or by bifurcating the issue space, so that you can first consider Type A vs Type B, then Type A1 vs Type A2, etc.

P.S. to those applying to Ph.D. programs: If you come to Columbia you can get this kind of cranky advice whenever you you want. And you don’t even have to wait for it to show up on the blog!