Prior distribution for design effects

David Shor writes:

I’m fitting a state-space model right now that estimates the “design effect” of individual pollsters (Ratio of poll variance to that predicted by perfect random sampling). What would be a good prior distribution for that?

My quickest suggestion is start with something simple, such as a uniform from 1 to 10, and then to move to something hierarchical, such as a lognormal on (design.effect – 1), with the hyperparameters estimated from data.

My longer suggestion is to take things apart. What exactly do you mean by “design effect”? There are lots of things going on, both in sampling error (the classical “design effect” that comes from cluster sampling, stratification, weighting, etc.) and nonsampling error (nonresponse bias, likeliy voter screening, bad questions, etc.) It would be best if you could model both pieces.

4 thoughts on “Prior distribution for design effects

  1. This isn't really a "design effect" (some of the sampling literature calls it a "misspecification effect" for the ratio of the actual sampling variance to the misspecified variance estimator; the classical design effect is just the ratio of the sampling variance for a complex estimator to the correct sampling variance of a SRS from the same population). For the sort of polls you're considered (mostly RDD, with low response rates), I'd argue the designs are pretty close to SRS, but a good deal of weighting occurs to reduce nonresponse bias.

    The big problem you face is that pollsters normally do not report the variances of their weights and use different (mostly ad hoc) procedures for trimming. To a first approximation, the squared error is

    E(poll – actual)^2 = (1 + cv(weights)) * (1/n) + bias^2

    You could view the squared bias as an attribute of the pollster (a pollster random effect) and try using this as the basis for a regression, as Steve Ansolabehere and I did a few years ago.

  2. Doug:

    Your suggestion about viewing bias as an attribute of the pollster makes sense and is, I think, consistent with what Nate Silver has been doing recently.

    Regarding the definition of design effect, I think that it's not so easy to separate design from nonresponse corrections, which is why I prefer to handle them together (see, for example, my papers with Tom Little, Hao Lu, and others, along with my more recent Struggles paper).

    To me, the clearest example of the tangle between the two sorts of adjustments is for household size, where the adjustment for sampling bias goes in the opposite direction as the adjustment for nonresponse bias. In this particular setting, I see no advantage whatsoever to even thinking about sampling bias or pure design effects (in your sense), since it all gets overwritten by the poststratification.

  3. Sure, you should model both design and nonresponse simultaneously. But, if you buy into the design-based religion, models are illegitimate and can never be trusted.

    For example, consider the following procedure (the example is real!): households are selected by RDD (so if you ignore nonresponse, the selection probabilities are proportional to the ratio of the number of phone lines to the number of persons); within household, you select males 75% of the time and females 25% of the time. The resulting sample is close to 50/50 male/female, but, in any event, is post-stratified by gender with the base weights assumed to be one. I'd call this a model-based procedure, where the model is y = a + b * female + e (with y being any survey variable, female is an indicator for women, and e assumed to have mean zero and constant variance).

    A pure design-based approach would insist that the weights should be proportional to the reciprocal of the gender-based selection probability and the RDD base weight (# of phone lines/# of persons in the HH). This will give you horrible results (because of differential non-response by gender and the negative correlation you mentioned between # of persons and nonresponse. Usually, they bow to reality and poststratify by gender, but using the base weight. This is probably worse than the preceding method (which assumes the base weights are one), and it's not obvious exactly what model it corresponds to.

    Your approach, at least as I'd interpret it, is that they should also include household size, # of phone line, and probably some interactions of these with gender, in the regression, and, if the model gets too complicated, to try to shrink these a bit. It would be interesting to see an empirical comparison of the three approaches.

  4. Doug:

    You can look at my article for the numbers, but the short answer is, if you're going to adjust for household size, you can do a lot better than the so-called design weight. For political surveys, hh size adjustment isn't crucial, because the relation between hh size and political attitudes is nonlinear (people in households with 1 and 3+ adults are more likely to be Democrats, people in households with 2 adults are more likely to be Republicans), but for other purposes, it's not such a great idea.

    In that paper and in the Struggles paper we talk about various examples where the pure design-based approach makes no sense.

Comments are closed.