Convergent interviewing and Markov chain simulation

Bill Harris writes,

MCMC is a technique to sample higher-dimensional spaces efficiently. By using Markov chains to select the next sample point, MCMC gathers information about important parts of that space when purely random sampling would likely fail to hit any points of interest.

Convergent interviewing is a way to select the
next person or people to interview and the next questions to use when gathering information from a group of people. It “combines some of the features of structured and unstructured interviews, and uses a systematic process to refine the information collected.”

In particular, people are selected by a simple process:

Decide the person “most representative” of the population. She will be the first person interviewed. Then nominate the person “next most representative, but in other respects as unlike the first person as possible”; then the person “next most representative, but unlike the first two” … And so on. This sounds “fuzzy”; but in practice most people use it quite easily.

Each person is asked largely “content-free” questions on the general topic at hand. Probe questions are added to later questions to test the extent of apparent agreement between people and to explain apparent disagreements.

At first glance, there seems to be a metaphorical similarity between the two processes, as both seek to extract desired information from a high-dimensional space in reasonable time with a guided sampling process that may or may not converge.

I sometimes wonder if there might not even be a deeper connection, although I’m not sufficiently educated in Gibbs sampling and the like yet to be able to test that conjecture.

My response: Regarding MCMC, there has been some stuff written on “antithetical sampling” (I think that’s what they call it) where there is a deliberate effort to make new samples different from earlier samples. There’s also hybrid sampling, or Hamiltonian dynamics sampling, which Radford Neal has written about (extending methods that have been used in computational physics), which tries to move faster through parameter space.

Regarding convergent interviewing, the key idea seems to be the technique of the interview itself. (I can give my own disclaimer here which is that I’ve never done a personal interview of this sort, so I’m just speculating based on books I’ve read and conversations I’ve had with experts.) The sampling method seems fine. In practice the real worry is getting people who are too much alike, thus an extra effort is made to get people who are different. This makes sense to me. In practice I suspect it probably won’t be better than sampling random people from the population (unless n is really small), but in many settings you can’t really get a random sample, so it sounds like a good idea to intentionally diversify. Another approach is to use network sampling and use statistical methods to correct for sampling biases (as in Heckathorn et al.’s work here).

5 thoughts on “Convergent interviewing and Markov chain simulation

  1. This convergent interviewing sounds a lot like a problem we were trying to solve for an information retrieval task. In our problem, we wanted to find not only representative documents given a search query, but also want the ranking to be diverse in the sense that two top ranking documents would highlight different aspects of the search query. We used a technique based on random walks with absorbing states to introduce diversity.

    I'm interested to learn what techniques are used to introduce diversity in the choice of people when doing convergent interviewing? And how do you measure similarity between people?

  2. Andrew,

    Thanks for posting this. I work in township government in Michigan, where we do land-use planning, and one pervasive cancer is the industry of non-qualified — in my opinion, of course — pollsters performing public-opinion surveys with no discernible attempt at any sort of validity. While the methods I've seen appear universally shoddy to me, even the basic idea of proper sampling appears to be ignored.

    Part of the problem, I personally believe, is the lack of error correction: there's no Dewey Defeats Truman when asking a community about land-use issues; the problem is multi-dimensional indeed, covering lot sizes, landscaping requirements, signs, outdoor lighting, farmland, wetlands, shorelines, traffic, cell towers, views, garden sheds, and so on. Since people become very passionate when politics gets down to telling them whether they can have a garden shed in their side yard, and since this level of politics does indeed determine such personal issues, bad opinion surveys are worse than useless.

    I'm curious to learn more about this topic. I've never heard of it before, and it sounds interesting. Of course, I could be wrong about the current state of affairs; I'm merely reporting my perception. Nonetheless, even improving good methods is beneficial.

  3. Andrew, thanks for your elaboration on other, possibly related by purpose sampling techniques. In what I've done and what I've read, I gather that the purpose is to begin to make sense of situations where a good random sample might be hard to obtain (randomizing might be hard to do, or sufficiently large n might be expensive) and where it's not yet clear what questions should be asked (in John's example, is it the garden shed in the side yard, or is it something else that's really bothering folks — if you only ask about garden sheds, you may never know, and if you have a structured survey about everything, people may get bored and stop answering).

    As for Jurgen's questions about selecting people, see step 5 of the document Andrew linked to at the start of his article.

  4. Bill Harris—

    I want to say thanks for the way you paraphrased my remark. I had assumed that the opinion space could be more-or-less defined a priori; however, you reminded me that in land-use, at least, not only are we dealing with myriad dimensions of opinion, but also opinions which one may not even imagine when designing the interview questions.

    Previously I had only thought of question crafting in terms of avoiding bias (some "surveys" for land-use have questions bordering on push-polling). But there's so much more than that, isn't there?

    I've yet to follow through on looking into this topic. I need to make good on that goal.

Comments are closed.