Context is important: a question I don’t know how to answer, leading to general thoughts about consulting

Someone writes in with a question that I can’t answer but which reminds me of a general point about interactions between statisticians and others.

It seems that there should be an easy answer to this question yet I cannot find a satisfactory one (aka one that satisfies my committee).

1) I have a vector of values V and I find the median of it m.
2) I feed vector V to a simulation and it returns a value for each iteration of the simulation r_t which is stacked into a vector R.
3) I take the mean of R which is r_bar

I want to now be able to compare m and r_bar. I want to be able to say if they are statistically different.

V cannot be assumed to be normal, and the simulation is stochastic, but not random.

Currently I am constructing a confidence interval around r_bar as:

r_bar +/- 1.96*sd(R)

But this does not seem right considering I cannot assume normality of the original “data: nor can I assume the simulation amounts to “random sampling”.

What would you recommend?

My reply: I’m sorry but I don’t understand what you are asking. My generic advice is that it’s hard to solve such problems without having more information on the context. This happens all the time to statisticians, that people try to help us out by giving us what is essentially a probability problem, stripping out all content. But almost always a good answer depends on what these random variables actually represent.

Once when I was teaching at Chicago I overheard a discussion of some students and faculty about some consulting problem that had come in, something about estimating the probability of a sequence of successive “heads” in some specified number of coin flips. It turned out to be for the goal of computing a p-value, and looking into the example more, it also became clear that this was not a very good analysis of these particular data.

A tough balance

It’s a tough balance being a statistician: we can’t go to the experimenters and start bossing them around–we have to respect what they’re saying–but we also have to extract from them what is really important and cause them to question their statistical assumptions. I’ve seen statistical consultants err in both directions: either basically ignoring the client and trying to cram every problem into some narrow methodological framework, or taking all the client’s words too seriously and becoming a technician, applying an inappropriate method without question.

3 thoughts on “Context is important: a question I don’t know how to answer, leading to general thoughts about consulting

  1. Thanks for answering. When I present the problem I do try to strip it of any unnecessary detail to keep people from being overwhelmed.

    Do;t feel obligated to answer, but here is a little more detail.

    What I am doing is this:

    1) Creating a vector of voters with ideal points [0,100] calibrated from income data.

    2) I split them into districts using an algorithm.

    3) I let each district "elect" a legislature (median voter of district).

    4) Then I take the median of the legislature.

    Steps 2-4 are done 1000 times using the same algorithm. I take the mean of this 1000 times as the legislative outcome. What I want to compare is the legislative outcome and the outcome found if I just take the median of the vector of voters.

  2. Around 15 years ago I was working in a statistical consulting group in a large toxicological research laboratory. A PI came to me with some data from an in vitro assay, involving the closure of rat palates that had been removed from rat fetuses and grown in culture. The point was to add toxicants to the culture in different concentrations, and model the concentration-response. The data she brought me involved counts of closed versus non-closed palates after some time period.

    The data indicated a phenomenal lack of effect of treatment on palate closure, but when I gave the PI my analysis, she said "I'm sorry, but I know there is a big effect here; maybe the statistics cannot show it, but it is there." The dissonance between what I saw in the data and the degree of her belief (and the fact that she has a reputation as a careful investigator) triggered me to ask what she had seen that had so convinced her.

    It turns out that in her lab, they score each palate as open, closed anteriorly, or closed posteriorly. However, a statistician she had worked at in the past (in another institution) had told her they could not deal with a multinomial response, so, before she gave me the data, she collapsed the two "closed" categories. In her hands, in this assay, normally you get (I've forgotten the details, so this may be reversed) anterior closure. What happened in this data set was the fraction of palates that closed anteriorly decreased with increasing dose, while the fraction with posterior closure (which almost never happens) increased with increasing dose; the two processes balanced each other out [it may be even that posterior closure prevented anterior closure in an individual palate]; this explanation was clearly born out by the data.

    Sorry for the long comment, but that experience has burned into my own practice a couple of principles: 1) experienced scientists generally know what they are doing — take them seriously, even if that does not make sense at first; 2) Always probe about what was actually measured, and try to understand the experiment in the scientist's terms.

  3. It seems to me that if you're a person looking for a consultant, you should feel confident giving the consultant all the information, and trust the consultant to separate the important from the unimportant. Isn't that sort of expertise part of what one is paying for in that situation?

Comments are closed.