Series of p-values

A finance professor writes,

I am currently working on a project and am looking for a test. Unfortunately, none of my colleagues can answer my question. I have a series of regressions of the form Y= a + b1*X1 + b2*X2. I am attempting to test whether the restriction b1=b2 is valid over all regressions. So far, I have an F-test based on the restriction for each regression, and also the associated p-value for each regression (there are approximately 600 individual regressions). So far, so good.

Is there a way to test whether the restriction is valid “on average”? I had thought of treating the p-values as uniformy distributed and testing them against a null hypothesis that the mean p-value is some level (i.e. 5%).

I figure that there should be a better way. I recall someone saying that a sum of uniformly distributed random variates is distribted Chi-squared (or was that a sum of squared uniforms?). In either case, I can’t find a reference.

My response: if the key question is comparing b1 to b2, I’d reparameterize as follows:
y = a + B1*z1 + B2*z2 + error, where z1=(X1+X2)/2, and z2=(X1-X2)/2. (as discussed here)
Now you’re comparing B2 to zero, which is more straightforward–no need for F-tests, you can just look at the confidence intervals for B2 in each case. And you can work with estimated regression coefficients (which are clean) rather than p-values (which are ugly).

At this point I’d plot the estimates and se’s vs. some group-level explanatory variable characterizing the 600 regressions. (That’s the “secret weapon.”) More formal steps would include running a regression of the estimated B2’s on relevant group-level predictors. (Yes, if you have 600 cases, you certainly must have some group-level predictors.) And the next step, of course, is a multilevel model. But at this point I think you’ve probably already solved your immediate problem.

3 thoughts on “Series of p-values

  1. Andrew,
    To start and maybe be useful, the random variable Y = -2*ln(X), where X is uniformly distributed zero to one, is distributed as a Chi-square of 2 degrees of freedom. Thus if you sum M of these independent Y’s, you will have a Chi-square of 2M degrees of freedom, and I presume it could be used to test a hypothesis on some average tendency of the p-values. [The sum of two uniforms is a triangular distribution – I think. I suppose one could figure out the distribution of the sum of M uniforms is a similar manner – I am not certain that it is a big triangular distribution, and I am getting too old to do that algebra anymore.]

    However, and maybe I am missing the point, but why isn’t a straightforward, albeit big, Wald test applicable? There M equations in N(i) observations. Run them with and without the restrictions. Take the standard SSE_R and SSE_U (sum squared errors restricted and unrestricted) which are standard OLS output (or easy to get if the regressions are done in the user’s code). And then form the big Wald F as the sum of all the SSE_R-SSE_U’s over the sum of the SSE_U’s (with the appropriate degrees of freedom). If the dependent variables are independent, I don’t see why that doesn’t work.

    If the dependent variables are not independent, there are problems, but those problems apply to looking a most cross-equation estimates. Note I am also assuming that the equations are not some set of simultaneous equations, but testing there is covered in the tests of simultaneous equation estimates (whose finite sample properties can be problematic).

    The big Wald test doesn’t address the grouping relationship, but the problem as stated simply asked if the restriction was valid in some kind of average. I think that the Wald test will answer that although it may be a good idea to jackknife the sample of SSE_R’s and SSE_U’s for 1, 2, 3 or something like that to see if a rejection is controlled by 1, 2, 3 or so individual equations.

    Marty

  2. Marty,

    Sure, all these things are possible, but I think they take you away from the substance of the problem and toward uninteresting "theoretical statistics" issues. I prefer my approach of looking at (and possibly modeling) the B2's, because this puts things more on the scale of the comparisons of interest.

  3. Nothing wrong with either of these suggestions, but you might also consider a Kolmogorov-Smirnov test of the p-values against a uniform. Just plotting the cumulative of the p-values would probably tell you a lot, particularly (as Andrew implicitly mentions) appended with some group-level data.

Comments are closed.