Judge Alito and the use of statistics in racial discrimination cases, well no, actually a technical point about hypothesis testing in 2-way tables

| 4 Comments

Jim Greiner has an interesting note on the use of statistics in racial discrimination cases. As both a lawyer and a statistician, Jim has a more complete perspective on these issues than most people have. I won't comment on the substance of Jim's comments (basically, he claims that the statistical analyses in these cases, on both sides, are so crude that judges can pretty much ignore the quantitative evidence when making their decisions) since I know nothing about the case in question. But I do have a technical point, which in fact has nothing really to do with racial discrimination and everything to do with statistical hypothesis testing.

Jim writes,

The facts of the specific case, which concerned the potential use of race in preemptory challenges in a death penalty trial, are less important than Judge Alito's approach to statistics and the burden of proof.

Schematically, the facts of the case follow this pattern: Party A has the burden of proof on an issue concerning race. Party A produces some numbers that look funny, meaning instinctively unlikely in a race-neutral world, but conducts no significance test or other formal statistical analysis. The opposing side, Party B, doesn't respond at all, or if it does respond, it simply points out that a million different factors could explain the funny-looking numbers. Party B does not attempt to show that such innocent factors actually do explain the observed numbers, just that they could, and that Party A has failed to eliminate all such alternative explanations.

. . .

Is there a middle way? Perhaps. In the above situation, what about requiring some sort of significance test from Party A, but not one that eliminates alternative explanations? In the specific facts of Riley, the number-crunching necessary for "some sort of significance test" is the statistical equivalent of riding a tricycle: a two-by-two hypergeometric with row totals of 71 whites and 8 blacks, column totals of 31 strikes and 48 non-strikes, and an observed value of 8 black strikes yields a p-value of 0.

OK, now my little technical comment. I don't think the hypergeometric distribution is appropriate since it conditoins on both margins. The relevant margin to condition on is the number of whites and blacks, since that was determined before the lawyers got to the problem. In a hypothesis-testing framework in which p-values represent the probability of various hypothetical alternatives (this is the framework I like, it can be interpreted classically or Bayesianly). To put it another way, the so-called Fisher exact test isn't really "exact" at all.

This is just a rant I go on occasionally, really has nothing to do with Jim's note except that it reminded me of the issue. For the fuller version of this argument, see Section 3.3 of my paper on Bayesian goodness-of-fit testing in the International Statistical Review. Also, Jasjeet Sekhon wrote a paper recently on the same topic.

For Jim's specific example, I'd be happy just doing a chi-squared test with 1 degree of freedom. His calculation is fine too--the hypergeometric is a reasonable approximation to a Bayesian posterior p-value with noninformative prior distribution.

P.S.

See also this item in Chance News.

4 Comments

There's also the issue of endogeneity, if the lawyer doing the striking knows such a test will be applied, they can respond by pushing right up to the significance level. This gaming, of course, ruins all the assumptions implicit in our tests of significance.

As someone who has testfied a number of times using the Fisher Exact Test, I would note several things: first, both margins are fixed more often than you think. In lots of termination cases, the racial composition of the employees is obviously fixed, but often so is the percentage of workers to be fired. Even where that isn't quite true, it is often "true enough." In an observational study, the second fixed margin is often a matter more of rhetoric than anything else. Second, the Fisher Exact Test is easy to describe to laymen (hint: think of urns and avoid the word "hypergeometric"). I agree that integration over an uninformed prior would in most cases give the same answer, but walking into court with the word "prior" in your report, even an uninformed one, is, for better or worse, if not a death sentence for your expert report, liable to lead to a long painful pathway for your client which contributes nothing to his cause. Third, Fisher Exact tests are easy to stratify over relevant causal strata using StatXact. Fourth, Fisher Exact tests are exact in small and/or unbalanced samples, Chi squared statistics are not.

In response to Patrick above, and to add one point I didn't mention before, the issue of the coverage of the FIsher Exact Test is an issue only for those for whom statistical significance Type I errors are written in stone (I wrote the previous comment before I read the Sekhon paper.) Calculate p values, which are exact and have no coverage issues at all, explain your results to the finder of fact (judge or jury) and patiently explain why the 5 percent significance level is inappropriate outside of Neyman-Pearson testing, inappropriate in a legal context, and not properly the subject of expert witness testimony. Then Patrick's claim falls as well.

Jonathan,

Good points; thanks. I've worked on some legal cases but never actually testified. (But I was on jury duty once for a slip-and-fall...)

Leave a comment

Subscribe to Entry

Recent Comments

  • Andrew: Jonathan, Good points; thanks. I've worked on some legal cases read more
  • Jonathan: In response to Patrick above, and to add one point read more
  • Jonathan: As someone who has testfied a number of times using read more
  • Patrick: There's also the issue of endogeneity, if the lawyer doing read more