Some thoughts on connections between biostatistics and statistics, prompted by an announcement for a meeting that I won’t be able to attend

This looks interesting. Yi Li writes of a panel discussion at the Harvard biostatistics department. My own thoughts are below; first here’s Li’s description. There’s some good stuff:

1) Should Biostatistics continue to be a separate discipline from Statistics? Should Departments of Biostatistics and Statistics merge. In other words, are we seeing a convergence of biostatistics and statistics? Biostatisticians develop statistical methodology, statisticians are getting involved in biological/clinical data. Even at Harvard we are considering moving closer to Cambridge, and some say that the move might lead to the eventual merge of the biostat and stat departments. What are your thoughts on the division between the disciplines of stat and biostat in general, whether it is widening or closing, and how it may affect our careers and career choices, especially for starting faculty, postdocs, students?

2) How should stats/biostats as a field respond to the increasing development of statistical and related methods by non-statisticians, in particular computer scientists?

It strikes me [Li] to some extent that statisticians get involved in applied problems in a rather arbitrary fashion based on haphazard personal connections and whether the statistician’s personal methodological research fits with the applied problem. Are statisticians sufficiently involved in the most important scientific problems in the world today (at least of those that could make use of statistical methods) and if not, is there some mechanism that could be developed by which we as a profession can make our expertise available to the scientists tackling those problems?

3) How do we close the gap between the sophistication of methods developed in our field and the simplicity of methods actually used by many (if not most) practitioners? Some scientific communities use sophisticated statistical tools and are up to date with the newest developments. Examples are clinical trials, brain imaging, genomics. Other communities routinely use the simplest statistical tools, such as single two-sample tests. Examples are experimental biology and chemistry, cancer imaging, and many other fields outside statistics. How do we explain this gap and what can we do to close it?

4) What makes a statistical methodology successful? Some modern statistical methods have gotten to be very well known in the scientific world, even though they are not usually part of any basic statistics course for non-statisticians. The best examples might be the bootstrap, regression trees, wavelet thresholding. Even Kaplan-Meier and Cox model are not in elementary stat books! But most statistical methods, even when they are good enough to be published in a good statistical journal, might get referenced a few times within the statistics literature and then forgotten, never making it outside the statistics community. What makes a statistical methodology gain widespread popularity?

5) Where should computational biology and bioinformatics sit in relation to biostatistics, both at Harvard and elsewhere. Should these subjects be taught as part of cross-department programs of which biostat is a part or should they be housed within an expanded biostat department?

6) Terry Speed recently published an IMS column entitled “statistics without probability”. He stated that “… the most prominent features of the data were systematic. Roughly speaking, we saw in the data very pronounced effects associated with almost every feature of the microarray assay, clear spatial, temporal, reagent and other effects, all dwarfing the random variation. These were what demanded our immediate attention, not the choice of two-sample test statistic or the multiple testing problem. What did we do? We simply did things that seemed sensible, to remove, or at least greatly reduce, the impact of these systematic effects, and we devised methods for telling whether or not our actions helped, none of it based on probabilities. In the technology-driven world that I now inhabit, I have seen this pattern on many more occasions since then. Briefly stated, the exploratory phase here is far more important than the confirmatory… How do we develop the judgement, devise the procedures, and pass on this experience? I don’t see answers in my books on probability-based statistics.”

My thoughts:

1) I think there are advantages to having two departments but they should certainly coordinate with each other. Here at Columbia, people are hired in one department or the other and nobody in the other department even hears about it, and we also have a biostatistics group in the psychiatry department. The trouble is, everybody’s so busy. One idea is to have each department have a person whose job (“committee assignment”) is to keep track of what’s happening in the sister department and then report back to the others. There are just so many opportunities for collaboration and shared work with students and faculty, it’s a shame to not take advantage.

2) I’m not supposed to go around saying that computer scientists are smarter than statisticians, but I think it’s ok for me to say that computer science is great, and I welcome that field’s involvement in statistical problems. I don’t know that we have to “respond” in any way except by cross-listing courses and updating the curriculum every now and then.

Li makes an excellent point about statisticians getting involved in problems “in a rather arbitrary fashion based on haphazard personal connections. One way to do better, I think, is to post all the collaborative projects in an easy-to-hash format so that people can get involved in projects that best suit them. We’re starting that here with our Applied Statistics Center but we have a ways to go, even at Columbia. At the very least, I recommend that other universities follow our path and start listing things.

3) There’s a need for more research into simple methods. Simple doesn’t have to mean stupid. Beyond that, I’m in favor of “closing the gap” one application at a time. But maybe that’s not the most efficient way, given that millions of scientific papers are published each year.

4) I think applied Bayesian methods are “very well known in the scientific world, even though they are not usually part of any basic statistics course for non-statisticians.” I’m surprised Li didn’t mention Bayesian methods in the list: this suggests that the first step is for statisticians and biostatisticians to become aware of the important methods in our own fields!

To answer the question more generally, I think for a method to gain widespread popularity it needs to give people answers that they want, and ideally be easy to use and theoretically justified. One reason Xiao-Li, Hal, and I wrote our paper on posterior predictive checking was to place this very useful method in a theoretical framework with theta, y, and y.rep.

5) I have no opinion on this one.

6) It’s funny that Terry Speed said this because, when I used to teach at Berkeley, I heard lots of people in the statistics department say that sort of thing. But at the same time they would teach extremely theoretical courses and discourage the Ph.D. students from learning about applied methods (outside of a few specific statistics-heavy fields such as biology). I don’t think they were aware of statistical methods that bridge between science and theory. The Bayesian approach is one way (at least, we try to do this in our book) but lots of non-Bayesian methods focus on systematic effects also. Consider all the work in economics on program evaluation and causal inference. In our recent book on regression, Jennifer and I emphasize the importance of the deterministic part of the model. I can’t say that we yet have a method to “develop the judgement, devise the procedures, and pass on this experience”–but we’ve definitely advanced beyond the 1950s-style “choice of two-sample test statistic or the
multiple testing problem.” So I don’t think things are as bad as Terry thinks, at least not in social science!

When, where, who

The panel discussion will be on Thurs 3 Apr from 2-3.30 in Kresge 213 (at the Harvard School of Public Health in Boston), and it will be led by Brad Efron, Colin Begg and David Harrington. It sounds like fun (it reminds me a bit of our symposium on statistical consulting), but I don’t know how they expect to cover all of that in only an hour and a half!

2 thoughts on “Some thoughts on connections between biostatistics and statistics, prompted by an announcement for a meeting that I won’t be able to attend

  1. I was surprised that Li listed clinical trials as an area where practitioners use sophisticated and up-to-date tools. Some clinical trials use up-to-date methods but these are decidedly in the minority.

  2. This brings up an interesting question (well, at least to me): aside from the focus on applied work for biology, what do you see as the major differences between statistics & biostatistics as disciplines?

Comments are closed.