Bayesian adaptive methods for clinical trials

Scott Berry, Brad Carlin, Jack Lee, and Peter Muller recently came out with a book with the above title.

The book packs a lot into its 280 pages and is fun to read as well (even if they do use the word “modalities” in their first paragraph, and later on they use the phrase “DIC criterion,” which upsets my tidy, logical mind). The book starts off fast on page 1 and never lets go.

Clinical trials are a big part of statistics and it’s cool to see the topic taken seriously and being treated rigorously. (Here I’m not talking about empty mathematical rigor (or, should I say, “rigor”), so-called optimal designs and all that, but rather the rigor of applied statistics, mapping models to reality.)

Also I have a few technical suggestions.

1. The authors fit a lot of models in Bugs, which is fine, but they go overboard on the WinBUGS thing. There’s WinBUGS, OpenBUGS, JAGS: they’re all Bugs recommend running Bugs from R using the clunky BRugs interface rather than the smoother bugs() function, which has good defaults and conveniently returns graphical summaries and convergence diagnostics. The result is to get tangled in software complications and distance the user from statistical modeling.

2. On page 61 they demonstrate an excellent graphical summary that reveals that, in a particular example, their posterior distribution is improper–or, strictly speaking, that the posterior depends strongly on the choice of an arbitrary truncation point in the prior distribution. But then they stick with the bad model! Huh? This doesn’t seem like such a good idea.

3. They cover all of Bayesian inference in a couple chapters, which is fine–interested readers can learn the whole thing from the Carlin and Louis book–but in their haste they sometimes slip up. For example, from page 5:

Randomization minimizes the possibility of selection bias, and it tends to balance the treatment groups over covariates, both known and unknown. There are difference, however, in the Bayesian and frequentist views of randomization. In the latter, randomization serves as the basis for inference, whereas the basis for inference in the Bayesian approach is subjective probability, which does not require randomization.

I get their general drift but I don’t agree completely. First, randomization is a basis for frequentist inference, but it’s not fair to call it the basis. There’s lots of frequentist inference for nonrandomized studies. Second, I agree that the basis for Bayesian inference is probability but I don’t buy the “subjective” part (except to the extent that all science is subjective). Third, the above paragraph leaves out why a Bayesian would want to randomize. The basic reason is robustness, as we discuss in chapter 7 of BDA.

4. I was wondering what the authors would say about Sander Greenland’s work on multiple-bias modeling. Greenland uses Bayesian methods and has thought a lot about bias and causal inference in practical medical settings. I looked up Greenland in the index and all I could find was one page, which referred to some of his more theoretical work:

Greenland, Lanes, and Jara (2008) explore the use of structural nested models and advocate what they call g-estimation, a form of test-based estimation adhering to the ITT principle and accomodating a semiparametric Cox partial likelihood.

Nothing on multiple-bias modeling. Also I didn’t see any mention of this paper by John “no relation” Carlin and others. Finally, the above paragraph is a bit odd in that “test-based estimation” and “semiparametric Cox partial likelihood” are nowhere defined in the book (or, at least, I couldn’t find them in the index). I mean, sure, the reader can google these things, but I’d really like to see these ideas presented in the context of the book.

5. The very last section covers subgroup analysis and then mentions multilevel models (the natural Bayesian approach to the problem) but then doesn’t really follow through. They go into a long digression on decision analysis. That’s fine, but I’d like to see a worked example of a multilevel model for subgroup analysis, instead of just the reference to Hodges et al. (2007).

In summary, I like this book and it left me wanting even more. I hope that everyone working on clinical trials reads it and that it has a large influence.

And, just to be clear, most of my criticisms above are of the form, “I like it and want more.” In particular, my own books don’t have anything to say on multiple-bias models, test-based estimation, semiparametric Cox partial likelihood, multilevel models for subgroup analysis, or various other topics I’m asking for elaboration on. As it stands, Berry, Carlin, Lee, and Muller have packed a lot into 280 pages.

6 thoughts on “Bayesian adaptive methods for clinical trials

  1. Andrew,

    You suggest that "so-called optimal designs" are "empty mathematical rigor". Could you elaborate more? I am doing some work on optimal design so I'm curious what your thoughts are, especially since it doesn't look like you have discussed it previously on the blog.

    Matt

  2. @ Matt:

    One issue with optimal designs is that they're optimal in one very specific sense—maximising the Fisher information, or the expected KL divergence between posterior and prior, or whatever is chosen as the criterion—but they are not necessarily good in any other sense. For instance, if in a regression you believe the "true" model has mean linear in x, and can choose values of x on the range [L,U], then it's optimal (to maximise the det of the Fisher information) to put half your xs at L and half at U; this design however does not give you any ability to check the assumption of linearity, though, and if the linear assumption is okay for middling values of x but a bit dodgy for extreme values of x then clearly the "optimal" design is leading you astray.

    In contrast, Box and Draper (1975, <a>Biometrika 62:347–52) provide a list of 14 desiderata for a good design, and conclude the paper with

    …we do not recommend choosing a design solely on the basis of any single criterion. In general, it seems more appropriate to select suitable measures … and to use them to make sensible subjective compromises. For, while it may not be possible simultaneously to optimise several criteria, it may nevertheless be possible to obtain satisfactory values for several.

    That's not to say optimal design is completely pointless, and I've found it useful in my own work, but it may well be over-rated.

  3. JAGS, OpenBUGS and WinBUGS may all be attempting to sample graphical models, but they're very different from each other in terms of syntax and supported operations. The upshot is that models are not portable. They also have performances which

    They also have vary by model relative to each other.

    Gelman et al.'s ARM and BDA are also both WinBUGS-centric in the sense that the models will only run in WinBUGS and the appendices are about how to install WinBUGS. I don't much care for books that try to show you the same thing in multiple languages at the same time. So what's an author to do?

    Maybe the next books will be JAGS oriented. I like JAGS more than WinBUGS now because it (a) is open source C++, (b) runs multi-platform, (c) is faster and seems slightly more robust than WinBUGS, (d) is better integrated with R (doesn't spawn an unkillable unresponsive process and provides some feedback as it runs), and (e) has slightly better debug info (though still not great).

    Unfortunately, walking the dependency graph and interpreting arithmetic operations and loops in all these programs makes them very slow compared to a compiler like HBC.

  4. Andrew: For MBA, Berry et al are focusing mainly on a randomization setting.

    I'll have to get their book.

    Failures of randomization are inevitable and perhaps should be addressed – though maybe better done in a second book?

    The Greenland, Lanes, and Jara (2008) paper would be a better fit there, it is a very practical paper (I have even used it once) and it holds on to the control of the achievable type 1 error (given randomization and ITT) while nicely allowing for less biased estimation.

    The MBA would come in for more serious randomization failures or consideration of multiple failures and here I would also suggest Wolpert and Mengersen. "Adjusted Likelihoods for Synthesizing Empirical Evidence from Studies That Differ in Quality and Design: Effects of Environmental Tobacco Smoke." Statistical Science 19.3 (August, 2004) for a more formal and complete Bayesian view.

    And I just can’t believe Peter Muller want along with this statement – “Bayesian approach is subjective probability, which does not require randomization”.

    Though they did not call the Winbugs optimally from R – that is likely the code to give – if you want many others to be able to actually redo the examples for themselves (and I wrote this while Bob was posting similar concerns.

    K?

  5. Hi Andrew — sorry to be slow in replying. Thanks for blogging the book as promised; as usual I find myself agreeing with many of your remarks! Taking them in order:

    1) I'm sorry you continue to dislike BRugs; I got into it because it was promoted as the seamless solution to integrating R and BUGS. It then vanished from CRAN for a while because the R folks disliked the fact that it "secretly" installed OpenBUGS on your computer at the time you loaded it from CRAN. The BUGS Core Team has been working on this problem and my understanding is this is being remedied. You can still get BRugs for R 2.11.1 (and I just used it in a short course I taught at the Deming Conference) but I'm open to suggestions. Really this is a matter of personal taste I think, and crusty old faculty members like us will never be able to keep up with our grad students, who will always find the most efficient ways to do such things.

    Having said all that, I also don't think BUGS (or JAGS) is the answer for everything; a lot of the work guys like Scott Berry do requires a *ton* of simulation, where you're checking the frequentist properties of your Bayesian design for hundreds of different true parameter values. The work is so intense that any system calling a high-level language like BUGS is going to be prohibitively slow. Very often, Scott is still writing code in Fortran (yes, Fortran) to get things to move quickly enough; he's also partnered with a software company to develop their own (very expensive) commercial package for this. Anyway the point in our book was to use BRugs-OpenBUGS to illustrate the basic idea (checking frequentist properties through simulation) and then if you get into this, it's up to you to find your own favorite computing method.

    2) That example is one I got from Don Berry, who was originally involved in the project, but wound up leaving (more on this anon). the posterior shown is proper because the prior is; it's truncated to a finite region. But you're right, the posterior "wants to be improper" (since untruncated it would be), and I left that plot in there partly to show folks what this situation looks like graphically. It also gave me a chance to plug an old paper by a former student (and Scott Berry classmate at CMU), Petros Hadjicostas, who proved the impropriety that results from the untruncated hyperprior.
    The downloadable code for this problem on the book's website also has a couple Gamma(2,2) hyperpriors in there you can pick instead, and this does dramatically change the shape of the bivariate posterior — again, kind of neat to see.

    3) This is another issue arising from Don's former involvement, and requires me to tell you the backstory of the book. Basically I've been rallying support for this project since at least Feb 2006, when I went down to MD Anderson and had a big pow wow with the 3 authors plus Don, Peter Thall, and a few others. The upshot was my Fall 2008 sabbatical at MDACC, where Don gave me some older "white papers" he'd written on this subject for introductory audiences, but never published. I took these and edited/re-cut them into Chapter 1 and parts of Chapter 2 (the rest of which is largely a very brief review of the CL3 text for those who don't own it or Gelman et al). But because these "white papers" were older, they reflected Don's much more subjective Bayesian views of 10-15 years ago. I should have been more careful in editing out some of this stuff, because it's not the way he or I or any "working Bayesian" thinks any more (heck, the FDA won't permit it anyway, whether you personally care about robustness or not). But this point of view was still feasible back then; see e.g. Jay Kadane's book for a purely subjective Bayes take (unsurprising given its author) on clinical trials.

    Just to finish the thread here, Don wound up not having time to really contribute meaningfully to the book, and thus left the author list (he would have been first author, since "Don" comes before "Scott" alphabetically). But even had he stayed, I don't think he would have said Bayesians don't care about randomization or Type I error, so I need to fix this in the next edition. While it is true that a purely subjective Bayesian approach doesn't "require" randomization, I don't think any Bayesians nowadays would simply do without it.

    4) On not giving Sander's stuff sufficient space, I plead guilty as charged. It's always a problem when you run across an important topic and none of the 4 authors on the team are expert in the material; somebody needs to step up, dig into it, and then summarize it for a general audience. We ran out of time for this (Sander's stuff can be notoriously difficult to read), but this is on the list of things to improve in the 2nd edition. This is a handbook, not a textbook, so we expect to revise, expand, and re-release it from time to time as necessary; this version was only the first step.

    I found g-estimation to be a pretty slippery idea, so I probably lost interest in it at that point. But I'll have a look at the multiple bias stuff. By the way, thanks for the ref to the paper by my "Australian twin," John Carlin ;) I'm descended from a French bastard left on the (Irish) Carlin family doorstep in the 1700s, so we really are probably "no relation" ;)

    5) Again, not much more to add here except I agree that the QB (me) is recharging his batteries and hopes to expand on this topic in the next edition.

    In summary, thanks for a good review and I"m glad you agree it's a good start. Certainly the book's reception has been very favorable; we were CRC's biggest seller at the 2010 JSM in Vancouver (outselling even Gelman et al!) and the phone is ringing off the hook with short course offers, one of which will be at JSM 2011 in Miami (we got invited back to reprise the 2010 course, apparently based on positive CE reviews).

    Cheers, Brad

  6. Brad: slipping into the abyss of "g-estimation" did not seem necessary for using the approach in Greenland, Lanes, and Jara (2008).

    Also might be a good place to start as its so close to RCTs and ITT – perhaps starting with simpler outcomes than survival time.

    But slipperiness is in the legs of the walker ;-)

    Your book is being ordered for me, but has not arrived yet, thanks for the back ground info.

    K?

Comments are closed.