The proliferation of MCMC (and, for that matter, optimization) methods: is there an underlying unity, or does that not make sense?

Radford is speaking in the statistics seminar on Monday 11 Dec (noon at 903 Social Work Bldg, for you locals):

Constructing Efficient MCMC Methods Using Temporary Mapping and Caching

I [Radford] describe two general methods for obtaining efficient Markov chain Monte Carlo methods – temporarily mapping to a new space, which may be larger or smaller than the original, and caching the results of previous computations for re-use. These methods can be combined to improve efficiency for problems where probabilities can be quickly recomputed when only a subset of `fast’ variables have changed. In combination, these methods also allow one to effectively adapt tuning parameters, such as the stepsize of random-walk Metropolis updates, without actually changing the Markov chain transitions used, thereby avoiding the issue that changing the transitions could undermine convergence to the desired distribution. Temporary mapping and caching can be applied in many other ways as well, offering a wide scope for development of useful new MCMC methods.

This reminds me of a general question I have about simulation algorithms (and also about optimization algorithms): should we try to have a toolkit of different algorithms and methods and put them together in different ways for different problems, or does it make sense to think about a single super-algorithm that does it all?

Usual hedgehog/fox reasoning would lead me to prefer a toolkit to a superalgorithm, but the story is not so simple. For one thing, insight can be gained by working within a larger framework. For example, we used to think of importance sampling, Gibbs, and Metropolis as three different algorithms, but they can all be viewed as special cases of Metropolis (see my 1992 paper, although I guess the physicists had been long aware of this). Anyway, Radford keeps spewing out new algorithms (I mean “spew” in the best sense, of course), and I wonder where he thinks this is all heading.

P.S. The talk was great; slides are here.

4 thoughts on “The proliferation of MCMC (and, for that matter, optimization) methods: is there an underlying unity, or does that not make sense?

  1. Reading your post, my very first 'hunch' would be that the most promising would be some approach where a 'meta-algorithm' determines itself what algorithm is most efficient for the task at hand and would be able to switch to a different algorithm on the fly when the current appears inefficient. Different algorithms are probably efficient for different types of cases (as also Radford's summary suggests) but it would be quite demanding if users had to know of all algorithms and make their own decisions in this regards. At least, for as long as all algorithms work in all cases, some efficiently and some inefficiently, but all with acceptable results.

  2. While a superalgorithm is nice for theorists, in practice, "plug and play" depending on the problem is standard. One reason is that problems have different complexities.

    Consider an optimization problem that has a network structure. By definition, it is also a linear program. However, using network-specific code is much more efficient, which means I can solve much bigger problems.

    I actually came across a scenario where NLP code was used to solve LPs. Not only was the solution time much worse but the NLP code sometimes can't even find the solution. In this case, the NLP would be the superalgorithm as it include the LP as a special case.

  3. Sounds like a combination of ideas deriving from his short-cut Metropolis, fast variable dragging, and non-reversible chain papers. Does the department post electronic versions of seminars? I've been thinking about applications of caching in MCMC recently.

Comments are closed.