How many simulations do you need?

Ernest Sergenti writes with a question regarding how to determine or quantify how many simulation iterations are “enough” to calculate quantities of interest, after burn-in has been achieved:

We [Sergenti et al.] have a computational model of party competition on a 2D policy space. The vote-maximizing rule that we are testing is constantly updating its position on the 2D policy grid. At each point in time it has a location and a heading. The party knows the vote share it received last period and the period before that. If the party gained votes from its last move, it continues to move in the same direction by one unit. If it lost votes, it picks a random heading from within a 180 degree arc directly opposite of the previous heading and moves one unit. We have 4 such parties, and, given a set of parameters, we are interested in estimating quantities of interested, such as the mean position of the 4 parties.

As at least one of the rules – in fact, in this case, all of rules – contain a random component, the process that generates the statistic “mean location of the 4 parties” is ergodic and thus the statistic tends to a limiting distribution. We are interested in estimating the mean and the standard deviation of the “mean location of the 4 parties” (as well as other statistics) for different parameter settings.

Although this is not Bayesian Analysis, we were struk by the similarities of our estimation needs and the methods employed with MCMC simulation. In fact, we use the procedure that you recommend in Chapter 11 of Bayesian Data Analysis to determine the number of simulations needed to ensure burn-in to the limiting distribution. Namely, we overdisperse the location of the parties based on our prior assumptions of where they should be in the long run and we run 5 separate individual runs of the model and use the Rhat statistic to come up with a conservative estimate that 5,000 iterations of the model are needed to reach the limiting distribution.

The problem we have been running into concerns how long to run the model after burn-in. We would like our estimates of quantities of interest to conform to the following two desiderata:

1. Quantities of interest, which a priori we do not expect to be zero or close to zero,
should be significantly different from zero; and

2. The estimated effect given one parameter setting should be statistically different from the
estimated effect of a second parameter setting.

Through trial-and-error, we have come up with the estimate that we require 250,000 post-burn-in iterations of our model to come up with good estimates of quantities of interest. We’d like to find a a more scientific way of making this determination. Someone mentioned repeated-measures power analysis, but I’m not sure that would work.

My reply:

1. Yes, you are simulating a Markov chain, and so it has a stationary distribution (a “target distribution” as we call it).

2. I’m not quite sure what you mean by “significantly different from zero.” In general, we distinguish between (a) quantities of interest (qoi’s) or functions of the parameters (or state variables), and (b) functions of the target distribution. For example, if the state variables in your model are a,b,c,d, then any of these is a qoi, and so are a/b, sqrt((a+b)/c), and so forth. But E(a), or E(a/b), or var(b+c) are functions of the target distribution.

For a qoi, the output of the simulations is a posterior distribution, possibly represented by a 95% interval. The sd (or the width of this interval) stabilizes but does _not_ go to zero as the number of simulations increases to infinity. For a function of the target distribution, the output is a single value, along with a simulation standard error which indeed does go to zero as the number of simulations approaches infinity.

I don’t think this directly answers your questions–I’m not quite sure exactly what you’re asking–but I suspect that if you carefully state your hypotheses it will be clear how to go from here. Generally, I think that people need fewer iterations than they think.

P.S. Damn it’s hard to type “qoi”! I can’t stop from typing the u right after the q.

1 thought on “How many simulations do you need?

  1. Hi Andrew,

    Thanks a lot for getting back to me on this.

    We are indeed interested in functions of the target distribution, namely E(a) and sd(a). Perhaps I could provide a concrete example. Say we have a parameter, the position of an intransigent ideological party: it never moves in the ideological 2D space. In addition, we have four other parties that adjust their position in the space to maximize the number of votes they receive. We're interested in: E ( mean position of the other parties ) and, to some extent, sd ( mean position of the other parties ). The estimate of the sd is roughly 2 for all settings of the intransigent party, while the estimate of the mean varies, but not by much. For example, with 250 thousand iterations, with the intransigent party set at 0, the E ( mean position of the other parties ) is .004; while with the intransigent party set at 2, the E ( mean position of the other parties ) is .052. The sd's are roughly 2 for both.

    Given the high sd relative to the variation in the means, we need a lot of iterations to get the simulation standard error on E ( mean position of the other parties ) down to the point where (1) the value of E (mean position) is statistically different from zero, when it should be, and (2) the difference between E (mean position) with intransigent party set at 0 is statistically different from the E (mean position) with intransigent party set at 2.

    The question we've been grappling with is: how small do we want our simulation standard errors on E() to be? Once we've determined that, we can run our simulations until we've reached that level of precision. Since I first wrote you, I've had a chance to consult some of the work done by scholars using simulations in Operations Research, such as Law and Kelton's Simulation Modeling and Analysis, 3rd Edition (2000). The consensus in this literature appears to be either (1) to pick an absolute level of precision, the half-length of the 95% confidence interval or (2) to pick a relative level of precision, the relative-to-the-estimated-mean half-length of the 95% confidence interval. Note that both of these measures are just functions of the simulation standard error.

    I haven't been completely satisfied with either of these criterion. The absolute measure poses problems if the values of the summary stats vary from target distribution to target distribution, while the relative measure poses problems for expected values close to zero. Instead, I have been focusing on power and a new measure I thought up: the ratio of the simulation standard error to the estimated target standard distribution, se / sd. For power, I'm thinking we could make sure that, given the number of iterations used, the power that the E (mean position) is statistically different from zero, when it should be, should be greater than some cut off value, such as .8. Likewise for a t-test of the difference between E (mean position) when the intransigent party is set at 0 versus the E (mean position) when the intransigent party is set at 2. The se-to-sd ratio is designed more to insure consistency across different target distributions from the same run with the same parameter settings and from different runs with different parameter settings.

    What do you think? Do you have any thoughts or advice on this issue? Have you come across such a problem with your work?

    Thanks a lot for any help!

    Ernest

Comments are closed.