Assessing the convergence of Markov Chain Monte Carlo methods: An example from evaluation of diagnostic tests in absence of a gold standard

https://doi.org/10.1016/j.prevetmed.2007.01.003Get rights and content

Abstract

The accessibility of Markov Chain Monte Carlo (MCMC) methods for statistical inference have improved with the advent of general purpose software. This enables researchers with limited statistical skills to perform Bayesian analysis. Using MCMC sampling to do statistical inference requires convergence of the MCMC chain to its stationary distribution. There is no certain way to prove convergence; it is only possible to ascertain when convergence definitely has not been achieved. These methods are rather subjective and not implemented as automatic safeguards in general MCMC software. This paper considers a pragmatic approach towards assessing the convergence of MCMC methods illustrated by a Bayesian analysis of the Hui–Walter model for evaluating diagnostic tests in the absence of a gold standard. The Hui–Walter model has two optimal solutions, a property which causes problems with convergence when the solutions are sufficiently close in the parameter space. Using simulated data we demonstrate tools to assess the convergence and mixing of MCMC chains using examples with and without convergence. Suggestions to remedy the situation when the MCMC sampler fails to converge are given. The epidemiological implications of the two solutions of the Hui–Walter model are discussed.

Introduction

Until recently, the use of Bayesian methods was hampered by the need for conjugate distributions where the prior distribution together with the observed data provide a posterior distribution from which estimates can be obtained analytically. An example is the Beta-Binomial, where using a Beta-distribution, e.g., p  Beta(a, b) as a prior for the probability (p) used in a Binomial-distribution modelling the y successes of n trials, e.g., (y|p)  Bin(p, n) will give a Beta posterior distribution for the probability p  Beta(a + y, b + n  y). However, since computing power has become readily available, the Markov Chain Monte Carlo (MCMC) methods, which can sample from most forms of posterior distribution, have become an alternative (see Gilks et al., 1996 for an introduction to some of the classic MCMC methods, such as the Gibbs Sampler or the Metropolis–Hastings Algorithm). Another factor which has increased the popularity of Bayesian methods is the advent of general purpose software such as WinBUGS (Spiegelhalter et al., 2003). The flexibility of WinBUGS allows one to specify and subsequently sample from virtually any model. This raises at least two issues for concern: First, is the model identifiable, i.e., are there enough degrees of freedom in the data to estimate the parameters, and second, how best to assess whether the chain has converged.

This paper focuses on the second issue, that of assessing whether the current chain has reached a stationary state from which samples may be obtained or not. We illustrate some adverse consequences of using MCMC methods and present some of the tools that can be used to avoid these outcomes in epidemiological assessments of diagnostic tests. Using a Bayesian version of the Hui–Walter model for evaluating diagnostic tests in the absence of a gold standard (Hui and Walter, 1980), we use a combination of test properties and prevalences for which convergence is easy to obtain and a set where convergence is unlikely ever to be achieved. Although the latter scenario probably has little practical importance, it allows us to compare and discuss convergence and inference for two models which are identical in structure and supported by data of the same sample size but still differ widely. We start by briefly introducing a Bayesian version of the Hui–Walter model, followed by the quintessence of MCMC sampling in order to explain the need for methods used for monitoring convergence.

Section snippets

A Bayesian Hui–Walter model

The Hui–Walter paradigm (Hui and Walter, 1980) has more or less been accepted as the standard method for evaluation of diagnostic tests in the absence of a “gold standard” test, i.e., a test with sensitivity (Se = Pr(T+|D+)=100%) and specificity (Sp = Pr(T−|D−)=100%), where Pr(T·|D·) is the probability of a certain test result given a certain disease status. Essentially the Hui–Walter method is a latent class analysis which requires that two or more tests must be evaluated in two or more

The quintessence of Markov Chain Monte Carlo sampling

The Monte Carlo method is essentially a numerical integration which uses a random element to obtain, for example, the mean of a random variable, or any other measure of interest. As a simple example, consider the random (continuous) variable X with density function g(x). The mean E(X) is then defined as the integral:E(X)=xg(x)dxThe Monte Carlo estimate of the mean (EMC(X)) is obtained by sampling a sequence of random variables X1, X2, … Xn all having the density function g(x) and then

Assessing the performance of MCMC chains in WinBUGS

To start the sampling, the MCMC-algorithm must be supplied with starting values. It is customary to run the chain for a number of iterations, burn-in period, before starting to sample. Unfortunately there is no certain way of establishing whether a chain has converged to the distribution from which we wish to sample. It is only possible to say when it definitely has not converged. When trying to determine whether or not a chain has converged, all variables should be monitored, even if only a

Improving the Bayesian Hui–Walter model

The lack of convergence of the Hui–Walter for Scenario A cannot be solved by thinning or running the chain(s) for a longer period. Despite our best efforts the Monte Carlo estimates of the mean Se, Sp and prevalences end up being approximately 0.5 and all distributions bimodal (for the prevalences these peaks will be much less visible than for the test properties). Still, it would be desirable to analyze Scenario A without having to resort to a maximum likelihood approach, as faith in the

Discussion

The objective of this paper was to illustrate the importance of assessing convergence of MCMC methods as well as presenting techniques to readily determine if convergence has been achieved. As an illustrative model we used the two-test two-population Hui–Walter model and the problems we have discussed in this paper were caused by a combination of poor diagnostic tests, insufficient differences in prevalences, but mostly by a nave implementation of the model. Hence one obvious lesson from this

Concluding remarks

MCMC sampling can be dangerous. None of the current general purpose MCMC software packages automatically monitor the convergence of the samples (to the authors knowledge). All the techniques presented in this paper are implemented in, e.g., WinBUGS, but it is up to the user to apply these in order to assess convergence of the MCMC samples prior to making inference of the parameters of interest. The accessibility of the different MCMC software is now at a level, where it no longer takes a lot of

Acknowledgements

This research was funded as part of the Wellcome Trust International Partnership Research Awards in Veterinary Epidemiology. Nils Toft was supported by a grant from The Danish Research Agency.

References (14)

There are more references available in the full text version of this article.

Cited by (123)

View all citing articles on Scopus
View full text