Assessing the convergence of Markov Chain Monte Carlo methods: An example from evaluation of diagnostic tests in absence of a gold standard
Introduction
Until recently, the use of Bayesian methods was hampered by the need for conjugate distributions where the prior distribution together with the observed data provide a posterior distribution from which estimates can be obtained analytically. An example is the Beta-Binomial, where using a Beta-distribution, e.g., p ∼ Beta(a, b) as a prior for the probability (p) used in a Binomial-distribution modelling the y successes of n trials, e.g., (y|p) ∼ Bin(p, n) will give a Beta posterior distribution for the probability p ∼ Beta(a + y, b + n − y). However, since computing power has become readily available, the Markov Chain Monte Carlo (MCMC) methods, which can sample from most forms of posterior distribution, have become an alternative (see Gilks et al., 1996 for an introduction to some of the classic MCMC methods, such as the Gibbs Sampler or the Metropolis–Hastings Algorithm). Another factor which has increased the popularity of Bayesian methods is the advent of general purpose software such as WinBUGS (Spiegelhalter et al., 2003). The flexibility of WinBUGS allows one to specify and subsequently sample from virtually any model. This raises at least two issues for concern: First, is the model identifiable, i.e., are there enough degrees of freedom in the data to estimate the parameters, and second, how best to assess whether the chain has converged.
This paper focuses on the second issue, that of assessing whether the current chain has reached a stationary state from which samples may be obtained or not. We illustrate some adverse consequences of using MCMC methods and present some of the tools that can be used to avoid these outcomes in epidemiological assessments of diagnostic tests. Using a Bayesian version of the Hui–Walter model for evaluating diagnostic tests in the absence of a gold standard (Hui and Walter, 1980), we use a combination of test properties and prevalences for which convergence is easy to obtain and a set where convergence is unlikely ever to be achieved. Although the latter scenario probably has little practical importance, it allows us to compare and discuss convergence and inference for two models which are identical in structure and supported by data of the same sample size but still differ widely. We start by briefly introducing a Bayesian version of the Hui–Walter model, followed by the quintessence of MCMC sampling in order to explain the need for methods used for monitoring convergence.
Section snippets
A Bayesian Hui–Walter model
The Hui–Walter paradigm (Hui and Walter, 1980) has more or less been accepted as the standard method for evaluation of diagnostic tests in the absence of a “gold standard” test, i.e., a test with sensitivity (Se = Pr(T+|D+)=100%) and specificity (Sp = Pr(T−|D−)=100%), where Pr(T·|D·) is the probability of a certain test result given a certain disease status. Essentially the Hui–Walter method is a latent class analysis which requires that two or more tests must be evaluated in two or more
The quintessence of Markov Chain Monte Carlo sampling
The Monte Carlo method is essentially a numerical integration which uses a random element to obtain, for example, the mean of a random variable, or any other measure of interest. As a simple example, consider the random (continuous) variable X with density function g(x). The mean E(X) is then defined as the integral:The Monte Carlo estimate of the mean (EMC(X)) is obtained by sampling a sequence of random variables X1, X2, … Xn all having the density function g(x) and then
Assessing the performance of MCMC chains in WinBUGS
To start the sampling, the MCMC-algorithm must be supplied with starting values. It is customary to run the chain for a number of iterations, burn-in period, before starting to sample. Unfortunately there is no certain way of establishing whether a chain has converged to the distribution from which we wish to sample. It is only possible to say when it definitely has not converged. When trying to determine whether or not a chain has converged, all variables should be monitored, even if only a
Improving the Bayesian Hui–Walter model
The lack of convergence of the Hui–Walter for Scenario A cannot be solved by thinning or running the chain(s) for a longer period. Despite our best efforts the Monte Carlo estimates of the mean Se, Sp and prevalences end up being approximately 0.5 and all distributions bimodal (for the prevalences these peaks will be much less visible than for the test properties). Still, it would be desirable to analyze Scenario A without having to resort to a maximum likelihood approach, as faith in the
Discussion
The objective of this paper was to illustrate the importance of assessing convergence of MCMC methods as well as presenting techniques to readily determine if convergence has been achieved. As an illustrative model we used the two-test two-population Hui–Walter model and the problems we have discussed in this paper were caused by a combination of poor diagnostic tests, insufficient differences in prevalences, but mostly by a nave implementation of the model. Hence one obvious lesson from this
Concluding remarks
MCMC sampling can be dangerous. None of the current general purpose MCMC software packages automatically monitor the convergence of the samples (to the authors knowledge). All the techniques presented in this paper are implemented in, e.g., WinBUGS, but it is up to the user to apply these in order to assess convergence of the MCMC samples prior to making inference of the parameters of interest. The accessibility of the different MCMC software is now at a level, where it no longer takes a lot of
Acknowledgements
This research was funded as part of the Wellcome Trust International Partnership Research Awards in Veterinary Epidemiology. Nils Toft was supported by a grant from The Danish Research Agency.
References (14)
- et al.
Estimation of diagnostic-test sensitivity and specificity through bayesian modeling
Prevent. Vet. Med.
(2005) - et al.
Diagnosing diagnostic tests: evaluating the assumptions underlying the estimation of sensitivity and specificity in the absence of a gold standard
Prevent. Vet. Med.
(2005) Re: Bayesian estimation of disease prevalence and the parameters of diagnostic tests in the absence of a gold standard
Am. J. Epidemiol.
(1997)- et al.
Estimating disease prevalence in the absence of a gold standard
Stat. Med.
(2002) - et al.
Alternative methods for monitoring convergence of iterative simulations
J. Comput. Graph. Stat.
(1998) - et al.
Bayesian approaches to modeling the conditional dependence between multiple diagnostic tests
Biometrics
(2001) - et al.
Correlation-adjusted estimation of sensitivity and specifity of two diagnostic tests
J. R. Stat. Soc. Ser. C: Appl. Stat.
(2003)
Cited by (123)
Effects of interventions for enhancing resilience in cancer patients: A systematic review and network meta-analysis
2024, Clinical Psychology ReviewProxy-based Bayesian inversion of strain tensor data measured during well tests
2023, Geomechanics for Energy and the EnvironmentBayesian Evaluation of Three Serological Tests for Diagnosis of Brucella infections in Dromedary Camels Using Latent Class Models
2022, Preventive Veterinary MedicineBayesian neuroevolution using distributed swarm optimization and tempered MCMC[Formula presented]
2022, Applied Soft Computing