Approximate Bayesian inference for hierarchical Gaussian Markov random field models
Introduction
A Gaussian Markov random field (GMRF) is a -dimensional Gaussian random vector with additional conditional independence, or Markov properties. Assume that . The conditional independence properties can be represented using an undirected graph with vertices and edges . Two nodes, and , are conditional independent given the remaining elements in , if and only if . Then, we say that is a GMRF with respect to . The edges in are in one-to-one correspondence with the non-zero elements of the precision matrix of , , in the sense that if and only if for . When we say that i and j are neighbours, which we denote by .
GMRFs are also known as conditional auto-regressions (CARs) following seminal work of Besag, 1974, Besag, 1975. GMRFs (and their intrinsic versions) have a broad use in statistics, with important applications in structural time-series analysis, analysis of longitudinal and survival data, graphical models, semi-parametric regression and splines, image analysis and spatial statistics. For references and examples, see Rue and Held (2005, Chapter 1).
One of the main areas of application for GMRFs is that of (Bayesian) hierarchical models. A hierarchical model is characterised by several stages of observables and parameters. The first stage, typically, consists of distributional assumptions for the observables conditionally on latent parameters. For example if we observe a time series of counts , we may assume, for , a Poisson distribution with unknown mean . Given the parameters of the observation model, we often assume the observations to be conditionally independent. The second stage consists of a prior model for the latent parameters or, more often, for a particular function of them. For example, in the Poisson case we can choose an exponential link and model the random variables . At this stage GMRFs provide a flexible tool to model the dependence between the latent parameters and thus, implicitly, the dependence between the observed data. This dependence can be of various kind, such as temporal, spatial, or even spatiotemporal. The third stage consists of prior distributions for the unknown hyperparameters . These are typically precision parameters in the GMRF. The posterior of interest is thenMost hierarchical GMRF-models can be written in this form. If there are unknown parameters also in the likelihood, then also the last term in (1) depends on . Such an extension makes only a slight notational difference in the following.
The main goal is often to compute posterior marginals, likefor each i and (sometimes also) posterior marginals for the hyperparameters . Since analytical integration is usually not possible for the posterior , it is common to use Markov chain Monte Carlo (MCMC)-based inference to estimate the posterior marginals. These marginals can then be used to compute marginal expectations of various statistics. Although single-site schemes, updating each element of individually, are always possible, they may converge slowly due to the hierarchical structure of the problem. We refer to Rue and Held (2005, Chapter 4) for further discussion. (In some cases reparametrisation may solve the convergence problem due to the hierarchical structure (Gelfand et al., 1995, Papaspiliopoulos et al., 2003), but see also Wilkinson, 2003.) In the case of disease mapping, Knorr-Held and Rue (2002) discuss various blocking strategies for updating all the unknown variables to improve the convergence, and Rue and Held (2005, Chapter 4) develop these ideas further. Even if using blocking strategies improves the convergence, MCMC techniques require a large number of samples to achieve a precise estimate. In this paper we propose a deterministic alternative to MCMC based inference which has the advantage of being computed almost instant and which, in our examples, proves to be quite accurate. The key for fast computing time lies in the sparseness of the precision matrix due to the Markov properties in the GMRFs. This characteristic allows the use of efficient algorithms and, as explained in Section 2, makes it possible to compute marginal variances without the need to invert .
One way to introduce our approximation technique is to start from the blocking strategies proposed in Knorr-Held and Rue (2002) and Rue and Held (2005, Chapter 4). The main idea behind these is to make use of the fact that the full conditional for the zero mean GMRF ,can often be well approximated with a Gaussian distribution, by matching the mode and the curvature at the mode. The resulting approximation will then bewhere is the mode of . Note that and (and then (4)) depend on but we suppress the dependence on to simplify the notation. The terms of the vector are due to the second order terms in the Taylor expansion of at the modal value , and these terms are zero for the nodes not directly observed through the data. We call the approximation in (4) the GMRF-approximation. The GMRF-approximation is also a GMRF with respect to the graph since, by assumption, each depends only on , a fact that is important computationally.
Following Knorr-Held and Rue (2002) and Rue and Held (2005, Chapter 4), we can often construct a one-block sampler for , which proposes the new candidate byand then accept or reject jointly. This one-block algorithm, is made possible, in practise, by the outstanding computational properties of GMRFs through numerical algorithms for sparse matrices (Rue, 2001, Rue and Held, 2005). GMRFs of size up to are indeed tractable.
In those cases where the dimension of is small (less than three, say) it is possible to derive an independence sampler by reusing (4) to build an approximation of the marginal posterior for . The starting point is the identityBy approximating the denominator via expression (4) and evaluating the right-hand side at the modal value for (for each ), we obtain an approximation for the marginal posterior, which we denote by . This approximation is in fact the Laplace-approximation suggested by Tierney and Kadane (1986), who showed that its relative error is after renormalisation. (Here, N is the number of observations.) The approximation then replaces in the one-block algorithm above. The independence sampler uses the approximationA natural question arises here. If we can use to construct an independence sampler to explore , why not just compute approximations to the marginals from directly?
Since (4) is Gaussian, it is, theoretically, always possible to (approximately) compute the marginal for the 's asby simply summing out by some numerical integration rule where is the weight associated with . The approximated marginal posterior is a mixture of Gaussians where the weights, mean and variances, are computed from (7). However, the dimension of is usually large, thus obtaining the marginal variances for is computationally intensive (recall that only the precision matrix is explicitly known). Therefore, the marginals in (8) are, in practise, possible to compute only for GMRFs since in, these cases, efficient computations are possible. A recursion algorithm to efficiently compute marginal variances for GMRFs is described in Section 2.
Although any MCMC algorithm will guarantee the correct answer in the end, the question is what happens in finite time. The Monte Carlo error is where M is the (effective) number of samples, hence, the strength of the MCMC approach is to provide rough (near) unbiased estimates rather quickly, on the other side, precise estimates may take unreasonable long time. Any (deterministic) approximated inference can in fact compete with an MCMC approach, as long as its squared “bias”, or error, is comparable with the Monte Carlo error. The most interesting aspect of approximation (8), is that it can be computed almost instantly compared to the time any MCMC algorithm will have to run to obtain any decent accuracy.
The aim of this paper is to investigate how accurate (8) is for some typical examples of hierarchical GMRF-models. In Section 3 we report some experiments using models for disease mapping on a varying scale of difficulty. We compare the marginals of interest as approximated by (8) and as estimated from very long MCMC runs. The results are very positive. Before presenting the examples, we will, in Section 2, discuss how to efficiently compute marginal variances needed in expression (8) for GMRFs. This section also explains (implicit) why fast computations of GMRFs are possible using numerical methods for sparse matrices. Section 2 is unavoidably somewhat technical, but it is not necessary to appreciate the results in Section 3. We end with a discussion in Section 4.
Section snippets
Computing marginal variances for a GMRF
GMRFs are nearly always specified by their precision matrix meaning that the covariance matrix, is only implicitly known. Although we can formally invert , the dimension n is typically large (–) so inverting directly is costly and inconvenient. In this section we discuss a simple and fast algorithm to compute marginal variances, applicable for GMRFs with large dimension. The starting point is the not-well-known matrix identity which appeared in a IEEE conference proceedings (
Examples
In this section, we will present some results for the approximations for the marginal posteriors computed from (7), and their comparison with estimates obtained from very long MCMC runs. We will restrict ourself to the well-known BYM-model for disease mapping (Section 3.1). The BYM-model is a hierarchical GMRF-model with Poisson distributions at the first stage. We will use two different data sets, which we describe as “easy” (many counts) and “hard” (few counts). The comparison of the marginal
Discussion
In this report we have investigated how marginal posterior densities can be approximated using the GMRF-approximation in (8). We apply the GMRF-approximation to the full conditional for the latent GMRF component in hierarchical GMRF models. We use this to approximate both marginal posteriors for the hyperparameters and marginal posteriors for the components of the latent GMRF itself. We have also discussed how to compute marginal variances for GMRFs with and without linear constraints, and
Acknowledgement
The authors are grateful to the reviewers and the Editor for their helpful comments.
References (21)
- Bernardinelli, L., Pascutto, C., Best, N.G., Gilks, W.R., 1997. Disease mapping with errors in covariates. Statist....
Spatial interaction and the statistical analysis of lattice systems (with discussion)
J. Roy. Statist. Soc. Ser. B
(1974)Statistical analysis of non-lattice data
Statistician
(1975)On a system of two-dimensional recurrence equations
J. Roy. Statist. Soc. Ser. B
(1981)- et al.
Bayesian image restoration with two applications in spatial statistics (with discussion)
Ann. Inst. Statist. Math.
(1991) - et al.
Model-based geostatistics (with discussion)
J. Roy. Statist. Soc. Ser. C
(1998) - et al.
On computing certain elements of the inverse of a sparse matrix
Commun. ACM
(1975) - et al.
Efficient parameterisations for normal linear mixed models
Biometrika
(1995) - et al.
Bayesian detection of clusters and discontinuities in disease maps
Biometrics
(2000) - et al.
On block updating in Markov random field models for disease mapping
Scand. J. Statist.
(2002)
Cited by (160)
The SPDE approach for Gaussian and non-Gaussian fields: 10 years and still running
2022, Spatial StatisticsCitation Excerpt :An important aim for our research is to construct models that also have good computational properties so we and other non-specialists can make practical use of them. It turns out that this is indeed possible due to the sparse precision matrix formulation and their efficient numerical computations (Rue and Held, 2005; Rue et al., 2009; Martins et al., 2013; Rue et al., 2017; van Niekerk et al., 2021; Rue and Martino, 2007; Eidsvik et al., 2009). The R-INLA, inlabru, and rSPDE packages (Krainski et al., 2018; Bakka et al., 2018; Lindgren and Rue, 2015; Bachl et al., 2019; Bolin and Kirchner, 2020) provide accessible interfaces to many of the Gaussian SPDE-based models discussed in this paper, whereas the ngme package (Asar et al., 2020) implements the non-Gaussian models.
Joint spatial modeling of significant wave height and wave period using the SPDE approach
2022, Probabilistic Engineering MechanicsMarginal additive models for population-averaged inference in longitudinal and cluster-correlated data
2024, Scandinavian Journal of StatisticsFitting Latent Non-Gaussian Models Using Variational Bayes and Laplace Approximations
2024, Journal of the American Statistical Association