Approximate Bayesian inference for hierarchical Gaussian Markov random field models

https://doi.org/10.1016/j.jspi.2006.07.016Get rights and content

Abstract

Many commonly used models in statistics can be formulated as (Bayesian) hierarchical Gaussian Markov random field (GMRF) models. These are characterised by assuming a (often large) GMRF as the second stage in the hierarchical structure and a few hyperparameters at the third stage. Markov chain Monte Carlo (MCMC) is the common approach for Bayesian inference in such models. The variance of the Monte Carlo estimates is Op(M-1/2) where M is the number of samples in the chain so, in order to obtain precise estimates of marginal densities, say, we need M to be very large.

Inspired by the fact that often one-block and independence samplers can be constructed for hierarchical GMRF-models, we will in this work investigate whether MCMC is really needed to estimate marginal densities, which often is the goal of the analysis. By making use of GMRF-approximations, we show by typical examples that marginal densities can indeed be very precisely estimated by deterministic schemes. The methodological and practical consequence of these findings are indeed positive. We conjecture that for many hierarchical GMRF-models there is really no need for MCMC based inference to estimate marginal densities. Further, by making use of numerical methods for sparse matrices the computational costs of these deterministic schemes are nearly instant compared to the MCMC alternative. In particular, we discuss in detail the issue of computing marginal variances for GMRFs.

Introduction

A Gaussian Markov random field (GMRF) x={xi:iV} is a n=|V|-dimensional Gaussian random vector with additional conditional independence, or Markov properties. Assume that V={1,,n}. The conditional independence properties can be represented using an undirected graph G=(V,E) with vertices V and edges E. Two nodes, xi and xj, are conditional independent given the remaining elements in x, if and only if {i,j}E. Then, we say that x is a GMRF with respect to G. The edges in E are in one-to-one correspondence with the non-zero elements of the precision matrix of x, Q, in the sense that {i,j}E if and only if Qij0 for ij. When {i,j}E we say that i and j are neighbours, which we denote by ij.

GMRFs are also known as conditional auto-regressions (CARs) following seminal work of Besag, 1974, Besag, 1975. GMRFs (and their intrinsic versions) have a broad use in statistics, with important applications in structural time-series analysis, analysis of longitudinal and survival data, graphical models, semi-parametric regression and splines, image analysis and spatial statistics. For references and examples, see Rue and Held (2005, Chapter 1).

One of the main areas of application for GMRFs is that of (Bayesian) hierarchical models. A hierarchical model is characterised by several stages of observables and parameters. The first stage, typically, consists of distributional assumptions for the observables conditionally on latent parameters. For example if we observe a time series of counts y, we may assume, for yi, iDV a Poisson distribution with unknown mean λi. Given the parameters of the observation model, we often assume the observations to be conditionally independent. The second stage consists of a prior model for the latent parameters λi or, more often, for a particular function of them. For example, in the Poisson case we can choose an exponential link and model the random variables xi=log(λi). At this stage GMRFs provide a flexible tool to model the dependence between the latent parameters and thus, implicitly, the dependence between the observed data. This dependence can be of various kind, such as temporal, spatial, or even spatiotemporal. The third stage consists of prior distributions for the unknown hyperparameters θ. These are typically precision parameters in the GMRF. The posterior of interest is thenπ(x,θy)π(xθ)π(θ)iDπ(yixi).Most hierarchical GMRF-models can be written in this form. If there are unknown parameters also in the likelihood, then also the last term in (1) depends on θ. Such an extension makes only a slight notational difference in the following.

The main goal is often to compute posterior marginals, likeπ(xiy)=θx-iπ(x,θy)dx-idθfor each i and (sometimes also) posterior marginals for the hyperparameters θj. Since analytical integration is usually not possible for the posterior π(x,θ|y), it is common to use Markov chain Monte Carlo (MCMC)-based inference to estimate the posterior marginals. These marginals can then be used to compute marginal expectations of various statistics. Although single-site schemes, updating each element of (x,θ) individually, are always possible, they may converge slowly due to the hierarchical structure of the problem. We refer to Rue and Held (2005, Chapter 4) for further discussion. (In some cases reparametrisation may solve the convergence problem due to the hierarchical structure (Gelfand et al., 1995, Papaspiliopoulos et al., 2003), but see also Wilkinson, 2003.) In the case of disease mapping, Knorr-Held and Rue (2002) discuss various blocking strategies for updating all the unknown variables to improve the convergence, and Rue and Held (2005, Chapter 4) develop these ideas further. Even if using blocking strategies improves the convergence, MCMC techniques require a large number of samples to achieve a precise estimate. In this paper we propose a deterministic alternative to MCMC based inference which has the advantage of being computed almost instant and which, in our examples, proves to be quite accurate. The key for fast computing time lies in the sparseness of the precision matrix Q due to the Markov properties in the GMRFs. This characteristic allows the use of efficient algorithms and, as explained in Section 2, makes it possible to compute marginal variances without the need to invert Q.

One way to introduce our approximation technique is to start from the blocking strategies proposed in Knorr-Held and Rue (2002) and Rue and Held (2005, Chapter 4). The main idea behind these is to make use of the fact that the full conditional for the zero mean GMRF x,π(xθ,y)exp-12xTQx+iDlogπ(yi|xi)can often be well approximated with a Gaussian distribution, by matching the mode and the curvature at the mode. The resulting approximation will then beπ˜(xθ,y)exp(-12(x-μ)T(Q+diag(c))(x-μ)),where μ is the mode of π(xθ,y). Note that μ and Q (and then (4)) depend on θ but we suppress the dependence on θ to simplify the notation. The terms of the vector c are due to the second order terms in the Taylor expansion of logπ(yi|xi) at the modal value μ, and these terms are zero for the nodes not directly observed through the data. We call the approximation in (4) the GMRF-approximation. The GMRF-approximation is also a GMRF with respect to the graph G since, by assumption, each yi depends only on xi, a fact that is important computationally.

Following Knorr-Held and Rue (2002) and Rue and Held (2005, Chapter 4), we can often construct a one-block sampler for (x,θ), which proposes the new candidate (x,θ) byθq(θθ)andxπ˜(xθ,y)and then accept or reject (x,θ) jointly. This one-block algorithm, is made possible, in practise, by the outstanding computational properties of GMRFs through numerical algorithms for sparse matrices (Rue, 2001, Rue and Held, 2005). GMRFs of size up to 105 are indeed tractable.

In those cases where the dimension of θ is small (less than three, say) it is possible to derive an independence sampler by reusing (4) to build an approximation of the marginal posterior for θ. The starting point is the identityπ(θy)=π(x,θy)π(xθ,y).By approximating the denominator via expression (4) and evaluating the right-hand side at the modal value for x (for each θ), we obtain an approximation for the marginal posterior, which we denote by π˜(θ|y). This approximation is in fact the Laplace-approximation suggested by Tierney and Kadane (1986), who showed that its relative error is O(N-3/2) after renormalisation. (Here, N is the number of observations.) The approximation π˜(θ|y) then replaces q(θ|θ) in the one-block algorithm above. The independence sampler uses the approximationπ˜(x,θy)=π˜(θy)π˜(xθ,y).A natural question arises here. If we can use π˜(x,θ|y) to construct an independence sampler to explore π(x,θ|y), why not just compute approximations to the marginals from π˜(x,θy) directly?

Since (4) is Gaussian, it is, theoretically, always possible to (approximately) compute the marginal for the xi's asπ˜^(xiy)=jπ˜(xiθj,y)π˜(θjy)Δjby simply summing out θ by some numerical integration rule where Δj is the weight associated with θj. The approximated marginal posterior π˜^(xiy) is a mixture of Gaussians where the weights, mean and variances, are computed from (7). However, the dimension of x is usually large, thus obtaining the marginal variances for xi|θ,y is computationally intensive (recall that only the precision matrix Q is explicitly known). Therefore, the marginals in (8) are, in practise, possible to compute only for GMRFs since in, these cases, efficient computations are possible. A recursion algorithm to efficiently compute marginal variances for GMRFs is described in Section 2.

Although any MCMC algorithm will guarantee the correct answer in the end, the question is what happens in finite time. The Monte Carlo error is Op(M-1/2) where M is the (effective) number of samples, hence, the strength of the MCMC approach is to provide rough (near) unbiased estimates rather quickly, on the other side, precise estimates may take unreasonable long time. Any (deterministic) approximated inference can in fact compete with an MCMC approach, as long as its squared “bias”, or error, is comparable with the Monte Carlo error. The most interesting aspect of approximation (8), is that it can be computed almost instantly compared to the time any MCMC algorithm will have to run to obtain any decent accuracy.

The aim of this paper is to investigate how accurate (8) is for some typical examples of hierarchical GMRF-models. In Section 3 we report some experiments using models for disease mapping on a varying scale of difficulty. We compare the marginals of interest as approximated by (8) and as estimated from very long MCMC runs. The results are very positive. Before presenting the examples, we will, in Section 2, discuss how to efficiently compute marginal variances needed in expression (8) for GMRFs. This section also explains (implicit) why fast computations of GMRFs are possible using numerical methods for sparse matrices. Section 2 is unavoidably somewhat technical, but it is not necessary to appreciate the results in Section 3. We end with a discussion in Section 4.

Section snippets

Computing marginal variances for a GMRF

GMRFs are nearly always specified by their precision matrix Q meaning that the covariance matrix, Σ=Q-1 is only implicitly known. Although we can formally invert Q, the dimension n is typically large (103105) so inverting Q directly is costly and inconvenient. In this section we discuss a simple and fast algorithm to compute marginal variances, applicable for GMRFs with large dimension. The starting point is the not-well-known matrix identity which appeared in a IEEE conference proceedings (

Examples

In this section, we will present some results for the approximations for the marginal posteriors computed from (7), and their comparison with estimates obtained from very long MCMC runs. We will restrict ourself to the well-known BYM-model for disease mapping (Section 3.1). The BYM-model is a hierarchical GMRF-model with Poisson distributions at the first stage. We will use two different data sets, which we describe as “easy” (many counts) and “hard” (few counts). The comparison of the marginal

Discussion

In this report we have investigated how marginal posterior densities can be approximated using the GMRF-approximation in (8). We apply the GMRF-approximation to the full conditional for the latent GMRF component in hierarchical GMRF models. We use this to approximate both marginal posteriors for the hyperparameters and marginal posteriors for the components of the latent GMRF itself. We have also discussed how to compute marginal variances for GMRFs with and without linear constraints, and

Acknowledgement

The authors are grateful to the reviewers and the Editor for their helpful comments.

References (21)

  • Bernardinelli, L., Pascutto, C., Best, N.G., Gilks, W.R., 1997. Disease mapping with errors in covariates. Statist....
  • J. Besag

    Spatial interaction and the statistical analysis of lattice systems (with discussion)

    J. Roy. Statist. Soc. Ser. B

    (1974)
  • J. Besag

    Statistical analysis of non-lattice data

    Statistician

    (1975)
  • J. Besag

    On a system of two-dimensional recurrence equations

    J. Roy. Statist. Soc. Ser. B

    (1981)
  • J. Besag et al.

    Bayesian image restoration with two applications in spatial statistics (with discussion)

    Ann. Inst. Statist. Math.

    (1991)
  • P.J. Diggle et al.

    Model-based geostatistics (with discussion)

    J. Roy. Statist. Soc. Ser. C

    (1998)
  • A.M. Erisman et al.

    On computing certain elements of the inverse of a sparse matrix

    Commun. ACM

    (1975)
  • A.E. Gelfand et al.

    Efficient parameterisations for normal linear mixed models

    Biometrika

    (1995)
  • L. Knorr-Held et al.

    Bayesian detection of clusters and discontinuities in disease maps

    Biometrics

    (2000)
  • L. Knorr-Held et al.

    On block updating in Markov random field models for disease mapping

    Scand. J. Statist.

    (2002)
There are more references available in the full text version of this article.

Cited by (160)

  • The SPDE approach for Gaussian and non-Gaussian fields: 10 years and still running

    2022, Spatial Statistics
    Citation Excerpt :

    An important aim for our research is to construct models that also have good computational properties so we and other non-specialists can make practical use of them. It turns out that this is indeed possible due to the sparse precision matrix formulation and their efficient numerical computations (Rue and Held, 2005; Rue et al., 2009; Martins et al., 2013; Rue et al., 2017; van Niekerk et al., 2021; Rue and Martino, 2007; Eidsvik et al., 2009). The R-INLA, inlabru, and rSPDE packages (Krainski et al., 2018; Bakka et al., 2018; Lindgren and Rue, 2015; Bachl et al., 2019; Bolin and Kirchner, 2020) provide accessible interfaces to many of the Gaussian SPDE-based models discussed in this paper, whereas the ngme package (Asar et al., 2020) implements the non-Gaussian models.

View all citing articles on Scopus
View full text