Background
Statistical methods that assess the impact of a spatial structure on the occurrence of a particular health event have been developed in many areas over the recent decades [
1]. These methods allow detecting clusters of disease cases and mapping observations or estimations [
2]. They combine techniques from geography, epidemiology, and public health to better understand health needs and allocate resources.
Currently, various epidemiological information systems are used to collect and analyze health-related data and guide political decisions. In France, cancer cases are collected by the Institut National de Veille Sanitaire and the Institut National du Cancer in collaboration with Francim network of cancer registries [
2]. Mapping cancer cases may have a significant impact on the perception of excess rates in particular regions, but the heterogeneity in population density between administrative areas may affect the interpretation of mapping results, especially in case of small areas [
3]. To display this heterogeneity, maps may be produced by operating transformations that better reflect the spatial distribution of the disease [
1]. Still, whatever the parametric or non-parametric method used to account for uncertainty in the spatial distribution of the disease, the choice of the measure to be shown remains a key issue [
4].
In a given geographical area, some health indicators may reveal an excess number of cases. This excess is often estimated by the standardized incidence ratio (SIR) [
5]. Generally, the SIR estimates the risk of disease in a given spatial unit and depends on the existence of either a spatial autocorrelation (i.e., lack of independence between observations) or a spatial heterogeneity. Comparisons of SIRs between neighboring spatial units may suggest grouping sets of spatial units into classes or clusters.
Two types of methods may be used to detect disease clusters; i.e., aggregates of cases [
6]. Local methods are able to detect and locate clusters with or without a predetermined source point. In addition to spatial location, some local methods allow for confounding factors that may affect the spatial distribution of the disease [
7]. Global methods look for the presence –but not the location– of a clustering pattern [
8,
9]. For example, Moran’s autocorrelation coefficient is a global method that measures the spatial autocorrelation weighted by a function of the distance between two close points defined by their centroids (“average X, average Y”) [
10].
Several local detection methods allow identifying clusters with particular shapes within the study area: i) the spatial scan statistic of Kulldorff (SaTScan) [
11] performs one or several circular or elliptic scans; (ii) regression trees such as Spatial Oblique Decision Tree (SpODT) perform oblique cuts; and, iii) hierarchical Bayesian spatial modeling (HBSM) [
4] produces a real smoothing of the SIR. Many Bayesian applications have been already used in infectious diseases and cancer; they were able to distinguish random fluctuations from true changes in the incidence of the disease.
The first applications of SpODT were made in the field of infectious diseases; they contributed to the detection of spatial classes, for example, different risks of malaria in a Malian village [
12] and a spatial pattern of Buruli disease in Cameroon [
13]. In comparison with SaTScan and classification regression trees (CART), SpODT provided complementary information, and, in some cases, was more accurate [
12,
14]. However, SpODT has never been applied to detection of clusters of cancer cases.
Concerning the geographic distribution of cancer cases, spatial clustering seems to exist in lung, prostate, bladder, and colon-rectum cancers but this clustering depends mainly on the available data [
15‐
18]. Furthermore, the clustering of cancer in a given area may depend on factors such as the socioeconomic status [
19] and on unknown risk factor common to other diseases [
20].
The main objective of the present study was to compare empirically different cluster detection methods by assessing their abilities to find spatial clusters of cancer cases. Secondarily, the study aimed also to evaluate the impact of the Townsend index of socioeconomic status on cancer incidence.
Using global detection methods with data on four cancers, we sought first, for the presence of particular spatial patterns. Then, we compared the results with those obtained with local methods (SaTScan, SpODT, and HBSM) with and without taking into account a confounding factor; the Townsend index. Thereafter, we compared the abilities of the three approaches to estimate random changes in the incidence of each cancer. Finally, using a multivariate HBSM, we examined whether factors common to the four cancers could increase the reliability of the results.
Discussion
Different methods of spatial analysis suitable for cluster detection and epidemiological monitoring in small areas were used here to: i) describe spatial heterogeneity and autocorrelation; ii) evaluate the impact of heterogeneity on global spatial autocorrelation; and: iii) search for an effect of the socioeconomic status on geographical differences in cancer incidence by analyzing the overall spatial structure or detecting high-risk areas. More precisely, the work aimed at examining whether deprivation is an explanatory or a confusion factor of the spatial distribution of some cancers. This study highlights the importance of using both global [
42] and local methods of cluster detection taking into account heterogeneity [
43]. “Naive” spatial autocorrelation and heterogeneity were found only with prostate cancer data. The adjusted Moran’s I method [
43] detected mainly a spatial autocorrelation in lung and bladder cancer as well as in prostate cancer by taking into account the spatial heterogeneity. In all cancers, “true” Moran’s I value was greater than “naive” Moran I value. This shows that it is important to include small spatial units in the calculation of spatial units in the calculation of spatial test statistics to be able to detect spatial autocorrelation.
SpODT is an approach recently applied to spatial distribution of cancer risk. Like SaTScan, one advantage of SpODT is its ability to overcome the administrative boundaries; another is that its implementation does not require the use of a proximity matrix, which avoids the problems related to the choice of this matrix (as in HBSM) [
12]. According to the algorithm stopping criteria, these two methods require sensitivity analyses. With SaTScan, the optimal clusters were found after sensitivity analyses regarding the size of the window. In the previous literature, few authors have mentioned these sensitivity analyses or the search for optimal parameters. Depending on the user settings, the lack of a sensitivity analysis is not in favor of a method’s reproducibility [
44]. In disease mapping, Bayesian smoothing remains important because it allows taking into account spatial and non-spatial effects in risk estimation [
1,
4,
45]. In small-area studies, problems of robustness of the estimates can be overcome by the use of hierarchical Bayesian spatial modeling; this warrants a better understanding of the risk levels in spatial epidemiology. In a simulation study, Aamodt et al. [
32] have shown that BYM model is better than SaTScan for local cluster detection in case of high relative risks [
46]. Bayesian models allow both global and local detection through criteria such as the variance of the autocorrelation, heterogeneity components, and DIC. The multivariate disease mapping approach through joint modeling [
15,
34,
36,
37] provides also a considerable improvement of spatial analysis by including information on correlations between diseases and by reducing smoothing effects. The present study shows the specificities of each method that will be discussed according to the results by cancer type.
The case of colon-rectum cancer (where global methods found neither heterogeneity nor spatial autocorrelation) allowed an evaluation of local clustering methods. In this case, the use of a spatial model is superfluous; indeed, all the methods agreed on the absence of clusters of colon-rectum cancers. Only SaTScan detected clusters of high risk with RRs <1.5 when the analysis was carried out without covariate. Guttmann et al. [
7] have shown in a simulation that the performance of SaTScan increased with the size of the population. Likewise, in small areas when Commune is a proxy for patient exact location, our results corroborate those of Lemke et al., Jeffery et al., and Ozonoff et al. [
32,
47,
48]. These studies demonstrated that the power of detecting clusters with SaTScan decreased together with the level of spatial resolution. The shape of the cluster was also discussed by Goujon-Bellec et al. [
49] who found that the elliptic scan method seems more appropriate than the circular scan method in detecting clusters of rare diseases over large regions. With simulation studies, other authors, such Aamodt et al. [
46] have found that SaTScan was more efficient than BYM model in detecting clusters with relatively low relative risks. This was corroborated in colon-rectum cancer. The HBSM confirmed its ability to detect a homogeneous risk with colon-rectum cancer and seemed to be less affected by population size, spatial resolution, or cluster shape. Furthermore, the use of an additional covariate (here, the Townsend index) reduced greatly the performance of SaTScan in terms of specificity [
50].
In the case of prostate cancer, all the methods converged to the same conclusion. The global clustering methods found a spatial autocorrelation and a spatial heterogeneity and all the local methods showed coherent clusters. SaTScan failed to detect an effect of the socioeconomic status. SpODT as well as the univariate HBSM detected coherent clusters. Our results with prostate cancer data raised the problem of edge effects in local cluster detection as previously found by Johnson [
17]. An edge effect can be defined as an impact on the results of features specific to the boundaries of the study area, such as spatial censoring. Precisely, some subjects may not be observed because they are out of the study area and thus excluded from the spatial analysis [
4]. Indeed, the cluster of prostate cancer cases detected by SaTScan in the center of the area is probably erroneous because other clusters were also located at the boundaries of that area. Actually, Guttmann et al. [
42] have shown that false clusters are numerous when edge effects are important. To correct these effects, the area under sensitivity analyses may be extended to other neighboring areas (here, an extension from Département Isère to the whole Rhône-Alpes Region). The use of more homogeneous spatial units than the current Communes, such as the French “Ilots Regroupés pour l’Information Statistique” (IRIS), may also eliminate or reduce the edge effects [
51]. Little and Rubin have also proposed to solve this problems by the use of methods that consider the external areas as missing data [
4]. We may mention here that SpODT was able to detect more precise clusters than SaTScan, especially when the Townsend index was taken into account in presence of autocorrelation. Poisson model and HBSM found that larger socioeconomic inequalities decreased the incidence of prostate cancer. In fact, deprived patients are often diagnosed at symptomatic stages, a fact that has been precisely detected by SpODT in the Southwestern part of Isère. This should be kept in mind because, in deprived people, other cancers, such as skin melanoma and breast cancer are often diagnosed at advanced stages [
52].
In the cases of lung and bladder cancers, EBI showed “true” spatial autocorrelation while Moran’s I test failed to find autocorrelation. These results highlight the importance of taking into account the heterogeneity in small areas when attempting to identify the spatial pattern of a disease. Contrarily to SaTScan, SpODT did not find clusters of lung cancer. Lung cancer results showed that a lower DIC (with vs. without introducing the Townsend index into the model) has identified an effect for that index on the geographical variations of the incidence in terms of spatial heterogeneity. In lung cancer, the Townsend index influenced greatly the randomcomponent whereas, in the bladder cancer, it was spatial autocorrelation that influenced the spatial analysis. The Poisson model and the univariate HBSM coefficient have shown that, in these two cancers, the incidence increases together with the socioeconomic inequalities. In the specific case of lung cancer, the socioeconomic status seemed to be a surrogate for various lifestyle factors (e.g., alcohol/tobacco consumption). Thus, as in previous studies, the socioeconomic status should not be overlooked, as a risk factor, in examining lung cancer etiology [
53]. One, now classical observation, is that bladder cancer shares common risk factors with lung cancer (e.g., tobacco consumption). This was shown by Cassetti and al. [
18] in a spatial study in Umbria, Italy. The multivariate modeling found also a correlation between these two cancers (posterior mean estimation: 0.6). In terms of DIC, the multivariate BYM model was the best model. These results corroborate those of Martinez-Beneito [
20] who recommended multivariate disease mapping models to epidemiologists interested in the spatial variations of several diseases. Changes in the DIC with HBSM may thus be used to identify the most credible spatial model vs. other competing spatial competing models and detect the cluster of high risk. Indeed, we have used the DIC on Colonna and Sauleau [
5] updated data to choose the best univariate Bayesian model and found similar results. However, some covariates and spatial patterns may be mixed up with the random effects; their inclusion in a spatial analysis can lead to biased estimates of the fixed effects [
54].
Using CAR models, some authors such as Reich et al. or Hughes and Haran [
55,
56], advise the use of a model without confounding random effects even if its DIC is greater than that of the usual spatial model when the goal is to study the association between any covariate and the disease under study. Here, in all the approaches we used, we did not check the existence of spatial confounding.
Limitations
In this empirical assessment of the efficacy of cluster detection methods, the results were consistent across all methods only in the case of prostate cancer. This raises questions in terms of power and precision of spatial cluster detection methods and suggests that power and precision would increase together with the event rate. However, checking both these hypotheses and assessing the efficacy of the discussed methods in other plausible epidemiological situations require analyses conducted in a systematic way. These limitations could be solved properly by simulation studies.
Acknowledgements
The authors thank the Cancer Registry of Isère for the access to the data. They also thank Jean Iwaz (Hospices Civils de Lyon, France) for the thorough revision of the final manuscript.