Background
Urinary tract cancers comprise primarily cancers of the urinary bladder and kidney, the former accounting for approximately two-thirds of all cases diagnosed. Bladder cancer is the ninth most common type of cancer worldwide (~360,000 cases per year) and the 13
th most common cause of death from cancer (~145,000 deaths per year worldwide) [
1,
2]. Kidney cancer is comparatively less common, ranking twelfth and accounting for an approximate 150,000 new cases and 78,000 deaths annually [
3,
4].
Internationally, the incidence rates for bladder and kidney cancer have been reported to vary by as much as ten-fold between countries. Incidence tends to be higher in Southwestern Europe, North Africa (Egypt) and North America; and lower in South America and Asia [
1,
4,
5]. Parkin [
2] reports the highest estimated mortality rates to be in Egypt, where the world-standardized rate of 34 per 100,000 (in men) is more than three times higher than the highest rates in Europe (Denmark 10.4, Spain 9.7) and eight times that in the United States (US) (3.4).
Several countries show increasing incidence for both bladder and kidney cancers, although with evidence of some stabilization or even decreases during the 1990s [
2,
4]. Recent trends in stage-specific incidence rates for bladder cancer in some US populations, suggest however, that rates may be stabilizing in late stage disease but continue to increase in noninvasive predominantly low grade disease [
6]. Regardless of space, time or stage at diagnosis, rates are consistently higher for males than females [
4,
5,
7‐
9]. In fact, in most developed countries, men are at least, a three to five time greater risk than women.
Past variations in the prevalence of known etiological factors, whether genetic, environmental, occupational or behavioural, may to some extent, contribute to the reported temporal and geographical variations of urinary tract cancers among populations worldwide. In addition, differences in the scope of case ascertainment between national cancer registries may result in some countries reporting solely invasive diagnoses while others may include non-invasive or in situ diseases. Some countries count only one primary cancer in subjects with multiple cancers in the urinary tract. In the Netherlands, such practice is thought to reduce the reported incidence of bladder cancer by up to 10 % [
2]. Finally, variations in rates within and/or between countries can be partly driven by the introduction of new imaging techniques enabling the detection of pre-symptomatic tumours.
In Canada, bladder cancer incidence rates increased from 1970 to 1981 and have since gradually declined or stabilized [
10‐
12]. Kidney cancer incidence rates have also stabilised in recent years among females, but continue to increase at a rate of about 1.3 % among males [
10,
11,
13,
14]. Rates of both bladder and kidney cancer are particularly high in Nova Scotia (NS), a province of 900,000 people, in Atlantic Canada. NS consistently has some of the highest rates of cancer in Canada for both males and females and continues to show increases in the age-standardized incidence rates of both bladder and kidney cancers. For bladder cancer, age-adjusted incidence rates estimated for 2015 exceed those of the national average by about 25 and 30 % among males and females, respectively [
11]. Similarly, for kidney cancer, excesses of 30 and 45 % have been reported among males and females, respectively. This noted excess burden of urinary tract malignancies in NS is unlikely to result from health system related factors (e.g. scope of case registration, imaging technology) given the relative uniformity of health care delivery within the country.
This study thus, describes spatial and spatio-temporal variations in the risk of bladder and kidney cancer for NS in order to identify those areas where rates are higher than what would be expected given the prevalence of known risk factors. This is an important step to guide both etiological research and public health interventions in the province. We use two geospatial methods for modelling disease risk, both of which are appropriate for low-density populations such as NS. The first approach is a Community-level analysis using a spatial autogregression (or Besag, York and Mollie model), a Bayesian method that models diseases risk for spatially aggregated case counts [
15,
16]. The second approach estimates spatially continuous variation in risk using a Local Expectation Maximization (local-EM) smoothing algorithm, an emerging geostatistical method developed by Fan, Stafford and Brown [
17], which models spatial and temporal variation in risk when cases are aggregated to time-varying spatial boundaries. To our knowledge, this is the first attempt to model the risk of bladder and kidney cancer in NS and one of the first epidemiological applications of the Local-EM algorithm for cancer mapping in Canada.
Methods
Data sources
The Community-level (BYM) analysis was restricted to Cohort 1. This is because the proportion of cases with incomplete residential addresses (i.e. civic street address) was fairly large prior to 1998. During those early years, most cases were assigned to a town or a six-digit postal code, which vary greatly in size, especially between urban and rural settings. Depending on the spatial scale of analysis, one postal code may belong to several geographic units or one unit of geography may contain several postal codes, resulting in the potential misclassification of the spatially aggregated data. The spatially continuous-grid based (local-EM) analysis was able to accommodate data from the entire 30 year period (Cohort 2) because the method allows for both changes in the spatial distribution of risk over time, and accounts for uncertainties in location of cases where civic street addresses are missing but postal codes or administrative regions are known.
The Nova Scotia Civic Address File (NSCAF) was used to assign spatial locations (i.e. longitude-latitude coordinates) to all cases for which a civic street address was available. When civic address was unavailable, the Desktop Mapping Technologies Inc (DMTI) conversion file was used to geo-reference postal codes. For the Community-level model, where postal code was unavailable or located in rural areas, a gazetteer of place names was used to georeference the centroid of the town. For the spatially-continuous local-EM, where postal code was available, cases locations were treated as spatially censored somewhere within one of the census regions containing at least one address with the postal code in question. Where postal code was unavailable, the local-EM analysis used the Census Division boundaries as a second type of spatial censoring. Proportions of case by spatial data type, including the numbers of cases excluded from each analysis due to uncertainty in their spatial location, are shown in Table
1.
For the modelling of risk using the spatial autoregressive model, population estimates were aggregated at the Community level, a set of geographic administrative units, which represent groupings of neighbourhoods with a degree of shared identity and social processes [
18]. This level of spatial aggregation represents the finest unit of geography for which boundaries are stable over time. There were 311 Communities in NS over the study period with population counts up to 30,900 persons. In total, 36 Communities (30 First Nations Communities and 6 wilderness and park Communities) were excluded due to unavailable population information.
The spatially-continuous (local EM) analysis used population counts by age and sex group at the finest level of geography for which digitized spatial boundary data were available. These were census subdivision level (CSD) for the 1981 and 1986 census years; enumeration areas (EA) for the 1991 and 1996 census years; and dissemination areas (DA) for census 2001 onward. There were 113 CSD in 1981 and 118 CSD in 1986. The number of EA/DA ranged from 1379 to 1645 between the 1991 and 2011 census periods; their size varied to target a population of 400 to 700 individuals.
It was assumed that populations were uniformly distributed within these finest levels of census regions, a not unreasonable assumption if one accepts that these census regions generally follow physical boundaries, such as major streets and waterways, and are designed to be fairly homogeneous. An exception is regions which are indicated by Statistics Canada to be partially uninhabited, or lying outside the population ecumene, in which case the population is assumed to be homogeneously distributed within the inhabited portion.
Data analyses
The Besag York and Mollié (BYM) model (see [
15,
16]), a popular and convenient spatial autoregressive model for count data referenced to discrete spatial regions, was used to perform Community-level analysis. The approach treats the case counts by Community as response variables, rather than Standardized Incidence Ratios (SIR), because the latter is unstable when computed from low counts. This is particularly important in this study due to the low population density of NS and the rarity of the health outcomes measured. Possible spatial dependence in the data, with pairs of nearby Communities tending to be more similar than Communities situated far apart, is accounted for with the inclusion of a spatially autocorrelated random effect term. The BYM models the case counts as Poisson distributed and supports Baysesian inference for model fitting, which in this study, was performed separately for each data set (bladder male, bladder female; kidney male, kidney female) using Integrated Nested Laplace Approximations [
31]. Further details pertaining to this analytical approach are described in Additional file
1.
Spatially-continuous analysis
The local-EM kernel smoothing was used to perform the spatially-continuous analysis. The method developed by Fan, Stafford and Brown [
17] was extended by Lee et al. (Lee J, Nguyen P, Brown P, Stafford J, Saint-Jacques N: Local-EM Algorithm for Spatio-Temporal Analysis with application in Southwestern Nova Scotia. Submitted in
Ann Appl Stat; [
32]) to accommodate the requirements of modelling the cancer incidence data presented here. Collected between 1980 and 2010, the data were subject to aggregation boundaries changing over time and were geocoded with varying degrees of precision. Exact spatial locations were derived from full residential civic street addresses for most of the recent cancer cases, though the proportion of cases spatially referenced with partial street address (i.e. postal codes) or with census regions, increased with the age of the data. Where exact location is unavailable, the local-EM kernel smoothing algorithm produces an optimal risk surface which averages out all the possible locations at which each case could be located. The bandwidth of the smoothing kernel is chosen by cross-validation (see Additional files
2 and
3) and determines the degree of smoothing in the risk surfaces. A detailed description of the methodology is contained in Lee et al. (Lee J, Nguyen P, Brown P, Stafford J, Saint-Jacques N: Local-EM Algorithm for Spatio-Temporal Analysis with application in Southwestern Nova Scotia. Submitted in Ann Appl Stat) and in Nguyen et al. [
32], and summarized in Additional file
1.
In this study, local-EM analyses focused on two regions of the province which the BYM models suggested risk was particularly high, as to describe localized patterns in risk. Two models were applied: (1) a spatial model testing for significant variation in risk over space, and where a spatial effect was detected; (2) a spatio-temporal model was applied to determine whether risk also varied significantly over time. Maps were produced where statistically significant spatial or spatio-temporal effects were detected. Estimated risk surfaces based on local-EM are not presented to minimize risk of disclosure of personal health information. Rather, a p-value for testing for relative risk being lower than 1.1 (risk less than 10 % above the population average) at each location and time is presented. These p-values were computed with a parametric bootstrap, with 100 synthetic datasets simulated with a constant relative risk of λ(s,t) = 1.1 and for each s and t the p-value is the proportion of these datasets where the local-EM algorithm yields risk estimates exceeding the estimate produced by the data. Shown are exceedance probabilities, or one minus the p-values, which are large when risk is believed to exceed 1.1.
The software used was R version 3.1.1 (
http://www.r-project.org) in combination with the
disease mapping package [
33] and the INLA software [
34]. This study received ethics approval from Capital Health Research Ethics Board. The study was a secondary analysis of anonymised cancer registry data obtained from the NS Provincial Cancer Registry and a waiver of consent was approved.
Discussion
Summary of findings
This study showed evidence of spatial variation in the risk of bladder and kidney cancer in Nova Scotia. Posterior summaries for regression and variance parameters suggested that much of the heterogeneity in risk related to unmeasured risk factors. High risk areas for bladder cancer were predominantly distributed along a southwest to northeast gradient. Kidney cancer risk followed a similar distribution, although areas of elevated risk were also detected in various northeast Communities of Cape Breton, for both genders. Focusing on aggregated spatial units (Communities), the study showed that areas identified to have high probability of exceedance (BYM: Pr[exp(U
i
) >1.1|data] > 80 %) in the risk of male (28 Communities) or female (9 Communities) bladder cancer had 33 % (males) and 52 % (females) more cases diagnosed over the 12 year period, compared to the number of cases expected. Similarly, high risk areas for male (11 Communities) or female (8 Communities) kidney cancer had 52 % (males) and 57 % (females) more cases diagnosed than expected. From a public health perspective, this translates in an excess of nearly 200 urinary tract cancer (UTC) cases (150 bladder; 45 kidney) being diagnosed in those high risk areas where the estimated risk was observed to be at least 10 % above the NS average rate. Over a 12 year period, this corresponds to an additional 16 UTC cases annually, a conservative figure given that exceedance probabilities in excess of both 80 % and 95 % had much larger spatial extent when derived from the spatially-continuous analysis than with the Community-level model. This was true for risk measured in either sex or cancer site. Focusing on localized spatial patterns, this study also highlighted significant spatial and spatio-temporal variations in the risk of male bladder cancer within SW NS, with areas of elevated risk along the Fundy shore and south shore of the region. Elevated risk of both, male and female kidney cancer were also observed along the south shore of SW NS. In addition, risk for both male bladder and kidney cancer varied significantly in CB, although areas of elevated risk did not always overlap. Overall, spatial patterns were generally stable over time.
Interpretation of spatial patterns
Patterns of spatiotemporal heterogeneity in risk provide clues to the occurrence and influence of extrinsic factors involved in the rise or fall of a disease. In this study, patterns of spatial variations in bladder and kidney cancers risk were stable over time, suggesting persistent risk exposure. The exception being male bladder, for which the results pointed to a temporal effect. However, the pattern of spatial variations in risk remained stable over a 13 year period, possibly also reflecting persistent effects. Similarly, a study of space-time patterns of bladder cancer incidence in Utah, US, detected high risk areas that were persistent over time [
35]. These high relative risk areas were subsequently found to be associated with the presence of Toxic Release Inventory sites, where the risk was observed to range between 1.14 and 1.82 for both genders combined and between 1.12 to 1.47 for males only. While the processes generating the elevated risk in NS are unknown, the magnitude of the estimated risk in high risk areas for NS was similar to that reported in Utah, ranging between 1.24 – 1.56 and 1.38 – 1.69 among males and females, respectively based on BYM and between 1.48 – 1.99 and 1.48 – 1.95 among male from SW NS and CB, respectively, when based on local-EM. The latter tighter lower bounds of the estimates are attributable to the more conservative rule of exceedance probability applied in NS (NS:
P
i
(10 %) > 0.8 and
P(s;10 %) > 0.8; Utah:
P(exp(
s
i
) >1.0|data) > 0.8) for the determination of high risk areas. Both studies suggest an increased effect in females.
Several factors affect the incidence of urinary tract cancers worldwide. Exposure to tobacco smoke, occupational toxins and environmental source of heavy metals such as arsenic in drinking water, are amongst well established risk factors for bladder cancer, in particular, transitional cell carcinoma which account for 90 % of the bladder cancer cases diagnosed in developed countries [
5,
7,
19]. Tobacco smoking [
5,
9,
36‐
41] and long-term exposure to high levels of arsenic in drinking water also increase kidney cancer risk [
19,
42] along with obesity [
38,
43,
44], hypertension [
38], the use of phenacitin-containing analgesics and exposure to trichloroethylene and polycyclic aromatic hydrocarbons [
38,
45‐
47]. Whether measured independently or synergistically, the magnitude of influence of these risk factors for the development of UTC varies. However, meta-analyses of over 30 years of epidemiological studies suggest, for instance, that tobacco smoking could increase the risk of bladder and kidney cancer by at least 270 and 50 %, respectively, in current smokers compared to non-smokers [
37,
48]. Exposure to arsenic in drinking water shows effects of similar magnitude, increasing the risk of bladder cancer by about 40 %, 230 and 310 % at levels exposure of 10, 50 and 150 μg/L, respectively [
19]. Obesity has been reported to account for 30–40 % of kidney cancer cases in Europe and the United States; and is known to increase the risk of renal cell carcinoma in a dose–response fashion [
12,
49]
In this study, residual spatial variation and resulting probabilities of exceedance for bladder and kidney cancer risk suggest that smoking is not the only factor contributing to the observed spatial patterns. This is because the proxy measures of smoking included in the analyses (i.e. social and material deprivation indices) did not change the spatial variations in risk or its magnitude. As well, the heterogeneity in bladder and kidney cancer risk observed in high risk areas was greater than what could be accounted by known spatial variations in smoking prevalence in Nova Scotia. Nonetheless, synergistic relationships between smoking and other un-measured risk factors cannot and should not be ruled out. This is especially important in Nova Scotia, a province known for its high prevalence of tobacco smoking [
50], obesity [
51] and where inorganic arsenic in drinking water was observed to be a major contributor to arsenic body burden in a study population [
52]. Overall, the two spatial approaches used to model disease risk provided consistent and complementary results. Inclusion of a time-varying component in the spatially-continuous models permitted the determination of whether high average risk in a given location was sustained over time or changed over time; two different situations that could be derived from the same number of accumulated cases in an area over a set time period. As described by Abellan et al. [
53], the epidemiologic interpretations of these two situations are important. In one scenario, spatial patterns are more likely to occur in a constant manner over time and hence could be induced by environmental or socio-demographic risk factors that act in a sustained manner. In the second scenario, the rate of case accumulation may be more temporally clustered with distinct variability, possibly reflecting emerging short-latency risk factors that would generate high excess cases in shorter time intervals or, alternatively, due to artificial or sudden variations associated with changes in disease coding or screening practices (see details in Abellan et al. [
53]). Hence, it would not be unreasonable to suggest that the observed heterogeneity in the spatial distribution of high-risk areas for bladder and kidney cancer in both SW NS and CB, support a scenario in which risk factors act in a relatively sustained manner over time.
Strengths and limitations
This study has important strengths. First, it is based on 30 years of cancer incidence data obtained from a population-based cancer registry adhering to registration standards of both the Canadian Cancer Registry and the North American Association of Central Cancer Registries. Those standards allow for consistency in disease coding over time and; ensure case ascertainment and completeness through a network of activities including automated and manual edit processes, record linkages and data audits. In addition, the systematic collection of spatial information at time of diagnosis enabled 100 % of cases in Cohort 1 and 95 % of cases in Cohort 2 to be successfully geo-referenced with a high degree of certainty, thus minimizing location misclassification (Cohort 1, ~ 85 % exact location; Cohort 2, ~ 50 %). Second, the two statistical methods used in this study accounted for spatial dependence (random effects) in risk estimates which reduce the likelihood of Type I error – declaring an area as having elevated risk when in fact its underlying true rate equals the background level [
54]. Third, the exceedance probability rules,
P
i
(10 %) > 0.8,
P(s;10 %) > 0.8 and
P(s,t;10 %) > 0.8, used here to classify spatial risk has high specificity even when data are sparse, further reducing the risk of false alarms, although perhaps increasing the likelihood of Type II error – declaring an area as having average risk when in fact its underlying true rate is elevated relative to background levels [
54]. Fourth, the application of the local-EM algorithm treated risk as a continuously varying process in space and time and so was not constrained to be within arbitrary administrative boundaries which often change between census periods [
52]. This allows for the integration and use of irregularly aggregated or point-location data within a single framework and minimizes loss of information. It presents a real advantage for the estimation of disease risk in small-area analyses or for rare diseases that requires the monitoring and accumulation of cases collected over a long time period as it maximizes statistical power and results in more meaningful inference [
55]. As such, it is reasonable to suggest that applying the Local-EM framework improved the sensitivity of the study, offering a balance to the Community-level autoregressive model, a more conservative approach with generally lower sensitivity (see [
54,
55]. Finally, modelling the spatio-temporal variation in risk with local-EM algorithm provided useful insights about the stability of the estimated spatial patterns of disease. It also produced predictions that were generally less spatially smooth, and as such, is a more sensitive tool for the detection of localized areas of elevated risk, which ultimately better informs health service planning, public health interventions and resource allocation.
Nonetheless, this study has limitations. First, location at time of diagnosis was used as a surrogate for the location where a person was thought to be exposed to factors which increased their risk of cancer. This is a common approach in the geographic analyses of many disease outcomes given the difficulty of obtaining a full history of residence and building estimates of lifetime exposure. The consequent exposure misclassification can result in less informative maps that impedes hypothesis generation or identification of environmentally or sociologically driven processes occurring over long time periods. Second, individual-level information on important risk factors such as smoking frequency and duration was not available as cancer registries do not routinely collect information unrelated to patient care. This study used neighbourhood social and material deprivation as a proxy for smoking prevalence. As a result, it is possible that maps of posterior means relative risks include some residual confounding due to smoking. Third, current algorithms for local-EM estimation do not allow for the inclusion of covariates. Fourth, the method is computationally intensive. Finally, although the local-EM analyses benefited from the inclusion of cases diagnosed over a longer time period, when reporting for the Cape Breton region, the number of cases was still quite low, which resulted in unstable results. This was particularly evident when determining optimal spatial and temporal bandwidths in females risk for which incidence counts was about 1.5 to 3 times lower than for males.
Conclusion
Modeling the geographical distribution of disease within a population is essential to public health surveillance. It permits the quantification of the risk of disease relative to expected background levels, and the identification of unusually high and low risk areas which can guide health service planning, public health intervention and resource allocation. The current approach further permits the estimation of residual spatial dependence resulting from exposure to unmeasured risk variables, and as such, helps identify areas where other etiological factors may be at play. In this study, spatial analyses demonstrated evidence of spatial heterogeneity in the risk of both bladder and kidney cancers in Nova Scotia. The temporal component of the spatially-continuous approach permitted the determination of the relative time scales of high average risk in a given area and hence provided an understanding of the stability of the spatial patterns of the estimated risk; and the generation of hypotheses about the nature of possible exposure. Based on this information, we suggest that the excess bladder and kidney cancer risk for both male and potentially, female in south-western NS may be driven by exposure to unknown risk factors that act in a sustained manner over time. Further research may uncover the nature of these factors and lead to future opportunities for disease prevention.
The findings from this study warrant further investigation in three main areas. First, further work is required in the area of exposure modeling in order to elucidate the potential factors driving the observed patterns of variations in the risk of UTC in NS. Second, they highlight the need for the development of local-EM methods that incorporate individual- and neighborhood-level covariates. Finally, they reaffirm the need for the establishment of a public health platform that would enable the collection of individual- and/or neighborhood level information relating to disease causing-risk factors, such as behavioural, occupational and environmental factors. Such information permits more accurate quantification and understanding of disease risk.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
NSJ extracted the cases files; georeferenced cases; conducted all analyses relating to BYM application, constructed tables and figures, drafted and revised the manuscript; JL modified existing Local-EM methods to incorporate temporality, carried-on all work relating to local-EM based analysis; PB devised the study, drafted section describing Local-EM methods, reviewed the article critically for important statistical content, provided assistance in the interpretation of the results, supervised NSJ and JL for statistical work; JS assisted JL in developing the local-EM methodology, supervised JL, reviewed the article critically for important statistical content; LP devised the study, reviewed the article critically for important intellectual content and provided assistance in the interpretation. TD devised the study, supervised the overall work, reviewed the article critically for important intellectual content and provided assistance in the interpretation. All authors read and approved the final manuscript.