Background
Mortality, an important outcome in epidemiological studies, generally has to be ascertained over long follow-up periods. This can be achieved either via prospective active follow-up, which is labor intensive, expensive and potentially biased due to losses to follow-up, or via linkage to a regional or national death registry, which has become more frequent due to the electronic availability of registry data [
1‐
6]. Incomplete enumeration of persons in a census, undocumented migration and data errors can, however, lead to incomplete linkage and incomplete mortality follow-up; which in turn might introduce bias in analyses of all cause and cause-specific mortality rates and determinants of mortality [
7,
8]. Incomplete mortality ascertainment leads to an underestimation of mortality rates mainly because the total number of deaths is too small (not all deaths are counted) and because the total person-time is too large (person-time under observation is not stopped without a date of death).
When the focus is on cause-specific mortality rates (e.g. site-specific cancer mortality) additional issues relating to the cause of death classification need to be considered. Changes in cause of death coding policy, for example switching from one version of the International Classification of Diseases (ICD) to another, can affect the time trends of cause-specific mortality rates, as previously documented for respiratory diseases, circulatory diseases and cancer [
9‐
14]. In older age-groups, where mortality is highest, both unascertained deaths and coding changes may dramatically affect absolute rates.
We investigated the bias introduced by incomplete ascertainment of deaths and changes in coding in the Swiss National Cohort (SNC) [
15,
16], a census based cohort study where mortality ascertainment is performed via linkage to the national death registry with about 95% completeness. We included the unlinked deaths using a pragmatic linkage algorithm and used Poisson regression models to account for changes in Swiss Federal Statistical Office (SFSO) coding of causes of death.
Methods
Swiss National Cohort (SNC)
The anatomy of the SNC has been described in detail elsewhere [
15]. Briefly, the SNC is a longitudinal study of the entire resident population of Switzerland, based on national census information. The SNC includes 6.8 million people at the census 1990 and 7.3 million at the census 2000. Regularly updated mortality and migration files are linked with the census 1990 and 2000. In the period 1991–2000 621,389 death certificates were recorded by the national death registry at the SFSO and 432,004 certificates were recorded in the period 2001–2007 for a total of 1,053,393 deaths. In the absence of a unique personal identifier, both deterministic and probabilistic methods of record linkage based on sex, date of birth, marital status, religion, nationality, place of residence and other variables when available (e.g. date of birth of mother or spouse) were used. If the census and death record that refer to the same person are recorded several years apart, then place of residence, marital status and nationality could have changed and will disagree on the two records. Linkage will be less successful, depending on the level of changes in these characteristics. Causes of death were coded at the national death registry of the SFSO according to the eighth revision of the ICD (ICD-8) until 1994 and according to the 10th revision (ICD-10) since 1995. Ethical approval was obtained from the Ethics Committees of the Cantons of Zurich and Bern.
Unlinked deaths
Among the 1,053,393 deaths recorded between 5
th December 1990 and 31
st December 2007 56,413 (5.4%) could not be linked to a census or migration record. Deaths relating to persons born between censuses were not considered as unlinked (e.g. a 1998 death of a child born in 1994 was not linkable to the SNC population because the child was born after census 1990 and died before census 2000). Deaths that could not be linked were younger at death, less likely to be Swiss nationals and more likely to be women and single, as described in detail elsewhere [
15].
We implemented a pragmatic two-step procedure to allocate unlinked death records to census records. We applied rules to prevent impossible matches, for example when attributing deaths with a gender specific cause of death (e.g. prostate or breast cancer). In a first step we used the following procedure to allocate unlinked deaths to census records: death and census record matched on gender, canton of residence, nationality, age (same birth date or maximally 3 months apart), civil status (identical or plausible change, such as married at census and widowed at time of death). If more than one census record fulfilled these criteria, we randomly allocated the death to one of them. If no census record was found, we used less stringent matching criteria in a second step: gender, region (Central, Eastern, Zurich, the Espace Mittelland, Lake Geneva, Northwestern, or Ticino) and birth date within one year. We again randomly selected one census record that matched the death record on these criteria.
Official mortality rates and SNC rates including and excluding unlinked deaths
We first calculated age- and gender-specific official cause-specific mortality rates by dividing all deaths of a specific cause of death recorded in Switzerland by the official midyear population data from the SFSO for each year of the period 1991–2007 (for males and females and 10 year age categories up to age 84 and a final category of the 85+ age group). These rates are hereafter referred to as reference rates.
We then calculated age- and gender-specific mortality rates based on the SNC (hereafter SNC rates), measuring time from the date of the census (5th December 1990 or 5th December 2000) to either the date of death, date of emigration, or 31st December 2007, whichever came first. We calculated the total person-time separately for each calendar year 1991–2007, gender and age-group and divided the corresponding number of deaths by the number of person-years. We did calculations both including and excluding the unlinked deaths.
We show results for selected causes of death: deaths from all causes and for all cancer causes (ICD-8: 140–209, ICD-10: C00-C97), all cardiovascular causes (ICD-8: 390–429, ICD-10: I00-I52), and suicides (ICD-8: E950-E959, ICD-10: X60-X84). As over 50% of deaths occur in the age-groups 75–84 years and 85+ years, we provide descriptive statistics for the percentage difference between the two versions of SNC rates (excluding or including unlinked deaths) and the reference rate for age-groups 75–84 years and 85+ years.
Accounting for change in official cause of death coding policy
In Switzerland and elsewhere the underlying cause of death on the death certificate is defined as “(a) the disease or injury which initiated the train of morbid events leading directly to death, or (b) the circumstances of the accident or violence which produced the fatal injury” [
17] and is generally considered the most meaningful cause from a public health standpoint. Although the notion of the underlying cause of death appears to be straight-forward, the determination of the sequence of causes may be difficult when a number of diseases and conditions are involved. The reporting physicians can list up to four additional diseases related to the death of the person. This information is used by the SFSO to assign the official cause of death. Through 1994 the SFSO official cause of death coding policy used ICD-8 combined with internal rules giving priority to some causes (accident, poisoning or trauma; influenza; cancer). In 1995, SFSO changed to ICD-10 and decided to strictly follow ICD coding [
14]. A sudden change in mortality rates between 1994 and 1995 was observed, most pronounced in cancers with long survival (e.g. breast and prostate cancer) [
14,
18]. For example, from 1995 onwards the mention of breast cancer on the death certificate of an elderly woman resulted less often in breast cancer being the official cause of death than in the preceding years [
18].
We used Poisson regression models that included a variable to account for the change in rates resulting from the 1995 change in coding of causes of death. We modeled the natural logarithm of the number of events and included the natural logarithm of the person-time at risk as a fixed offset [
19]. The dataset consisted of records for each calendar year between 1991–2007 with the number of deaths (all cause or cause-specific) and the person-time at risk calculated from the SNC for males and females for a specific age category. We included restricted cubic splines using predefined equally spaced connecting knots at 1990, 1995, 2000, 2004 to flexibly model time trends of absolute rates [
20,
21]. These models allowed estimating absolute mortality rates with 95% confidence intervals (95% CI) for the years before 1995 as if the post-1995 official cause of death coding policy had been used during the earlier years. In addition the estimated parameter for the sudden change in official cause of death coding policy can be understood as a multiplication factor with which the rate calculated in the year 1994 would need to be multiplied to be comparable to rate calculated in the year 1995. We illustrate the impact of the change in coding policy for breast cancer (ICD8: 174–175, ICD10: C50), prostate cancer (ICD8: 185, ICD10: C61), all cancer causes (ICD8: 140–209, ICD10: C00-C97), and suicides (ICD: E950-E959, ICD10: X60-X84) for age-groups 75–84 years and 85+ years. We also present the estimated multiplication factors and their 95% CI.
Hazard ratios by education, marital status and nationality
We analyzed the association of education, marital status, and nationality with all cause, all cancer, all cardiovascular, and suicide mortality using multivariable Cox regression models. We investigated how estimated hazard ratios (HR) differed if we included or excluded unlinked deaths in the analysis. In addition to education, marital status and nationality all models included the categorical variables language region, religion, and degree of urbanization of the place of residence. All analyses were done using Stata 11.1 and 12.1 (StataCorp, College Station, Texas).
Discussion
Mortality rates calculated in the SNC, a large population-based study with mortality follow-up ascertained through probabilistic record linkage, showed substantial differences when compared to official mortality statistics from the Swiss Federal Statistical Office (SFSO) as illustrated for all cause, all cancer, all cardiovascular, and suicide mortality. The discrepancies were removed after including the initially unlinked deaths through pragmatic linkage that only required matching for gender, age in years and geographical region but not community of residence. The lower levels of agreement of information on census and on death certificate for key variables showed that this method of allocating unlinked deaths resulted in much less reliable links than the initial more refined SNC linkage.
Changes in official cause of death coding policies must be accounted for when describing time trends of cause-specific absolute mortality rates. We achieved this by incorporating a specific parameter for the change in official cause of death coding policy in Poisson regression models with flexible restricted cubic splines to model time trends [
20,
21]. This allowed us to quantify the impact of the change in Switzerland and to estimate a multiplication factor by which cause-specific mortality rates in the years preceding 1995 would need to be multiplied to be comparable to those from 1995 onwards while flexibly accounting for existing time trends. Our approach integrally quantifies a sudden change in cause-specific mortality from 1994 to 1995. With our method it is not possible to disentangle the effect of the change in ICD coding form other possible causes for mortality changes occurring at the same time. Still, the interpretation of this multiplication factor is similar to the comparability ratio which has been estimated in bridging studies in the US and UK for the change of cause of death coding from ICD-9 to ICD-10 [
10‐
13]. The comparability factor was estimated in two steps, first coding the same death certificates by both coding systems and then by dividing the number of deaths due to a certain cause (e.g. prostate cancer) as classified by ICD-10 by the number of deaths due to this cause as classified by ICD-9 [
10‐
13]. Similar to our multiplication factor, the comparability ratio may be used to adjust cause-specific mortality rates classified by the earlier coding system for comparison with cause-specific mortality rates classified under the later coding system [
10]. In the US and the UK comparability ratios clearly different from 1 were observed for deaths due to pneumonia with values of 0.70 for the US and 0.62 for England and Wales [
10,
12]. In contrast to the Swiss situation with multiplication factors of less than 0.9 for breast and prostate cancer in the 85+ age group, comparability ratios for breast (1.01 in US, 1.03 in England and Wales) and prostate cancer (1.01 in US, 1.04 in England and Wales) were close to 1 in the US and in England and Wales, with hardly any variation across age groups [
10,
11]. Variation of the comparability factor across age groups was however observed for deaths due to ischemic heart disease and myocardial infarction in England and Wales, with 0.946 for deaths in women under 75 years of age and 0.894 for women aged 85 years and older [
13]. In Switzerland, no such bridging studies were conducted.
We examined hazard ratios to gain an understanding of the potential impact on results when including the pragmatically linked deaths in analyses of the SNC. We considered various outcomes (all cause, all cancer, all cardiovascular, and suicide mortality) and several independent variables (education, marital status, and nationality). These analyses reflected common mortality outcomes and important socio-demographic determinants of mortality. In all these analyses hazard ratios were very similar when including or excluding the unlinked deaths, regardless of the chosen outcome. As Greenland et al. explain [
22], in some situations measurement error in the form of non-differential misclassification of a binary outcome variable (e.g. death yes/no) does not result in biased risk ratios. This happens when specificity of outcome assessment is 100% and sensitivity is the same across exposure levels. Including deaths linked to census records with perfect agreement on several identifying variables will result in a high specificity (close to 100%) of outcome ascertainment, but errors in identifying information such as marital status or community of residence will result in a sensitivity below 100%.
In the SNC, the proportion of initially unlinked deaths varied somewhat by educational attainment, marital status and nationality. Sensitivity of outcome ascertainment was thus not the same across exposure levels and one would expect that hazard ratios for these exposures might be biased [
23]. By including the pragmatically linked deaths we improve sensitivity but also reduce specificity of outcome ascertainment, which also will bias results from survival analyses if sensitivity and specificity vary by levels of exposure. The way we included the initially unlinked deaths guarantees that the links are correct with regard to age (within 1 year) and sex and region of residence within Switzerland, and no bias is therefore to be expected for these exposures. In the initial and the additional pragmatic linkage we could not match on education, a powerful predictor of mortality [
24‐
26] because education is not recorded on the death certificates. Therefore we cannot know whether sensitivity and specificity of mortality ascertainment in the SNC varied by educational level. However, the very similar results when including or excluding the initially unlinked deaths in the models for education can be interpreted in two ways. First that the level of unlinked deaths was so low that results could hardly been affected when including them, or second that the unlinked deaths did not importantly change sensitivity and specificity of mortality ascertainment by educational level.
Our study has several strengths and limitations. The main strength is that the rates and models were based on one of the largest longitudinal datasets worldwide [
15] and included a long follow-up period (17 years). Several limitations result from the SNC’s reliance on routine mortality data for outcomes. First, the official underlying cause of death might not be 100% accurate. This limitation is common to all studies that rely on cause of death information provided by a national death registry. The underlying cause of death describes the “disease or injury which initiated the train of morbid events leading directly to death”, or “the circumstances of the accident or violence which produced the fatal injury” [
17] and its determination may be difficult for deaths in which a number of diseases and conditions are involved. A further limitation might be that mortality rates for immigrants and foreigners may be under or over estimated because of informative censoring. This could happen if older individuals tend to return to their countries of origin after retirement and if returning to the country of origin is prognostic for death. This bias would also affect the official mortality rates for persons of foreign nationality reported by the SFSO. The extent of this potential bias cannot be assessed because mortality follow-up of persons moving out of Switzerland is not possible.
Acknowledgements
This manuscript contains original material not previously published supported by funding from the Swiss National Science Foundation (grant number 3347C0-108806) and Oncosuisse (grant number OCS-02288-08-2008). The members of the SNC Study Group are Felix Gutzwiller (Chairman of the Executive Board), Matthias Bopp (Zurich, Switzerland); Matthias Egger (Chairman of the Scientific Board), Adrian Spoerri (Bern, Switzerland); Nino Künzli (Basel, Switzerland); Fred Paccaud (Lausanne, Switzerland); and Michel Oris (Geneva, Switzerland). We also thank the Swiss Federal Statistical Office, whose support made the SNC and these analyses possible.
Competing interest
The authors declare that they have no conflict of interest. The corresponding author had full access to all the data in the study and had final responsibility for the decision to submit for publication.
Authors’ contribution
MZ conceived the additional linkage of unlinked deaths and the use of splines with ICD coding parameter in the Poisson regression model for the analysis of time trends, and finalized the manuscript. KS did the additional linkage, the data analysis and the writing of the first draft. CK, AS and ME contributed to the writing of the manuscript. All authors contributed to the interpretation of the results and approved the final version of the manuscript.