Background
Chronic conditions were estimated to account for 63% of 54.6 million global deaths in 2008, and this number increased to 71% of 56.7 million in 2016 [
1,
2]. Cardiovascular diseases (CVDs) and cancers are the first two leading causes of mortality accounting for 31.8% and 17.1% of global deaths in 2017, respectively [
3]. In the UK, 89.7% of total mortality was attributed to non-communicable diseases in 2017 with cancers, CVDs, and dementia as the first three leading contributors [
4]. Of all deaths, 77.0%, 17.3%, and 5.7% were people who were aged 70 years or older, 50–69 years, and younger than 50 years old, respectively [
4]. The leading cause of mortality is cancer followed by CVD among people who died younger than 70 years; however, CVD is the leading cause followed by cancer among those who died 70 years or over [
4]. It has been estimated that life expectancy at birth increased rapidly until 2010, but slowly since 2010 in the UK [
5]. One explanation for this is that the population is reaching the biological limits of longevity, but addressing the causes of premature mortality may help promote longevity.
Research shows that most individuals over 50 years have not one but several comorbidities (defined as multimorbidity) [
6,
7], and multimorbidity is associated with an increased risk of mortality [
8]. Trajectory analyses as an emerging method have been used to identify temporal disease progression patterns for predicting and preventing future diseases [
9,
10]. Most people may die because of multiple diseases. A previous study based on the UK Biobank cohort has investigated the disease trajectories during follow-up and mortality among individuals with depression [
11]. However, this study is limited by failing to identify temporal trajectories of important diseases in the life course.
Prevention and reduction of premature mortality risk are of paramount importance for achieving further substantial life expectancy increases. Using the UK Biobank, we aimed to examine the association between important diseases and incident mortality. Based on the diseases with great contribution to mortality, disease trajectories in the life course were identified.
Methods
Study population
The UK Biobank is a population-based cohort study of 502,505 participants aged 40–73 years at baseline between 2006 and 2010. Participants were recruited from one of the 22 assessment centers throughout the UK [
12]. The study design, recruitment flow, and population have been described in detail elsewhere [
12]. A baseline assessment was conducted among 502,505 out of approximately 9.2 million people invited. Participants provided information on geographic factors, lifestyle, and other health-related aspects through comprehensive baseline questionnaires, interviews, and physical measurements.
The UK Biobank study’s ethical approval was granted by the National Information Governance Board for Health and Social Care and the NHS North West Multicentre Research Ethics Committee. All participants provided informed consent through electronic signature at baseline assessment. The present study was conducted under application number 62443 of the UK Biobank resource [
13].
Ascertainment of diseases
Diseases were defined if participants reported that they had ever been told by a doctor that they had a disease (field code for each disease is listed in Additional file
1: Table S1). A further question “What was your age when the disease was first diagnosed?” was requested to answer for those who reported a diagnosis of disease. The checks for disease diagnosis age were performed to confirm whether the diagnosis age was within the rationale range. Individuals who were uncertain about the diagnosis age provided an estimate or selected “Do not know”. Sixty major diseases including CVD, cancer, diabetes, dementia, and chronic kidney disease (CKD) were included in the analysis.
Additional disease cases at baseline and follow-up were defined using inpatient data. The Hospital Episode Statistics database, the Scottish Morbidity Record, and the Patient Episode Database were used to capture inpatient hospital records in England, Scotland, and Wales [
12]. The inpatient hospital data for the UK Biobank participants were available since 1997 [
12]. The codes for international classification diseases (ICD) for each of the 60 diseases are listed in Additional file
1: Table S2. The age at diagnosis of disease (years) was then computed by subtracting the birth date from the initial diagnosed date divided by 365.25. The incident cases of these 60 diseases during follow-up were identified using ICD codes.
Ascertainment of mortality
Mortality data for participants in England and Wales were obtained from the National Health Service Digital, and the mortality data for the participants in Scotland were obtained from the National Health Service Central Register [
12]. Specific causes of mortality with a primary diagnosis were identified using ICD codes [
14]. Person-years were calculated from the date of baseline assessment (2006–2010) to the date of death, or the end of follow-up (31 December 2020 for England/Wales and 18 January 2021 for Scotland), whichever came first.
Covariates
Data on age, gender, ethnicity, education, and income were collected using a touch-screen computer. A detailed questionnaire on lifestyle factors, including diet, physical activity, smoking status, and frequency of alcohol consumption was also completed. We divided sleep duration into three groups: <7, 7–9, and >9 h [
15]. An excess metabolic equivalent (MET)-hours/week of physical activity during work and leisure time was estimated using questions that were similar to those used in the short form of the International Physical Activity Questionnaire [
16]. A healthy diet score was computed based on seven commonly eaten food groups following recommendations on dietary priorities for cardiometabolic health [
17] with a higher score representing a healthier diet. In the present analysis, the high diet quality was defined as the diet score≥4. A genetic risk score (GRS) for longevity was computed using 78 single-nucleotide polymorphisms [
18].
BMI was calculated as measured weight in kilograms divided by measured height in meters squared. Glycated hemoglobin (HbA1c) was measured using high-performance liquid chromatography on a Bio-Rad Variant II Turbo. Total cholesterol, high-density lipoprotein cholesterol (HDL-C), low-density lipoprotein cholesterol (LDL-C), and triglycerides were measured by direct enzymatic methods (Konelab, Thermo Fisher Scientific, Waltham, Massachusetts).
Statistical analysis
Cox proportional hazard regression models were used to examine the association between each of the 60 major diseases at baseline (including both self-reported and inpatient data) and incident mortality. The covariates were selected based on clinical knowledge and potential multicollinearity was tested in the analysis. Those covariates with a variance inflation factor greater than 5 were excluded from the analysis (LDL-C and total cholesterol). Model 2 was adjusted for age, gender, ethnicity, education, income, BMI, smoking, physical acidity, alcohol consumption, sleep duration, diet, blood pressure, longevity GRS, HDL-C, triglycerides, and HbA1c. For the analysis of each disease, the effect of the other diseases was additionally adjusted for in the full model (Model 3). All the individual diseases were found to have no potential multicollinearity (all individual variance inflation factors<2). Individuals who died within the first year of follow-up were excluded from the analysis. The log-minus-log plots were used to test the proportional hazards assumption. We then examined the association between each major disease was diagnosed in the life course (including those reported at baseline and follow-up) and mortality using Cox proportional hazard regression models. Individuals with a disease diagnosed in the last year before mortality were excluded from the analysis. Benjamin-Hochberg’s procedure was used to control the false discovery rate at a 5% level for multiple comparisons [
19].
We then classified the diseases that were significantly associated with mortality into groups. Individuals who died during follow-up were divided into groups according to the number of types of diseases: 0, 1, 2, 3, 4, 5, and ≥6. Baseline data were expressed as means ± standard deviations, medians (interquartile ranges [IQRs]), or frequency (percentage) according to the number of types of diseases. ANOVA analysis for normally distributed continuous variables, Wilcoxon rank-sum test for skewed continuous variables, and chi-square test for categorical variables were used to examine the difference in baseline characteristics across the number of diseases in the life course.
Temporal disease trajectories were identified using the permutation of diseases (multiple diseases in order of age at diagnosis) among individuals who were diagnosed with two or more diseases before mortality. The age at diagnosis of diseases was based on self-reported or inpatient data. For example, among those who were diagnosed with two diseases in the life course, the primary disease was defined as the disease diagnosed at a younger age and the secondary disease as the disease diagnosed at an older age. This analysis was also conducted for those with 3, 4, 5, and ≥6 diseases separately. The disease trajectories were identified for all mortality, cancer mortality, and CVD mortality, separately.
For chronic diseases at baseline that were associated with a lower risk of incident mortality, a matched analysis was conducted to test these associations with controls matched by age and gender. For each individual with the disease, one control was randomly selected from those free of the corresponding disease.
The percentage of participants with missing values in physical activity, household income, education, BMI, alcohol consumption, smoking, sleep duration, LDL-C, triglycerides, blood pressure, and HbA1c was 19.9%, 15.3%, 2.0%, 2.0%, 0.3%, 0.6%, 0.8%, 14.4%, 6.6%, 6.0%, and 7.2%, respectively. Missing values for categorical variables were assigned as a single category. Missing values for continuous covariates were assigned as the mean. Sensitivity analysis for associations between individual diseases and mortality was conducted among participants with complete data.
All data analyses were conducted using SAS 9.4 (SAS Institute Inc.), and P values were two-sided with statistical significance set at <0.05.
Discussion
We found that more than half of the participants died with cancer and more than one fifth died with CVD. More than 90% of the individuals were diagnosed with two or more diseases of interest in the life course. A larger number of diseases diagnosed in the life course was associated with longer longevity. Hypertension was more likely to be diagnosed ahead of CVD and cancer, whilst CKD was more likely to be diagnosed following CVD and cancer. This trend was more pronounced with the increasing number of diseases diagnosed in the life course. There were significant interplays between cancer and CVD. Similar results were found for individuals who died with cancer or CVD.
Our analysis is consistent with a previous study showing that cancer, CVD, diabetes, neurological disorders, mental disorders, chronic respiratory diseases, and digestive diseases play an important role in the development of mortality [
3]. An increasing number of studies have linked cataract to increased mortality risk [
20‐
22]. This is in line with our study demonstrating that cataract was associated with an increased risk of mortality independent of geographic factors, lifestyle, biomarkers, and other chronic diseases. A recent meta-analysis showed that findings for the association between glaucoma and mortality remained inconsistent between previous studies [
22]. In our analysis, glaucoma was not significantly associated with mortality risk after adjustment of other diseases suggesting the potential risk of glaucoma was dependent on its association with other diseases. We also observed endometriosis, prostate, and migraine at baseline were associated with a lower risk of mortality, but these associations were attenuated to be non-significant in the matched analysis. A prospective cohort study of Finnish women (49,956 with endometriosis, 98,824 age- and municipality-matched controls) with a median follow-up of 17 years reported that endometriosis diagnosed by surgery was associated with a lower risk of all-cause mortality (HR (95% CI): 0.73 (0.69–0.77)) [
23]. We found endometriosis (defined by both self-reported and inpatient data) was associated with a decreased risk of mortality, but this association was not significant in the age-matched analysis. The inconsistent results between the previous study and our analysis may be due to different methods used for the diagnosis of endometriosis. A recent large prospective study of 27,844 women with a median follow-up of 22.7 years showed that migraine was not significantly associated with all-cause mortality (HR (95% CI) 0.96 (0.89–1.04)) [
24]. We found migraine was associated with a reduced risk of mortality, but this association was attenuated to be non-significant after adjustment for other chronic diseases. This suggests the potential beneficial effects of migraine on mortality prevention may be due to confounding. An analysis based on the Oxford Record Linkage Study and English national data demonstrated that benign prostatic hyperplasia was associated with a lower risk of mortality although the effect size was minimal [
25]. Likely, we found prostate disorders (excluding prostate cancer) were associated with decreased risk of mortality, but this association was not significant after adjustment for other diseases.
There was a small proportion of individuals who were not diagnosed with any disease in the life course, a large proportion of whom died with CVD or neurogenerative/mental disorders. Although a relatively larger proportion of these individuals died from external reasons (10.1%), they had shorter longevity compared to those who were diagnosed with one or more diseases in life course even when those who died from external reasons were excluded from the analysis. As the reduction in longevity was possibly due to the unawareness of diseases in those individuals, it is imperative to screen diseases, especially CVD and neurogenerative/mental disorders among middle-aged adults. Higher total cholesterol but lower HbA1c was observed in this subgroup of individuals. As further analysis showed that individuals with fewer diseases were more likely to have fewer deadly diseases (Additional file
1: Table S9), the shorter longevity among individuals who were diagnosed with fewer diseases may be due to the fact that they were less likely to seek health check and care. Therefore, health screening is important among these participants in their mid-life.
Cancer is the leading cause of mortality and is also the most prevalent one of diseases of interest (60.2%) in the life course among individuals who died prematurely. This is consistent with a previous study demonstrating that cancer was the leading cause of life years lost and life years lost due to cancer increased by 16% from 1995 to 2015 [
26]. Around one quarter of those who were diagnosed with cancer had no other disease diagnosed before the diagnosis of cancer in the life course and others had at least one disease (including hypertension, digestive disorders, or painful conditions) diagnosed before the diagnosis of cancer. Likely, previous prospective studies have shown that hypertension was associated with an increased risk of cancer [
27,
28]. Digestive disorders including anorexia may increase the risk of cancer [
29‐
31]. We found CVD was the second leading cause of premature mortality. Although CVD contributed to a much smaller proportion of premature mortality in our study, a larger proportion of mortality caused by CVD (80%) was related to modifiable risk factors compared with that caused by cancer (47%) [
32]. It is well known that hypertension and hypercholesterolemia are primary causes of CVD [
33], whilst blood pressure and cholesterol-lowering is shown to be beneficial for the prevention of CVD [
34,
35]. There were significant interplays between cancer and CVD [
36], which might explain why a large proportion of CVD was diagnosed following cancer. This is in line with previous studies showing that cancer clustered with hypertension, CVD, and/or digestive disorders is a common multimorbidity pattern in the European populations [
37‐
40]. However, the temporal trajectories of these conditions in life course need to be investigated in more prospective cohort studies.
A systematic analysis for the Global Burden of Disease Study showed that hypertension is the leading contributor to global mortality [
41]. We found, although hypertension is not the leading cause of mortality, hypertension is the most prevalent one of diseases of interest. The association between hypertension and mortality is probably attributed to the fact that a large proportion of cancer and CVD (leading causes of mortality) was diagnosed following hypertension. Several recent studies have shown that the cluster of hypertension and/or CVD and CKD is a frequently seen multimorbidity pattern [
9,
37,
42]. Hypertension is more likely to precede other conditions before mortality whilst CKD is more likely to occur following other conditions [
43,
44]. This suggests the importance of screening more severe conditions such as cancer and CVD among those with hypertension and the prevention of CKD also deserves scrutiny among those with one or more existing diseases. Given the interactions between various shared risk factors and the known significant hormonal shifts across this period, it is clear that longitudinal research spanning the prodrome of disease development is central to improving our understanding of the evolution of multimorbidity in the life course. It is also important to identify time windows for potential risk and preventative factors that may contribute to premature mortality.
To our knowledge, this is the first study to examine the disease trajectory in the life course based on a large population cohort. There are several potential limitations in our study. Firstly, most cases of many chronic conditions at baseline including hypertension, high cholesterol, stroke, asthma, depression, and anxiety were captured by self-reported data (Additional file
1: Table S10), whilst those conditions diagnosed during follow-up were largely captured by inpatient data given that self-reported data during follow-up were available in only a small subgroup of the UK Biobank cohort. Data on age at diagnosis of disease until recruitment among some individuals (without initial diagnosis records in the inpatient data) were based on self-reported questionnaires, which might have biased the associations. Secondly, UK Biobank participants are more likely to have better general health (lower prevalence of main chronic diseases and unhealthy behaviors). However, a previous study has demonstrated that findings regarding exposure-disease relationships may be generalized to other populations [
45]. Thirdly, we present the disease trajectories for the first six and the last diseases among individuals who were diagnosed with seven or more diseases in their life course. The permutations of diseases diagnosed between the seventh and the last diseases were not displayed given the too large metrics. Finally, the severity of the diseases cannot be captured in the data and thus was not included in the analysis. This might have biased the associations as a disease of being a different severity may result in different risks for mortality or different combinations of diseases then leading to different risks of mortality.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.