Background
Life expectancy is a well-known concept quantifying the average number of years an individual is expected to live. For cancer patients, life expectancy (LE
C) quantifies the average number of life years remaining at diagnosis, while the loss in life expectancy (LLE) is the average number of life years a cancer patient loses due to cancer. The LLE for cancer patients is estimated as the difference between the life expectancy cancer patients would have if they did not have cancer LE
exp and the life expectancy of cancer patients LE
C:
$$\begin{aligned} LLE (Z) = LE_{\text {exp}}(Z_2) - LE_C(Z), \end{aligned}$$
where
\(LE_{\text {exp}}(Z_2)\) and
\(LE_{C}(Z)\) are calculated as the area under the corresponding survival curve, the survival cancer patients would have if they did not have cancer
\(S^*(t | Z_2)\) (also referred to as expected survival) and the observed (all-cause) survival for cancer patients
S(
t|
Z), respectively:
$$\begin{aligned} LE_{\text {exp}} (Z_2) = \int _0^{t^*} S^*(u | Z_2) du \\ LE_C (Z) = \int _0^{t^*} S(u | Z) du \end{aligned}$$
Using these, the above equation for LLE becomes:
$$\begin{aligned} LLE (Z) = \int _0^{t^*} S^*(u | Z_2) du - \int _0^{t^*} S(u | Z) du, \end{aligned}$$
(1)
where
\(t^*\) is the maximum time when both survival curves, the expected survival
\(S^*(t|Z_2)\) and all-cause survival
S(
t|
Z) are effectively zero.
\(Z_2\) denotes a set of covariates for the life expectancy of cancer patients if they did not have cancer while
Z presents the covariates for the life expectancy of cancer patients at cancer diagnosis and includes
\(Z_2\).
In practice, expected survival
\(S^*(t|Z_2)\) is assumed to be the same as the survival in the general population, obtained from the general population life tables stratified by some sociodemographic covariates
\(Z_2\) like age, gender and calendar year. The uncertainty in the estimates of expected measures based on the entire general population, i.e. all people living in a country, region or the catchment area for the population-based cancer registry, is negligible with regards to the uncertainty in a much smaller cancer population and is, therefore, usually ignored [
2].
Estimation of
\(LE_C (Z)\) most often requires extrapolation of
S(
t|
Z) in the cancer cohort beyond the study period since follow-up until the death of all cancer patients, i.e. until the observed survival curve is effectively zero, is not feasible. For most cancer types this extrapolation has been shown to perform well in a relative survival framework [
8]. Within the relative survival framework the all-cause mortality rate for cancer patients,
h(
t|
Z), can be partitioned into the mortality rate due to cancer, and the mortality rate due to other causes. The mortality rate due to other causes is assumed to be the same as the mortality rate of an individual in the general population, matched on age, sex, calendar year and possibly other covariates, and referred to as expected mortality,
\(h^*(t |Z_2)\), and the mortality due to cancer is referred to as excess mortality
\(\lambda (t | Z_1)\), the mortality rate in excess to the expected mortality.
\(Z_1\) presents covariates for the cancer-specific death and
Z is the combination of
\(Z_1\) and
\(Z_2\). Very often
Z,
\(Z_1\) and
\(Z_2\) will be the same. The extrapolation of the all-cause mortality is performed separately for the expected and excess mortality rates. On a survival scale, all-cause survival for cancer patients
S(
t|
Z) is the product of expected survival
\(S^*(t |Z_2)\) and relative survival
\(R(t | Z_1)\):
$$\begin{aligned} S(t | Z) = R(t | Z_1) \cdot S^*(t | Z_2) \end{aligned}$$
(2)
The relative survival can be estimated from a flexible parametric relative survival model (FPRM) [
9]. The log cumulative excess hazard
\(\ln {[\Lambda (t|Z_1)]}\) within a FPRM is expressed as:
$$\begin{aligned} \ln [\Lambda (t| Z_1)] = s(\ln (t)|\varvec{\gamma _1},\varvec{k_1}) + \varvec{\beta _1 Z_1}, \end{aligned}$$
(3)
where
t represents time since cancer diagnosis,
\(s(\ln (t)|\varvec{\gamma _1},\varvec{k_1})\) is a restricted cubic spline function of
\(\ln (t)\) used to estimate the baseline log cumulative excess hazard [
10],
\(\varvec{Z_1}\) represents a vector of covariates for excess mortality. Model (
3) is a proportional excess hazards model but time-dependent effects can be incorporated by including interactions between covariates and a spline function of log time [
11]. The estimates of parameters (
\(\widehat{\varvec{\beta _1}}\),
\(\widehat{\varvec{\gamma _1}}\)) from Model (
3) are obtained using maximum likelihood, where the contribution of the
i-th individual to the log-likelihood
l can be written as:
$$\begin{aligned}{} & {} l_i (\varvec{\beta _1}, \varvec{\gamma _1} | t_i, \varvec{Z_i}) = d_i\ln [h^*(t_i | \varvec{Z_{2_i}} ) + \lambda (t_i | \varvec{Z_{1_i}}, \varvec{\beta _1}, \varvec{\gamma _1})] + \ln [S^*(t_i | \varvec{Z_{2_i}})]\\{} & {} + \ln [R(t_i | \varvec{Z_{1_i}}, \varvec{\beta _1}, \varvec{\gamma _1})], \end{aligned}$$
where
\(d_i\) is the death indicator.
We assume that
\(h^*(t | Z_2)\) and
\(S^*(t | Z_2)\) are known, i.e. measured without uncertainty. As they do not depend on the model parameters,
\(S^*(t | Z_2)\) can be dropped from the log-likelihood and
l can be rewritten as:
$$\begin{aligned} l_i (\varvec{\beta _1}, \varvec{\gamma _1} | t_i, \varvec{Z_i}) = d_i\ln [h^*(t_i | \varvec{Z_{2_i}}) + \lambda (t_i | \varvec{Z_{1_i}}, \varvec{\beta _1}, \varvec{ \gamma _1})] + \ln [R(t_i | \varvec{Z_{1_i}}, \varvec{\beta _1}, \varvec{ \gamma _1})] \end{aligned}$$
(4)
Here, for each cancer patient, i, their expected mortality rate, \(h^*(t_i | Z_2)\), given covariates \(Z_2\) at the time of death due to any cause, \(t_i\), is assumed to be known, and most often obtained from life tables based on the entire general population. We denote a variance-covariance matrix of \(\widehat{\varvec{\beta _1}}\) and \(\widehat{\varvec{\gamma _1}}\) as \(V_1\).
Using estimates from Model (
3) and the relationship between the cumulative hazard function and survival function,
\(\widehat{R}(t | Z_1)\) can be obtained by
$$\begin{aligned} \widehat{R}(t|Z_1)=\exp \left( -\exp (\ln [\widehat{\Lambda }(t|Z_1)])\right) \end{aligned}$$
(5)
LLE can be estimated in the relative survival setting as
$$\begin{aligned} \widehat{LLE}(Z) = \int _0^{t^*} S^*(u|Z_2) du - \int _0^{t^*} \widehat{R}(u|Z_1) \cdot S^*(u|Z_2) du, \end{aligned}$$
(6)
Since
\(S^*(t|Z_2)\) is treated as fixed, it does not contribute to the variance of LLE, i.e.:
$$\begin{aligned}{} & {} Var(\widehat{LLE}) = Var(\widehat{LE_{\text {exp}}} - \widehat{LE_C}) = Var(\widehat{LE_{\text {exp}}}) + Var(\widehat{LE_C}) -2\cdot COV(\widehat{LE_{\text {exp}}},\widehat{LE_C}) \nonumber \\{} & {} = Var(\widehat{LE_C}) = Var \left[ \int _0^{t^*} \widehat{R}(u|Z_1) \cdot S^*(u|Z_2) du \right] \nonumber \\{} & {} = Var\left[ \int _0^{t^*} \exp (-\exp (\ln [\widehat{\Lambda }(u|Z_1)]))\cdot S^*(u|Z_2) du \right] , \end{aligned}$$
(7)
which can be obtained using the delta method [
12]. In this case, the uncertainty of the LLE solely comes from the uncertainty in excess mortality.
In situations, where there may be concerns about the extrapolation of survival curves, for example, for young cancer patients, or for long follow-up times, restricted mean survival times (RMST) can be obtained [
13]. Expected restricted mean survival time (RMST
exp), observed restricted mean survival time for cancer patients (RMST
C) and the difference (loss) between restricted mean survival times (LRMST) for cancer patients are estimated within a predefined time window.
Estimation of LEexp including uncertainty in the expected survival and mortality of the general population
It has been shown that the uncertainty in the expected survival should be taken into account when the estimates are based on a sample from the general population [
2]. An example of such a sample can be comparators from a matched cohort study, where cancer patients are matched on age to comparators from the general population. By fitting a survival model to estimate mortality for the comparators, the predicted rates can be used as an alternative for
\(h^*(t | Z_2)\), and the uncertainty of the estimates can be obtained.
We suggest using a flexible parametric survival model (FPM) [
14] with attained age as a time-scale to estimate the mortality rate for the comparators:
$$\begin{aligned} \ln [H(a|Z_2)] = s(\ln (a)|\varvec{\gamma _2},\varvec{k_2}) + \varvec{\beta _2 Z_2}, \end{aligned}$$
(8)
where
a is the attained age,
\(\varvec{Z_2}\) is a vector of covariates for the expected survival,
\(H(a|Z_2)\) is the cumulative expected hazard,
\(s(\ln (a)|\varvec{\gamma _2},\varvec{k_2})\) is a restricted cubic spline function of
\(\ln (a)\), used to estimate the baseline log cumulative hazard. Model (
8) is a proportional hazards model but can easily be extended to non-proportional hazards by incorporating interactions between covariates and spline terms for
\(\ln (a)\).
Parameter estimates
\(\widehat{\varvec{\beta _2}}\) and
\(\widehat{\varvec{\gamma _2}}\) from Model (
8) are obtained by maximum likelihood that incorporates the potential delayed entry (left-truncation) and can be written as follows:
$$\begin{aligned}{} & {} l_i (\varvec{\beta _2}, \varvec{\gamma _2} | a_{0_i}, a_i, \varvec{Z_{2_i}}) = d_i\ln [h^*(a_i | \varvec{Z_{2_i}}, \varvec{\beta _2}, \varvec{\gamma _2})] + \ln [S^*(a_i | \varvec{Z_{2_i}}, \varvec{\beta _2}, \varvec{\gamma _2})] \\{} & {} - \ln [S^*(a_{0_i} | \varvec{Z_{2_i}}, \varvec{\beta _2}, \varvec{\gamma _2})] , \end{aligned}$$
where
\(a_{0_i}\) is the age at the beginning of the follow-up period for
i-th individual.
Using the general relationship between cumulative hazard, hazard and survival,
\(\widehat{S^*}(a)\) can be obtained by:
$$\begin{aligned} \widehat{S^*}(a|Z_2) = \exp \left( -\exp (\ln [\widehat{H}(a|Z_2)])\right) \end{aligned}$$
(9)
Then
\(\widehat{LE_{\text {exp}}}(Z_2)\) with attained age as time scale is estimated as:
$$\begin{aligned} \widehat{LE_{\text {exp}}}(Z_2) = \int _{a_0}^{a_0 + t^*} \frac{\widehat{S^*}(u' | Z_2) }{\widehat{S^*}(a_0 | Z_2)} du', \end{aligned}$$
(10)
where
\(t^*\) is the maximum of follow-up time when everyone is expected to have died and
\(a_0\) is the age at matching (age at diagnosis for the corresponding matched cancer patient). We can rewrite Eq. (
10) with time since diagnosis as time scale by taking into account that attained age is a function of time, i.e.:
\(a = a_0 + t\). Then by putting
\(u = u' - a_0\), we rewrite:
$$\begin{aligned} \widehat{LE_{\text {exp}}}(Z_2) = \int _{a_0}^{a_0 + t^*} \frac{\widehat{S^*}(u' | Z_2) }{\widehat{S^*}(a_0 | Z_2)} du' = \int _0^{t^*} \widehat{S^*}(u + a_0 | Z_2, a_0) du. \end{aligned}$$
(11)
The variance of
\(\widehat{LE_{\text {exp}}}\) can be obtained using the delta method:
$$\begin{aligned} Var(\widehat{LE_{\text {exp}}})=G_E^T \cdot V_2 \cdot G_E \end{aligned}$$
(12)
where
\(V_2\) is the variance-covariance matrix for
\(\widehat{\varvec{\beta _2}}\) and
\(\widehat{\varvec{\gamma _2}}\) from Model (
8) and
\(\varvec{G_E}\) is a vector of the first derivatives of function LE
exp (Eq. (
11)) with respect to each of the parameters
\(\varvec{\beta _2}\) and
\(\varvec{\gamma _2}\).
Estimation of LEC including uncertainty in the expected survival and mortality of the general population
Recall, that
\(LE_C(Z) = \int _0^{t^*} R(u | Z_1) \cdot S^*(u | Z_2) du\) (Eq. (
6)). By using the estimates of
\(\widehat{R}(t|Z_1)\) from Model (
3) and the estimates of
\(\widehat{S^*}(t|Z_2)\) from Model (
8),
\(LE_C(Z)\) can be written:
$$\begin{aligned}{} & {} \widehat{LE_C}(Z) = \int _0^{t^*} \widehat{R}(u | Z_1) \cdot \widehat{S^*}(u + a_0 | Z_2) du \nonumber \\{} & {} = \int _0^{t^*} \exp \left( -\exp (\ln [\widehat{\Lambda }(u|Z_1)])\right) \cdot \exp \left( -\exp (\ln [\widehat{H}(u + a_0 | Z_2])\right) du, \end{aligned}$$
(13)
where
\(\Lambda (t | Z_1)\) is the cumulative excess mortality, while
\(H(t + a_0 | Z_2)\) is the cumulative expected mortality.
The relative survival
R(
t) is interpreted as net survival, i.e. survival from specific cancer in a hypothetical world where a cancer patient can die only from the cancer of interest if conditional independence assumption holds. In other words, conditional on covariates cancer-specific mortality and mortality due to other causes, are independent [
15]. They are competing but mutually exclusive events. Therefore, for implementation purposes to use existing Stata software, Model (
13) can be specified in terms of a competing risks approach [
16], where all-cause survival
S(
t) can be presented as:
$$\begin{aligned} S(t) = 1 - \left( Cr_{cancer}(t) + Cr_{other}(t)\right) \end{aligned}$$
Here,
\(Cr_{cancer}(t)\) is the crude probability of death due to cancer, interpreted as the probability of dying from cancer by time
t, while also being at risk of dying from other causes and
\(Cr_{other}(t)\) is the crude probability of death due to other causes interpreted as the probability of dying due to other than the cancer of interest causes by time
t, while at risk of the cancer death [
17]. It should be noted that the notation crude probability of death is used in the relative survival framework, while it is also known as cumulative incidence function in competing risk terminology [
18]. Crude probability of death due to cancer and crude probability of death due to other causes can be estimated as:
$$\begin{aligned}{} & {} \widehat{Cr}_{cancer} (t|Z) = \int _0^{t^*} \widehat{S^*}(u + a_0|Z_2) \cdot \widehat{R}(u|Z_1) \cdot \widehat{\lambda }(u|Z_1) du \nonumber \\{} & {} \widehat{Cr}_{other} (t|Z) = \int _0^{t^*} \widehat{S^*}(u + a_0|Z_2) \cdot \widehat{R}(u|Z_1) \cdot \widehat{h^*}(u|Z_2) du \end{aligned}$$
The life expectancy for cancer patients is then estimated as:
$$\begin{aligned} \widehat{LE_C}(Z) = \int _0^{t^*} S(u | Z) du = t^* - \int _0^{t^*} (\widehat{Cr}_{cancer}(u |Z) + \widehat{Cr}_{other}(u|Z)) du, \end{aligned}$$
where
\(t^*\) is a pre-defined time point after cancer diagnosis when we expect all individuals to have died. This use of the competing risk approach (i.e. by re-writing LE
C with respect to
\(Cr_s\)) allows us to use the Stata command
standsurv
[
19] to obtain
\(\widehat{LE_C}\), its SE and a vector of the first partial derivatives for the function
\(\widehat{LE_C}\) with respect to each parameter from both models (
3) and (
8), i.e. with respect to vector
\((\varvec{\beta _1, \gamma _1, \beta _2, \gamma _2})^T\). We denote this vector of the first partial derivatives
\(\varvec{G_C}\).
Estimation of Var(LLE) including uncertainty in the expected survival and mortality of the general population
Recall, that loss in life expectancy is obtained as the difference between life expectancy for cancer patients and their life expectancy if they did not have cancer. Therefore, to get the variance of LLE, we need to know the variance of LE
exp, the variance of LE
C and their covariance (Eq.
7).
\(Var(\widehat{LE_{\text {exp}}})\) is obtained as shown in Eq. (
12).
\(Var(\widehat{LE_C})\) is obtained as described above.
To obtain
\(Cov(\widehat{LE_{\text {exp}}}, \widehat{LE_C})\), let
\(\varvec{G}\) denote a matrix of observation-specific first derivatives for
\(\widehat{LE_{\text {exp}}}\) and
\(\widehat{LE_C}\) with respect to each of parameters from both model (
8) and model (
3), i.e. with respect to
\((\varvec{\beta _1, \gamma _1, \beta _2, \gamma _2})^T\):
$$\begin{aligned} \varvec{G} = \left( \begin{array}{c} \varvec{G_E^*}\\ \varvec{G_C} \end{array}\right) \end{aligned}$$
Note that
\(\varvec{G_E^*}\) is a vector of observation-specific first derivatives for
\(\widehat{LE_{\text {exp}}}\) with respect to
\((\beta _1, \gamma _1, \beta _2, \gamma _2)^T\), i.e.
\(\varvec{G_E^*}\) includes
\(\varvec{G_E}\), a vector of the first derivatives for
\(\widehat{LE_{\text {exp}}}\) with respect to parameters (
\(\varvec{\beta _2}, \varvec{\gamma _2}\)) and a vector of
\(\varvec{0_s}\), a vector of the first derivatives for
\(\widehat{LE_{\text {exp}}}\) with respect to parameters (
\(\varvec{\beta _1}, \varvec{\gamma _1}\)) because models (
8) and (
3) do not have shared parameters.
Let
\(\varvec{V}\) denote a combination of two variance-covariance matrices
\(\varvec{V_1}\) and
\(\varvec{V_2}\) from models (
3) and (
8), respectively:
$$\begin{aligned} \varvec{V} = \left( \begin{array}{cc} \varvec{V_2} &{} \varvec{0} \\ \varvec{0} &{} \varvec{V_1} \end{array}\right) \end{aligned}$$
Note that
\(\varvec{0}\)s in
\(\varvec{V}\) convey that models (
3) and (
8) do not have shared parameters.
And let
\(\varvec{\Sigma }\) be a result of matrix multiplication:
$$\begin{aligned} \varvec{\Sigma } = \varvec{G^T} \cdot \varvec{V} \cdot \varvec{G}, \end{aligned}$$
(14)
where
\(\varvec{\Sigma }\) can be rewritten:
$$\begin{aligned} \varvec{\Sigma } = \left( \begin{array}{cc} \sigma ^2_{\widehat{LE_{\text {exp}}}} &{} \sigma _{\widehat{LE_{\text {exp}}},\widehat{LE_C}} \\ &{}\\ \sigma _{\widehat{LE_C},\widehat{LE_{\text {exp}}}} &{} \sigma ^2_{\widehat{LE_C}} \end{array}\right) . \end{aligned}$$
(15)
The estimates from Matrix (
15) are used to calculate
\(Var(\widehat{LLE})\).