In the relative survival framework it is assumed that the overall all-cause mortality rate,
h(
t|
Xi), for an individual with covariate pattern
Xi, is the sum of the expected mortality rate,
h∗(
t|
Xi), and the excess mortality rate,
λ(
t|
Xi).
$$ h(t|\pmb{X}_{i}) = h^{*}(t|\pmb{X}_{i}) + \lambda(t|\pmb{X}_{i}) $$
(1)
For simplicity it is assumed that covariates,
Xi, are the same for the expected and excess mortality rates, but this can be relaxed. Expected mortality rates are stratified by age, sex, calendar year and potentially other demographic covariates. The relative survival for covariate pattern
Xi is,
$$ {R}(t|\pmb{X}_{i}) = \exp\left(-\int_{0}^{t}{\lambda(u|\pmb{X}_{i}) du}\right) $$
The marginal relative survival,
Rm(
t|
X), is the expectation over covariates,
X, i.e.
EX[
R(
t|
X)]. This can be estimated in a modelling framework when incorporating these covariates,
X, by averaging the individual estimates,
\(\widehat {R}(t|\pmb {X}_{i})\).
$$ \widehat{R}_{m}(t|\pmb{X}) = \frac{1}{N}\sum_{i=1}^{N}{\widehat{R}(t|\pmb{X}_{i})} $$
The marginal excess mortality rate function,
λm(
t|
X) can be obtained through the usual transformation from survival to hazard function,
\(h(t) = -\frac {d \ln [S(t)]}{dt}\).
$$ \lambda_{m}(t|\pmb{X}) = \frac{E_{\pmb{X}}\left[R(t|\pmb{X})\lambda(t|\pmb{X})\right]}{E_{\pmb{X}}[R(t|\pmb{X})]} $$
(2)
A conditional model with no covariates for the excess mortality rate
Consider the conditional model in Eq. (
1) without including covariates
Xi for the excess mortality rate.
$$ h(t|\pmb{X}_{i}) = h^{*}(t|\pmb{X}_{i}) + \lambda(t) $$
(3)
This assumes that the excess mortality rate,
λ(
t), is the same for all individuals. This would mean that all-cause mortality rate would vary between individuals only due to variation in the expected mortality rates and not variation in excess mortality rates. This is different from that defined in Eq. (
2) where the individual excess mortality rates vary between individuals.
Likelihood
We adopt a fully parametric approach, so will model how excess mortality rates vary over time from diagnosis and by covariates. For an observed all-cause survival/censoring time,
ti and event indicator for death due to any cause,
di, the log-likelihood contribution of the
ith individual with covariate pattern,
Xi, for a relative survival model is
$$ \ln L_{i} = d_{i}\ln\left[h^{*}(t_{i}|\pmb{X}_{i}) + \lambda(t_{i}|\pmb{X}_{i},\pmb{\beta},\pmb{\gamma}) \right] - \Lambda(t_{i}|\pmb{X}_{i},\pmb{\beta},\pmb{\gamma}) $$
where
Λ(
t|
Xi,
β,
γ) is the cumulative excess mortality function with parameters,
β, modelling covariate effects and,
γ, modeling the effect of time from diagnosis [
11,
13].
For a marginal model with no covariates the marginal excess mortality rate,
λm(
t|
X), as defined in Eq. (
2), needs to be directly estimated;
X thus denotes covariates that can impact on both expected and excess mortality rates. Rather than incorporate,
h∗(
ti|
Xi), the individual expected hazard for the
ith individual at time
ti, a suitable estimate of the marginal expected mortality rate needs to be incorporated. A naive way to do this would be including the mean of
h∗(
ti|
Xi) among those at risk at time
ti. However, net survival is defined in the hypothetical world where it is not possible to die from other causes, but it is estimated in the real world where individuals may die from both their cancer and from other causes. This means that with increasing time from diagnosis individuals with a higher risk of dying from other causes will be underrepresented. This should be taken into account in both the likelihood and when estimating the mean expected mortality rate. A similar idea to that proposed by Pohar Perme et al. [
6] for the non-parametric estimator is used by upweighting by the inverse of the expected survival,
S∗(
t|
Xi), where,
$$S^{*}(t|\pmb{X}_{i}) = \exp\left(-\int_{0}^{t}{h^{*}(u|\pmb{X}_{i}) du}\right) $$
and defining the individual level, time-dependent weights
\(w_{i}^{*}(t)\) as
$$ w_{i}^{*}(t) = \frac{1}{S^{*}(t|\pmb{X}_{i})} $$
The mean expected mortality rate at time
ti incorporating weights,
\(w_{i}^{*}(t_{i})\), is
$$ \bar{h}^{*}(t_{i})=\frac{\sum\limits_{j\in \mathcal{R}(t_{i})} { w_{j}^{*}(t_{i}) h^{*}(t_{i}|\pmb{X}_{j})}}{\sum\limits_{j\in \mathcal{R}(t_{i})} { w_{j}^{*}(t_{i})}} $$
(4)
where \(\mathcal {R}(t_{i})\) is the set of those at risk at time ti.
The weighted marginal expected mortality rates,
\(\bar {h}^{*}(t_{i})\) can then be incorporated into the weighted likelihood rather than
h∗(
ti|
Xi) together with weights,
\(w_{i}^{*}(t_{i})\),
$$ \begin{aligned} \ln L_{i} &= d_{i} w_{i}^{*}(t_{i})\ln\left[\bar{h}^{*}(t_{i}) + \lambda_{m}(t_{i}|\pmb{\gamma}) \right] \\&\quad- \int_{0}^{t_{i}}{w_{i}^{*}(u)\lambda_{m}(u|\pmb{\gamma}) du} \end{aligned} $$
(5)
Note that Eq. (
4) only needs to be calculated at event times and is not needed for individuals with censored times. The integral in Eq. (
5) will generally not be analytically tractable. A numerical integration method, such as Gaussian quadrature, could be incorporated into the estimation process. However, we choose to split the time-scale into a number of intervals and assume that the weight is constant within each interval. The likelihood then becomes,
$$ \begin{aligned} \ln L_{i} &= d_{i} w_{i}^{*}(t_{i})\ln\left[\bar{h}^{*}(t_{i}) + \lambda_{m}(t_{i}|\pmb{\gamma}) \right] \\&\quad- \sum_{k=1}^{M_{i}} {w_{i}^{*}(t_{k})\left(\Lambda_{m}(t_{i(k)}|\pmb{\gamma}) - \Lambda_{m}(t_{i(k-1)}|\pmb{\gamma})\right)} \end{aligned} $$
(6)
where
Mi is the number of intervals for the
ith subject. An advantage of this approach is that after splitting the time-scale and calculating the weights, standard parametric relative survival models can be used. This requires the software to incorporate both weights and left truncation into the likelihood. There needs to be a choice of how finely to split the time-scale. As the weights depend upon expected mortality rates, the weights will vary continuously, so a choice needs to be made at what point within the interval to calculate the weight. We use the mid-point of the interval [
14]. More time intervals will result in greater precision, but increase computational time. The choice of the number of time intervals is investigated in the example in the “
Results” section. The weights vary within individuals and leads to within-subject correlation. Therefore, a cluster robust sandwich estimator of the variance is used [
15]. This is similar to other methods that use time-dependent weights, such as the Fine and Gray subhazard model [
16] or the parametric equivalent [
17].
External age-standardization
In order to compare estimates of marginal relative survival between different population groups it is necessary to age-standardize to the same age distribution. In the non-parametric setting the usual approach is to estimate marginal relative survival separately within age groups and then obtain a weighted average of the age-specific estimates, with weights equal to the proportion within each age group in the reference population. In a modelling framework regression standardization is performed with each individual up or downweighted using the ratio of the proportion in the age group to which the individual belongs and the proportion in the reference age group [
2]. A similar idea can be used within the marginal model that enables externally age-standardized estimates to be obtained without the need to model, or stratify by, age.
Let
\(p^{a}_{i}\) be the proportion in the age group to which the
ith individual belongs and
\(p^{R}_{i}\) be the corresponding proportion in the reference population. Weights can be defined to upweight or downweight individual relative to the reference population.
$$ w_{i}^{a} = \frac{p^{R}_{i}}{p^{a}_{i}} $$
(7)
These weights can then be combined with the inverse expected survival weights,
$$ w_{i}(t) = w_{i}^{a} w_{i}^{*}(t) $$
These weights are the same as those defined by Sasieni and Brentnall [
7] for use in non-parametric relative survival estimators. The weights need to be used when calculating
\(\bar {h}^{*}(t_{i})\) by substituting
\(w_{i}^{*}(t)\) for
wi(
t) in Eq. (
4) and in the likelihood in Eq. (
6). It is common just to standardize by age, but the approach is applicable when standardizing over multiple covariates.
Choice of parametric model
The likelihood defined in Eq. (
6) could be used for a variety of parametric models. Here we use flexible parametric survival models on the log-cumulative excess hazard scale [
9] that incorporate restricted cubic splines to model the effect of time from diagnosis. An advantage of modeling on the log-cumulative excess hazard scale is that the it provides an analytical form for the cumulative excess hazard that is required for the likelihood in Eq. (
6). The model for the log cumulative excess hazard,
Λ(
t|
k0,
γ), where
k0 is a vector of knots and
γ the associated splines parameters, is
$$\ln\left[\Lambda(t|\pmb{k_{0}},\pmb{\gamma})\right] = \eta(t|\pmb{k_{0}},\pmb{\gamma}) = s\left(\ln(t)|\pmb{k_{0}},\pmb{\gamma}\right) $$
where
s(ln(
t)|
k0,
γ) is a restricted cubic spline function of log time. The number of parameters to model the baseline is determined by the number of knots for the restricted cubic spine function with the number of parameters (including the intercept) being equal to the number of knots. Simulation studies have shown that the models give negligible bias when estimating survival functions across a wide range of scenarios [
18,
19]. For more details on these models see Royston and Lambert [
13].
After fitting a model the estimated marginal relative survival and marginal excess mortality functions can be estimated,
$$\begin{aligned} \widehat{R}_{m}(t|\pmb{X}) &= \exp[-\exp(\widehat{\eta}(t|\pmb{k_{0}},\pmb{\gamma}))] \ \ \ \ \ \ \ \\ \widehat{\lambda}_{m}(t|\pmb{X}) &= \frac{d s\left(\ln(t)|\pmb{k_{0}},\widehat{\pmb{\gamma}}\right)}{dt} \exp[\widehat{\eta}(t|\pmb{k_{0}},\pmb{\gamma})] \end{aligned} $$
In
Appendix I, we describe how a semi-parametric marginal model could be fitted through estimation of a separate parameter for each event type using Poisson regression and how this is a equivalent to the Pohar Perme non-parametric estimate when not modeling covariates in order to demonstrate how the methods are related. However, we do not advocate this approach due to the computational intensity of splitting the time scale at unique time points and estimating a separate parameter for each time interval.