The proposed models
The new model combining the excess mortality hazard model and the competing risk model may be written:
where, in case of K covariates, b
1 = 0 and β
1k
= 0, k = 1,...,K for uniqueness.
To simplify the interpretation of model (3), we considered
J = 3 different events with event "death" denoted by
j = 3 as an example. That model assumes a common pattern for the baseline hazards through
λ
1(
t) which represents the baseline hazard for individuals experiencing the 'reference' event 1 and with vector of covariates
x equal to 0. In model (3), the log of the baseline hazard of event 1, log(
λ
1(
t)), is modelled by a cubic regression spline with one knot located at 1 year. The interior knot location at 1 year is suggested because, in many cancers, a large proportion of events are observed during the first year after diagnosis. However, the user may either choose another knot location, based on substantive knowledge about the disease or, in the absence of such knowledge, locate the interior knot at the median of the sample distribution of uncensored event times, which ensures equal data support for both functional segments. A cubic regression spline is a smooth piecewise polynomial function of order 4 in which the constraint is that the function and its first two derivatives should be continuous at the knots where the adjacent pieces of the polynomial join [
24,
25]. Since the baseline hazard for event 1 is
λ
1(
t), the baseline hazard for the event
j ≠ 1 is simply
λ
j
(
t,
x = 0) =
λ
1(
t)exp(
b
j
). For event death, the baseline hazard is
λ
1(
t)exp(
b
3) which represents the baseline of the "event-free excess death hazard". The model (3) assumes PH effects of covariates
x on the event-specific hazards. For one unit increase of a given covariate
x
k
, the effect is split into a
common (or shared) effect through the regression parameter
α
k
, and a
differential event-specific effect through the regression parameter
β
jk
. In the same way, the HRs will be estimated by exp(
α
k
) and exp(
α
k
+
β
jk
) for events 1 and
j, respectively. So, the comparison between covariate effects on event 1 and event
j can be directly tested by H
0:
β
jk
= 0 using the classical Wald test or the likelihood ratio (LR) test. Moreover, when the effect of a covariate on one event type is not significantly different from the common effect, a simpler and more parsimonious variant of model (3) can be fitted, including only the common effect
α
k
of this covariate.
However, the assumption of a common pattern for event-specific hazards through
λ
1
(
t) may seems dubious [
17,
20]. To overcome this limit of our new model (3), we propose to introduce a time-dependent log HR,
b
j
(
t), between event-specific hazards.
The new flexible model may be written:
where, in case of K covariates, b
1(t) = 0 and β
1k
= 0, k = 1,...,K for uniqueness.
In the flexible model (4), the log of the baseline hazard of event 1, log(
λ
1(
t)), and the time-dependent log HRs between event-specific hazard,
b
j
(
t), are modelled by cubic regression splines, each spline having one knot located at 1 year. (Note that while all spline functions in (3) and (4) use the same cubic B-spline basis, the resulting functional estimates may differ substantially in their values and shapes, depending on the estimated spline coefficients). Then, each time-dependent effect is modelled by a 5-degree-of-freedom (df) function [
26] and a LR test with 4 df can be used to compare a model with a constant
b
j
versus a model with
b
j
(
t) (i.e., to compare the new model (3) to the new flexible model (4)). Therefore, in the case of non-significant time-dependent effects
b
j
(
t),
j = 2...
J, the simpler model (3) may be used. Moreover, model (4) is an important and flexible alternative in modelling because some HRs between baseline hazards may be time-dependent (e.g., death and local recurrence) whereas others may be constant over time (e.g., for events close in nature such as local recurrence and distant metastasis).
Estimation procedure
In both models (3) and (4), the maximum likelihood estimates are obtained using the trick of data duplication. A detailed description of data duplication and coding can be found in references [
4,
16,
17,
27]. This trick allows fitting both models (3) and (4) using any tool for one survival outcome existing in statistical software, such as the Cox model [
17]. In the present work, the maximum likelihood estimates are obtained using an Iterative Reweighted Least Square procedure developed for a previous spline-based model [
23]. This procedure is based on split data, which approximates the contribution of each individual to the full log-likelihood by a sum of Poisson terms on time intervals that are sufficiently small for the assumption of a constant rate to be acceptable [
23,
28]. Doing so, the parameters can be estimated within the framework of the generalized linear models assuming a Poisson distribution for the observed number of deaths. However, as pointed by Dickman
et al., the user has to specify a particular link function for the generalized linear model to take into account the general population mortality in the estimation procedure [
29].
Simulation studies were conducted to assess the performance of the estimators obtained from model (4) in the case of three competing events (of whom death) and different sample sizes and censoring rates. Data generation, simulation design, and results are detailed in Additional file
1.
Briefly, the times to the events were supposed to depend on three independent prognostic factors. Different rates of drop-out censoring (0%, 15%, and 30%) and different sample sizes (N = 400 and N = 1000) were considered. The relative biases (RBs) were close to zero (range: -0.047 to 0.05) whatever the sample size and the drop-out censoring rate (Additional file
1, Table S1). Obviously, the RBs increased with the drop-out censoring rate for most parameter estimates and the impact of the drop-out censoring rate was more important with N = 400 than with N = 1000. Whatever the sample size and the drop-out censoring rate, the empirical coverage rates (ECRs) were close to the nominal level of 95% (range: 91.8 to 96.6), even when the ECR was slightly smaller than 95% for the parameter estimates of the excess mortality hazard function (Additional file
1, Table S1). Graphically, we have shown that the means of the estimates of the baseline hazard function of the three competing events were close to their true baseline hazard functions (Additional file
1, Figures S1 and S2). The performances of the estimated time-dependent HRs relative to event 2 and event 3 (excess death) were similar.