Background
CD
) and non-cancer-death (NCD
) times. We simulated data with negative, null, and positive correlations, thereby covering an exhaustive range of dependence assumptions. This study is the first which investigates the impact of censoring and, most importantly, of misclassification of causes of death on the behaviour of these tests.CD
, taking into account the fact that patients can meanwhile die of other causes, and that an increased risk of NCD
is possible in the treatment arm, due to chemotherapy toxicity.CD
and NCD
.Methods
Tests for competing causes of death
Peto (Pe
)
NCD
. Those concerning CD
are then obtained by taking the difference between the former two. Another relevant peculiarity of this approach is that all deaths due to an unknown cause and all those occurring after a relapse are ascribed to the cancer, even if explicitly declared as due to another cause.Cause-specific (CS
)
Gray (Gr
)
Plots
CD
, as well as those of an unknown cause. NCD
s preceded by a recurrence are censored when a recurrence occurs. According to the Peto method, first the survival probability is computed per year for the 2 arms combined. Then the survival probability for each arm is obtained by adding to it or subtracting from it a quantity which depends on the logarithm of the yearly risk ratio [16, 17]. Finally, the Aalen–Johansen estimates of the CIFs [34], corresponding to the Gray test, are plotted. It is noteworthy that, in the case of overall survival, one minus the CIF corresponds to the Kaplan–Meier estimate.Simulation study
CD
and NCD
is the focus of the researcher’s interest. Sometimes, the classification of causes of deaths implicitly requires the occurrence of a recurrence (Rec). Although the treatment evaluation is done directly on times to CD
and NCD
, the tests differ in the manner of classifying causes of death. In particular, the Peto test requires information on the times to recurrence.CD
and NCD
(Figure 1). We generated them by using two exponential distributions, possibly with positive or negative dependence. We obtained them in two steps. First, a bivariate normal random variable Z = (Z
1,Z
2)⊤ was generated with unit means, unit variances and correlation ρ. Then, the times to death were computed as T
CD = - log(Φ (Z
1))/λ
CD and T
NCD = - log(Φ(Z
2))/λ
NCD, where Φ (·) is the standard normal distribution function [27]. Thus, T
CD ∼ Exp (λ
CD) and T
NCD ∼ Exp (λ
NCD). In the control group of the IALT trial, which we describe below, we estimated that the CD
rate is about five-fold higher than the NCD
rate. Therefore, we set and . The time to death for each subject is then T
D
= min(T
CD,T
NCD). Finally, we assumed that, conditional on the time to CD
, the time to recurrence T
Rec
follows a uniform distribution between 0 and T
CD. Hence, a recurrence is observed whenever T
Rec
< T
D
and is censored only when T
Rec
> T
NCD. This method allowed us to study the effect of the reclassification done by Peto: in our simulations about half of the NCD
were preceded by a recurrence. We did not consider the case of unknown causes of death, which were very marginal in our real dataset.
CD
and NCD
times. On the other hand, this does not affect the correlation between CD
and recurrence times, which can be shown to be constantly . In this respect, no difference exists between the scenarios. In order to investigate the properties of the tests in a wide range of situations, we chose five values for ρ, covering very negative and very positive dependence, passing through weak and no dependence: -0.75, -0.375, 0, 0.375, 0.75.CD
and NCD
(H R CD = H R NCD = 1),CD
(H R CD = 1) and an increased NCD
risk (H R NCD = 1.25),CD
(H R CD = 0.8) and a null effect on NCD
(H R NCD = 1),CD
(H R CD = 0.8) and an increased NCD
risk (H R NCD = 1.25).CD
. Finally, the fourth one is a scenario that could occur for chemotherapy and radiotherapy regimens in oncology, as their efficacy against CD
implies a cost in terms of an increased NCD
hazard. The hazard ratios for the treatment effect in the four scenarios are illustrated in Figure 2. In addition to the situation with complete data, we replicated simulations with 25% and 50% of censored observations. Censoring times were generated from uniform random variables between zero and a given bound. For each scenario, the choice of this upper bound was made numerically in order to attain the desired proportion of censored times to death. As in clinical practice the causes of death can be misrecorded, we also reperformed all the tests after inverting the cause (CD
vs. NCD
) of 20% of deaths.
The International Adjuvant Lung Cancer Trial
Chemotherapy | Control | Total | ||
---|---|---|---|---|
Cancer Deaths | 438 | 480 | 918 | |
Non-Dancer Deaths | 107 | 72 | 179 | (Of which 71 after a relapse) |
Deaths from unknown cause | 33 | 38 | 71 | (Of which 26 after a relapse) |
All Deaths | 578 | 590 | 1168 |
Results and discussion
CD
and NCD
, three possible proportions of censoring, and presence or absence of misclassification of the cause of death. For each of these 4 × 5 × 3 × 2 = 120 situations, 10 000 data sets of size 1000 were generated. The three tests were performed for each of them and the empirical rejection probabilities at a 5% nominal size were computed across the 10 000 replications. The null hypothesis of no treatment effect holds in scenarios 1 and 2 for CD
and in scenarios 1 and 3 for NCD
. In these cases the empirical rejection probabilities stand for the empirical size of the tests. On the contrary, in all the other situations, the hypothesis does not hold and the rejection probabilities represent the empirical power of the tests. Of note, the rate of miclassified causes of death (20%) is quite high with respect to clinical real life, but it is useful in this context to study its role in a somehow extreme situation.CD
, but it is harmful in terms of NCD
, because of toxicity. Figure 3 shows the main results with complete data, whereas full details with 25% and 50% censored observations are provided in Additional file 1: Table A.3 and Figure A.4. Let’s first consider the results when there is no misclassification of the cause of death. Under these conditions results show that for complete data Gr
(Gray test) has an overinflated size for CD
(0.10 < α < 0.19, complete data), whereas the other two tests have better empirical sizes in general (0.04 < α ≤ 0.12 for Pe
[Peto test] and 0.05 < α < 0.08 for CS
[Cause-Specific test], complete data). Due to the set-up of our simulation study with a CD
rate about 5-fold higher than a NCD
rate, the three tests have moderate power for detecting an effect for NCD
(0.12 < 1 - β < 0.41, complete data), with CS
outperforming its two competitors and Pe
being the least powerful (1 - β < 0.23). As censoring increases, all the rejection probabilities decrease in general and get closer and closer to each other, so that the differences between them become less and less pronounced. CS
seems to be the most reliable choice in this context. In the case that 20% of the causes of death are misrecorded (see also Additional file 1: Figure A.5 and Table A.4), the size of Gr
is more correct (α ∈ [ 0.06,0.08], complete data) and the three tests loose power for detecting the effect on NCD
, notably CS
(1 - β < 0.26) and Gr
(1 - β < 0.13).
CD
, without any effect on NCD
. Under these conditions and without misclassified causes of death, the results in Figure 4 (see also Additional file 1: Figure A.6 and Table A.5) suggest that Gr
has the lowest power for CD
(0.54 < 1- β < 0.93 for Gr
, while 0.86 < 1 - β for Pe
and CS
; complete data) and often by far the highest size for NCD
(0.16 < α; complete data). CS
and Pe
are largely equivalent for CD
. Either CS
or Pe
is preferable for NCD
(0.05 < α < 0.17 for CS
, 0.05 < α < 0.11 for Pe
; complete data), depending on the correlation. Again, censoring causes a contraction of the empirical rejection probabilities, irrespective of whether the null hypothesis holds or not. In this scenario Pe
and CS
are broadly equivalent, whereas Gr
should not be preferred. When introducing miclassification of the cause of 20% of deaths (see also Additional file 1: Figure A.7 and Table A.6), CS
is less powerful for CD
(0.77 < 1 - β; complete data) and has very inflated size for NCD
(0.10 < α < 0.33); Gr
has very poor power for CD
(1 - β < 0.37, complete data) but is more correct for NCD
(0.04 < α < 0.09); again, Pe
is less sensitive to misclassification as it reclassifies at least some of the deaths as due to the cancer when a recurrence occurs, irrespective of the declared cause.
CD
, but at a cost of a harm in terms of NCD
hazard. Gr
is uniformly the most powerful in this scenario. In particular, for NCD
it is in general 35–40% more powerful than its competitors (0.62 < 1 - β < 0.89 for Gr
, 0.16 < 1- β < 0.74 for CS
and 0.22 < 1 - β < 0.40 for Pe
; complete data). The rejection probabilities are far more similar for CD
, with high power ranging from 0.73 to 1.00 for all tests (complete data). In all the scenarios, the tests are generally more powerful for CD
than for NCD
because the baseline hazard for CD
is considerably higher than for NCD
(λ
CD = 5 × λ
NCD). Even though censoring attenuates differences between the three tests, Gr
is undoubtedly preferable under these conditions. On the other hand, Gr
has the highest loss of power due to misclassification of the cause of death (see also Additional file 1: Figure A.9 and Table A.8) notably for CD
(1 - β < 0.57, complete data); for NCD
the widest power loss is for CS
(1 - β < 0.09).
CD
and NCD
is of primary interest. Plots on the first line of Figure 6 show the Nelson–Aalen estimate of the cumulative risk (a), the cumulative yearly rates estimated by the Peto method (b), and the cumulative incidence function (c), respectively, for overall mortality by treatment arm. Note that, as no competing event exists for overall survival, plot 6(c) corresponds to one minus the Kaplan-Meier estimate. Chemotherapy seems to provide a benefit up to five years after randomization, and then the two curves overlap. Under a proportional hazards assumption, the estimated hazard ratio between the chemotherapy and the control groups is 0.95 (95% CI: 0.84-1.06) and the log-rank test has a p-value equal to 0.34. Note that, for the sake of simplicity, we did not adjust for any of the prognostic factors used in previous publications about the IALT study. The desired and expected action of cisplatin-based chemotherapy is to reduce the risk of CD
, while having no effect or moderately increasing the risk of NCD
. Figures 6(g) – 6(i) show the same quantities as (a)–(c) but only for CD
; you can see that risk and incidence are constantly less in the chemotherapy group than in the control group. On the other hand, Figures 6(d) – 6(f) show that the two treatment arms are overall equivalent with respect to non-cancer mortality; an increased NCD
rate and incidence are observed for the experimental group after five years. Then, we compared the results of testing the effect of chemotherapy on the competing causes of death by means of the three test statistics considered thus far: Pe
, CS
and Gr
(Table 2). The increase observed in NCD
in the treatment arm (see Figure 6(d)) is significant according to the three tests: p = 0.029 for Pe
, p = 0.041 for CS
and p = 0.015 for Gr
. One should keep in mind that the Pe
reclassifies as CD
a total of 97 deaths: 26 NCD
s – which could attenuate the differences between treatment arms – and 71 deaths from an unknown cause. These deaths from an unknown cause are censored for both causes of death by CS
, whilst they make up a third group according to Gr
.
CD
and
NCD
, in the IALT study
CD
| NCD
| |||
---|---|---|---|---|
X
2
| (p – val) |
X
2
| (p – val) | |
Pe
| 3.72 | (0.054) | 4.77 | (0.029) |
CS
| 3.44 | (0.064) | 4.19 | (0.041) |
Gr
| 4.52 | (0.033) | 5.89 | (0.015) |
CD
, with p-values ranging from 0.033 to 0.064. This suggests that the effects on the risks of CD
and NCD
are in opposite directions, and that they compensate each other, at least partially, when all deaths due to any cause are considered together. Gr
, based on the CIF, detects a statistically significant difference at a 5% level (p = 0.033), whereas the other two are borderline but not significant (p = 0.054 for Pe
, p = 0.064 for CS
). Most likely, the net increase (i.e. in the cause-specific hazard) in the risk of NCD
in the chemotherapy arm contributes to reducing the incidence of CD
in that group, amplifying the reduction in the risk of CD
when measured in terms of the CIF, although the differences between the test statistics are small.CS
and the Pe
tests treat death from other causes as independent censoring, which is not realistic in most practical situations. Gr
does not require such an assumption but on the other hand its estimated effect on each competing event reflects also the effect on the others. Thus, both the approaches have a possible drawback, but none of the two prevailed clearly in the simulation study: assuming independent censoring can be a serious issue in the case of strong correlation, whereas using the hazard of the subdistribution can be misleading whenever the treatment changes the hazard of only one of the competing events.CD
) but the higher the risk of pulmonary complications (and then of NCD
).Conclusions
Gr
seemed to be the most reliable in the situation of a therapy that reduced the risk of CD
and increased that of NCD
, provided that causes of death are correctly recorded; otherwise, it performed substantially worse and Pe
should be recommended. In all the other situations Gr
had the poorest performances, both in terms of the preservation of the nominal size and in terms of power.CS
should be preferred whenever the treatment is expected to be ineffective against the risk of CD
and possibly harmful in terms of NCD
. A cancer treatment is required to be effective against the risk of CD
but not against that ofNCD
. In that case, Pe
was comparable to CS
, except that CS
had very high size for NCD
in the presence of a high rate of misrecorded causes of death. In our study, Pe
did not outperform its competitors in any situation in which the causes of death were correctly classified, whereas it was often the most reliable when the misclassification rate was high.Gr
was firmly in favor of a benefit versus the risk of CD
, whereas CS
and Pe
were borderline. We showed how the natural graphical representations for the three tests are the Nelson–Aalen estimate of the cumulative cause-specific hazard, the cumulative yearly rates as estimated by Peto, and the Aalen–Johansen estimate of the cumulative incidence function.CD
and NCD
. To keep things simple, we chose not to generate times to death from unknown causes. In such cases, multiple imputations or inverse probability weighting techniques exist (see for instance [35]).