From the naive analyses based on standard Cox models, a significant benefit associated with RIC Allo-SCT was observed for MM patients with an estimated hazard ratio (HR) of death at 0.38 (95% confidence interval 95%CI: 0.18;0.80) and for HD patients (HR = 0.33, 95%CI: 0.12;0.87) while Allo-SCT seemed to be deleterious in FL patients (HR = 2.55, 95%CI: 1.37;4.75). No significant benefit was found in terms of EFS (HR =1.21, 95%CI: 0.68;2.18, HR =0.71, 95%CI: 0.38;1.35 for FL and HD respectively).
Matched propensity score-based approach
The matching procedure resulted in a drastic reduction of the sample size of the PS-matched samples. From the original datasets, 21 (91% of RIC Allo-SCT patients, 15% of controls) matched pairs could be constituted from MM patients, as compared to 19 (68% of Allo-SCTpatients, 17% of controls) from the the FL patients, and 15 (48% of the Allo-SCT patients and 79% of the controls) from the HD patients. This relies both on the original differences in sample sizes and the non-overlapped covariates values (Table
1). As a result, baseline imbalances between the two matched sets were reduced (Figure
3). Note that imbalance was also reduced for those covariates not included in the PS, especially age at diagnosis and age at transplantation in the FL cohort.
Based on these PS-matched samples, we observed a significant benefit to the survival of Allo-SCT as compared to non Allo-SCT MM patients with an estimated HR of death at 0.35 (95%CI: 0.14-0.88), as well as HD (HR = 0.23, 95%CI: 0.07;0.80). A similar result was not found for FL patients (HR = 1.28; 95%CI: 0.43;3.77). No significant benefit was found for EFS with the estimated HR of event at 0.45 (95%CI: 0.17;1.21) in FL and 0.47 (95%CI: 0.20;1.09) in HD.
IPW approach
Using the IPW approach, imbalances in the pseudo-cohorts were also reduced, though reduction was slightly less effective than that observed using the PS (Figure
3). Actually, the distribution of the covariates in the weighted samples (
pseudo-population, was close to that observed in the original datasets (Table
1).
Despite similar trends, the survival benefit associated with Allo-SCT in MM and HD patients was erased using IPW based analyses as compared to PS-based analyses, which yielded an estimated HR of death of 0.72 (95%CI: 0.37-1.39) and 0.60 (95%CI: 0.19-1.89), respectively. Results for FL patients remained non-significant (HR = 2.02, 95%CI: 0.88;4.66). No significant benefit was found for EFS, which gave an estimated HR of event of 0.67 (95%CI: 0.31;1.41) in FL and 0.64 (95%CI: 0.33;1.22) in HD.
The main objective of this paper was to report examples of treatment estimation from observational cohorts in the particular setting of Allogeneic Stem Cell Transplantation. Despite the fact that the randomized controlled trial (RCT) is the gold standard for removal of most sources of bias from observational data, such studies are difficult to conduct when evaluating Allo-SCT. In situations such as HLA-matched sibling allogeneic transplants, some authors have advocated a biological assignment trial [
16]. Such trials are also known as
genetic or
Mendelian randomization trials, and these trials consider the selection of the sibling donor and recipient genes from their parents as a random process at the time of conception. Nevertheless, implementing such a trial requires careful consideration of the ethical issues and potential biases (prognostic factor imbalance, enrollment bias) [
21]. Moreover, these trials are prospective and require several years to provide estimates of survival benefits, while observational information about treatment effect are already available.
Indeed, observational studies have several advantages over randomized, controlled trials, including lower cost, greater timeliness, and a broader range of patients [
8]. Moreover, systematic reviews tend to demonstrate that, when adequaltely performed, observational studies give results similar to those of randomized clinical trials [
52]. In the hematology field, and especially in that of Allo-SCT, many international cooperating groups exist and register all blood or marrow transplantation experiments. Notably, the European Group for Blood and Marrow Transplantation (EBMT) and the Center for International Blood and Marrow Transplant Research (CIBMTR) have collected information about patients undergoing Allo-SCT since the 1970s. Such observational registers could be a an important source of information when estimating the causal effect of Allo-SCT as compared to autologous SCT or other standard treatments. Nevertheless, standard statistical analyses from such observational data may result in biased and associational rather than causal estimates of treatment effect [
27,
28].
Since 2000, there has been a growing interest in the use of statistical methods to estimate unbiased treatment effects from observational studies and begin to be used in haematology or oncology [
53‐
56]. Most of these methods are based on the propensity score,
i.e. re-creation of the exchangeability between the two treatments groups. Two main approaches have been proposed in this setting, namely, the propensity score-matched approach and the inverse probability weighting approach [
36]. If these approaches were initially proposed for large studies, recent work by Pirracchio et al. showed that propensity score approaches (matching or IPW) are also valid and useful on small sample studies [
5]. We illustrated how those methods could perform to estimate the effect of Allo-SCT on survival and event-free survival using observational data from multiple myeloma, follicular lymphoma and Hodgkin’s disease observational cohorts. Obviously, considering our low sample sizes, our findinds should be confirmed by larger studies.
However, as recently pointed out [
32], both approaches are interested in estimating different quantities, namely the average treatment effect (ATE) and the ATE for the treated (ATT). The propensity based approach aims at estimating the ATT,
i.e. the effect of treatment on those subjects who are treated, allowing observational studies to be designed similarly to randomized experiments [
57]. By contrast, the inverse probability weighting approach aims at estimating the ATE, that is, the average effect on the population of moving all subjects from being untreated to treated. According to specific clinical contexts, researchers should determine the most clinically meaningful treatment effect. When evaluating the benefit of Allo-SCT as compared to chemotherapy, ATE (and thus, the IPW approach) would answer the question about how outcomes would change if a policy was instituted that all patients eligible for either therapy were offered Allo-SCT. By contrast, ATT would answer the question of what was the effect of treatment for those who selected a particular modality such as Allo-SCT. This explains why estimated resulting hazard ratio estimates differed between the two approaches. Indeed, by contrast to the PS-based approach, the IPW approach never showed a significant impact of Allo-SCT on overall survival or event-free survival. In other words, the benefit of Allo-SCT appeared to be restricted to treated patients, while no average benefit appeared to be expected in the whole eligible population for Allo-SCT. This is likely to rely on the fact that the benefit of Allo-SCT may be restricted to some subsets of patients that have been excluded by matching in the PS-matched analyses but maintened, and possibly heavily weighted, in the IPW method. This further highlights the importance of the positivity (overlap) assumption.
Indeed, whatever the approach, each subject is assumed to have a non-zero probability of receiving either treatment. This suggests that observational studies should be designed similar to RCTs. That is, subjects who are ineligible for at least one of the treatments should be excluded [
32]. Actually, this was exemplified in our cohorts by the percentage of control patients who could not be matched, ranging from 21% in HD up to 85% in MM. Such percentages could be related to the differences in the criteria used to define controls. Moreover, it is assumed that all variables related to both outcomes and treatment assignments were introduced in the propensity score model [
35]. Rubin suggested including only variables that are strongly related to the treatment allocation, while others have proposed the application of selection algorithms [
37,
58]. Our PS models were based on unbalanced characteristics with known clinical significance and the number of variables was limited by the sample size. Therefore, one cannot exclude that other confusing characteristics should have been included in the PS model.
Other methods could be proposed to estimate treatment effect in non-randomized studies. The most popular method consists in estimating treatment effects using adjustment on covariates with a multivariable regression model [
5].The main limitation of this approach is that the treatment effect estimated is neither the ATE nor the ATT. Indeed, the treatment effect measured is conditional on the other covariates and then biased if used as an estimate of the ATE or ATT. Another emerging approach is the instrumental variable (IV) approach which is an econometric method used to remove the effects of hidden bias in observational studies [
5]. An instrumental variable has 2 key characteristics: it is highly correlated with treatment and does not independently affect the outcome, so that it is not associated with measured or unmeasured patient health status. In our case, none available variable could be considered as an IV. Moreover, this approach hasn’t been validated on small samples. This should deserve further evaluation to be used in such clinical settings
.