Original ArticleObservational studies using propensity score analysis underestimated the effect sizes in critical care medicine
Introduction
Well-designed and properly conducted randomized controlled trial (RCT) is one of the most important sources of evidence for clinical decision-making. Randomization will balance both measured and unmeasured variables between treated and untreated subjects. RCT can provide causal association between intervention and outcome, which is the key for clinicians to understand the underlying mechanisms for a pathologic condition. However, such experimental studies are often not feasible because of economical and ethical constraints [1]. Thus, clinical evidence is often shaped by observational studies, in which however the treatment effect is often confounded by many measured and unmeasured factors. Many techniques have been developed to control these confounding factors, including stratification, matching, and multivariable regression analysis [2].
Propensity score (PS) analysis was developed in the 1980s and has been increasingly used in biomedical field [3], [4], [5], [6]. It is defined as the conditional probability of receiving a treatment or exposure given a series of predefined covariates [7]. With conventional matching or stratification, only few covariates can be taken into account, whereas the PS technique is able to incorporate all measured confounding factors and assigned each subject a score based on the probability that one will receive treatment. PS can be used for adjustment, matching, weighting, and stratification [8]. Critical care studies are especially subjected to bias because a long list of baseline characteristics cannot be easily balanced, and there is large number of interventions other than experimental treatment being conducted in intensive care unit (ICU). Thus, it is of crucial importance to control confounding factors in observational studies, particularly when administrative data are used for analysis [9]. Thus, PS has found its way into the field of critical care medicine, and the number of publications involving PS has increased exponentially in recent years [10]. However, the validity of PS has long been debated, and it is unknown whether the result obtained by using PS is comparable with that obtained by RCTs. Thus, the present study aimed to compare the treatment effect for experimental intervention in PS-based observational studies vs. RCTs (or meta-analysis of RCTs) in critical care medicine.
Section snippets
Study selection
Observational studies using PS in the field of critical care medicine were identified by searching PubMed from inception to April 2013. There was no language restriction. Searching strategies consisted terms related to critical care and PS and mortality: (((((critically ill[Title/Abstract]) OR critical care[Title/Abstract]) OR intensive care[Title/Abstract]) OR ICU[Title/Abstract]) AND propensity score[Title/Abstract]) AND mortality[Title/Abstract]. Studies were potentially eligible if they (1)
Study selection and characteristics
Fig. 1 shows the flow chart of study selection. Our initial search identified 161 potential studies. Among them, 109 studies were excluded because they were not critical care studies, investigating risk factors or not involving human subjects. The remaining 52 studies using PS were matched for potential RCTs. A total of 32 studies were excluded because they could not be matched to an RCT. Finally, a total of 20 studies [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26]
Discussion
Our study showed that in critical care medicine, RCTs (or meta-analysis of RCTs) are more likely to report beneficial effect for experimental treatment than observational studies using PS. The result remains unchanged after adjustment for potential confounding factors such as the design of PS-based studies and the number of covariates used to derive PS. With important advances in technology and Internet, increasing number of clinical databases capturing patients with critical illness is
Acknowledgments
Z.Z. helped design the study, conduct the study, analyze the data, and write the manuscript and has seen the original study data, reviewed the analysis of the data, and approved the final manuscript. H.N. helped design the study and analyze the data and has seen the original study data, approved the final manuscript, and is the author responsible for archiving the study files. X.X. helped design the study and write the manuscript and has seen the original study data and approved the final
References (43)
- et al.
Confounding and control of confounding in nonexperimental studies of medications in patients with CKD
Adv Chronic Kidney Dis
(2012) - et al.
Within-center matching performed better when using propensity score matching to analyze multicenter survival data: empirical and Monte Carlo studies
J Clin Epidemiol
(2013) - et al.
Random-effects model for meta-analysis of clinical trials: an update
Contemp Clin Trials
(2007) - et al.
Seven-day antibiotic courses have similar efficacy to prolonged courses in severe community-acquired pneumonia–a propensity-adjusted analysis
Clin Microbiol Infect
(2011) - et al.
Low-dose steroid therapy does not affect hemodynamic response in septic shock patients
J Crit Care
(2007) - et al.
Use of the pulmonary artery catheter is not associated with worse outcome in the ICU
Chest
(2005) - et al.
Treatments effects from randomized trials and propensity score analyses were similar in similar populations in an example from cardiac surgery
J Clin Epidemiol
(2011) - et al.
Observational studies: propensity score analysis of non-randomized data
Int MS J
(2009) - et al.
The central role of the propensity score in observational studies for causal effects
Biometrika
(1983) A critical appraisal of propensity-score matching in the medical literature between 1996 and 2003
Stat Med
(2008)
Propensity-score matching in the cardiovascular surgery literature from 2004 to 2006: a systematic review and suggestions for improvement
J Thorac Cardiovasc Surg
An overview of the objectives of and the approaches to propensity score analyses
Eur Heart J
An introduction to propensity score methods for reducing the effects of confounding in observational studies
Multivariate Behav Res
Primer: administrative health databases in observational studies of drug effects–advantages and disadvantages
Nat Clin Pract Rheumatol
Propensity scores in intensive care and anaesthesiology literature: a systematic review
Intensive Care Med
CONSORT 2010 Statement: updated guidelines for reporting parallel group randomised trials
Trials
Statistical methods for assessing the influence of study characteristics on treatment effects in ‘meta-epidemiological’ research
Stat Med
Timing of renal replacement therapy initiation by AKIN classification system
Crit Care
Effects of etomidate on complications related to intubation and on mortality in septic shock patients treated with hydrocortisone: a propensity score analysis
Crit Care
Statin therapy as prevention against development of acute respiratory distress syndrome: an observational study
Crit Care Med
High-frequency oscillatory ventilation and short-term outcome in neonates and infants undergoing cardiac surgery: a propensity score analysis
Crit Care
Cited by (35)
Effects of infant motor problems and treatment with physiotherapy on child outcomes at school-age
2020, Early Human DevelopmentCitation Excerpt :Adjusting for imbalances in these baseline characteristics through PSM showed that receipt of physiotherapy neither negatively nor positively affected motor skills and cognitive function. Despite a small adverse effect on attention regulation that was below the level that is considered clinically relevant these results should be interpreted cautiously, particularly as observational studies that use PSM may still underestimate beneficial effects of treatment on outcomes (in contrast to a randomized controlled trial (RCT)) [26]. Overall, early treatment with physiotherapy was not found to improve motor skills into school-age – nor did it have a positive effect on other, related, developmental outcomes; at least not physiotherapy alone.
Propensity score: Interests, use and limitations. A practical guide for clinicians
2018, Revue de Medecine InterneAlternative methodological approach to randomized trial for surgical procedures routinely used
2018, Contemporary Clinical TrialsCitation Excerpt :The use of propensity score is rising in medical literature, but there is an uncertainty concerning the comparability of treatment effect estimated by propensity score matching with that obtained by RCT. For example, in critical care medicine a study showed that observational study with propensity score reports less beneficial effect of experimental treatment compared with RCT [38]. This result can be partially explained by the fact that patients included in RCT are generally selected with strict inclusion criteria.
Risk of maternal mortality in women with severe anaemia during pregnancy and post partum: a multilevel analysis
2018, The Lancet Global HealthCitation Excerpt :We noted a moderately strong association between severe anaemia and maternal death in the multilevel regression analysis on the basis of published criteria, with a point estimate (ie, adjusted OR) in the range of 2–5.29,30 By these criteria, the association was weak in the propensity score regression analysis, but this statistical approach underestimates the strength of associations.31 The association was temporal in that the exposure measurements (severe anaemia) were obtained before the occurrence of the outcome (death).
“Second best”: A good start
2017, Journal of Thoracic and Cardiovascular SurgeryDiverse criteria and methods are used to compare treatment effect estimates: a scoping review
2016, Journal of Clinical EpidemiologyCitation Excerpt :We included a total of 26 studies in this review (Fig. 1). Among the 26 included studies, 19 aimed to compare the effect estimates obtained using observational studies to those from randomized controlled trials [8–10,17–32]; from these, four were specific to observational studies that used propensity score methods [8,10,19,32], and one was specific to observational studies conducted using administrative data sets [27]. Five studies aimed to compare the effect estimates from systematic reviews that used indirect comparisons versus systematic reviews that used direct comparisons or network meta-analysis [33–37], one aimed to compare the effect estimates from large randomized trials versus systematic reviews that used meta-analysis of small trials [38], and one aimed to compare more than two types of study designs [39].
Conflict of interest: None.