Original Article
Observational studies using propensity score analysis underestimated the effect sizes in critical care medicine

https://doi.org/10.1016/j.jclinepi.2014.02.018Get rights and content

Abstract

Background and Objective

Propensity score (PS) analysis has been increasingly used in critical care medicine; however, its validation has not been systematically investigated. The present study aimed to compare effect sizes in PS-based observational studies vs. randomized controlled trials (RCTs) (or meta-analysis of RCTs).

Methods

Critical care observational studies using PS were systematically searched in PubMed from inception to April 2013. Identified PS-based studies were matched to one or more RCTs in terms of population, intervention, comparison, and outcome. The effect sizes of experimental treatments were compared for PS-based studies vs. RCTs (or meta-analysis of RCTs) with sign test. Furthermore, ratio of odds ratio (ROR) was calculated from the interaction term of treatment × study type in a logistic regression model. A ROR < 1 indicates greater benefit for experimental treatment in RCTs compared with PS-based studies. RORs of each comparison were pooled by using meta-analytic approach with random-effects model.

Results

A total of 20 PS-based studies were identified and matched to RCTs. Twelve of the 20 comparisons showed greater beneficial effect for experimental treatment in RCTs than that in PS-based studies (sign test P = 0.503). The difference was statistically significant in four comparisons. ROR can be calculated from 13 comparisons, of which four showed significantly greater beneficial effect for experimental treatment in RCTs. The pooled ROR was 0.71 (95% CI: 0.63, 0.79; P = 0.002), suggesting that RCTs (or meta-analysis of RCTs) were more likely to report beneficial effect for the experimental treatment than PS-based studies. The result remained unchanged in sensitivity analysis and meta-regression.

Conclusion

In critical care literature, PS-based observational study is likely to report less beneficial effect of experimental treatment compared with RCTs (or meta-analysis of RCTs).

Introduction

Well-designed and properly conducted randomized controlled trial (RCT) is one of the most important sources of evidence for clinical decision-making. Randomization will balance both measured and unmeasured variables between treated and untreated subjects. RCT can provide causal association between intervention and outcome, which is the key for clinicians to understand the underlying mechanisms for a pathologic condition. However, such experimental studies are often not feasible because of economical and ethical constraints [1]. Thus, clinical evidence is often shaped by observational studies, in which however the treatment effect is often confounded by many measured and unmeasured factors. Many techniques have been developed to control these confounding factors, including stratification, matching, and multivariable regression analysis [2].

Propensity score (PS) analysis was developed in the 1980s and has been increasingly used in biomedical field [3], [4], [5], [6]. It is defined as the conditional probability of receiving a treatment or exposure given a series of predefined covariates [7]. With conventional matching or stratification, only few covariates can be taken into account, whereas the PS technique is able to incorporate all measured confounding factors and assigned each subject a score based on the probability that one will receive treatment. PS can be used for adjustment, matching, weighting, and stratification [8]. Critical care studies are especially subjected to bias because a long list of baseline characteristics cannot be easily balanced, and there is large number of interventions other than experimental treatment being conducted in intensive care unit (ICU). Thus, it is of crucial importance to control confounding factors in observational studies, particularly when administrative data are used for analysis [9]. Thus, PS has found its way into the field of critical care medicine, and the number of publications involving PS has increased exponentially in recent years [10]. However, the validity of PS has long been debated, and it is unknown whether the result obtained by using PS is comparable with that obtained by RCTs. Thus, the present study aimed to compare the treatment effect for experimental intervention in PS-based observational studies vs. RCTs (or meta-analysis of RCTs) in critical care medicine.

Section snippets

Study selection

Observational studies using PS in the field of critical care medicine were identified by searching PubMed from inception to April 2013. There was no language restriction. Searching strategies consisted terms related to critical care and PS and mortality: (((((critically ill[Title/Abstract]) OR critical care[Title/Abstract]) OR intensive care[Title/Abstract]) OR ICU[Title/Abstract]) AND propensity score[Title/Abstract]) AND mortality[Title/Abstract]. Studies were potentially eligible if they (1)

Study selection and characteristics

Fig. 1 shows the flow chart of study selection. Our initial search identified 161 potential studies. Among them, 109 studies were excluded because they were not critical care studies, investigating risk factors or not involving human subjects. The remaining 52 studies using PS were matched for potential RCTs. A total of 32 studies were excluded because they could not be matched to an RCT. Finally, a total of 20 studies [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26]

Discussion

Our study showed that in critical care medicine, RCTs (or meta-analysis of RCTs) are more likely to report beneficial effect for experimental treatment than observational studies using PS. The result remains unchanged after adjustment for potential confounding factors such as the design of PS-based studies and the number of covariates used to derive PS. With important advances in technology and Internet, increasing number of clinical databases capturing patients with critical illness is

Acknowledgments

Z.Z. helped design the study, conduct the study, analyze the data, and write the manuscript and has seen the original study data, reviewed the analysis of the data, and approved the final manuscript. H.N. helped design the study and analyze the data and has seen the original study data, approved the final manuscript, and is the author responsible for archiving the study files. X.X. helped design the study and write the manuscript and has seen the original study data and approved the final

References (43)

  • P.C. Austin

    Propensity-score matching in the cardiovascular surgery literature from 2004 to 2006: a systematic review and suggestions for improvement

    J Thorac Cardiovasc Surg

    (2007)
  • G. Heinze et al.

    An overview of the objectives of and the approaches to propensity score analyses

    Eur Heart J

    (2011)
  • P.C. Austin

    An introduction to propensity score methods for reducing the effects of confounding in observational studies

    Multivariate Behav Res

    (2011)
  • S. Suissa et al.

    Primer: administrative health databases in observational studies of drug effects–advantages and disadvantages

    Nat Clin Pract Rheumatol

    (2007)
  • E. Gayat et al.

    Propensity scores in intensive care and anaesthesiology literature: a systematic review

    Intensive Care Med

    (2010)
  • K.F. Schulz et al.

    CONSORT 2010 Statement: updated guidelines for reporting parallel group randomised trials

    Trials

    (2010)
  • J.A. Sterne et al.

    Statistical methods for assessing the influence of study characteristics on treatment effects in ‘meta-epidemiological’ research

    Stat Med

    (2002)
  • T.T. Leite et al.

    Timing of renal replacement therapy initiation by AKIN classification system

    Crit Care

    (2013)
  • B. Jung et al.

    Effects of etomidate on complications related to intubation and on mortality in septic shock patients treated with hydrocortisone: a propensity score analysis

    Crit Care

    (2012)
  • E.K. Bajwa et al.

    Statin therapy as prevention against development of acute respiratory distress syndrome: an observational study

    Crit Care Med

    (2012)
  • M. Bojan et al.

    High-frequency oscillatory ventilation and short-term outcome in neonates and infants undergoing cardiac surgery: a propensity score analysis

    Crit Care

    (2011)
  • Cited by (35)

    • Effects of infant motor problems and treatment with physiotherapy on child outcomes at school-age

      2020, Early Human Development
      Citation Excerpt :

      Adjusting for imbalances in these baseline characteristics through PSM showed that receipt of physiotherapy neither negatively nor positively affected motor skills and cognitive function. Despite a small adverse effect on attention regulation that was below the level that is considered clinically relevant these results should be interpreted cautiously, particularly as observational studies that use PSM may still underestimate beneficial effects of treatment on outcomes (in contrast to a randomized controlled trial (RCT)) [26]. Overall, early treatment with physiotherapy was not found to improve motor skills into school-age – nor did it have a positive effect on other, related, developmental outcomes; at least not physiotherapy alone.

    • Alternative methodological approach to randomized trial for surgical procedures routinely used

      2018, Contemporary Clinical Trials
      Citation Excerpt :

      The use of propensity score is rising in medical literature, but there is an uncertainty concerning the comparability of treatment effect estimated by propensity score matching with that obtained by RCT. For example, in critical care medicine a study showed that observational study with propensity score reports less beneficial effect of experimental treatment compared with RCT [38]. This result can be partially explained by the fact that patients included in RCT are generally selected with strict inclusion criteria.

    • Risk of maternal mortality in women with severe anaemia during pregnancy and post partum: a multilevel analysis

      2018, The Lancet Global Health
      Citation Excerpt :

      We noted a moderately strong association between severe anaemia and maternal death in the multilevel regression analysis on the basis of published criteria, with a point estimate (ie, adjusted OR) in the range of 2–5.29,30 By these criteria, the association was weak in the propensity score regression analysis, but this statistical approach underestimates the strength of associations.31 The association was temporal in that the exposure measurements (severe anaemia) were obtained before the occurrence of the outcome (death).

    • “Second best”: A good start

      2017, Journal of Thoracic and Cardiovascular Surgery
    • Diverse criteria and methods are used to compare treatment effect estimates: a scoping review

      2016, Journal of Clinical Epidemiology
      Citation Excerpt :

      We included a total of 26 studies in this review (Fig. 1). Among the 26 included studies, 19 aimed to compare the effect estimates obtained using observational studies to those from randomized controlled trials [8–10,17–32]; from these, four were specific to observational studies that used propensity score methods [8,10,19,32], and one was specific to observational studies conducted using administrative data sets [27]. Five studies aimed to compare the effect estimates from systematic reviews that used indirect comparisons versus systematic reviews that used direct comparisons or network meta-analysis [33–37], one aimed to compare the effect estimates from large randomized trials versus systematic reviews that used meta-analysis of small trials [38], and one aimed to compare more than two types of study designs [39].

    View all citing articles on Scopus

    Conflict of interest: None.

    View full text