Skip to main content

19.03.2020 | Original Research

Benchmarking Observational Analyses Against Randomized Trials: a Review of Studies Assessing Propensity Score Methods

Journal of General Internal Medicine
AM Shaun P. Forbes, MD ScD Issa J. Dahabreh
Wichtige Hinweise

Electronic supplementary material

The online version of this article (https://​doi.​org/​10.​1007/​s11606-020-05713-5) contains supplementary material, which is available to authorized users.

Prior Presentations

Earlier versions of this work were presented in part at the International Conference on Health Policy Statistics, October 7–9, 2015, and the AcademyHealth Annual Research Meeting, June 26–28, 2016.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.



Observational analysis methods can be refined by benchmarking against randomized trials. We reviewed studies systematically comparing observational analyses using propensity score methods against randomized trials to explore whether intervention or outcome characteristics predict agreement between designs.


We searched PubMed (from January 1, 2000, to April 30, 2017), the AHRQ Scientific Resource Center Methods Library, reference lists, and bibliographies to identify systematic reviews that compared estimates from observational analyses using propensity scores against randomized trials across three or more clinical topics; reported extractable relative risk (RR) data; and were published in English. One reviewer extracted data from all eligible systematic reviews; a second reviewer verified the extracted data.


Six systematic reviews matching published observational studies to randomized trials, published between 2012 and 2016, met our inclusion criteria. The reviews reported on 127 comparisons overall, in cardiology (29 comparisons), surgery (49), critical care medicine and sepsis (46), nephrology (2), and oncology (1). Disagreements were large (relative RR < 0.7 or > 1.43) in 68 (54%) and statistically significant in 12 (9%) of the comparisons. The degree of agreement varied among reviews but was not strongly associated with intervention or outcome characteristics.


Disagreements between observational studies using propensity score methods and randomized trials can occur for many reasons and the available data cannot be used to discern the reasons behind specific disagreements. Better benchmarking of observational analyses using propensity scores (and other causal inference methods) is possible using observational studies that explicitly attempt to emulate target trials.

Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten

e.Med Interdisziplinär

Mit e.Med Interdisziplinär erhalten Sie Zugang zu allen CME-Fortbildungen und Fachzeitschriften auf

Jetzt e.Med zum Sonderpreis bestellen!

Sichern Sie sich jetzt Ihr e.Med-Abo und sparen Sie 50 %!

Weitere Produktempfehlungen anzeigen
Über diesen Artikel
  1. Sie können e.Med Innere Medizin 14 Tage kostenlos testen (keine Print-Zeitschrift enthalten). Der Test läuft automatisch und formlos aus. Es kann nur einmal getestet werden.

  2. Sie können e.Med Allgemeinmedizin 14 Tage kostenlos testen (keine Print-Zeitschrift enthalten). Der Test läuft automatisch und formlos aus. Es kann nur einmal getestet werden.