Background
Appropriately designed and executed randomized controlled trials (RCTs) represent the current gold-standard primary study design for the determination of the efficacy and safety of medical interventions [
1]. Evidence from RCTs is used by healthcare providers to guide their clinical decisions and by payers and policy makers to support their recommendations for the adoption of new therapies in clinical practice [
2]. Explanatory RCTs are designed to determine the efficacy of an intervention under idealized and controlled circumstances and so are conducted under rigorous conditions, including strict adherence to structured protocols, the use of restrictive inclusion and exclusion criteria, and patient randomization, that maximize their internal validity (that is to ensure they minimize the possibility of bias regarding the effect of an intervention) [
3,
4]. In order for the results of such trials to be clinically useful, they must also be relevant to a definable patient population in a specific healthcare setting, a concept that is termed external validity or generalizability (note, these terms are used interchangeably [
3] in this review and describe the applicability of the study results outside of the trial environment) [
5‐
7]. As it is challenging to simultaneously optimize internal and external validity, efficacy data from traditional explanatory RCTs are often complemented by evidence from pragmatic trials (including pragmatic RCTs) or observational studies that determine the performance of an intervention under conditions more closely resembling routine clinical practice, and include more heterogeneous patient populations and less stringent treatment and delivery protocols [
4]. While some pragmatic trials have good internal validity and some observational studies may lack external validity, generally explanatory RCTs tend to maximize internal validity at the expense of external validity, while studies conducted in a setting more closely resembling real-world practice may do the opposite. As such, evidence from all these sources can be complementary in understanding the effect of an intervention and furthering clinical research [
8].
In recent years, the need to better understand the external validity of RCT results has been identified across numerous therapeutic areas [
9‐
13]. However, a comprehensive literature review of studies that have assessed the representativeness of RCT populations has not been undertaken in recent years (note, the term representativeness has been used throughout this review to describe the similarities between RCT samples and real-world populations). To examine this issue, we conducted a literature review of studies that have attempted to evaluate external validity in one of two ways: (i) by comparing the clinical characteristics of an RCT sample with those of everyday clinical practice patients, or (ii) by assessing what proportion of a real-world population would satisfy the criteria for RCT inclusion. In the context of the current review, real-world populations are defined as those patients encountered in routine clinical practice settings (for example, patients included in observational cohorts or patients identified from medical chart review, registries, or insurance databases). The primary objective of the review was to assess the extent to which RCT samples are representative of real-world populations (which may or may not affect the external validity of the trial findings). Other objectives were to identify key issues that may impact the external validity of trial findings (with reference to included studies) and also to outline recommendations from the identified studies for improving external validity. The present review was limited to RCTs in oncology, mental health, and cardiology as, when the review was undertaken, these were identified as the main therapeutic areas in which RCT and real-world populations had been compared. It should be noted that the focus of the current review was explanatory and not pragmatic RCTs.
Discussion
The present analysis utilized a robust literature review methodology to identify studies that compared the clinical characteristics of an RCT sample and patients from a real-world source (Method A) or assessed the proportion of a real-world population that would satisfy criteria for RCT inclusion (Method B). Publications identified by this methodology indicated that RCT samples in cardiology, mental health, and oncology studies that assessed pharmaceutical interventions in adult patients were often not broadly representative of patients treated in everyday clinical practice and that caution should be exercised when extrapolating data from trials to patients treated in usual care settings. Note that, with the exception of a single study [
40], none of the RCTs described in the included studies were documented as being of a pragmatic design. In this Method B study, the RCTs in acute coronary syndrome from which eligibility criteria were extracted were described as having pragmatic enrollment strategies; however, the analysis still suggested that there were important differences in risk profile between RCT eligible and ineligible patients [
40]. Differences in demographics, clinical characteristics, and treatments and procedures were reported between RCT and real-world patients by studies that employed Method A in their analyses [
15,
17,
21‐
25,
27,
29‐
31,
37,
38,
42,
44,
45,
48,
49]. Similarly, when specific RCT inclusion/exclusion criteria were applied to real-world populations (Method B), important differences with respect to demographics and clinical and treatment parameters were identified between patients who would have been RCT ineligible compared with those who would have been eligible for the trial [
16,
18‐
21,
25,
26,
28,
32‐
36,
38‐
44,
46,
47,
49‐
51]. Furthermore, it was observed that large proportions of the general disease population were often excluded from trial participation. We note that some differences in generalizability were observed between the different therapeutic areas studied in the present review.
In only a minority of studies did the authors conclude that RCT samples were broadly representative of real-world populations and that external validity was not impacted, or failed to reach an explicit conclusion regarding external validity despite demonstrating some differences in baseline characteristics between groups [
52‐
66]. These findings are largely consistent with a previously published systematic sampling review that assessed the nature and extent of exclusion criteria among RCTs published between 1994 and 2006 in selected medical journals with impact factors > 2.5 [
2]. While involving the review of older studies and use of more restrictive search criteria than the present review, this earlier study also demonstrated that RCTs often exclude large proportions of the general disease population and specific patient groups from trial participation. In agreement with the present review, it was reported that the elderly, women, and patients with co-morbidities were frequently ineligible for trial inclusion [
2]. However, note that RCT findings may still be externally valid even in circumstances where the patient sample is not broadly representative of the real-world population. For example, one study included in the present review concluded that patients with unstable angina or non-ST-segment elevation myocardial infarction who would have been excluded from enoxaparin RCTs could be safety treated in clinical practice [
53].
That the external validity of RCT results is often limited is widely acknowledged by clinicians as a problem when it comes to extrapolating data to the patients seen in everyday practice [
3,
7]. Indeed, it is an often-cited reason for the frequent underuse of guideline-recommended therapies [
67]. Where there is no evidence of efficacy in specific patient groups, clinicians may well be right in withholding treatment so as to prevent unanticipated harm [
35]. This situation could, however, mean that patients at highest baseline risk who might be expected to receive the most benefit from a particular therapy are undertreated. This so-called “treatment-risk paradox” has been well described, particularly in cardiology [
6].
In the studies included in the present review, the use of restrictive inclusion/exclusion criteria in RCTs was identified as being one of the key factors that limited the external validity of trial findings. Authors reported that frequently excluded patients were the elderly, females, or those with co-morbidities in cardiology studies [
15‐
17,
19,
24,
29,
34,
35,
40,
44,
53,
55], patients with evidence of substance abuse or co-morbid psychological disorders in mental health studies [
18,
28,
32,
33,
41,
42,
47,
49,
50,
61,
64], and patients with poor disease prognosis in oncology studies [
20,
25,
31,
38,
39,
45,
46]. These RCT populations were, therefore, often highly selected and represented a patient sample at much lower risk of adverse events and complications compared with patients in clinical practice. The use of stringent selection criteria in RCTs ensures a homogeneous patient sample, optimizes internal validity of the study by reducing variance and removing potential confounding, so increasing the likelihood of finding a true association between treatment exposure and outcomes (that is, it makes it easier to distinguish the “signal” [treatment effect] from the “noise” [bias and chance]) [
68,
69]. While the use of highly selected populations does not necessarily imply that a given treatment under study would fail to have equivalent efficacy and safety in under-represented patient groups, it does create uncertainty that can only be dispelled through the generation of additional evidence. However, it is pertinent to also consider how inclusion of high-risk patients may affect the outcomes of traditional trials. Patients with more co-morbidities or co-interventions may be more likely to prematurely discontinue study participation, which could lead to high attrition rates and a negative impact on trial validity and outcomes.
The studies reviewed herein made several recommendations to either improve the external validity of RCTs or compensate for limitations thereof. These included adaptation of trial designs to include a more heterogeneous patient sample that better represents different subgroups such as the elderly or patients with co-morbidities [
19,
20,
28‐
33,
46]. Some studies suggested that adoption of pragmatic trial designs may be a way forward [
48,
49]. Traditional RCTs are often described as “explanatory” trials since they aim to evaluate treatment efficacy under idealized conditions, and to explore “if and how an intervention works”. In contrast, pragmatic trials evaluate the effects of an intervention under usual conditions and their designs seek to determine “if an intervention actually works in real-life” [
70]. In recent years, the Pragmatic–Explanatory Continuum Indicator Summary (PRECIS) tool has been developed, and has now been updated with the PRECIS-2 version to allow trialists to design studies that better support the needs of the intended users of the results. PRECIS-2 consists of nine domains (including “participant eligibility criteria”) in which design decisions are made to determine the extent to which the trial is pragmatic or explanatory, and to help ensure that the design achieves the primary purpose of the trial [
71]. In addition to its application as an aid to trial design, PRECIS-2 has the potential for use in the assessment of completed trials for methodological quality and the likelihood of outcome bias in much the same way as the current Grading of Research, Assessment, Development and Evaluation (GRADE) system is used to assist guideline developers.
There is growing interest in different analytical methods that utilize data from multiple studies to extend and complement the evidence provided by a single clinical trial. Meta-analysis [
72,
73] can be used to combine evidence from multiple clinical trials to provide a more valid estimate of treatment effect, assuming the studies being combined are similar enough to permit synthesis. Cross-design synthesis is a type of meta-analysis in which evidence from studies with complementary designs are combined in an effort to leverage complementary strengths (such as internal validity of RCTs and external validity of observational studies) and minimize the weaknesses of each [
74]. Another approach that leverages real-world data to extend findings from a traditional trial involves development of propensity scores that predict, for each trial subject, membership in a corresponding real-world population [
75,
76]. Subjects over-represented in the clinical trial relative to the target real-world population receive lower weights while those under-represented receive higher weights. The resulting weights can be used to understand differences between the trial and target real-world populations, and to “project” the RCT efficacy to the target population, in effect providing an estimate of the efficacy that would be observed were the trial to be conducted in a more representative everyday practice population [
75,
76]. Finally, simple descriptive analysis of real-world data can also be employed in the trial planning stages to better understand the impact of specific design decisions (for example, potential exclusion criteria) on the anticipated generalizability of the trial results and so improve design. Adaptation of statistical analysis plans was recommended by two of the studies reviewed here as a method to facilitate analysis of important patient subgroups [
20,
37].
Several of the reviewed studies highlighted incomplete reporting as a potential issue for the external validity of RCTs [
24,
28,
38,
51,
63]. Improvements in trial reporting to provide a more detailed description of RCT samples would enable clinicians to better assess the external validity of RCTs and so more accurately extrapolate trial findings to their own patients. Following reporting guidelines such as CONSORT, which is a requirement for publication in many peer-reviewed journals [
1], may go some way to address issues of inconsistent reporting and may provide greater transparency with respect to trial eligibility.
Trials should follow the need for evidence but be part of a broader strategy for evidence generation. As such, complementary data obtained from other appropriately designed alternatives conducted in Phase IV of the development lifecycle are required to address limitations in the external validity of RCTs post hoc. As recommended by some of the studies included in this review [
15,
23,
36], the use of non-randomized observational studies that utilize large healthcare databases can support RCT findings by determining treatment effectiveness in routine clinical practice [
6,
77]. Such studies include a wide range of different designs including prospective and retrospective cohort studies, case–control studies, and cross-sectional studies in which any intervention studied is determined by clinical practice and not a rigid protocol [
78]. Taken together, RCT and observational study data should provide a complementary body of evidence that optimizes both internal and external validity.
The findings presented in this review must be viewed within the limitations of the methodology employed. Firstly, the search strategy did not define the outcomes to be reported a priori and was influenced by the evidence base identified. Secondly, there are no acknowledged methods for the assessment of the quality of data for this type of analysis. Thirdly, the present review was limited to just three therapeutic areas (cardiology, mental health, and oncology), and while a large proportion of the relevant literature was focused in these areas, it is possible that findings may be different in other specialties. In addition, to manage the scope of the review, we restricted our eligibility criteria to studies that included adults and assessed pharmaceutical interventions only, and we cannot completely rule out the possibility that findings might be different in pediatric populations or other healthcare interventions. Finally, the conclusions regarding external validity, as reported in individual studies, were subjective, which limited our ability to more accurately synthesize and summarize the findings. The review strategy was, however, relevant to the objective of the present analysis, as it utilized a robust and transparent approach in order to identify key concepts and the main sources of information available on the representativeness of RCT patient samples and the external validity of RCT findings. The framework for categorizing the methods used in individual studies and for interpreting individual study conclusions was consistent and clearly detailed, adding to the methodological rigor of the review.
Competing interests
SC, DF, and JJ are employees of Eli Lilly and Company, USA. TKM is Director of, and SR is an employee of, Kennedy-Martin Health Outcomes Ltd, and received financial support from Eli Lilly and Company for their contributions to the conception and design of the study; the acquisition, analysis, and interpretation of the data; and drafting of the manuscript.
Authors’ contributions
SC, DF, and JJ conceived the project. TKM conducted the literature search. TKM and SR reviewed the search results and conducted the data extraction. All authors contributed to the content and writing of the manuscript and all authors read and approved the final manuscript.