Introduction
Real-world studies seek to provide a line of complementary evidence to that provided by randomized controlled trials (RCTs). While RCTs provide evidence of efficacy, real-world studies produce evidence of therapeutic effectiveness in real-world practice settings [
1]. The RCT is a well-established methodology for gathering robust evidence of the safety and efficacy of medical interventions [
2]. In RCTs, the investigators are able to reduce bias and confounding by utilizing randomization and strict patient inclusion and exclusion criteria. This internal validity is often achieved at the expense of external validity (generalizability), since the populations enrolled in RCTs may differ significantly from those found in everyday practice. Real-world evidence has emerged as an important means to understanding the utility of medical interventions in a broader, more representative patient population. The strict exclusion criteria for RCTs may exclude the majority of patients seen in routine care; therefore, real-world evidence can give vital insight into treatment effects in more diverse clinical settings, where many patients have multiple comorbidities [
3,
4].
Data from real-world studies can provide evidence that informs payers, clinicians, and patients on how an intervention performs outside the narrow confines of the research setting, providing essential information on the long-term safety and effectiveness of a drug in large populations, its economic performance in a naturalistic setting, and for assessment of comparative effectiveness with other treatments. With improvements in the rigor of methodology being applied to real-world studies, along with the increasing availability of higher-quality, larger datasets, the importance of findings from these studies is growing. The value of real-world data has been recognized by regulatory bodies such as the US Food and Drug Administration (FDA) and the European Medicines Agency (EMA) [
5,
6]. These bodies acknowledge the importance of real-world data in supporting marketed products and their potential role in supporting life cycle product development/monitoring and decision-making for regulation and assessment [
5,
6]. A survey of the pharmaceutical and medical devices industry in the European Union and the USA determined that 27% of real-world studies are conducted by industry, performed “on request” by regulatory authorities [
7]. Real-world data form a key component of healthcare technology assessments used by national and regional bodies, such as the National Institute for Health and Care Excellence (NICE) in the UK and Germany’s Institute for Quality and Efficiency in Health Care (EQWiG), to guide clinical decision-making [
8]. The data from real-world studies are also increasingly utilized by payers. In a US survey, the majority of payers who responded reported using real-world data to guide decision-making, in particular on utilization management and formulary placement [
9]. Such data usage may have profound effects; for example, the reversal of a decision by the EQWiG that analogue basal insulins showed no benefit over human insulin, which restored market access and premium pricing for insulin glargine in Germany [
10]. The increase in the number of real-world studies has resulted in more clinical evidence being available to guide treatment decisions, and can allow assessment of the impacts of off-label use. In this paper, we review the impact of real-world clinical data and how their interpretation can assist clinicians to assess clinical evidence appropriately for their own decision-making.
The Association of the British Pharmaceutical Industry defines real-world data as “data that are collected outside the controlled constraints of conventional RCTs to evaluate what is happening in normal clinical practice” [
11]. Real-world studies can be either retrospective or prospective, and when they include prospective randomization, they are called “pragmatic trial design” studies (Table
1) [
12]. The clearest distinction between RCTs and real-world studies is based on (a) the setting in which the research is conducted and (b) where evidence is generated [
2]. RCTs are typically conducted with precisely defined patient populations, and patient selection is often contingent on meeting extensive eligibility (i.e., inclusion and exclusion) criteria. Participants in such trials (and the data they provide) are subject to rigorous quality standards, with intensive monitoring, the use of detailed case-report forms (to capture additional information that may not be present in ordinary medical records), and carefully managed contact with research personnel (who are responsible for ensuring protocol adherence) being commonplace. Real-world evidence, in contrast, is often derived from multiple sources that lie outside of the typical clinical research setting: these can include offices that are not generally involved in research, electronic health records (EHRs), and patient registries and administrative claims databases (sometimes obtained from integrated healthcare delivery systems). Despite these differences, real-world evidence can also be used retrospectively as external control arms for RCTs, to provide comparative efficacy data [
13]. Consequently, this article is based on previously conducted studies and does not contain any studies with human participants or animals performed by any of the authors.
Table 1
Comparison of randomized controlled trials and real-world studies
Type of study | Experimental/interventional | Observational/non-interventional | Interventional/pragmatic |
Design | Prospective | Retrospective/prospective | Prospective |
Primary focus | Efficacy, safety, quality, cost-effectiveness | Efficacy, safety, quality, cost-effectiveness, natural history, compliance and adherence, service models, patient preferences, comparative |
Patient population | Narrow, restricted, motivated | Diverse, large, and unrestricted |
Monitoring | Intense (ICH-GCP compliant) | Not required (?) | Reflects usual care |
Comparators | Gold standard/placebo | None/standard clinical practice/multiple iterations | Standard practice/placebo/multiple iterations |
Outcomes | Clear sequence | Wide range |
Data collection confounders | Standardized, controlled | Routine, recruitment bias (?), recall/interviewer bias |
Randomization | Yes | No | Yes |
Blinding | Yes | No | Sometimes (participants or outcome assessment) |
Follow-up | Generally short | Reflects usual care | Long |
Large “pragmatic trials” are an increasingly common real-world data source. Such trials are designed to show the real-world effectiveness of an intervention in a broad patient group [
14]. They incorporate a prospective, randomized design and collect data on a wide range of health outcomes in a diverse and heterogeneous population (i.e., they are consistent with clinical practice) [
15‐
17]. Pragmatic trials are conducted in routine practice settings [
1], include a population that is relevant for the intervention and a control group treated with an acceptable standard of care (or placebo), and describe outcomes that are meaningful to the population in question [
14]. Aspects of care other than the intervention being studied are intentionally not controlled, with clinicians applying clinical discretion in their choice of other medications [
11]. Pragmatic trials may focus on a specific type of patient or treatment, and study coordinators may select patients, clinicians, and clinical practices and settings that will maximize external validity (i.e., the applicability of the results to usual practice) [
16]. As such, pragmatic trials are able to provide data on a range of clinically relevant real-world considerations, including different treatments, patient- and clinician-friendly titration and treatment algorithms, and cost-effectiveness, which in turn may help address practice- and policy-relevant issues. These studies can focus specifically on the outcomes which are most important to patients, and take into account real-world treatment adherence and compliance on the direct impact of a medication or treatment regimen for patients.
Understanding the Strengths and Weaknesses of Real-World Studies
Compared with RCT data, real-world evidence has the potential to more efficiently provide answers that inform outcomes research, quality improvement, pharmacovigilance, and patient care [
2]. As they are performed in clinical settings and patient populations that are similar to those encountered in clinical practice, real-world studies have broader generalizability. Specifically, RCTs provide evidence of efficacy, while real-world studies give evidence of effectiveness in real-world practice settings [
1]. Additionally, observational, retrospective real-world studies are generally more economical and time efficient than RCTs [
18] as they use existing data sources such as registries, claims data, and EHRs to identify study outcomes [
16].
Key to the utility of real-world studies is their ability to complement data from RCTs in order to fill current gaps in clinical knowledge. Specific trial criteria may cause RCTs to exclude a particular group of patients commonly seen in clinical practice; for example, RCTs frequently exclude older adults. In the case of diabetes, while many RCTs focus primarily on the safety and glucose-lowering efficacy of antihyperglycemia drugs [
19], it is desirable to have real-world effectiveness outcomes data in patients with type 2 diabetes (T2D) that take into account issues such as adherence [
20,
21] and the frequency of side effects in less controlled settings (which may affect outcomes). Such studies suggest that the difference between glycated hemoglobin reduction in RCTs and in practice may be related to adherence and point to the potential value of real-world studies assessing clinical-practice effectiveness. In addition, real-world evidence can address important issues such as the impact of treatment on microvascular disease and cardiovascular (CV) events [
22] and enable the examination of outcomes, which are difficult to assess in RCTs, such as the utilization of healthcare resources by patients receiving different therapies. In the DELIVER-3 study, for example, insulin glargine 300 U/ml (Gla-300) was associated with reduced resource utilization compared with other basal insulins [
23]. An example, which demonstrates the utility of pragmatic trial design, is the exploration of patient-driven insulin titration protocols that highlight the practical need that patients face in everyday life, rather than reflecting the needs of a highly controlled, well-motivated RCT population [
24‐
26].
Real-world studies have a number of limitations. Retrospective and non-randomized real-world studies are subject to bias and confounding factors, problems that are controlled for in randomized blinded trials [
27]. Electronic data may be inconsistently collected, with missing data elements that can eventually result in reduced statistical validity and a decreased ability to answer the research question [
16]. The types of bias seen in real-world trials include selection bias (e.g., therapies may be differently prescribed depending upon patient and disease characteristics, e.g., severity of disease and/or other patient characteristics), information bias (misclassification of data), recall bias (caused by selective recall of impactful events by patients/caregivers), and detection bias (where an event is more likely to be captured in one treatment group than another) [
28]. While systematic reviews have found little evidence to suggest that treatment effects or adverse events in well-designed observational studies are either overestimated or qualitatively different from those obtained in RCTs, each real-world study must be examined individually for sources of bias and confounding [
29‐
31]. Indeed, caution should be exercised when using data from real-world studies (particularly retrospective studies) to influence change in clinical practice [
18] because of confounding and bias. Techniques such as propensity score matching (PSM) can be used to reduce selection bias by matching the characteristics of patients entering different arms of studies (see below) [
32].
Properly designed, prospective, interventional pragmatic trials have the potential to overcome many of the limitations of observational and retrospective real-world studies. However, the main limitation of pragmatic trials is that they do not often place constraints on patients and clinicians, which may result in inconsistent or missing data in source documents such as EHRs. This, together with heterogeneity in terms of clinical practice and associated documentation, may lead to a reduced capability of the study to answer the research question [
16]. In addition, heterogeneity of clinical practice and patient populations reduces the translatability of pragmatic trial data to different settings and locations [
33]. There are also numerous challenges inherent in pragmatic trial design. These are illustrated by the trade-off between blinding of results to reduce bias and the desire to create a fully pragmatic design where the intervention is delivered as in normal practice [
14]. Pragmatic trials, in producing evidence of effectiveness in real-world-practice settings, may trade aspects of internal validity for higher external validity, which ultimately means that they are more generalizable than RCTs [
1].
Real-World Studies: Addressing Generalizability
RCT exclusion criteria may rule out a significant proportion of real-world patients. As previously mentioned, patients excluded from RCTs are older, have more medical comorbidities, and have more challenging social and demographic issues than those included in these trials. Real-world studies have the potential to assess whether results seen in RCTs would be generalizable to real-world patient populations. The EMPA-REG OUTCOME RCT selected T2D patients with established CVD and, for those treated with the sodium-glucose co-transporter-2 (SGLT2) inhibitor empagliflozin vs placebo, reported a significant reduction in the primary composite endpoint of a three-point major adverse cardiac event (MACE) (CV death, non-fatal myocardial infarction, and non-fatal stroke), as well as the individual endpoints of CV death, all-cause death, and hospitalization for heart failure [
51]. The CANVAS RCT investigating the SGLT2 inhibitor canagliflozin, which included a lower percentage of patients at high CV risk than EMPA-REG, also reported a significant reduction in the primary composite endpoint of a three-point MACE and the individual endpoint of hospitalization for heart failure but did not show a significant benefit for CV mortality or all-cause mortality alone [
52]. Evidence from a further real-world study may support and expand upon the RCT data. The CVD-REAL study in over 300,000 patients with T2D, both with (13% of the total) and without established CVD, showed a consistent reduction in hospitalization for heart failure suggesting a real-world benefit of the SGLT2 inhibitor drug class as a whole in patients with T2D, irrespective of existing CV risk status or the SGLT2 inhibitor used [
53].
Improving Quality of Evidence Generated from Real-World Studies
Criteria for the design of observational studies have been developed and, if followed, should result in higher-quality studies (Table
2) [
28]. The STROBE guidelines (STrengthening the Reporting of OBservational studies in Epidemiology) provide a reporting standard for observational studies [
54]. An extension to the CONSORT guideline for RCTs provides specific guidance for pragmatic trials and provides a reporting checklist that covers background, participants, interventions, outcomes, sample size, blinding, participant flow, and generalizability of findings [
55]. Adherence to such criteria should improve not only the quality but also the validity of real-world study data in clinical practice.
Table 2
Quality criteria for comparative observational database studies
Background | Clear underlying hypotheses and specific research question(s) |
Methods |
Study design | Observational comparative effectiveness database study Independent steering committee involved in a priori definition of the study methodology (including statistical analysis plan), review of analyses, and interpretation of results Registration in a public repository with a commitment to publish results |
Database(s) | High-quality database(s) with few missing data for measures of interest Validation studies |
Outcomes | Clearly defined primary and secondary outcomes, chosen a priori The use of proxy and composite measures justified and explained The validity of proxy measures checked |
Length of observation | Sufficient duration to reliably assess outcomes of interest and long-term treatment effects |
Patients | Well-described inclusion and exclusion criteria, reflecting target patients’ characteristics in the real world |
Analyses | Study groups compared at baseline using univariate analyses Avoidance of biases related to baseline differences using matching and/or adjustments Sensitivity analyses are performed to check the robustness of results |
Sample size | Sample size calculations based on clear a priori hypotheses regarding the occurrence of outcomes of interest and target effect of studied treatment versus comparator |
Results | Flow chart explaining all exclusions Detailed description of patients’ characteristics, including demographics, characteristics of the disease of interest, comorbidities, and concomitant treatments Characteristics of patients lost to follow-up are compared with those of patients remaining in the analyses Extensive presentation of results obtained in unmatched and matched populations (if matching was performed) using univariate and multivariate, as well as unadjusted and adjusted, analyses Sensitivity analyses and/or analyses of several databases go in the same direction as primary analyses |
Discussion | Summary and interpretation of findings, focusing first on whether they confirm or contradict a priori hypotheses Discussion of differences with results of efficacy RCTs Discussion of possible biases and confounding factors, especially related to the observational nature of the study Suggestions for future research to challenge, strengthen, or extend study results |
A number of methods have also been developed to reduce the effects of confounding in observational studies, including PSM. This method aims to make it possible to compare outcomes of two treatment or management options in similar patients [
32]. It does this by reducing the effects of multiple covariates to a single score, the propensity score. Comparison of outcomes across treatment groups of pairs or pools of propensity-score-matched patients can reduce issues such as selection bias [
32]. Although a powerful and widely used tool, there are limits to the degree in which propensity score adjustments can control for bias and confounding variables. An example of this can be seen in RCT versus real-world data for mortality in patients with severe heart failure treated with the aldosterone inhibitor spironolactone [
56]. While RCT data consistently showed a reduction in mortality, in a real-world study using PSM, spironolactone appeared to be associated with a substantially increased risk of death [
57]. The authors of the study suggest that concluding that spironolactone is dangerous on the basis of the real-world study is not legitimate because of issues of unknown bias and confounding by indication (i.e., confounding due to factors not in the propensity score or even not formally measured) [
57]. This illustrates a major limitation of PSM: it can only include variables that are in the available data [
58]. A further major limitation is that the need for grouping or pairing data in PSM narrows the patient population analyzed, limiting generalizability and thereby reducing one of the main values of real-world studies.
“Big data” have emerged as a cutting-edge discipline that uses capture of data from EHRs and other high-volume data sources to efficiently generate hypotheses about the relationship between processes and outcomes. This demands an increased emphasis on the integrity of the data, with “high-quality” data defined in terms of their accuracy, availability and usability, integrity, consistency, standardization, generalizability, and timeliness [
59,
60]. Missing data may represent a significant challenge in some datasets. For example, the US healthcare system (unlike many European countries) relies on a number of different laboratory companies to supply laboratory results data, which may result in inconsistencies in the recording of results in EHRs. The technical and methodological challenges presented by these new data sources are an active area of endeavor by key stakeholders moving towards harmonization of data collected from high-volume data sources, with the aim of creating a unified monitoring system and implementing methods for incorporating such data into research [
2]. Artificial intelligence (AI) is the natural partner of big data, and the increased availability of these data sources is already allowing AI to improve clinical decision-making. AI techniques have used raw data gleaned from radiographical images, genetic testing, electrophysiological studies, and EHRs to improve diagnoses [
6].
As a final caveat, with the increasing availability of real-world data, there may be some discrepancies in information derived from different sources. As with all data, be it from RCTs or real-world practice, consideration should be given to the limitations and generalizability of results when interpreting individual study outcomes and applying them to everyday clinical practice.
Conclusions
Real-world studies provide important information that can complement and/or even expand the information obtained in RCTs. RCTs set the standard for eliminating bias in determining efficacy and safety of medications, but have significant limitations with regard to generalizability to the broad population of patients with diabetes receiving health care in diverse clinical practice settings. Because real-world studies are performed in actual clinical practice settings, they are better able to assess the actual effectiveness and safety of medications as they are used in real-life by patients and clinicians. With improving study designs, methodological advances, and data sources with more comprehensive data elements, the potential for real-world evidence continues to expand. Moreover, the limitations of real-world studies are better understood and can be better addressed. Real-world evidence can both generate hypotheses requiring further investigation in RCTs and also provide answers to some research questions that may be impractical to address through RCTs.
Acknowledgements
KK acknowledges support from the National Institute for Health Research (NIHR) Collaboration for Leadership in Applied Health Research and Care-East Midlands (CLAHRC-EM) and the NIHR Leicester Biomedical Research Centre.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.