Concordance of real-world versus conventional progression-free survival from a phase 3 trial of endocrine therapy as first-line treatment for metastatic breast cancer

Cynthia Huang Bartlett; Jack Mardekian; Matthew James Cotter; Xin Huang; Zhe Zhang; Christina M. Parrinello; Ariel Bulua Bourla

doi:10.1371/journal.pone.0227256

Abstract

There is growing interest in leveraging real-world data to complement knowledge gained from randomized clinical trials and inform the design of prospective randomized studies in oncology. The present study compared clinical outcomes in women with metastatic breast cancer who received letrozole as first-line monotherapy in oncology practices across the United States versus patients in the letrozole-alone cohort of the PALOMA-2 phase 3 trial. The real-world cohort (N = 107) was derived from de-identified patient data from the Flatiron Health electronic health record database. The clinical trial cohort (N = 222) comprised postmenopausal women in the letrozole-alone arm of PALOMA-2. Patients in the real-world cohort received letrozole monotherapy per labeling and clinical judgment; patients in PALOMA-2 received letrozole 2.5 mg/d, continuous. Real-world survival and response rates were based on evidence of disease burden curated from clinician notes, radiologic reports, and pathology reports available in the electronic health record. Progression-free survival and objective response rate in PALOMA-2 were based on Response Evaluation Criteria in Solid Tumors v1.1. Concordance of survival and response rates were retrospectively assessed using inverse probability of treatment weighting-adjusted Cox regression analysis. Inverse probability of treatment weighting-adjusted Cox regression results showed similar median progression-free survival in the real-world and PALOMA-2 cohorts (18.4 and 16.6 months, respectively): the hazard ratio using real-world data as reference was 1.04 (95% CI, 0.69–1.56). No significant difference was observed in response rates: 41.8% in the real-world cohort vs 39.4% in the PALOMA-2 cohort (odds ratio using real-world data as reference: 0.91 [95% CI, 0.57–1.44]). These findings indicate that data abstracted from electronic health records with proper quality controls can yield meaningful information on clinical outcomes. These data increase confidence in the use of real-world assessments of progression and response as efficacy endpoints.

Trial registration NCT01740427; Funding: Pfizer.

Citation: Huang Bartlett C, Mardekian J, Cotter MJ, Huang X, Zhang Z, Parrinello CM, et al. (2020) Concordance of real-world versus conventional progression-free survival from a phase 3 trial of endocrine therapy as first-line treatment for metastatic breast cancer. PLoS ONE 15(4): e0227256. https://doi.org/10.1371/journal.pone.0227256

Editor: Apar Kishor Ganti, University of Nebraska Medical Center, UNITED STATES

Received: January 31, 2019; Accepted: December 15, 2019; Published: April 21, 2020

Copyright: © 2020 Huang Bartlett et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The data on the real-world cohort that support the findings of this study have been originated by Flatiron Health, Inc and were purchased by Pfizer from Flatiron Health Inc for the purpose of this research. Access to the de-identified data set is subject to a contractual agreement with Flatiron Health; for data access, interested researchers should contact DataAccess@flatiron.com. A licensing agreement is legally required prior to sharing these data in order to safeguard sensitive patient information, and to ensure proper deidentification and compliance with applicable restrictions and requirements under HIPAA.

Funding: These analyses and the studies included in these analyses (NCT01740427) were sponsored by Pfizer Inc. The real-world data used are derived from the Flatiron Health Analytic electronic health record database as reported in the manuscript and were purchased by Pfizer from Flatiron Health Inc., an independent subsidiary of the Roche Group. Editorial and medical writing support was provided by Catherine Grillo, of Complete Healthcare Communications, LLC (North Wales, PA), a CHC Group company, and was funded by Pfizer. The authors also wish to thank Amy P. Abernethy, MD, PhD, for her work on developing this manuscript while at Flatiron Health Inc.; Dr. Abernethy’s participation in the development of this manuscript occurred prior to her appointment as Principal Deputy Commissioner of the U.S. Food and Drug Administration. Cynthia Huang Bartlett is a former employee of Pfizer Inc. Jack Mardekian, Matthew James Cotter, Xin Huang and Zhe Zhang are employed by Pfizer Inc. Pfizer Inc provided support in the form of salaries for authors CHB, JM, MJC, XH and ZZ, study design, data collection, and data analysis. The interpretation of the data, the content of the manuscript, and the decision to publish were at the discretion of the authors. Christina M. Parrinello and Ariel Bulua Bourla are employed by Flatiron Health. Flatiron Health provided support in the form of study design, data collection and management, and salaries for authors CMP and ABB, but did not have any additional role in the data analysis, decision to publish, or preparation of the manuscript.

Competing interests: I have read the journal's policy and wish to report the following conflicts: Cynthia Huang Bartlett is a former employee of and owns stock in Pfizer Inc. Jack Mardekian, Matthew James Cotter, Xin Huang, and Zhe Zhang are employees of and own stock in Pfizer Inc. Christina M. Parrinello, and Ariel Bulua Bourla are employees of Flatiron Health, Inc., which is an independent subsidiary of the Roche Group, and which received funding from Pfizer for the conduct of this study; they also report equity ownership in Flatiron Health Inc. All authors affirm that these competing interests do not alter our adherence to PLOS ONE policies on sharing data and materials.

Introduction

Real-world evidence is playing an increasingly important role in regulatory decision-making, drug development, and clinical practice. [1–4] Because less than 5% of cancer patients participate in randomized clinical trials, [5] real-world evidence can provide valuable information on disease course and treatment outcomes of patients receiving care in front-line routine clinical settings, as well as insights on the generalizability of clinical trial findings to real-world patient populations. [3, 4, 6]

Real-world evidence is generated from real-world data documented during the course of routine clinical care. [2, 6–8] Real-world data can be derived from a range of sources, including electronic health records (EHRs), patient/disease registries, mobile devices and applications, genomic datasets, and medical/pharmacy claims databases. [2–4, 7, 8] Although these resources contain a wealth of information, they are designed to support clinical care and practice management, not clinical research. [4] Unlike randomized clinical trials (RCTs), which limit variability and ensure the quality of data collected through strict protocols and standardized methods such as case report forms, real-world datasets are typically disorganized and unstructured, requiring complex curation in order to be useful for research analyses. The quality and consistency of data in real-world sources, such as EHRs, can vary widely depending on the data curation processes used as well as on clinician-, practice-, and patient-related factors. These discrepancies can make it difficult to compare data collected in real-world settings with those from controlled clinical trials. [2, 8]

The most common outcome variables in cancer research are overall survival and assessments of tumor burden such as tumor response rate or progression free survival (PFS). [9, 10] In traditional RCTs, clinical response or disease progression is determined based on quantitative assessments of target lesions using a predefined scale (eg, Response Evaluation Criteria in Solid Tumors [RECIST]), applied at predefined time points (eg, every 6 to 8 wk), using prespecified imaging modalities (eg, computed tomography [CT] or magnetic resonance imaging [MRI]). [9] In clinical practice, assessments of tumor response and progression are based on clinician interpretations of imaging reports and symptomatic criteria. [9] (Table 1).

Download:

Table 1. Outcomes assessments and endpoints.

https://doi.org/10.1371/journal.pone.0227256.t001

To evaluate the relationship between real-world and clinical trial outcomes in oncology, it is critical to assess the comparability of the data derived in each of these settings while minimizing the effect of confounding due to differences in prognostically important variables. [11, 12] The primary objective of this study was to compare PFS and response rates generated using real-world data reflecting routine clinical care with outcomes observed in a traditional RCT. To achieve this goal, we analyzed data from a curated EHR-derived real-world dataset to compare outcomes in a cohort of women with hormone-receptor positive (HR+), human epidermal growth factor receptor 2-negative (HER2–) metastatic breast cancer (mBC) who received first-line letrozole therapy in a real-world setting with those in the control arm of the phase 3 PALOMA-2 trial. [13] An inverse probability of treatment weighting (IPTW) approach was used to account for potential baseline differences in the real-world and PALOMA-2 cohorts, which allowed retrospective evaluation of the comparability of the real-world and traditional RECIST-based clinical trial endpoints in 2 similar cohorts. [14–16]

Design and methods

Study design and patients

The real-world cohort was drawn from de-identified patient data from the Flatiron Health database, a longitudinal, demographically and geographically diverse database derived from EHR data. [17] At the time of this study, the overall database encompassed more than 2000 clinicians at approximately 775 sites of care across the United States (US), representing 1.7 million patients with active cancer. Data for this study were derived from a curated subsample of patients with confirmed mBC.

The Flatiron real-world dataset is covered under the Health Insurance Portability and Accountability Act of 1996 (HIPAA) through Business Associate Agreements with every provider in the Flatiron network. These agreements authorize Flatiron to collect and de-identify patient-level structured and unstructured data to create de-identified data sets for research purposes. Processed data are de-identified according to either the Safe Harbor or Expert Determination method as outlined in HIPAA Section 164.514(b). When using the Expert Determination method, Flatiron employs a third-party expert to design the de-identification methodology and certify that the dataset is de-identified. Only de-identified data is delivered to clients. Institutional Review Board (IRB) approval was obtained for this study; informed consent was waived by the IRB as the study was retrospective and non-interventional, using routinely collected data. Details on the IRB are available in S1 Appendix.

Data were derived from a random sampling (with attrition at each step, Fig 1) of women diagnosed with mBC between January 1, 2011, and September 30, 2015 (inclusive), regardless of menopausal status. Data provided were de-identified and provisions were in place to prevent re-identification in order to protect patients’ confidentiality. Eligibility criteria aligned with those of the PALOMA-2 trial and included documented HR+ (estrogen receptor–positive [ER+] or progesterone receptor–positive) and HER2– disease at any point before or ≤60 days following mBC diagnosis, an Eastern Cooperative Oncology Group performance status (ECOG PS) score <3 within 30 days of mBC diagnosis, and initiation of letrozole monotherapy in the first-line metastatic setting before October 1, 2015. Patients who had received previous treatment with a cyclin-dependent kinase 4/6 inhibitor or who had another primary cancer diagnosis ≤3 year before initiation of letrozole monotherapy were excluded.

Download:

Fig 1. Design and consort diagram.

ECOG PS = Eastern Cooperative Oncology Group performance status; ER = estrogen receptor; HER2- = human epidermal growth factor receptor 2-negative; ICD = International Classification of Diseases.

https://doi.org/10.1371/journal.pone.0227256.g001

The RCT cohort comprised women from the control arm of the double-blind, randomized, placebo-controlled, international, multicenter, phase 3 PALOMA-2 study (NCT01740427) (Fig 1). The study was approved by an IRB or equivalent ethics committee at each participating site, and all patients provided written informed consent before enrollment. Details on participating IRBs/ethics committees are available in S1 Appendix. The study was conducted in accordance with the International Conference on Harmonisation Good Clinical Practice guidelines and the provisions of the Declaration of Helsinki.

Eligible patients were postmenopausal women aged ≥18 years with ER+/ HER2– advanced breast cancer who had not received previous endocrine therapy for advanced disease. Inclusion criteria included postmenopausal status defined as previous bilateral surgical oophorectomy, spontaneous cessation of regular menses for 12 consecutive months, or follicle-stimulating hormone and estradiol blood levels in the respective postmenopausal ranges; adequate organ function; Eastern Cooperative Oncology Group performance status of 0−2; and measurable disease as defined per Response Evaluation Criteria in Solid Tumors (version 1.1). Exclusion criteria included HER2+ tumors; advanced, symptomatic, visceral spread at risk of life-threatening complications; previous neoadjuvant or adjuvant treatment with a nonsteroidal aromatase inhibitor with disease recurrence while on or within 12 months of completing treatment; and previous cyclin-dependent kinase 4/6 inhibitor treatment. [13]

Treatment

In the real-world cohort, patients were treated with letrozole monotherapy per approved labeling and treating physicians’ clinical judgment. In the RCT cohort, women received letrozole (2.5 mg once daily, administered orally) plus placebo per the PALOMA-2 study protocol. [13]

Endpoints and assessment

For the real-world cohort, tumor burden was assessed during routine clinical visits for patients with HR+/HER2− mBC. [9, 18] Tumor burden assessments were at the discretion of the treating physician and formalized RECIST methodology was not generally employed. Structured and unstructured patient-level data were extracted from the EHR using Flatiron Health’s proprietary technology-enabled abstraction platform, an electronic interface mimicking a case report form with centralized management and quality controls. This layer of technology facilitates document classification and visual organization, text search within documents, and selective presentation of relevant documents to trained data abstractors (clinical oncology nurses and tumor registrars). Structured data such as diagnoses, lab values, and medication administrations were mapped to a common terminology and unstructured data (eg, physician notes, lab/radiology reports) underwent manual review [19]. All abstractors received training in the use of the platform as well as indication-specific training (operating procedures, best practice guidelines) prior to beginning the abstraction process.

Curated progression events were designated “real-world progression” (rwP). The approach to rwP anchors on clinician-documented cancer progression based on an interpretation of the entire patient chart, including results of diagnostic procedures and tests (eg, radiology and pathology reports). [19] The date of cancer progression was defined as the date of the first source evidence for progression referenced by the clinician (eg, radiology report date) or the date of clinician note when no other corresponding evidence sources were documented. A parallel construct reflecting real-world progression-free survival (rwPFS) was calculated, measuring from the start of first-line letrozole therapy through the end of first-line therapy for patients receiving only first-line therapy and to the start of second-line therapy for all other patients. Patients without disease progression or death were censored at the end of first-line therapy (patients who only received first-line therapy) or at the start of second-line therapy (all others). The approach to real-world response is based on clinician-documented assessments of radiologic change in burden of disease over the course of treatment with a given therapy. Real-world response rate (rwRR) was calculated as the percentage of patients in the cohort with a maximum clinician-assessed therapeutic response of complete response (rwCR) or partial response (rwPR) (Table 2).

Download:

Table 2. Real-world response categories vs RECIST v1.1 [9].

https://doi.org/10.1371/journal.pone.0227256.t002

For the PALOMA-2 cohort, tumor assessment (CT with contrast or MRI) was conducted every 12 weeks +/−7 days for patients with measurable disease; patients with bone-only disease received bone scans every 6 months. [13] Imaging and bone scans were performed until objective disease progression, initiation of a new anticancer therapy, or withdrawal from the study, whichever came first. [13] PFS and ORR were measured per RECIST version 1.1 (Table 2). PFS was defined as the time from the date of randomization to the date of radiologically confirmed disease progression or death due to any cause, whichever occurred first, and calculated using a similar approach as described for rwPFS; ORR was estimated by dividing the number of patients with confirmed CR or PR by the number of patients randomized to letrozole plus placebo with measurable disease at baseline. [13]

Of note, in PALOMA-2 all deaths that occurred through 28 days after the end of first-line therapy were included as progression events. In the real-world cohort, however, death dates were reported only by month and year. To align progression definitions, patients in the real-world cohort who died in the same month or within one month of the stop date of first-line therapy were included as progression events.

Analysis and statistical methods

Inverse probability of treatment weighting was used to adjust analyses for differences in observed potential confounders between the 2 study cohorts. [16, 20, 21] The IPTW process modifies the patient counts according to differences in unweighted baseline characteristics.

Propensity scores were generated using a multivariable logistic model executed on data from 107 real-world patients and 222 PALOMA-2 patients. Study origin (real-world or PALOMA-2) was used as an outcome and potential baseline confounders were included as covariates, having been selected based on the authors’ clinical judgment. Covariates included were age, race, disease stage at diagnosis (I–IV or unrecorded/unknown), ECOG PS score, number of disease sites at diagnosis (1, 2, ≥3), and bone-only metastases. In order to balance the 2 study cohorts for duration of follow-up, the propensity score model also included potential follow-up, a baseline measure defined as the number of months from a patient’s start of treatment date to the study cutoff: September 30, 2016 for the real-world cohort and February 26, 2016 for PALOMA-2.

Inverse probability of treatment weights were then generated for each patient by inverting their propensity score and stabilizing the score to reduce influences from large weights (small propensity scores) by multiplying the inverted propensity score by 107/329 for Flatiron patients and 222/329 for PALOMA-2 patients. The balance in prognostically important baseline characteristics was assessed using a standardized differences approach, with values ≥0.10 indicating a non-negligible imbalance.

The duration of first line letrozole therapy was abstracted using Flatiron business rules applied to patient EHRs for the real-world cohort. For the PALOMA-2 cohort, the duration of treatment was obtained from information recorded in the data collection tool used in the study.

The Kaplan-Meier method was used to estimate median rwPFS and RECIST-based PFS for the real-world and PALOMA-2 cohorts, respectively. Hazard ratios and 95% confidence interval (CI) were computed using weighted Cox proportional hazards analysis with IPTW adjustment. A 2-sided p < 0.05 was considered significant. All statistical analyses were performed using SAS v.9.4.

Results

Patient population

Between January 1, 2011, and September 30, 2015 (data cutoff, September 30, 2016), 107 women initiated letrozole monotherapy and met the eligibility criteria for inclusion in the unadjusted real-world cohort (Fig 1). In PALOMA-2, 222 patients were randomized to treatment with letrozole plus placebo between February 2013 and July 2014 (cutoff date for final analysis, February 26, 2016) and were included in the unadjusted RCT cohort (Table 3). The number of patients in each cohort were modified by IPTW according to differences in unweighted baseline characteristics. Rounding to the nearest whole number, the IPTW-adjusted number was 116 for the real-world cohort and 207 for the RCT cohort (Table 4).

Download:

Table 3. Baseline demographic and clinical characteristics (Unweighted).

https://doi.org/10.1371/journal.pone.0227256.t003

Download:

Table 4. Demographic and clinical characteristics of interest, before and after IPTW adjustment.

https://doi.org/10.1371/journal.pone.0227256.t004

Unweighted, unadjusted demographic and clinical characteristics of the 2 cohorts were broadly comparable, although patients in the real-world cohort were older (mean age 68.6 vs 60.6 y in PALOMA-2), more racially diverse, had poorer performance status (12.1% vs 1.4% with ECOG PS 2), and were more likely to have stage IV disease (39.3% vs 32.4%) and bone-only metastases (29.9% vs 21.6%) at diagnosis (Table 3). Of note, two patients with confirmed HER2− disease prior to their metastatic diagnosis had equivocal results when tested closer to the metastatic diagnosis date. In both cases the most recent result was documented.

Data abstractors were instructed to record menopausal status only when it was explicitly stated in the patient’s chart. As a result, more than one-third of patients were classified as “unknown.” As all but 1 of these patients—a 54 year old—were over the age of 60, these patients were retained in the real-world dataset (Table 3). Five patients classified as “premenopausal” were also retained. Because letrozole is specifically contraindicated in women of premenopausal status it could reasonably be inferred that these patients met the criteria for medically confirmed postmenopausal status or were in medically-induced menopause as a result of ovarian suppression per current treatment guidelines and standard practice.

After IPTW, standardized differences were reduced for all baseline demographic and clinical variables of interest. Standardized differences were <0.10 for prognostically important variables including age, ECOG PS, disease stage III or IV, bone-only metastases, and potential follow-up (Table 4). Standardized differences for disease stage I and II were <0.20 (Table 4).

PFS and treatment duration

Using unadjusted and unweighted patient data for the 2 cohorts, median rwPFS was 18.7 months (95% CI, 14.6–24.1) for real-world patients and PFS was 14.5 months (95% CI, 12.9–17.1) for PALOMA-2 patients (hazard ratio, 1.38 [95% CI, 1.00‒1.91]); Fig 2A). Median rwPFS was longer than PFS in the PALOMA-2 cohort, potentially reflecting the higher proportion of patients with bone-only disease in this group. Following IPTW adjustment, median PFS was similar in both cohorts: 18.4 months (95% CI, 12.8–23.3) for the real-world group and 16.6 months (95% CI, 13.7–22.2) for the PALOMA-2 group (Fig 2B). The hazard ratio using real-world data as reference was 1.04 (95% CI, 0.69–1.56).

Download:

Fig 2.

Unadjusted (A) and IPTW-Adjusted (B) Progression-Free Survival. CI, confidence interval; IPTW, inverse probability of treatment weighting. IPTW adjusted numbers of patients at risk are shown.

https://doi.org/10.1371/journal.pone.0227256.g002

The unweighted, unadjusted mean (standard deviation [SD]) duration of first-line letrozole treatment was slightly longer among patients in the real-world cohort than in the PALOMA-2 cohort: 17.1 months (13.0) and 14.0 months (8.9), respectively, standardized difference, 0.2810. After IPTW-adjustment, mean (SD) duration of treatment was 13.3 months (11.1) in the real-world cohort and 14.6 months (8.9) in the PALOMA-2 group, with a reduction in standardized difference to 0.1242 (Table 5). Discontinuations due to treatment-related adverse events or toxicity were relatively low, and were reported more frequently among patients in the real-world cohort than in the PALOMA-2 cohort (6.5% and 4.1%, respectively) (Table 6).

Download:

Table 5. Duration of first-line letrozole therapy, before and after IPTW adjustment.

https://doi.org/10.1371/journal.pone.0227256.t005

Download:

Table 6. Reasons for discontinuation (Unweighted).

https://doi.org/10.1371/journal.pone.0227256.t006

Tumor response

Using unadjusted and unweighted patient data, the rwRR in the real-world cohort (40.2% [95% CI, 30.8–50.1]) was similar to the ORR in the PALOMA-2 cohort (38.3% [31.9–45.0]; odds ratio: 0.92 [95% CI, 0.56–1.53]; 2-sided P = .83). No significant difference was observed between rwRR and ORR in IPTW adjusted comparisons: 41.8% and 39.4%, respectively (odds ratio: 0.91 [95% CI, 0.57–1.44]; 2-sided P = .68; Fig 3A). Complete tumor response was more frequently reported in the unadjusted real-world cohort (11.2%) than the unadjusted PALOMA-2 group (2.3%) (Fig 3B). Of note, 22.4% of patients in the real-world cohort had no tumor assessments recorded during a mean 5.8 months of first line therapy.

Download:

Fig 3.

Summary of Response Rate (A) and Tumor Response (B) in Patients Receiving First-Line Letrozole for HR+/HER2– mBC. CI, confidence interval; HER2–, human epidermal growth factor receptor 2-negative; HR+, hormone-receptor positive; IPTW, inverse probability of treatment weighting; mBC, metastatic breast cancer; RWD, real-world data. ^aRWD as reference.

https://doi.org/10.1371/journal.pone.0227256.g003

Discussion

To our knowledge, this is the first study in oncology clinical research to establish concordance on time-dependent efficacy endpoints between real-world and RCT datasets. Our analysis found that after IPTW adjustment for potentially confounding demographic and clinical characteristics, tumor burden endpoints such as rwPFS and rwRR derived from curated real-world data were similar to those observed in an RCT in women treated with letrozole monotherapy as first-line treatment for HR+/HER2– mBC. Median rwPFS in the real-world cohort was 18.4 months versus a median PFS of 16.6 months in the PALOMA-2 cohort, with a rwRR of 41.8% versus an ORR of 39.4% in PALOMA-2 patients.

As the number of novel oncology therapies entering the market increases, the need to assess the efficacy of these therapies relative to one another will become increasingly important. Reliable real-world data can help efficiently address this growing demand. There is a growing interest in the use of real-world data to support modern clinical trial design. Real-world data can facilitate the study of new agents in populations that are more reflective of the diverse patients encountered in routine clinical practice, either as internal control arms or as external control arms for single-arm trials. [1–4] At the regulatory level, single-arm trials with surrogate endpoints supported by external control data could be the basis for rapid approval of novel agents with exceptional clinical activity, while high quality phase IV studies in the real-world setting could provide confirmatory evidence following accelerated approvals. [2]

If real-world data are to be integrated into clinical trials, increasing confidence in the validity of real-world endpoints is critical. Conventional RECIST-based assessment relies on quantitative measurement of target lesions with consistent imaging modality and strict assessment intervals. In real-world clinical practice, the assessment of progression or treatment response is often qualitative and based on diverse clinical factors, including imaging studies, clinical presentation, and patient-specific factors such as performance status.

This analysis demonstrated consistency between rwPFS/rwRR and RECIST-based correlates despite these fundamental differences. A key advantage of this work was that the endpoints were subjected to similar analytic conditions as would be expected for traditional clinical trial endpoints and performed similarly. Available individual patient-level data from the PALOMA-2 cohort allowed for patient-level weighting of study populations and increased confidence in results.

There were several limitations of this analysis. Differences in clinical and sociodemographic characteristics were observed between the real-world and PALOMA-2 cohorts that confirm the well-established observation that patients who enroll in RCTs tend to be younger, healthier, and less racially and ethnically diverse than the general population of cancer patients. [5] Inverse probability of treatment weighting was used to control for these imbalances, and IPTW-adjusted baseline characteristics were comparable between the 2 cohorts. However, IPTW cannot completely overcome initial selection bias and does not control for unobserved confounders; as a result, unmeasured confounding may still be present even in the weighted observations.

In addition, although the inclusion/exclusion criteria for the real-world cohort were designed to align with those of PALOMA-2 as much as possible, there were differences in selection criteria between the 2 groups. In PALOMA-2, prior adjuvant or neoadjuvant therapy with a nonsteroidal AI was allowed unless disease recurred while the patient was on therapy or within 12 months of therapy completion. While start and stop dates of other endocrine therapies were abstracted from unstructured chart data in the real-world cohort, it was not feasible to determine timing relative to disease recurrence, so patients with a history of prior AI therapy were not excluded. However, it could reasonably be inferred that treating physicians followed current treatment guidelines, which recommend that patients who received prior endocrine therapy within 1 year of recurrence be treated with a different endocrine therapy.

Similarly, postmenopausal status was an eligibility criterion in the prospective PALOMA-2 trial, but was not a requirement for inclusion in the retrospective real-world cohort. In the PALOMA-2 study, rigorous screening criteria were in place to ensure all enrolled patients were postmenopausal. [13]. In routine clinical practice, however, menopausal status often goes undocumented in the EHR. In the real-world dataset, menopausal status was recorded only when explicitly stated in the patient’s chart and age was not used as a proxy. As a result, approximately one-third of patients in the real-world cohort had a menopausal status of “unknown.” All of these patients were over the age of 50, and all but 1 was over the age of 60. The differences in menopausal status were partially adjusted for by the inclusion of age as a variable in the computation of the weights in the IPTW process—as indicated by a change in standardized difference from 0.7175 before IPTW adjustment to 0.0222 after (Table 4).

The real-world patients in this analysis had longer unadjusted rwPFS, possibly due to the higher proportion of bone-only metastases—which is a potential indicator of more indolent disease. [22, 23]—in the real-world cohort. The between-groups difference was substantially reduced following IPTW adjustment (HR = 1.04; 95% CI 0.69–1.56).

In addition, in contrast to the global PALOMA-2 study, patients in the real-world cohort are all from the US and receive care in routine clinical settings, which may have contributed to the observed differences in the frequency of tumor assessments. The PALOMA-2 protocol specified that tumor assessments be conducted every 12 weeks, while in the real-world cohort scans were ordered at the discretion of the treating physician. It is noteworthy that 12 of the patients without tumor assessments had durations of treatment longer than 12 weeks.

Despite these limitations, this analysis increases confidence that data from real-world health care databases can be used to match the populations of randomized clinical trials and to assess key outcomes in clinical practice settings. Although this is, to our knowledge, the first study of its kind in oncology, a similar analysis of data from a large health care database successfully mirrored the composite endpoints of the pivotal ONTARGET trial of the angiotensin receptor blocker telmisartan. [24] The analysis of data from more than 50,000 patients took approximately 12 weeks at a fraction of the cost of the pivotal trial. [24]

Deriving endpoints in the oncology setting is admittedly a more labor-intensive task, requiring manual review of unstructured chart elements (eg, clinician notes, radiology reports) to arrive at high quality clinical outcome data. Enhancing the interoperability of EHRs and improving the capture of outcomes data are core goals of regulatory and private sector efforts to promote meaningful use of health information technology (HIT). [25] Modifying EHRs to include structured fields that capture progression and response and training clinicians to enter relevant data into the correct fields may provide an easier path to capturing outcome measures in oncology clinical practice settings, and facilitate both retrospective and prospective studies of real-world outcomes. Such an effort would require the coordinated efforts of multiple stakeholders to provide the necessary HIT framework, education, and support to physicians and allied health professionals.

Conclusions

This study is a preliminary but important step in showing that clinically meaningful information can be derived from the assessment of rwPFS and rwRR based on EHR data abstraction when proper quality controls and analytic methods are incorporated. Although limited to patients with mBC, the current study lays the groundwork for additional analyses that could be used to investigate treatment effects using real-world data in other malignancies. With further validation, real-world data may help to modernize the clinical trial landscape and enhance the design of prospective real-world randomized studies.

Supporting information

S1 Appendix. List of independent ethics committees or institutional review boards.

https://doi.org/10.1371/journal.pone.0227256.s001

(DOCX)

Acknowledgments

The authors also wish to thank Amy P. Abernethy, MD, PhD, for her work on developing this manuscript while at Flatiron Health Inc.; Dr. Abernethy’s participation in the development of this manuscript occurred prior to her appointment as Principal Deputy Commissioner of the U.S. Food and Drug Administration.

References

1. Gill J, d'Angela D, Berger K, Dank M, Duncombe R, Fink-Wagner A, et al. RWE in Europe Paper IV: Engaging pharma in the RWE Roadmap. London, UK: London School of Economics, 2018.
2. Khozin S, Blumenthal GM, Pazdur R. Real-world data for clinical evidence generation in oncology. J Natl Cancer Inst. 2017; 109(11): [Epub].
- View Article
- Google Scholar
3. Berger ML, Lipset C, Gutteridge A, Axelsen K, Subedi P, Madigan D. Optimizing the leveraging of real-world data to improve the development and use of medicines. Value Health. 2015; 18(1): 127–30. pmid:25595243
- View Article
- PubMed/NCBI
- Google Scholar
4. Sherman RE, Anderson SA, Dal Pan GJ, Gray GW, Gross T, Hunter NL, et al. Real-world evidence—what is it and what can it tell us? N Engl J Med. 2016; 375(23): 2293–7. pmid:27959688
- View Article
- PubMed/NCBI
- Google Scholar
5. Murthy VH, Krumholz HM, Gross CP. Participation in cancer clinical trials: race-, sex-, and age-based disparities. JAMA. 2004; 291(22): 2720–6. pmid:15187053
- View Article
- PubMed/NCBI
- Google Scholar
6. Gyawali B, Parsad S, Feinberg BA, Nabhan C. Real-world evidence and randomized studies in the precision oncology era: the right balance. JCO Precis Oncol. 2017: [Epub].
- View Article
- Google Scholar
7. Jarow JP, LaVange L, Woodcock J. Multidimensional evidence generation and FDA regulatory decision making: defining and using "real-world" data. JAMA. 2017; 318(8): 703–4. pmid:28715550
- View Article
- PubMed/NCBI
- Google Scholar
8. Miksad RA, Abernethy AP. Harnessing the power of real-world evidence (RWE): a checklist to ensure regulatory-grade data quality. Clin Pharmacol Ther. 2018; 103(2): 202–5. pmid:29214638
- View Article
- PubMed/NCBI
- Google Scholar
9. Eisenhauer EA, Therasse P, Bogaerts J, Schwartz LH, Sargent D, Ford R, et al. New response evaluation criteria in solid tumours: revised RECIST guideline (version 1.1). Eur J Cancer. 2009; 45(2): 228–47. pmid:19097774
- View Article
- PubMed/NCBI
- Google Scholar
10. US Department of Health and Human Services Food and Drug Administration Center for Drug Evaluation and Research (CDER), Center for Biologics Evaluation and Research (CBER). Guidance for Industry: Clinical Trial Endpoints for the Approval of Cancer Drugs and Biologics. Rockville, MD: US Department of Health and Human Services, 2007.
11. Najafzadeh M, Schneeweiss S. From trial to target populations—calibrating real-world data. N Engl J Med. 2017; 376(13): 1203–5. pmid:28355503
- View Article
- PubMed/NCBI
- Google Scholar
12. Austin PC. Variance estimation when using inverse probability of treatment weighting (IPTW) with survival analysis. Stat Med. 2016; 35(30): 5642–55. pmid:27549016
- View Article
- PubMed/NCBI
- Google Scholar
13. Finn RS, Martin M, Rugo HS, Jones S, Im SA, Gelmon K, et al. Palbociclib and letrozole in advanced breast cancer. N Engl J Med. 2016; 375(20): 1925–36. pmid:27959613
- View Article
- PubMed/NCBI
- Google Scholar
14. Austin PC. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behav Res. 2011; 46(3): 399–424. pmid:21818162
- View Article
- PubMed/NCBI
- Google Scholar
15. Austin PC. The performance of different propensity score methods for estimating marginal hazard ratios. Stat Med. 2013; 32(16): 2837–49. pmid:23239115
- View Article
- PubMed/NCBI
- Google Scholar
16. Austin PC, Stuart EA. Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Stat Med. 2015; 34(28): 3661–79. pmid:26238958
- View Article
- PubMed/NCBI
- Google Scholar
17. Flatiron Health. Life Sciences New York, NY: Flatiron Health, Inc.; No Date [updated Unknown; cited 2018 September 28]. Available from: https://flatiron.com/real-world-evidence/.
18. National Comprehensive Cancer Network. NCCN Clinical Practice Guidelines in Oncology (NCCN Guidelines^®). Breast Cancer. Version 1.2018. Fort Washington, PA: National Comprehensive Cancer Network, 2018.
19. Griffith SD, Miksad RA, Calkins G, You P, Lipitz NG, Bourla AB, et al. Assessing feasibility and validating real-world tumor progression endpoints and their association with overall survival in a large advanced non-small cell lung cancer dataset. JCO Clin Cancer Inform. 2019: In press.
- View Article
- Google Scholar
20. Austin PC, Stuart EA. The performance of inverse probability of treatment weighting and full matching on the propensity score in the presence of model misspecification when estimating the effect of treatment on survival outcomes. Stat Methods Med Res. 2017; 26(4): 1654–70. pmid:25934643
- View Article
- PubMed/NCBI
- Google Scholar
21. Austin PC, Schuster T. The performance of different propensity score methods for estimating absolute effects of treatments on survival outcomes: a simulation study. Stat Methods Med Res. 2016; 25(5): 2214–37. pmid:24463885
- View Article
- PubMed/NCBI
- Google Scholar
22. Lee SJ, Park S, Ahn HK, Yi JH, Cho EY, Sun JM, et al. Implications of bone-only metastases in breast cancer: favorable preference with excellent outcomes of hormone receptor positive breast cancer. Cancer Res Treat. 2011; 43(2): 89–95. pmid:21811424
- View Article
- PubMed/NCBI
- Google Scholar
23. Parkes A, Clifton K, Al-Awadhi A, Oke O, Warneke CL, Litton JK, et al. Characterization of bone only metastasis patients with respect to tumor subtypes. npj Breast Cancer. 2018; 4(1): 2.
- View Article
- Google Scholar
24. Fralick M, Kesselheim AS, Avorn J, Schneeweiss S. Use of Health Care Databases to Support Supplemental Indications of Approved Medications. JAMA Intern Med. 2018; 178(1): 55–63. pmid:29159410
- View Article
- PubMed/NCBI
- Google Scholar
25. Office of the Natiice of the National Coordinator for Health Information Technology (ONC). 2018 Report to Congress: Annual Update on the Adoption of a Nationwide System for the Electronic Use and Exchange of Health Information. Washington, DC: Office of the National Coordinator for Health Information Technology (ONC), U.S. Department of Health and Human Services, 2018 December. Report No.: 2018–12.

[ref1] 1. Gill J, d'Angela D, Berger K, Dank M, Duncombe R, Fink-Wagner A, et al. RWE in Europe Paper IV: Engaging pharma in the RWE Roadmap. London, UK: London School of Economics, 2018.

[ref2] 2. Khozin S, Blumenthal GM, Pazdur R. Real-world data for clinical evidence generation in oncology. J Natl Cancer Inst. 2017; 109(11): [Epub].
View Article
Google Scholar

[3] View Article

[4] Google Scholar

[ref3] 3. Berger ML, Lipset C, Gutteridge A, Axelsen K, Subedi P, Madigan D. Optimizing the leveraging of real-world data to improve the development and use of medicines. Value Health. 2015; 18(1): 127–30. pmid:25595243
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref4] 4. Sherman RE, Anderson SA, Dal Pan GJ, Gray GW, Gross T, Hunter NL, et al. Real-world evidence—what is it and what can it tell us? N Engl J Med. 2016; 375(23): 2293–7. pmid:27959688
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref5] 5. Murthy VH, Krumholz HM, Gross CP. Participation in cancer clinical trials: race-, sex-, and age-based disparities. JAMA. 2004; 291(22): 2720–6. pmid:15187053
View Article
PubMed/NCBI
Google Scholar

[14] View Article

[15] PubMed/NCBI

[16] Google Scholar

[ref6] 6. Gyawali B, Parsad S, Feinberg BA, Nabhan C. Real-world evidence and randomized studies in the precision oncology era: the right balance. JCO Precis Oncol. 2017: [Epub].
View Article
Google Scholar

[18] View Article

[19] Google Scholar

[ref7] 7. Jarow JP, LaVange L, Woodcock J. Multidimensional evidence generation and FDA regulatory decision making: defining and using "real-world" data. JAMA. 2017; 318(8): 703–4. pmid:28715550
View Article
PubMed/NCBI
Google Scholar

[21] View Article

[22] PubMed/NCBI

[23] Google Scholar

[ref8] 8. Miksad RA, Abernethy AP. Harnessing the power of real-world evidence (RWE): a checklist to ensure regulatory-grade data quality. Clin Pharmacol Ther. 2018; 103(2): 202–5. pmid:29214638
View Article
PubMed/NCBI
Google Scholar

[25] View Article

[26] PubMed/NCBI

[27] Google Scholar

[ref9] 9. Eisenhauer EA, Therasse P, Bogaerts J, Schwartz LH, Sargent D, Ford R, et al. New response evaluation criteria in solid tumours: revised RECIST guideline (version 1.1). Eur J Cancer. 2009; 45(2): 228–47. pmid:19097774
View Article
PubMed/NCBI
Google Scholar

[29] View Article

[30] PubMed/NCBI

[31] Google Scholar

[ref10] 10. US Department of Health and Human Services Food and Drug Administration Center for Drug Evaluation and Research (CDER), Center for Biologics Evaluation and Research (CBER). Guidance for Industry: Clinical Trial Endpoints for the Approval of Cancer Drugs and Biologics. Rockville, MD: US Department of Health and Human Services, 2007.

[ref11] 11. Najafzadeh M, Schneeweiss S. From trial to target populations—calibrating real-world data. N Engl J Med. 2017; 376(13): 1203–5. pmid:28355503
View Article
PubMed/NCBI
Google Scholar

[34] View Article

[35] PubMed/NCBI

[36] Google Scholar

[ref12] 12. Austin PC. Variance estimation when using inverse probability of treatment weighting (IPTW) with survival analysis. Stat Med. 2016; 35(30): 5642–55. pmid:27549016
View Article
PubMed/NCBI
Google Scholar

[38] View Article

[39] PubMed/NCBI

[40] Google Scholar

[ref13] 13. Finn RS, Martin M, Rugo HS, Jones S, Im SA, Gelmon K, et al. Palbociclib and letrozole in advanced breast cancer. N Engl J Med. 2016; 375(20): 1925–36. pmid:27959613
View Article
PubMed/NCBI
Google Scholar

[42] View Article

[43] PubMed/NCBI

[44] Google Scholar

[ref14] 14. Austin PC. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behav Res. 2011; 46(3): 399–424. pmid:21818162
View Article
PubMed/NCBI
Google Scholar

[46] View Article

[47] PubMed/NCBI

[48] Google Scholar

[ref15] 15. Austin PC. The performance of different propensity score methods for estimating marginal hazard ratios. Stat Med. 2013; 32(16): 2837–49. pmid:23239115
View Article
PubMed/NCBI
Google Scholar

[50] View Article

[51] PubMed/NCBI

[52] Google Scholar

[ref16] 16. Austin PC, Stuart EA. Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Stat Med. 2015; 34(28): 3661–79. pmid:26238958
View Article
PubMed/NCBI
Google Scholar

[54] View Article

[55] PubMed/NCBI

[56] Google Scholar

[ref17] 17. Flatiron Health. Life Sciences New York, NY: Flatiron Health, Inc.; No Date [updated Unknown; cited 2018 September 28]. Available from: https://flatiron.com/real-world-evidence/.

[ref18] 18. National Comprehensive Cancer Network. NCCN Clinical Practice Guidelines in Oncology (NCCN Guidelines^®). Breast Cancer. Version 1.2018. Fort Washington, PA: National Comprehensive Cancer Network, 2018.

[ref19] 19. Griffith SD, Miksad RA, Calkins G, You P, Lipitz NG, Bourla AB, et al. Assessing feasibility and validating real-world tumor progression endpoints and their association with overall survival in a large advanced non-small cell lung cancer dataset. JCO Clin Cancer Inform. 2019: In press.
View Article
Google Scholar

[60] View Article

[61] Google Scholar

[ref20] 20. Austin PC, Stuart EA. The performance of inverse probability of treatment weighting and full matching on the propensity score in the presence of model misspecification when estimating the effect of treatment on survival outcomes. Stat Methods Med Res. 2017; 26(4): 1654–70. pmid:25934643
View Article
PubMed/NCBI
Google Scholar

[63] View Article

[64] PubMed/NCBI

[65] Google Scholar

[ref21] 21. Austin PC, Schuster T. The performance of different propensity score methods for estimating absolute effects of treatments on survival outcomes: a simulation study. Stat Methods Med Res. 2016; 25(5): 2214–37. pmid:24463885
View Article
PubMed/NCBI
Google Scholar

[67] View Article

[68] PubMed/NCBI

[69] Google Scholar

[ref22] 22. Lee SJ, Park S, Ahn HK, Yi JH, Cho EY, Sun JM, et al. Implications of bone-only metastases in breast cancer: favorable preference with excellent outcomes of hormone receptor positive breast cancer. Cancer Res Treat. 2011; 43(2): 89–95. pmid:21811424
View Article
PubMed/NCBI
Google Scholar

[71] View Article

[72] PubMed/NCBI

[73] Google Scholar

[ref23] 23. Parkes A, Clifton K, Al-Awadhi A, Oke O, Warneke CL, Litton JK, et al. Characterization of bone only metastasis patients with respect to tumor subtypes. npj Breast Cancer. 2018; 4(1): 2.
View Article
Google Scholar

[75] View Article

[76] Google Scholar

[ref24] 24. Fralick M, Kesselheim AS, Avorn J, Schneeweiss S. Use of Health Care Databases to Support Supplemental Indications of Approved Medications. JAMA Intern Med. 2018; 178(1): 55–63. pmid:29159410
View Article
PubMed/NCBI
Google Scholar

[78] View Article

[79] PubMed/NCBI

[80] Google Scholar

[ref25] 25. Office of the Natiice of the National Coordinator for Health Information Technology (ONC). 2018 Report to Congress: Annual Update on the Adoption of a Nationwide System for the Electronic Use and Exchange of Health Information. Washington, DC: Office of the National Coordinator for Health Information Technology (ONC), U.S. Department of Health and Human Services, 2018 December. Report No.: 2018–12.

Figures

Abstract

Introduction

Design and methods

Study design and patients

Treatment

Endpoints and assessment

Analysis and statistical methods

Results

Patient population

PFS and treatment duration

Tumor response

Discussion

Conclusions

Supporting information

S1 Appendix. List of independent ethics committees or institutional review boards.

Acknowledgments

References