Background
Amyotrophic Lateral Sclerosis (ALS) is a fatal neuro-degenerative disease. While over 50 clinical trials have been conducted over the last two decades, none have been successful save riluzole and edaravone [
1], which at best offer modest improvements in survival or function [
2]. While many studies may have failed because the drugs were ineffective, a recurring theme in ALS are trials which do not meet their primary outcome but yield indeterminate results [
3]. Two major hurdles to conducting ALS trials are the rarity of ALS (3.9 in every 100,000 people in the US [
4]) and the disease’s heterogeneity [
5], which is a barrier to properly powered studies.
Methodology for rare-disease clinical trials is an important area of study for ALS researchers [
6]. Enriching trials with historic controls has become possible due to the creation of large pooled placebo data sets [
7] and is an approach used for selection of drugs for larger studies, such as in the lithium and rasagiline study [
8‐
10]. Other benefits to large databases of ALS patients include constructing predictive models for screening particular subgroups of patients, which could reduce the heterogeneity of disease progression observed in the trial, or making interim decisions during the conduct of a clinical trial based on predicted and observed disease progression.
The wide implementation of Electronic Medical Record (EMR) systems across the United States, using one of two commercial systems, and the development automated abstraction and de-identification of data, create opportunities to: 1) better understand ALS disease progression and determinants of survival in the clinical setting; 2) use clinical data to enrich existing placebo-arm data sets to improve the power of trials; and 3) leverage this electronic infrastructure to run clinical trials – including EMR-based screening, randomization, and data collection. For these approaches to be worthwhile, we need to be able to demonstrate the feasibility of automatically extracting the data required for modelling ALS disease progression and survival directly from the EMR.
We consider the feasibility of constructing statistical models built with automatically captured EMR patient data from our ALS clinic at the University of Kansas Medical Center (KUMC). This is a key first-step in utilizing the EMR to augment clinical trials.
Discussion
Here we demonstrate the feasibility of using an automated extraction tool (HERON) to obtain ALS patient data directly from the KUMC EMR which could be used for analysis of ALS disease progression and survival. While data pertaining to demographic, ALSFRS-R, and survival information was both readily obtainable and accurate, some key variables (especially disease onset time and riluzole use) were only available via manual EMR review and/or suffered from large amounts of missing data.
The main advantages to using automatic tools such as HERON includes that they can drastically reduce the amount of time needed to accurately capture EMR data when compared to a manual review of the EMR. This methodology is generalizable across other research sites: EPIC is one of the two major EMR record systems in the US, serving over 50% of patients in the US [
29], and represents a large number of academic centers with ALS clinics. The automatic extraction tool HERON is powered by i2b2, which is used by dozens of research institutions within the US and abroad [
30].
Looking towards the future, as EMR data becomes more complete, other advantages of using this approach will emerge. Advantages to complete and comprehensive ALS records in the EMR include allowing clinicians to track the performance of their patients clinic-wide and compare these to other ALS clinics, for both research and quality control purposes. For example, the average ALSFRS-R decline per month in the KUMC clinic of 0.64 is somewhat high compared to reports from other clinics, which report monthly ALSFRS-R declines of between 0.36 to 0.65 [
14,
31‐
33]. Note that this may be because we were unable to adjust for how long patients’ have had the disease.
Other future advantages include the ability to perform retrospective studies quickly and efficiently, which could create support for new therapeutics or improvements to standards of care. This depends heavily on tracking of patients’ use of therapeutics in a way that is accessible in the EMR. EMR data could also be used to augment clinical trial data, being used as either a placebo/ standard of care arm or as historical controls [
34]. This has become a vital issue for the broader ALS community. For example, approval of edaravone in the US has raised many questions about which patients will benefit from this therapy and for how long. This could be answered by pooling ALS clinic data. In addition, edaravone has put a limit on how broadly existing placebo data sets like PRO-ACT can be used for historical controls in clinical trials. Contemporary controls captured through automated EMR data abstraction could be one solution to this problem [
1,
35,
36].
One current criticism of ALS clinical trials is that the ALS patients who serve in these trials are not representative of the general population [
37], which is likely due to the rigorous inclusion/exclusion criteria for these trials. One simple solution to make ALS trials more representative is to simply modify the inclusion/exclusion criteria – however the resulting increased patient variability would require very large studies. Again we see the potential utility of EMR data: with a more general trial population, we would be free to use the EMR to augment the control population for these trials. Networks such as the Northeast or Western ALS Study Groups [
38] could provide placebo or standard-of-care arms in a variety of designs, and could make such large-scale studies possible.
The main disadvantage of this approach is the current lack of completeness of the EMR with respect to critical ALS data, resulting in incomplete statistical models. To use the EMR as we propose across multiple academic centers, the ALS community would need to agree on a set of common data elements or ALS-related forms to capture in the EMR. Such agreement could allow common data dictionaries to be used to allow for automated data capture not just across academic centers, but across different EMR platforms (i.e. Epic and Cerner). Furthermore, physicians and their clinic personnel would need to adhere to these data dictionaries, and then rigorously enter all the required data for each patient at each visit. Many efforts have already been made toward developing these common data sets for ALS: much of the field already captures the ALSFRS-R, the FVC, and details about the diagnosis at each visit. In addition several initiatives are underway to standardize forms across institutions, with a suite of ALS clinic forms available for download through Epic Central.
One example of critical information that needs to be collected in a standardized way is disease onset time. Because disease duration (which is derived from disease onset time) is critical for both survival and disease progression modelling [
5,
12,
24,
25,
39], it is necessary that ALS clinics dedicate a data-capture form for this, as opposed to entering it as free-text notes/comments where it is difficult to find systematically. Other critical variables include usage of approved therapeutics (such as riluzole or ederavone), time of diagnosis, and location of symptom onset.
Conclusions
We were able to use automated extraction tools to accurately obtain necessary variables from the EMR with which to create simple statistical models of both ALS disease progression and survival time. Key variables that might offer large improvements to these models (such as disease onset time or riluzole use) were unavailable via automatic extraction. In the future, as automated EMR data abstraction becomes increasingly important for post-marketing surveillance of FDA approved drugs, or for use as concurrent controls, the ALS community will need to adopt common data elements for the EMR. Optimal use of the EMR requires disease-specific key variables, such as disease-onset time for ALS, to be identifiable and obtainable by data extraction tools as well as rigorous data entry by clinical staff.
Acknowledgements
Not applicable