INTRODUCTION

Since its emergence in December 2019, SARS-CoV-2 has infected more than 200 million people around the world.1 The United States (US) accounts for approximately 4% of the world’s population but in the first 9 months of the pandemic recorded approximately 26% of reported global cases and 23% of the world’s COVID-19-related deaths.1, 2 At the start of the pandemic, no treatments beyond supportive care had been proven to be effective for COVID-19; therefore, there was an urgent need to test therapeutic strategies, including both repurposed medications and novel agents. As with any therapeutic intervention, randomized controlled trials are the single best tool to evaluate safety and efficacy of possible treatments against this novel pathogen.3, 4

The mechanisms that underlie the variable clinical syndrome caused by SARS-CoV-2 remain an area of active investigation.3 Many hypotheses have been suggested, which have received heightened publicity in both the scientific and lay press. At the beginning of the pandemic, in order to both maintain social distancing mandates as well as to divert crucial resources to hospitals’ efforts to develop SARS-CoV-2 testing and expand capacity for clinical care, the healthcare and biomedical research sectors temporarily stopped much of their non-COVID-19-related research. The urgency of the global threat, the temporary halting of existing research programs, and the availability of COVID-19-related research funding resulted in an unprecedented rise in COVID-19-related research.5, 6 Interventional trials focused on identifying effective treatments were rapidly developed and launched, in some cases in a matter of weeks.

Several centrally coordinated national trials platforms exist across the US, many of which transitioned their entire infrastructure to running COVID-19-related clinical trials, but the need for trials was far greater than the capacity of these networks.7, 8 Therefore, individual academic medical centers and pharmaceutical companies developed and carried out site-specific clinical trials. Such an uncoordinated approach risks conducting duplicative studies of the same interventions, underpowering studies, and extending the length of time needed to determine definitive answers. Having broad networks across the country is particularly critical in a pandemic during which cases geographically surge and recede, affecting an individual trial site’s ability to consistently enroll participants.

The landscape of pathophysiologic mechanisms under study, how the distribution of trial sites across the US compared to the national COVID-19 caseload, and the likelihood that the registered studies were powered to address the hypotheses under investigation remain unknown. To address this gap in knowledge, we leveraged data collected on the clinicaltrials.gov platform in the first 9 months of the global pandemic to characterize the landscape of clinical trials for treatments of COVID-19 across the US.

METHODS

Data Source

We conducted a cross-sectional study of all randomized, interventional trials for patients diagnosed with COVID-19 in the US that were registered on www.clinicaltrials.gov and started enrolling as of August 10, 2020 (approximately 9 months after the first cases of COVID-19 in the world were identified).9 We excluded clinical trials of vaccines and other interventions attempting to prevent (rather than treat) COVID-19, as well as studies that were registered but had not yet begun enrollment. Research ethics approval was not required because www.clinicaltrials.gov is a publicly available data repository.

Study Characteristics

We used natural language processing (i.e., regular expression) and manual data extraction to record information for each registered clinical trial. We extracted data related to trial-level characteristics including study design (e.g., blinding), study location, study setting (i.e., outpatient, inpatient, or both), funding source, patient population, planned sample size, intervention, comparator (i.e., placebo-controlled or active comparator(s)), and the trial’s primary outcome. Details related to the primary outcome included whether the outcome was a surrogate outcome or clinical outcome and whether it was a single or composite outcome. We refer to studies that were registered as phase 2/3 or phase 3 as “phase 3” studies.

We categorized the available interventions based on their mechanism of action (e.g., antiviral, antibiotic, anticoagulant). For clinical trials involving more than one intervention, we collected data on each planned intervention. In terms of geographic distribution, we identified all the participating study centers indicated on clinicaltrials.gov.

Statistical Analysis

To categorize the treatments studied in the selection of COVID-19 clinical trials, we performed keyword detection for drug or therapy names using regular expressions on the text descriptions given in the “Arms/Intervention” fields of the records from clinicaltrials.gov. Categorization of primary outcomes was carried out in a similar manner, with the descriptions given in the “Primary Outcome Measures” fields. Keyword detection was performed using the R programming language (RCore Team, 2020) with functions for regular expressions from the stringr package. Labeling of treatment and primary outcomes was then checked manually for each trial to ensure accurate labeling.

We used descriptive statistics to summarize the clinical trials’ characteristics. We then calculated the statistical power of the phase 3 trials in their current form compared to their statistical power if the patients from the individual trials for a given therapeutic intervention were pooled into a single trial. For phase 3 studies, we tabulated the sample size per study arm and calculated the statistical power for the trial as registered and again if all of the individual arms for a given treatment were pooled (i.e., if all the patients contributed to one single trial rather than having many smaller individual trials). We assumed a 15% risk of mortality, an alpha of 0.05, and one-to-one randomization,10 and tested the robustness of our results by comparing model estimates to those with assumed mortality of 10% and 20%, respectively. We used the Miettinen formula to calculate power using the open-source EpiSheets software (R package episheet source).11 We used population census data from 2019 to calculate the standardized proportion of clinical trials by state per 100,000 people living within each state.12 We also reported the clinical trials by the number of COVID-19 cases per 100,000 people in that state.12

RESULTS

We identified 200 interventional clinical trials for patients with COVID-19 (Figure 1). The median planned sample size was 150 patients (IQR 60–400), 9 (5%) had completed enrollment, 87 (44%) were single-center, 64 (32%) were unblinded, and 136 (68%) were placebo-controlled. Most studies (N=115, 58%) specified the sponsor as “Other.” The primary study sponsor was pharmaceutical companies for 80 trials (40%), with an additional 3% (N=5) of trials funded by the National Institutes of Health (NIH) or other sources of federal funding.

Figure 1.
figure 1

Flow diagram.

Figure 2.
figure 2

Phase 3 clinical trial enrollment by treatment category. ACE, angiotensin-converting enzyme; Sm. Mol, small molecule.

Nearly all studies required patients to be at least 18 years of age (N=188, 94%) and 26% (N=52) had an upper age limit for inclusion; 32 trials had upper age limits ranging between 50 and 85 years of age, with an additional 20 having upper age limits above this age. Clinical trials also commonly excluded pregnant women (73%); other populations excluded by multiple trials included people with HIV/AIDS (19%), people who were incarcerated (9%), and people with mental health diagnoses (8%) (Table 1).

Table 1 Trial Characteristics

The most common categories of treatments by mechanism of action included monoclonal antibodies (N=46), small molecule immunomodulators (N=28), antivirals (N=24), hydroxychloroquine (N=20), and polyclonal antibodies (N=15) (Table 1).

For phase 3 trials, the most common categories of treatments included monoclonal antibodies (N=13), antiviral medications (N=10), and chloroquine/hydroxychloroquine (N=9) (Table 2, Figure 2). Nearly all trials were of patients hospitalized with COVID-19 (N=45, 83%) and 80% (N=36) of the inpatient trials required patients to have hypoxia to be included. Of the inpatient studies, 27% of studies explicitly included patients who were critically ill (e.g., ICU, mechanical ventilation) (N=12 of 45). The median planned enrollment for the phase 3 trials was 465 (IQR 302–1029) (Table 2). The most common primary outcomes were symptom severity (N=23, 43%), a composite endpoint of mortality or ventilation (N=10, 19%), and mortality alone (N=6, 11%). For the three most common treatment categories (antivirals, monoclonal antibodies, chloroquine/hydroxychloroquine), the estimated power to detect a modest reduction in mortality (i.e., relative risk reduction of 0.80) was less than 25% (Table 3). In contrast, had the trials in each category of treatments been pooled (or the leading candidate tested across the total population), the power to detect this same reduction in mortality would have been greater than 98% (Table 3, e-table1 and 2).

Table 2 Characteristics of Phase 2/3 and Phase 3 Trials (N=54)
Table 3 Statistical Power to Detect a Relative Risk Reduction in Mortality for the Most Common Treatment Arms Across Different Baseline Rates of Inpatient Mortality
Figure 3.
figure 3

Confirmed cases of COVID-19 per 100,000 and available clinical trials by state. Legend—Gray states indicate no registered phase 3 trial centers.

In terms of geographic distribution, the number of trial centers per capita varied by region: Northeast (N=385 participating centers, 0.69 per 100,000 people), West (N=384, 0.49 per 100,000), South (N=469, 0.37 per 100,000), and Midwest (N=223, 0.33 per 100,000). The number of trial centers per 100,000 cases of COVID-19 also varied by region: Northeast (N=41.2 centers per 100,000 COVID-19 cases), West (N=36.1), Midwest (N=28.6), and South (N=20.5). On a per-case of COVID-19 basis, the Northeast had the most trial centers while the South had the fewest. (Figure 3)

At the state level, five states were participating in at least 40 unique trials, 10 states and Washington D.C. were participating in 20 to 39 unique trials, 22 states were participating in 5 to 19 unique trials, and 13 states were participating in fewer than 5 trials, including 4 states with no trials (AK, DL, ND, WY). On a per 100,000 population scale, the highest availability of trial centers included Washington D.C., Massachusetts, and Louisiana. On a per 100,000 cases of COVID-19 scale, the locations with the highest availability of trials were Washington D.C., Maine, and Hawaii, while the states with the lowest availability (after those states with no trials at all) included Arkansas, New Mexico, and West Virginia.

DISCUSSION

This study leverages the data infrastructure captured by clinicaltrials.gov to characterize the COVID-19 trials landscape approximately 9 months after the world’s first documented case of SARS-CoV-2. Of the 200 randomized trials identified, nearly half were single-center and almost one-third were not placebo-controlled. For a disease that surges and recedes in any one given location and without the possibility of historical controls given that this is a novel pathogen, these factors highlight important areas in which national clinical trial infrastructure and trial design might be optimized in this pandemic and beyond.

Perhaps the most crucial area in which the absence of a centralized national trials platform is most apparent is the proliferation of multiple underpowered trials instead of fewer, coordinated, multi-site studies. For example, there were four clinical trials of anticoagulation with a median planned sample size of about 760 patients each. Presuming a baseline mortality rate of 15%, the current trials each have 22% power to detect a 20% relative reduction in mortality. However, one centralized trial for anticoagulation would have had more than 4800 patients which could have provided 86% power to detect that 20% relative reduction in mortality (Table 3). While meta-analyses offer one strategy to pooling results across a number of smaller randomized trials in an attempt to draw broader conclusions,13 this approach can be challenging when trials use different endpoints. Such heterogeneity limits the ability of investigators to pool study results and to be able to rigorously account for disparate study designs and methodologies. Similar to the findings of Mathioudakis and colleagues,14 our analysis demonstrates that this was quite common: the primary outcomes across the registered clinical trials varied from changes in symptom severity to mortality. Utilization of pre-specified outcomes of interest that have been agreed upon by clinical and methodological experts and centralizing trials are two strategies to address this.15

Our results also identify substantial geographic variations that could exacerbate the well-documented disparities within populations affected by COVID-19.16 Access to clinical trials may have only been available to patients in regions where academic medical centers had the resources to launch trials quickly or previously established interactions with pharmaceutical companies that would have facilitated rapid roll-out of industry-sponsored studies. Geographic disparities would be expected to widen without intentional effort to ensure access to trials in rural areas.

Commonly identified exclusion criteria documented here are also important to consider. More than one-quarter of all studies excluded older adults—the population with the highest COVID-19-associated mortality. The rate of exclusion of older adults in these trials mirrors exclusions that have been previously described in non-COVID-19 studies.17, 18 Nearly three in four trials also excluded people who were pregnant, a common practice in non-COVID-19 trials as well.19 For example, early trials of the antiviral drug remdesivir excluded patients who were pregnant. However, the manufacturer developed a so-called compassionate use program to make the drug available to pregnant people outside the context of a clinical trial.20,21,22 Excluding people who are pregnant from the trial while then administering the treatment to that population ensures that the generated data will be limited in their ability to inform future counseling on the risks and benefits to both the person who is pregnant and the fetus.

This study and similar work using data from clinicaltrials.gov is only possible because of the universal requirement that investigators pre-register clinical trials, a requirement that has only been enforced since 2007.23 Such transparency allows accountability and provides some critical metrics that can be used to assess improvement. Our analysis also suggests areas where increased focus on transparency and information gathering could be beneficial. For example, the funding source was listed on clinicaltrials.gov as “other” for more than half of the trials in this analysis, with no further detail available. Additional details in this reporting may help promote funding transparency in biomedical research.

Our study has limitations. First, our results may not be generalizable to clinical trials outside of the US because our analysis did not include results from non-US registries. Second, our power calculations focused on planned rather than actual enrollment. Thus, our power calculations may overestimate the trial’s actual statistical power. Third, we based our power calculation on published event rates which is an imperfect approach representing population average estimates. As a result, individual studies may have had higher or lower event rates depending on the characteristics of the study populations. In addition, some trials did not perform 1 to 1 randomization and thus, some of our power estimates represent over-estimates because other randomization schemes require larger sample sizes. Fourth, we utilized word tagging, a form of natural language processing, to identify various datapoints reported in our study which could introduce misclassification bias. We performed manual checking to minimize misclassification, and while a small degree of residual misclassification could still be present, we expect it to be random.

In conclusion, in the first 9 months of the global pandemic, there was an unprecedented rise in COVID-19-related research, including hundreds of clinical trials launched in the US. This study highlights an important opportunity to improve national clinical trial infrastructure, which would allow the biomedical community to better leverage the critical investment of trial participants who are putting themselves at risk to advance new knowledge in this pandemic.