Introduction
The analysis and reporting of adverse event (AE) data gathered throughout the development, testing, and use of a drug are key to establish the drug’s safety profile. An AE is defined by the US Food and Drug Administration (FDA) as “any untoward medical occurrence associated with the use of a drug in humans, whether or not considered drug related” [
1]. Prior to marketing approval, information on AEs is derived from the drug’s preclinical studies and randomized controlled trials (RCTs) [
2,
3]. In the US, RCTs are subject to regulatory safety reporting requirements by the FDA [
4]. After a newly approved drug enters the marketplace, post-marketing surveillance systems can reveal AEs not detected during the pre-approval review [
5]. These systems are useful to generate hypotheses of potential drug-associated AEs but do not allow the quantification of risk; AE incidence cannot be calculated, as the number of events reported (numerator) is not representative of the actual number that occurred, and the number of subjects exposed to treatment or “safety population” (denominator) is not known [
6,
7]. In RCTs, the presence of a clearly defined safety population and a placebo comparator allow for both the calculation of AE incidence and the controlled comparison of AE rates [
8].
AEs detected in RCTs are presented in drug labels, as detailed in the FDA Guidance Document for Adverse Reactions Section of Labeling for Human Prescription Drug and Biological Products [
9]. The standard US label for FDA-approved drugs lists all the AEs identified in RCTs that occur with an incidence ≥ 2% in the drug treatment group and for which the rate for drug exceeds the rate for placebo [
10,
11]. AEs are presented in tabular summaries with counts and percentages of the number of subjects who experienced the event by treatment, thus enabling side-by-side comparison of AE incidence for drug and placebo. AEs are initially recorded by clinical investigators in their own words (verbatim terms). To provide a meaningful estimate of the proportion of individuals experiencing AEs and prevent diluting or obscuring the true effect of the AEs, events reported under different terms but representing the same phenomenon are coded using the Medical Dictionary for Regulatory Activities (MedDRA). The coded terms are then summarized and analyzed by System Organ Class and Preferred Term [
12,
13]. The drug label lists AEs as Preferred Terms or group of Preferred Terms.
There is increasing interest in enhancing the retrieval, analysis, and reporting of AEs in both pre- and post-marketing settings. Recently, regulators, including the FDA, have issued new guidance. As is known, MedDRA not only maintains the list of Preferred Terms but also provides a large number of Standardized MedDRA Queries (SMQs). SMQs are routinely used to facilitate retrieval of MedDRA-coded data as a first step in investigating drug safety issues in pre- and post-marketing. In 2022, a group of FDA medical experts presented the FDA Medical Queries (FMQs), developed specifically to assess the safety of new drugs in clinical development. Indeed, FMQs have the ability to consolidate medical conditions with scattered Preferred Terms, detecting more readily safety signals in RCT datasets [
14]. This same group of experts discussed standard safety tables and figures and provided new statistical considerations in the analyses of AEs. The FDA will implement the use of FMQs and these analytical methods and presentations for drug safety data [
14].
The above guidance to improve statistical analysis of AEs can be considered a follow-up to a 2015 FDA initiative, the Safety Research Interest Group, aiming at identifying gaps in safety reporting and addressing these gaps through targeted actions [
15]. The Group recognized, among several areas of safety-related needs, the necessity to (1) improve clinical trial statistical analyses for safety, including benefit-risk assessments, and (2) improve access to post-marketing data and explore the feasibility of their use in analyzing safety signals.
Here, we propose a means to improve clinical trial statistical analyses for safety. This work expands our efforts to implement drug safety signal detection, with a shifted focus from post- to pre-marketing data; previously, Hopkins et al. piloted a new method to evaluate the novelty of the safety profile of a drug in a new pharmacological class in development against the safety profile of drugs in an established pharmacological class by using Bayesian disproportionality analysis of post-market FDA Adverse Event Reporting System (FAERS) data [
16]. They showed that in lurasidone clinical trials in schizophrenia, half of subjects had AEs specific to atypical antipsychotics, whereas ulotaront, a novel trace amine-associated receptor 1 (TAAR1) agonist with 5-HT
1A agonist activity, which does not act via blockade of D
2 or 5-HT
2A receptors [
17‐
20], presented a lower cumulative rate of antipsychotic class-specific AEs [
16]. In an ensuing paper [
21], the authors demonstrated that the class-specific AEs in RCT data for risperidone, calculated as a cumulative function of the AEs’ disproportional reporting derived from FAERS data, were comparable to those first reported for other atypical antipsychotic drugs. Therefore, it can be postulated that cumulative AE curves represent a more objective approach to describe the qualitative similarities or, vice versa, differences in AE profiles between drugs.
In the current paper, we acknowledge that the traditional 2% incidence tables of the drug label presuppose identical AE duration and usually ignore recurrent events and competing risks occurring in the study population. If AE durations are different across treatment groups, then comparisons based on simple incidence produce biased results [
22]. Regulators have underlined the limitations of defining and measuring AEs using frequency tables and have called for caution in drawing any robust conclusion [
7]. Moreover, we highlight that tabular summaries do not include data that may be relevant to patients and healthcare professionals. Specifically, AE prevalence and duration can impact patients’ treatment satisfaction, adherence to medication, employment status, social activities, and, ultimately, quality of life. AE prevalence and duration are also crucial to fully understand the safety side of a drug’s benefit-risk ratio. To illustrate this point, we analyzed pooled data from five RCTs of the dopamine D
2 antagonist lurasidone [
23‐
27] and one RCT of the novel TAAR1 agonist ulotaront [
18] in acutely psychotic patients with schizophrenia: first, we calculated incidence, absolute prevalence, and expected duration of AEs; then, we developed and tested a new metric, the drug-placebo difference in AE prevalence.
Discussion
The drug label is a publicly available document that is uniquely placed to be an invaluable source of information for patients, healthcare providers, researchers, and regulators. Per FDA guidance [
35], the drug label must meet two criteria: (1) it must contain details and directions for healthcare providers to prescribe the drug safely and effectively, including the approved uses for the drug, contraindications, potential adverse reactions, available formulations and dosage, and how to administer the drug [21 CFR 201.56(a)(1)] and (2) it must be informative and accurate and neither promotional in tone nor false or misleading [21 CFR 201.56(a)(2)]. This information aligns with the FDA’s historic mission to protect consumers and the congressional mandates that the drug label “must be truthful” (the 1906 Wiley Act) [
36] and provide complete information of “the risks as well as the benefits” (the 1962 Drug Amendments) [
37].
Despite these efforts, current drug labels only report the incidence of the drug’s AEs (percent of subjects who reported the AE out of the total subjects in the RCT) and not the prevalence (percent of subject-days spent with an AE out of the total subject-days spent in the RCT) or duration (days required for the AE incidence to be reduced by half), which can be valuable for both patients and physicians. A recent review of adverse drug reaction data from 24 publicly available drug labels for antidepressants and anticonvulsants marketed in the USA showed that only one drug label out of the 24 contained information about AE duration [
38]. In this study, when we analyzed the pooled lurasidone data, we found that the two AEs “akathisia” and “nausea” in the drug arm have similar incidence (14.3% and 12.86%, respectively) but dissimilar duration (18 and 5 days, respectively). Physicians can hypothesize that “nausea” is short-lived compared to “akathisia” but cannot be certain, because the duration, despite being collected in RCTs, is usually neither analyzed nor reported. Consideration of side effect duration may help determine reversibility of AEs in safety analyses and enhance evaluation of drug safety signals for those AEs whose incidence rates are similar between the drug and placebo arms of RCTs. We also showed that the two AEs “akathisia” and “nausea” in the drug arm of the pooled lurasidone data had dissimilar prevalence (11.15% and 3.79%, respectively). Clearly, information on prevalence and duration, if reported for each RCT, would inform a patient-physician discussion not only on how likely the patient will be to experience a specific side effect but also on how much and how long the patient will suffer from it. For the FDA, additional information regarding AE prevalence and duration may inform inclusion of a side effect with low incidence (below the 2% threshold) where prevalence and/or duration in drug is greater than placebo.
We then calculated the drug-placebo difference in AE prevalence. The latter is an area under the curve (AUC) corresponding to the difference between the AUCs of drug and placebo (∆AUCO). It is defined as incidence by duration: the y-axis shows the incidence (percentage) of subjects who experience one or more AEs by increasing duration (days) on the x-axis. The quantification of this ∆AUCO offers more information than the difference in incidence between drug and placebo, thus improving safety signal detection. The evaluation of the contribution of individual AEs to ∆AUCO reveals a continuum of risk, from most drug-associated AEs to most placebo-associated AEs, based on the specific AE contributing to an increased or decreased drug-placebo difference in AE prevalence. For example, in pooled lurasidone data, akathisia contributes + 28.44% to (increased) drug-placebo difference in AE prevalence, while schizophrenia contributes − 3.35% to (decreased) drug-placebo difference in AE prevalence. From a clinical perspective, this means that in lurasidone trials almost one third of the drug-placebo difference in AE prevalence is attributable to one single AE, akathisia, an extrapyramidal symptom specific to D2 antipsychotics, while schizophrenia, an AE commonly related to the underlying disease, plays a marginal role in this difference. In ulotaront data, akathisia contributes 0.6%, while schizophrenia contributes 38.7%. Notably, with this metric, individual AEs that do not appear in the 2% incidence tables can, nevertheless, contribute substantially to the drug-placebo difference in AE prevalence, regardless of whether they are associated with drug treatment or lack thereof (placebo). Overall, these findings show that the drug-placebo difference in AE prevalence in the drug arm is generally greater than in the placebo arm, though ulotaront exhibits a drug-placebo difference in AE prevalence that is lower than placebo. This is attributable, in part, to a relatively lower incidence and shorter duration of AEs in the ulotaront arm, as well as the emergence of disease-related AEs in the placebo arm. These results underline how the reliability of detecting drug-associated AEs in clinical trials may be augmented by incorporating information on drug-placebo difference in AE prevalence.
The main reason for the analysis of AE information in RCTs being less than robust is that these trials are designed and statistically powered to establish the efficacy rather than the benefit-risk relationship of a drug [
22]. RCTs are often underpowered to evaluate the harm profile of a drug, which includes multiple, non-predefined AEs [
11,
39]. Moreover, safety analyses are usually limited to descriptive statistics and basic statistical computations, which are not particularly informative. A recent paper has shown that a search on Google Scholar for “new models to demonstrate efficacy in clinical trials” generated 1.2 million results, while the search for “new models to analyze safety” generated only 218,000 results [
40].
The analysis of AEs requires the evaluation and reporting of data on timing, duration, and severity of AEs, among other variables, as highlighted by two recent studies on AE burden in schizophrenia trials [
41,
42]. Over the years, a variety of unique methods have been proposed to analyze AEs beginning as early as 1989—though most were published after 2004 [
39]. These methods were summarized in a recent review and include graphical methods, hypothesis testing methods under the frequentist paradigm, estimation methods that quantify distributional differences in AEs between treatments without a formal test (e.g., risk differences, risk ratios, and odds ratios with CIs), and Bayesian methods that give the posterior probability of AEs [
39]. The use of these methods, however, is limited. An online survey of public sector and industry statisticians working in RCTs showed that only 38% were aware of these methods, and even less—approximately 13%—had used such methods [
43]. The most frequently cited reasons were the unsuitability of trial sample sizes, multitude of different AEs experienced in trials, technical complexity of most statistical methods, significant resources and time needed to implement these methods, and uncertainty about the level of agreement of regulators on these methods.
Our new metric adds AE duration to incidence, yielding a two-dimensional plot for both drug and placebo and providing a novel way to look at AE data. This metric reflects more accurately the impact of AEs on patients, offers a more robust understanding of safety risks for drug relative to placebo, and enables the quantification of the drug-specific side effect burden as measured by the absolute prevalence of AEs and by the drug-placebo difference in AE prevalence. After piloting this metric in lurasidone and ulotaront RCTs in schizophrenia, we replicated our results in RCTs for other drugs and indications.
It must be noted that in RCTs, some subjects drop out from treatment or study follow-up because of lack of efficacy, AEs, or loss to follow-up, among other reasons [
44]. Dropout from RCTs due to AEs impacts the estimation of a drug’s safety profile as it terminates AE data collection and generates missing values [
45‐
47]. The typical conduct and duration of RCTs often result in incomplete information on AEs lasting longer than the study end date and do not allow for follow-up within the RCT setting to fully characterize drug tolerability. A recent analysis of clinical trial safety results in ClinicalTrials.gov for FDA-approved drugs revealed that one of the main challenges in using AE data from RCTs for drug safety monitoring is that approximately half of all RCTs have missing data in the published report [
48]. The missing data are typically safety-related [
49]. Good clinical and research practices require the collection of AE data to begin at the study start (initiation of drug or placebo intervention) and continue until resolution. Follow-up is required for AEs that cause interruption or discontinuation of the study drug or those that are present at the end of study treatment (ongoing AEs). It is important for sponsors, investigators, and medical monitors to strive for improved clinical trial reporting practices to increase data quality, and limit missing data points, particularly for AE end dates that fall within the duration of the RCT. A limitation of our study is the high rate of AEs without an end date in the lurasidone and ulotaront trials, ranging from 17.5 to 45.9%, with the highest rates in the drug and placebo arms of the lurasidone trials. Despite this limitation, we demonstrated that the results of our study, including our new metric, were insensitive to missing data and did not differ substantially after excluding all AEs without an end date from the analysis. Another limitation of our study is that we did not investigate reasons for study discontinuation and could not assess whether treated subjects discontinued because of inadequate efficacy or AEs in a greater proportion than subjects in the placebo group. We also did not assess whether there were differential losses to follow-up between intervention arms. The presence of differential losses and the cessation of AE monitoring in patients who withdraw from the study may lead to imbalances in the rates of AEs between arms. Last, our study did not account for long-term or tardive AEs that may not have become apparent until after the RCT end date.
In summary, our results point to areas for potential improvement in the analysis and reporting of AE data that could benefit patients, physicians, researchers, and regulators. Enhanced focus by sponsors and clinical investigators is needed to ensure the completeness of safety outcome reporting, including information on prevalence and duration. Moreover, careful consideration of harm outcomes and implementation of appropriate statistical methods when designing clinical trials can help identify safety signals and provide a more accurate evaluation of a drug’s benefits and risks.