1 Introduction
The response to a drug or vaccine includes both therapeutic effects and potential adverse drug reactions (ADRs); the magnitude of such effects can be highly heterogeneous across patient subgroups [
1]. If responses are significantly associated with known subgroup characteristics, such as age, sex or underlying condition, prescribers can use this information to identify individuals who are more likely to experience ADRs, and thus optimise the benefit–risk ratio for a given patient.
Associations between patient characteristics and ADRs are well understood in some cases and can be used to inform clinical decisions. For instance, the drug rasburicase is used to prevent and treat tumour lysis syndrome, which is an oncological emergency in patients with certain solid tumours or haematological malignancies [
2]. However, rasburicase is contraindicated in patients with glucose-6-phosphate dehydrogenase (G6PD) deficiency because of an increased risk of haemolysis (rupturing of red blood cells) [
2]. Thus, it is recommended that clinicians screen patients at high risk for G6PD deficiency, such as those of African or Mediterranean ancestry [
3]. ADRs can also be associated with the sex of the patient. For example, young adult male patients are at increased risk of myocarditis associated with coronavirus disease 2019 (COVID-19) vaccination [
4‐
6]. Conversely, female sex has been identified as a risk factor for drug-induced QT prolongation and Torsades de pointes [
7‐
9], as well as congenital long-QT syndrome [
10]. Additionally, lower doses of the hypnotic agent zolpidem are recommended for women, who eliminate zolpidem more slowly and are more prone to impairment of daytime activities than men [
11].
Early in the drug development process, preclinical pharmacodynamic and pharmacokinetic data are used to model the risk of ADRs in human subjects [
12‐
14]. The focus shifts to empirical evidence once a drug or vaccine enters human trials [
15,
16]. Sometimes, rare but serious ADRs are recognised only after a drug is approved and marketed and appropriate steps are taken to minimise the risk [
17].
Spontaneous reporting systems for adverse events (AEs) are the mainstay of postmarketing safety surveillance [
18,
19]. Because of the large volume of spontaneous reports, pharmaceutical companies and regulators use quantitative signal detection methodologies, mostly based on disproportionality analysis, to identify potential ADRs, which subsequently undergo a focused clinical review [
18,
20‐
22]. Quantitative signal detection is usually broadly applied to AE reporting datasets and often adjusts for potential confounders based on stratification. However, this one-size-fits-all approach for stratification does not account for all the confounding in spontaneous AE reports and can therefore be misleading. It has been demonstrated that subgroup analyses can perform better than methods adjusted by stratification [
21,
23‐
25] and potentially address modifying effects that might underly the AE data. Nevertheless, there is currently a lack of systematic subgroup analyses for first-pass screening and, when employed, subgroup analyses are often limited to specific demographic characteristics [
26‐
28]. Quantitative screening for a broad range of covariates in this context has recently been proposed [
29]. This approach could be burdened by limitations related to the specificity of spontaneous reporting, such as lack of certain data needed to characterise subgroups, non-random reporting of specific elements on the spontaneous report, as well as a low number of AE reports for recently launched products or products with narrow indications or low exposure. However, subgroup analysis could enable safety reviewers to efficiently screen large amounts of data to identify subgroups that may be at greater risk.
In this study, we aimed to examine the extent to which subgroup analysis can serve as a first-pass quantitative signal detection method in screening spontaneous AE reports. We also aimed to examine the potential limitations of spontaneous AE data sources that might influence the ability to identify subgroup statistical signals, such as missing data elements required for subgroup differentiation and subsequently point to ways to improve high-risk subgroup identification. To this end, we compiled a reference set of AEs, which we defined as any AE discussed within the context of differentiated subgroup risk in European Medicines Agency (EMA) Pharmacovigilance Risk Assessment Committee (PRAC) meeting minutes from 2015 to 2019. We then applied a recently published quantitative approach for subgroup analysis [
29] across a large and diverse dataset of AE reports and examined whether we could detect the reference set of
a priori identified AEs.
2 Methods
2.1 US FDA Adverse Event Reporting System (FAERS) Dataset
US FDA Adverse Event Reporting System (FAERS) data cumulative from 2004 through the second quarter of 2021 were used for the analyses. All records in which the product was reported as suspect or interacting (but not concomitant) were included. The analyses were performed on all events at the Medical Dictionary for Regulatory Activities (MedDRA
®; version 24.0) Preferred Term (PT) level and on all mapped products at their active moiety level. The data were standardised and deduplicated. Active moieties were derived, in alignment with the FDA’s definition [
30], by Commonwealth Informatics in the same manner as the commercially available signal management platform Commonwealth Vigilance Workbench.
Specifically, drug name cleaning was performed by processing their source values through successive mappings, including uppercasing; removing excess whitespace, quotes, parentheses, trailing periods and commas, outer square brackets, braces, etc.; removing certain literals and variants thereof (e.g. ‘tablet’, ‘caplet’, ‘capsule’, ‘unknown’, ‘formulation’, ‘generic’, ‘nos’, etc.); removing units such as ‘mg’ and ‘milligrams’; and changing backslashes to forward slashes. The adjusted verbatim drug names were then mapped to product active ingredients according to known verbatim-active ingredients mappings. Any remaining unmapped verbatim drug names were assigned to the literal ‘UNMAPPED’ and excluded from the analysis.
Duplicate detection was performed after all other data transformation and standardisation was complete. A large number of candidate duplicate pairs were initially generated based on a set of simple heuristic rules. These candidate pairs were then scored by implementing a quantitative method based on the hit-miss algorithm previously described [
31]. Briefly, the method generates a score correlated to the statistical likelihood that two different reports represent two versions of the same underlying case. Pairs with a score above a selected threshold are considered true duplicates. Finally, the individually identified duplicate pairs are ‘coalesced’ into duplicate groups (consisting of two or more case reports) to address multiple duplicates for a given case report.
2.2 Reference Set
2.2.1 Initial Reference Set
The PRAC meeting minutes from 2015 to 2019 were downloaded from the EMA website to extract the reference set of positive controls for this study. PRAC meetings aim to evaluate data from all sources, including spontaneously reported suspected ADRs and results from interventional and observational studies that offer important data for signal detection. The PRAC discusses the prioritisation of emerging safety signals and issues recommendations required for their management, such as further investigation or drug labelling changes [
32]. The minutes were reviewed independently by two healthcare professionals to identify any discussion of an AE associated with the use of a drug and the potential for a differentiated risk in particular subgroups. Neither the context of the discussion (e.g. signal detection, signal validation, signal assessment or hypothesis testing) nor the trigger (e.g. case reports, clinical trials or epidemiological studies) were considered for the purpose of identifying these subgroup examples.
Only subgroups corresponding to those defined by Sandberg et al. [
29] (Table
1) were considered for inclusion in the subgroup analyses. Subgroups mentioning products in development or vaccines were excluded from the reference set as they are rarely listed in FAERS. The included subgroups will be referred to as PRAC subgroup examples.
Table 1
Covariates and corresponding subgroups described by Sandberg et al., and subsets included in our analysis
2.2.2 Mapping of Drugs, Events and Subgroups
The drugs and events discussed in the PRAC subgroup examples did not fully correspond to the drug and medical ontologies used to code the FAERS data. For instance, a group of drugs rather than a specific drug may have been discussed or generic medical nomenclature rather than specific MedDRA® terms may have been used to describe events in PRAC examples. Additionally, the subgroups discussed by PRAC may not be readily identifiable in the AE reporting data set (e.g. pregnancy).
Where needed, events described in PRAC subgroup examples were independently mapped to MedDRA® PTs available in FAERS by two drug safety experts. The two mappings were then jointly reviewed by the experts and consensus was reached.
Where needed, drugs described in PRAC subgroup examples were independently mapped to the active moieties available in FAERS by two drug safety experts. As for events, the two mappings were then jointly reviewed by the experts and consensus was reached.
PRAC subgroups were mapped to subgroups defined by Sandberg et al. [
29]. Because the raw narratives were not available in the FAERS data used for this study, identifying cases for the pregnancy subgroup was challenging. Therefore, a slightly adapted algorithm was needed. Upper case MedDRA
® PTs that included the substrings of ‘PREGN’ or ‘GESTAT’ or ‘GRAVID’ or ‘MATERN’ or ‘LABOUR’ and excluded terms such as ‘Pregnancy test negative’, ‘Pregnancy test false positive’ and ‘Pregnancy test urine negative’ were used to identify potential pregnancy cases. Additionally, the reported case must have concerned a woman between the age of 15 and 44 years. However, if the MedDRA
® PT fell under the MedDRA
® High Level Term ‘Unintended pregnancies’, these were excluded. This algorithm was used only to identify cases for the pregnancy subgroup and not to identify pregnancy-related AEs.
2.3 Subgroup Analysis
Subgroup disproportionality scores were computed on the overall FAERS data using the method described by Sandberg et al. [
29]. This method is based on the Information Component, which is the binary logarithm of a shrunk disproportionality data mining algorithm comparing the observed (O) number of reports for a given drug–event combination (DEC) with an expected (E) number of reports estimated from the overall database, and is fully described elsewhere [
33]. Briefly, subgroup disproportionality scores were obtained by restricting the O/E ratio computation to the subgroups of interest. No combinations of the subgroup covariates were considered and no further adjustment was performed within the subgroups. Bayesian credibility intervals were computed and the lower limit of 95% credibility intervals was used to set the threshold for signal detection. For subgroup analyses, broader credibility intervals were used [
29] compared with the intervals reported by Norén et al. [
33] to control for the rate of spurious associations due to multiple comparisons [
34]. The requirements reported by Sandberg et al. were used to identify subgroup signals, followed by sensitivity analyses using algorithm adaptations (Table
2). The analyses were run in Azure Databricks using PySpark.
Table 2
Requirements to identify disproportionately reported subgroup DECs
1. Subgroup IC0005 > 0 and unadjusted entire database IC025 ≤ 0 or for covariates with > 2 subgroups: IC005 > 0 simultaneously in ≥ 2 subgroups and unadjusted entire database IC025 ≤ 0a | 1. Unadjusted entire database IC025 ≤ 0 ignored 2. Subgroup IC0005 > 0 changed to subgroup IC005 > 0 |
2. IC value computed by dividing the subgroup O/E by the adjusted O/E for the remainder of the databaseb with log adjustment and shrinkage (ICΔ) > 1 | 3. ICΔ > 1 changed to ICΔ > 0 or ICΔ not calculablec |
2.4 Assessment of Concordance
Concordance was determined at two different levels: at the subgroup example level, requiring just one of the PRAC subgroup DECs to be detected in FAERS to consider the subgroup example detected; and at the subgroup DEC level, assessing for each PRAC subgroup DEC whether it was detected in FAERS or not. Because the PRAC examples included combinations of covariates (i.e. age and underlying condition, sex and underlying condition, sex and age) but no combinations were considered for the subgroup analysis in FAERS, these examples were considered independently for each covariate. For instance, an example representing age and underlying condition was tested once for age and once for underlying condition.
4 Discussion
Subgroup analyses can be of vital importance in postmarketing safety surveillance to identify subgroups at higher risk of developing specific ADRs. Currently, both a widely accepted gold standard to assess quantitative signal detection methods [
35] and systematic assessment of the extent to which quantitative data mining on spontaneous reports correlates with subgroup safety risk differences are lacking. In this study, we applied a recently published method [
29] that describes first-pass screening subgroup analysis for a variety of risk factors, to a large AE dataset. To test this methodology, FAERS data were selected because they include more than 13 × 10
6 reports, are public domain, are widely used for method testing and contain a diverse set of medications, albeit not vaccines. In the absence of any gold-standard reference set for the subgroup analyses, the PRAC subgroup examples were selected as a reference set. They were chosen because they are externally recognised, are in the public domain and are not reliant on spontaneous reporting. They constitute a valuable independent reference set of safety concerns that warrant discussion by a regulatory body, regardless of future labelling status. To our knowledge, this is the first study to evaluate the Sandberg subgroup method [
29] and report on its ability to detect subgroups of potential increased risk across a large, diverse dataset. Our analysis demonstrated that the subgroup methodology detected PRAC subgroup examples in FAERS with a low sensitivity (7% at subgroup example level and 0.1% at subgroup DEC level).
Removing the requirement of the Sandberg methodology for signals to not be disproportionately reported overall, not only improved the sensitivity (from 7 to 52% at the subgroup example level and from 0.1 to 10% at the subgroup DEC level) but also generated more subgroup signals from FAERS data. It resulted in improved sensitivity for age and sex (detection of 45% and 83% at the subgroup example level and 7% and 25% at the subgroup DEC level, respectively). However, it should be noted that those signals would have been identified as DECs by routine disproportionality analysis and subsequently used by safety reviewers to identify subgroups disproportionately reported and potentially responsible for the overall disproportionality. Eighty-one percent of the DECs detected by this adapted subgroup methodology would have been detected by routine overall disproportionality analysis. Conversely, 57% of DECs detected by routine disproportionality analysis would also be detected by the adapted subgroup methodology. We also assessed the sensitivity after excluding PRAC examples with combinations of covariates or where only one subgroup had the ability to be exposed to the drug or to experience the event in a post hoc analysis. PRAC examples for age were mainly included and the sensitivity at the subgroup example level was reduced to 20% for age and 18% overall. After reviewing the outputs of the post hoc analysis, the low sensitivity observed was mostly attributed to the small sample size of observed cases in FAERS and the resulting broad credibility intervals.
Candore et al. [
20] assessed several overall disproportionality methods using various spontaneous reporting systems and showed a sensitivity ranging from 19 to 46% and a positive predictive value from 10 to 21%. In this study, the sensitivity ranged from 0.1 to 52%, therefore sensitivities were similar to overall disproportionality analyses. Positive predictive value could not be calculated because our reference set did not include the exhaustive list of positive controls but is likely very low. It should be noted that the reference set of positive controls used by Candore et al. [
20] and the one used in this study are very different.
The decision of how to group or split covariates into subgroups may affect the analysis. For example, age subgroups defined by Sandberg et al. [
29] did not always match with the age subgroups mentioned in PRAC examples, potentially diluting the disproportionality. In addition, not combining covariates, when combinations were present in 59% of PRAC examples, ignores the fact that the modifying effect of one covariate may differ by subgroups of the other covariate. A scan test (or more advanced machine learning techniques) could be used to handle these limitations by assessing all meaningful combinations while controlling for multiple testing and not having to define the subgroups a priori.
There are several limitations to our study that should be considered when interpreting concordance. First, the PRAC minutes may not use product active moieties and event MedDRA® PTs to represent drug exposure and adverse events. Translating these to standardised dictionaries with a different granularity, such as MedDRA®, may allow for variability in results. After the mapping, the majority of PRAC subgroup examples concerned a range of active moieties, subgroups and MedDRA® PTs. These multiple entries for drug, event and subgroup may reduce the detection power by resulting in more subgroup DECs with fewer data, or diluting the effect by mixing subgroups with high exposure effect with subgroups with low exposure effect. This might explain the observed discrepancy between sensitivity at the subgroup example level versus the subgroup DEC level. Furthermore, the PRAC subgroup examples included in this study represent a sample of only 5 years, therefore the reference set used is not comprehensive and specificity could not be assessed. Moreover, any mention of subgroups that might be at greater risk of developing a particular AE after exposure to a given drug was included in our reference set regardless of whether it was validated. Sensitivity to validated subgroup signals may differ from sensitivity to the reference set we used in our experiment. To some extent, the overall low concordance observed might also be explained by the fact that subgroups discussed by PRAC are based on various data and methods, whereas the method used in this analysis is purely quantitative and does not account for qualitative aspects that are not readily available in structured databases. Additional work would be needed to understand whether traditional methods as used by PRAC could be complemented by quantitative subgroup disproportionality analyses. Additionally, the number of our PRAC examples was small and weighted towards specific subgroups tested (age and underlying condition). On the other hand, the use of FAERS data, which mainly cover the US population (whereas PRAC is a European committee that likely uses European data), may have impacted concordance. However, none of the products from the PRAC subgroup examples are exclusively marketed in the European Union, and although healthcare provision and usage might differ, this is unlikely to result in highly different subgroup categorisation in the two geographical regions.
Another limitation resides in the use of spontaneous data. Non-random reporting patterns at the case level, and also at the case attribute level, impact what data are listed or missing on a case report and the way they are recorded. This non-random recording of data in spontaneous AE reports may make it particularly challenging to conduct quantitative analysis of spontaneous data. Subgroups such as underlying condition and pregnancy are captured sporadically and unsystematically in spontaneous AE data, therefore imposing limitations on subgroup analyses. In our study, when the data were considered regardless of disproportionately reported in the entire database, the sensitivity for underlying condition was low (10% at subgroup example level and 6% at the subgroup DEC level). The sensitivity for pregnancy was 100% but accounted for only one subgroup example/DEC, for an event that only pregnant women can experience (gestational diabetes). Consequently, this example was excluded from the post hoc analysis. Although it was attempted to minimise missing data (e.g. by using indications of drugs to determine underlying condition), the alternative information required was also frequently missing or could introduce bias into the analysis (e.g. due to certain indications of concomitant drugs being more frequently reported than others). In addition, identification of the subgroup of pregnant cases relied on an algorithm based on structured fields and coded events because we could not access free-text fields in FAERS. This limited our capacity to identify pregnant cases and resulted in a low number of such cases. Sandberg et al. [
29] did not consider concomitant medication or exposure in pregnancy with the risk in offspring, therefore we excluded 4 and 12 such examples, respectively. Nevertheless, these data might also pertain to this category of covariates, i.e. sporadically and unsystematically reported, and are therefore difficult to assess in spontaneous data because of reporting biases and missing mother–child linkage. Electronic health care records data in these situations could provide additional insights. A logical next step could be to assess whether performance is improved for these covariates when enriching spontaneous data with these relevant observational data [
36].
In light of these limitations, we would recommend to not consider subgroups meeting one of the following criteria for subgroup analyses in spontaneous reports data.
-
Timebound subgroups (e.g. definitive overlap of exposures for a specific period of time, such as with drug–drug interactions, exposure to a concomitant drug within 60 days of occurrence of an event following another drug exposure, exposure to a drug for at least 1 year). The dates and times are not reliable and are often missing in spontaneous reports data, rendering temporal relationships between subgroup elements difficult to establish.
-
Conditional subgroups (e.g. patients with a history of a particular event or patients who take a particular concomitant drug). The rationale for exclusion is that reporting of medical or medication history and concomitant drugs in spontaneous reports data is very sporadic and heavily biased.
-
Combination subgroups (e.g. a female under the age of 20 years). Although certain characteristics (e.g. age and sex) are more commonly reported in spontaneous reports data, combining these would potentiate the variability of the results based on the sporadic and non-random reporting of these data elements.
-
Subgroups normally missing from spontaneous reports data (e.g. genetic risk factors). The rationale for exclusion is that if the data element is not populated on the database, then it cannot be used to determine allocation to subgroups.
-
Subgroups requiring linkage to other records (e.g. in utero exposure and fetal adverse events). There are very few reports of linkage of mother–child records with robust data in spontaneous safety databases.
Alternative approaches have been proposed in the literature. Giangreco and Tatonetti proposed a subgroup method within the paediatric population [
37] using a generalised additive model (GAM) approach, more technical than simple proportional reporting ratios. Nonetheless, their results do not convincingly suggest that the GAM approach performed significantly better than proportional reporting ratios. In another study, Chandak and Tatonetti created matched cohorts for sex, which could be used to identify differential effects in sex subgroups [
38]. They generated propensity scores (PSs) for women then used them to create PS-matched cohorts of men and women, and subsequently evaluated all drug AEs in both cohorts. While an independent PS model could be created for sex regardless of the drug/AE investigated, it may be driven by factors that are good predictors of sex but have no effect on the risk of the ADR or on the probability of being exposed to the drug, preventing a good adjustment for confounding factors.
5 Conclusions
Overall, we noted apparent low concordance between the Sandberg method applied in a large ADR database and a reference set of PRAC meeting subgroup examples, especially when used as first-pass screening. The performance was improved for variables that are better captured in spontaneous report data, namely age and sex, but covariates such as underlying condition and pregnancy likely require enrichment with alternative data sources. While we have offered some suggestions for future approaches to improve subgroup analyses, further research is needed to assess the optimal combination of data sources, individual characteristics, reference set and statistical methods and thresholds needed to screen subgroups that might be at high risk of ADRs. Ultimately, the nature of spontaneous reports and the application of quantitative approaches, rather than the specific use of subgroup analyses, seemed to limit the ability to identify issues discussed in a regulatory context. Thus, progress to an increasingly personalised view of predictive safety will require a multimodal data approach.