Background
Although early detection and assessment of drug safety signals are important [
1‐
3], post-approval drug safety studies often face challenges such as small size, rare incidence of adverse outcomes, and low exposure prevalence after the launch of a new drug. In addition, nonrandomized studies of treatment effects in healthcare data are vulnerable to confounding bias. Propensity Score (PS) methods are increasingly used to control for measured potential confounders, especially in pharmacoepidemiologic studies of rare outcomes in the presence of many covariates from different data dimensions of administrative healthcare databases [
4‐
7]. Methods of selecting variables for PS models based on substantive knowledge have been proposed [
8‐
12], but substantive knowledge may often be lacking, and the meaning of various medical codes may often be unclear [
13]: Seeger et al. proposed that health care claims may serve as proxies in hard-to-predict ways for important unmeasured covariates [
14]; Stürmer et al. used PS models with over 70 variables representing medical codes present during a baseline period [
5]; Johannes et al. created a PS model that considered as candidate variables the 100 most frequently occurring diagnoses, procedures, and outpatient medications in healthcare claims [
15]. A recently-developed strategy for selecting variables from a large pool of baseline covariates for PS analyses is the use of computer-applied algorithms [
16,
17], such as the High-Dimensional Propensity Score (hd-PS) algorithm. The hd-PS automatically defines and selects variables for inclusion in the PS estimating model to adjust treatment effect estimates in studies using automated healthcare data [
16,
18].
The hd-PS algorithm prioritizes variables within each data dimension (e.g., inpatient diagnoses, inpatient procedures, outpatient diagnoses, outpatient procedures, dispensed prescription drugs) by their potential for confounding control based on their prevalence and on bivariate associations with the treatment and with the study outcome [
16,
19]. Version 1 of the hd-PS algorithm excludes variables found in fewer than 100 patients (exposed and unexposed combined) and variables with zero/undefined covariate-exposure association or zero/undefined covariate-outcome association. Once variables have been prioritized, a predefined number of variables with the highest potential for confounding per dimension is chosen to be included in the PS.
Combining medications or medical diagnoses into higher-level groupings increases the prevalence of the aggregated covariate which may increase the chances of a variable being selected by the algorithm. However, aggregation may also weaken covariate-exposure and/or covariate-outcome relations and reduce variable prioritization in the Bross formula [
19]. In addition to the selection issue, control for a selected aggregated variable may lead to residual confounding in the adjusted risk ratios if not all of its components have the same confounding effect. No study to date has assessed how hd-PS performance is affected by aggregating medications and/or medical diagnoses, especially in cohorts with relatively few patients, rare outcome incidence, or low exposure prevalence. To investigate the impact of aggregation on hd-PS performance in cohorts with low outcome incidence or exposure prevalence, we created an empirical example based on prior research [
16,
20] with an observed elevated crude risk ratio, likely due to confounding by indication in studies of upper gastrointestinal (UGI) complications in rheumatoid arthritis (RA) or osteoarthritis (OA) patients initiating celecoxib compared to traditional non-steroidal anti-inflammatory agents (tNSAIDs). Celecoxib has been shown to decrease the risk of UGI complications in several randomized controlled trials (RCT) by approximately 50% [
21‐
26]. We therefore assume that a treatment effect estimate closer to 0.50 is less biased by confounding.
Results
In the full cohort, there were 7,197 (38%) celecoxib and 11,632 (62%) ibuprofen or diclofenac initiators with 46 and 71 UGI events, respectively. Celecoxib users were older and had more risk factors for UGI complications than did the tNSAIDs users (Table
1). The RR for UGI complication associated with celecoxib versus tNSAIDs was 1.05 (95% CI: 0.72-1.52) in the crude model, compared to 0.92 (95% CI: 0.62-1.37) in the model that used hd-PS automated variable selection in addition to the basic covariates (Table
2). Consistent with the sampling procedures described above, the median numbers of patients in cohorts in conditions 1 and 2 were about 3,594 and 1,441, respectively; the median outcome incidence proportions in conditions 3 and 4 were about 0.3% and 0.1%, respectively, and the median exposure prevalence in conditions 5 and 6 were about 19% and 8%, respectively.
Table 1
Characteristics of initiators of celecoxib or NSAIDs (ibuprofen or diclofenac) in a cohort 18–65 years old between 1 July 2003 and 30 September 2004 in the MarketScan database: age at the date of the first medication use and co-morbidities/use of medications as defined during six months prior to the first medication use
Age (years) | | | | |
Median | 56.0 | | 52.0 | |
Mean | 54.1 | | 50.4 | |
Standard deviation | 8.2 | | 9.7 | |
18-35 | 235 | 3.3 | 996 | 8.6 |
36-45 | 854 | 11.9 | 2,164 | 18.6 |
46-55 | 2,373 | 33.0 | 4,339 | 37.3 |
56-65 | 3,735 | 51.9 | 4,133 | 35.5 |
Female | 4,387 | 61.0 | 6,869 | 59.1 |
Hypertension | 1,748 | 24.3 | 2,191 | 18.8 |
Congestive heart failure | 36 | 0.5 | 56 | 0.5 |
Coronary artery disease | 270 | 3.8 | 297 | 2.6 |
Chronic renal disease | 44 | 0.6 | 59 | 0.5 |
Inflammatory bowel disease | 26 | 0.4 | 30 | 0.3 |
Use of gastroprotective drugs | 1,567 | 21.8 | 2,111 | 18.1 |
Use of warfarin | 220 | 3.1 | 128 | 1.1 |
Use of antiplatelet | 143 | 2.0 | 108 | 0.9 |
Use of oral steroids | 963 | 13.4 | 1,356 | 11.7 |
Table 2
Geometric mean of risk ratios and a summary analysis for different cohort size, outcome incidence and exposure prevalence of initiators of celecoxib or NSAIDs (ibuprofen or diclofenac) in a cohort 18–65 years old between 1 July 2003 and 30 September 2004 in the MarketScan database
Full cohort
§
| 7197 (38) | 46 (0.64) | 11632 (62) | 71 (0.61) | | | |
Unadjusted | | | | | 1.05 | | |
Basic covariates | | | | | 0.98 | | |
Basic and extended covariates | | | | | 0.95 | | |
Basic and hd-PS covariates | | | | | 0.92 | | 100 |
Basic, extended and hd-PS covariates | | | | | 0.94 | | 100 |
Condition 1: 50% size sample
| 3594 (38) | 23 (0.64) | 5821 (62) | 36 (0.62) | | | |
Unadjusted | | | | | 1.02 | (0.89-1.20) | |
Basic covariates | | | | | 0.96 | (0.84-1.11) | |
Basic and extended covariates | | | | | 0.92 | (0.80-1.09) | |
Basic and hd-PS covariates | | | | | 0.88 | (0.74-1.07) | 65 |
Basic, extended and hd-PS covariates | | | | | 0.89 | (0.74-1.11) | 65 |
Condition 2: 20% size sample
| 1441 (38) | 0 (0.66) | 2325 (62) | 14 (0.60) | | | |
Unadjusted | | | | | 1.10 | (0.89-1.37) | |
Basic covariates | | | | | 1.03 | (0.82-1.29) | |
Basic and extended covariates | | | | | 0.99 | (0.79-1.24) | |
Basic and hd-PS covariates | | | | | 0.94 | (0.71-1.21) | 41 |
Basic, extended and hd-PS covariates | | | | | 0.95 | (0.70-1.25) | 41 |
Condition 3: 50% outcome incidence sample
| 7220 (38) | 23 (0.32) | 11667 (62) | 36 (0.31) | | | |
Unadjusted | | | | | 1.02 | (0.89-1.19) | |
Basic covariates | | | | | 0.96 | (0.84-1.13) | |
Basic and extended covariates | | | | | 0.93 | (0.81-1.09) | |
Basic and hd-PS covariates | | | | | 0.90 | (0.78-1.08) | 65 |
Basic, extended and hd-PS covariates | | | | | 0.91 | (0.78-1.08) | 65 |
Condition 4: 20% outcome incidence sample
| 7233 (38) | 10 (0.14) | 11689 (62) | 14 (0.12) | | | |
Unadjusted | | | | | 1.00 | (0.81-1.37) | |
Basic covariates | | | | | 0.94 | (0.73-1.25) | |
Basic and extended covariates | | | | | 0.91 | (0.69-1.19) | |
Basic and hd-PS covariates | | | | | 0.85 | (0.69-1.17) | 42 |
Basic, extended and hd-PS covariates | | | | | 0.86 | (0.70-1.14) | 42 |
Condition 5: 50% exposure prevalence sample
| 3599 (19) | 22 (0.61) | 15230 (81) | 95 (0.62) | | | |
Unadjusted | | | | | 1.02 | (0.93-1.13) | |
Basic covariates | | | | | 0.94 | (0.86-1.05) | |
Basic and extended covariates | | | | | 0.91 | (0.83-1.02) | |
Basic and hd-PS covariates | | | | | 0.88 | (0.79-0.98) | 81 |
Basic, extended and hd-PS covariates | | | | | 0.88 | (0.79-1.00) | 81 |
Condition 6: 20% exposure prevalence sample
| 1440 (8) | 9 (0.63) | 17389 (96) | 108 (0.62) | | | |
Unadjusted | | | | | 0.97 | (0.77-1.24) | |
Basic covariates | | | | | 0.89 | (0.72-1.15) | |
Basic and extended covariates | | | | | 0.86 | (0.70-1.08) | |
Basic and hd-PS covariates | | | | | 0.89 | (0.73-1.13) | 73 |
Basic, extended and hd-PS covariates | | | | | 0.89 | (0.72-1.14) | 73 |
In all cohort conditions except condition 2, where the total study size was only about 3,790, the geometric means of the hd-PS adjusted risk ratios were similar to the full cohort risk ratios. This similarity held even in cohort conditions 4 and 6, where the number of exposed patients with an outcome event was approximately 10. In all conditions except condition 6, where the exposure prevalence was only 8%, the geometric means of the hd-PS adjusted risk ratios were at least slightly closer to the RCT finding than the geometric means of the risk ratios adjusted for only the basic and extended covariates. A majority of the covariates that hd-PS identified in the full cohort were also selected by hd-PS in the samples in conditions 1, 3, and 5, where the number of exposed outcomes was at least 20, but also in condition 6, where there were only 10 exposed outcomes but a large total number of outcomes.
A scenario with combined aggregations of medications into ATC level 4 and of diagnoses into CCS level 1 consistently performed best, reducing residual confounding from 8.9% to 19.3% compared to the base scenario (Tables
3 and
4). Aggregating medications into chemical, pharmacological or therapeutic subgroups of ATC level 4, slightly improved adjusted estimates in all cohort conditions except condition 4, the 20% outcome incidence samples (data not shown). In contrast, aggregations of medications into groupings of the other ATC levels produced nearly the same or even worse adjusted risk ratios in all cohort conditions.
Table 3
Risk ratios for different cohort size, outcome incidence and exposure prevalence of initiators of celecoxib or NSAIDs (ibuprofen or diclofenac) in a cohort 18–65 years old between 1 July 2003 and 30 September 2004 in the MarketScan database by using the High-Dimensional Propensity Score (hd-PS) adjustment with different aggregation methods
Unadjusted | | 1.05 | | | | | | | | | | | | | | | |
Basic covariates | | 0.98 | | | | | | | | | | | | | | | |
Basic and extended covariates | | 0.95 | | | | | | | | | | | | | | | |
Basic and hd-PS covariates | | 0.92 | 0.94 | 0.93 | 0.92 | 0.92 | 0.90 | 0.91 | 0.88 | 0.90 | 0.89 | 0.92 | 0.92 | 0.94 | 0.95 | 0.94 | 0.85 |
| %§
| | 3.9 | 2.6 | 0.0 | 0.8 | −2.9 | −1.4 | −7.0 | −3.7 | −4.4 | 0.10 | 1.0 | 3.6 | 5.1 | 4.1 | −12.1 |
Basic, extended and hd PS covariates | | 0.94 | 0.91 | 0.96 | 0.94 | 0.94 | 0.90 | 0.93 | 0.91 | 0.91 | 0.92 | 0.95 | 0.94 | 0.96 | 0.96 | 0.95 | 0.88 |
| %§
| | −5.0 | 3.7 | −0.5 | −0.7 | −6.0 | −1.3 | −5.0 | −4.4 | −2.5 | 1.0 | 0.6 | 3.6 | 4.0 | 2.1 | −10.9 |
hd-PS covariates (k = 500)║
| | | | | | | | | | | | | | | | | |
Outpatient diagnoses (n) | | 136 | 224 | 198 | 177 | 154 | 144 | 133 | 0 | 32 | 90 | 97 | 54 | 123 | 133 | 139 | 34 |
Inpatient diagnoses (n) | | 9 | 12 | 11 | 11 | 9 | 9 | 7 | 0 | 22 | 18 | 19 | 5 | 16 | 14 | 11 | 23 |
Medication (n) | | 167 | 0 | 36 | 76 | 122 | 148 | 177 | 247 | 216 | 186 | 181 | 213 | 171 | 166 | 163 | 194 |
Outpatient procedures (n) | | 152 | 220 | 211 | 194 | 174 | 161 | 148 | 210 | 188 | 166 | 163 | 187 | 153 | 151 | 151 | 206 |
Inpatient procedures (n) | | 36 | 44 | 44 | 42 | 41 | 38 | 35 | 43 | 42 | 40 | 40 | 41 | 37 | 36 | 36 | 43 |
Table 4
Geometric mean of risk ratios for different cohort size, outcome incidence and exposure prevalence of initiators of celecoxib or NSAIDs (ibuprofen or diclofenac) in a cohort 18–65 years old between 1 July 2003 and 30 September 2004 in the MarketScan database by using the High-Dimensional Propensity Score (hd-PS) adjustment with different aggregation scenarios
Condition 1: 50% size sample
|
Unadjusted | 1.02 | | |
Basic and hd-PS covariates | 0.88 | 0.83 | −9.9% |
Basic, extended and hd-PS covariates | 0.89 | 0.84 | −8.9% |
Condition 2: 20% size sample
|
Unadjusted | 1.10 | | |
Basic and hd-PS covariates | 0.94 | 0.87 | −12.0% |
Basic, extended and hd-PS covariates | 0.95 | 0.88 | −11.9% |
Condition 3: 50% outcome incidence sample
|
Unadjusted | 1.02 | | |
Basic and hd-PS covariates | 0.90 | 0.84 | −11.9% |
Basic, extended and hd-PS covariates | 0.91 | 0.85 | −11.3% |
Condition 4: 20% outcome incidence sample
|
Unadjusted | 1.00 | | |
Basic and hd-PS covariates | 0.85 | 0.81 | −10.4% |
Basic, extended and hd-PS covariates | 0.86 | 0.82 | −9.8% |
Condition 5: 50% exposure prevalence sample
|
Unadjusted | 1.02 | | |
Basic and hd-PS covariates | 0.88 | 0.81 | −14.4% |
Basic, extended and hd-PS covariates | 0.88 | 0.82 | −12.7% |
Condition 6: 20% exposure prevalence sample
|
Unadjusted | 0.97 | | |
Basic and hd-PS covariates | 0.89 | 0.79 | −19.3% |
Basic, extended and hd-PS covariates | 0.89 | 0.81 | −16.3% |
When we experimented with different aggregations for diagnoses, without any aggregation for medications, aggregating ICD-9 diagnosis codes into different CCS levels inconsistently changed the adjusted risk ratios. Note that in our empirical setting, not controlling for any measure of diagnoses resulted in the estimate closest to the RCT finding (RRs in column “No Dx” of Table
3). When we aggregated ICD-9 diagnosis codes into CCS levels 1 or 2, the adjusted risk ratios in the samples were generally closer to the RCT finding. In contrast, aggregations of ICD-9 codes into CCS universal, CCS level 3, CCS level 4, or 3- or 4-digit ICD-9 groupings did not improve the adjusted point estimates (data not shown).
Discussion
We hypothesized that aggregations of medications and medical diagnoses into certain levels of ATC or CCS would help the performance of the hd-PS, especially with smaller cohort size, rarer outcome incidence or lower exposure prevalence. To explore these hypotheses, we selected a retrospective cohort where, as has been previously observed, the hd-PS adjustment for confounding yielded an adjusted RR slightly closer to the RCT findings [
21‐
26] than did PS adjustment using a limited number of investigator predefined covariates [
16,
18].
Of the 500 covariates identified by hd-PS in the full cohort, most were also identified by hd-PS in the random samples with fewer observations, rarer outcomes, or lower prevalence of treatments. Aggregations of medications into ATC level 4 alone or in combination with aggregation of diagnoses into CCS level 1 improved the hd-PS adjustment for confounding in the full cohort and most of the samples. The strength of our results on the effect of aggregating diagnoses is limited, however, by the fact that the overall confounding by co-morbidity was attenuated in the presence of 500 hd-PS covariates from medications, outpatient procedures and inpatient procedures in our empirical setting.
In general, aggregation of potential covariates into higher-level groupings increases the number of covariates that are present in at least 100 observations (the default requirement of the hd-PS version 1) and increases the prevalence of the covariate in exposed and unexposed groups which increases the covariate’s prioritization from the Bross formula if it is associated with treatment [
19]. But aggregation may simultaneously weaken covariate-exposure and/or covariate-outcome relations, reducing prioritization in the Bross formula [
19]. The latter also has the potential to change the impact of control for the aggregated covariate on the adjusted risk ratios. The hd-PS algorithm theoretically may not favor the aggregation of confounder information. However, in particular cases (e.g., small samples, rare outcome incidence and low exposure prevalence), aggregations potentially help the hd-PS to reduce residual bias, for example, in this study. Version 2 of the hd-PS algorithm, which removed the restriction of a minimum 100 occurrences per potential confounder, allows important confounders to have a higher chance for the variable selection process and may improve bias reduction for treatment effect in small sample sizes and low exposure prevalence.
Grouping medications into ATC level 4 instead of the original generic drugs helped the hd-PS to robustly function in the samples, except for the 20% outcome incidence (condition 4). The use of other ATC levels for aggregating medications did not provide benefit and even resulted in some harm. For example, ATC level 4 code B01AC (platelet aggregation inhibitors excluding heparin) includes the following level 5 codes: B01AC04 (clopidrogel), B01AC05 (ticlopidine), B01AC07 (dipyridamole), B01AC23 (cilostazol), and B01AC30 (combined drugs). The latter four codes each occurred in fewer than the 100 observation minimum that hd-PS requires by default and so would not be eligible for inclusion in the hd-PS adjustment. With ATC level 5 for medications, the hd-PS algorithm selected code B01AC04 (frequency 218, covariate-exposure RR = 1.5, covariate-outcome RR = 3.8 – Table
5). Using ATC level 4 for medications, the hd-PS selected ATC level 4 code B01AC which had a slightly higher frequency (253), the same covariate-exposure (RR = 1.5) but slightly weaker covariate-outcome (RR = 3.3) associations. Situations like this may account for the observed improvement in confounding control in the ATC level 4 aggregation (e.g., RR of 0.83 in 20% exposure prevalence scenario) compared with scenarios that used ATC level 5 (e.g., RR of 0.88). Additional examples to illustrate the changes in prevalence, covariate-exposure and covariate-outcome relations from aggregation of clopidrogel and warfarin from level 5 to ATC levels 4, 3, 2 and 1 are in Table
5. The ATC level 4 with pharmacological subgroups seems the most appropriate level for aggregation of medications in this study.
Table 5
Changes of prevalence, covariate-exposure and covariate-outcome relations when we aggregated potential confounders, clopidrogel and warfarin from level 5 to levels 4, 3, 2 and 1 of the Anatomical Therapeutic Chemical (ATC) classification
Generic drug | | Clopidrogrel | 218 | once | 1.5 | 3.8 | 0.012 | |
Generic drug | | Clopidrogrel | 218 | sporadic | 1.4 | 2.9 | 0.012 | |
Generic drug | | Warfarin | 319 | once | 1.6 | 2.0 | 0.017 | |
Generic drug | | Warfarin | 319 | sporadic | 1.7 | 1.3 | 0.017 | |
ATC level 5 | B01AC04 | Clopidrogrel | 218 | once | 1.5 | 3.8 | 0.012 | |
ATC level 5 | B01AC04 | Clopidrogrel | 218 | sporadic | 1.4 | 2.9 | 0.012 | |
ATC level 5 | B01AA03 | Warfarin | 319 | once | 1.6 | 2.0 | 0.017 | |
ATC level 5 | B01AA03 | Warfarin | 319 | sporadic | 1.7 | 1.3 | 0.017 | |
ATC level 4 | B01AC | Platelet aggregation inhibitors excluding heparin | 253 | once | 1.5 | 3.3 | 0.013 | |
ATC level 4 | B01AC | Platelet aggregation inhibitors excluding heparin | 253 | sporadic | 1.5 | 2.5 | 0.013 | |
| B01AC04 | Clopidrogrel | 218 | | | | | Yes |
B01AC05 | Ticlopidine | 1 | | | | 0.000 | No |
B01AC07 | Dipyridamole | 6 | | | | 0.000 | No |
B01AC23 | Cilostazol | 25 | | | | 0.000 | No |
B01AC30 | Combinations | 11 | | | | 0.000 | No |
ATC level 4 | B01AA | Vitamin K antagonists | 319 | once | 1.6 | 2.0 | 0.017 | |
ATC level 4 | B01AA | Vitamin K antagonists | 319 | sporadic | 1.7 | 1.3 | 0.017 | |
ATC level 3 | B01AA03 | Warfarin | 319 | | | | | Yes |
ATC level 3 | B01A | Antithrombotic agents | 637 | once | 1.5 | 1.5 | 0.034 | |
ATC level 2 | B01A | Antithrombotic agents | 637 | sporadic | 1.6 | 2.0 | 0.034 | |
ATC level 2 | B01 | Antithrombotic agents | 637 | once | 1.5 | 1.5 | 0.034 | |
ATC level 2 | B01 | Antithrombotic agents | 637 | sporadic | 1.6 | 2.0 | 0.034 | |
ATC level 1 | B | Blood and blood forming organs | 1049 | once | 1.4 | 1.4 | 0.056 | |
ATC level 1 | B | Blood and blood forming organs | 1049 | sporadic | 1.4 | 1.9 | 0.056 | |
ATC level 1 | B | Blood and blood forming organs | 1049 | frequent | 1.5 | 2.0 | 0.025 | |
As for diagnostic codes, ICD-9 code 530.1 includes 530.11 (reflux esophagitis) and the additional codes 530.10 (esophagitis unspecified), 530.12 (acute esophagitis) and 530.19 (other esophagitis). In our study, the latter three codes each occurred in fewer than the 100 observation minimum that hd-PS requires by default and so would not be eligible for inclusion in the PS adjustment. With 5-digit granularity for diagnoses, the hd-PS selected ICD-9 code 530.11 (frequency 165, covariate-exposure RR = 1.3, covariate-outcome RR = 5.0 – see, Additional file
1: Table S6). Using 4-digit granularity for diagnoses, the hd-PS selected ICD-9 code 530.1 (esophagitis) which had a higher frequency (217) but slightly weaker covariate-exposure (RR = 1.2) and covariate-outcome (RR = 4.6) associations. Situations like this could account for the slight worsening of confounding control in the 4-digit ICD-9 aggregation compared with the base case (up to 5-digit ICD-9). Additional examples to illustrate the changes in prevalence, covariate-exposure and covariate-outcome relations when we aggregated potential confounders, ICD-9 codes 530.11 (reflux esophagitis) and 530.81 (esophageal reflux) from 5-digit ICD-9 into 4-, 3-digit ICD-9, and CCS levels 4, 3, 2 and 1 are in Additional file
1: Table S6. It is worth noting that not all ICD-9 diagnosis codes have their equivalent CCS codes in all 4 levels [
32]. This issue was more pronounced in CCS levels 3 and 4. Using the most granular CCS code available for each ICD-9 code in the universal CCS did not improve results in most samples and the full cohort. We also did not observe any benefit while aggregating ICD-9 codes into first 3- or 4-digit groupings [
16,
31]. Since CCS has only 18 groupings for level 1 and 134 groupings for level 2, it could be argued that the benefit from aggregation comes about by enabling more variables from the other data dimensions (medications, inpatient and outpatient procedures) to fit within the 500 variable maximum in the hd-PS default. To address this concern, we also experimented with a maximum of k = 3,000 variables and consistently observed the benefit of aggregation of ICD-9 into CCS levels 1 or 2. Similarly, ATC level 1 has 14 groups, whereas level 4 has over 800 groupings, but aggregation of medications into ATC level 4 outperformed aggregation into level 1.
Our study has several limitations. This study empirically compared estimates from different aggregations and assumed any treatment effect estimates closer to the RCT findings to be less biased by confounding. There was relatively little confounding present in the data, and the effect estimates did not change much after adjustment for the baseline covariates. The magnitude of the percentage reductions in confounding depends on the value selected as the unconfounded value; however, the precise value selected from the published RCT [
21‐
26] does not affect the ranking of performance across scenarios. Our comparison relies on the assumption that the codes in the original database are accurate. Also, our study is based in a single cohort in which hd-PS performed reasonably well. Fully specified simulations with true risk ratios in diversified scenarios could be used to prove the advantage of aggregation under certain conditions but would be unable to answer the important question of magnitude in real world settings. It is nevertheless unclear whether our findings regarding the effects of aggregation of medications and diagnostic codes on the performance of the hd‒PS algorithm apply to other treatment‒outcome pairs that may be subject to confounding by different factors. Studies with few events or small size may suffer from small sample bias or overfit PS models and outcome models using PS deciles to estimate adjusted risk ratios [
36,
37]. The small number of UGI complication cases produced imprecise estimates and insufficient power to confirm differences between the different methods. The computer time requirements of the hd-PS algorithm constrained our ability to increase the size of our samples beyond 100 for each cohort condition. We thus should compare results with and without aggregations within each condition, but not across conditions. However, we are interested in bias which pertains to expected values of point estimates and statistical significance plays no defensibly useful role in the assessment or measurement of bias. Moreover, each aggregation scenario had six cohort conditions (600 samples). Thus, consistent patterns (the combined ATC level 4 plus CCS level 1) are supported by a large number of samples. Users of the hd-PS methodology should screen and remove instrumental variables and collider bias candidates [
10‐
12]. This topic is out of the scope of this study.
Further studies may explore examples of no drug effect on the outcome, increased drug-outcome risk, more common outcomes, and compare the aggregation approaches with the zero-cell correction or exposure-based association selection for the hd-PS [
38], develop appropriate methods to replace missing codes in CCS levels, appropriate aggregations for procedures, simultaneous aggregation of diagnoses, medications and procedures, evaluation of the hd-PS functions in cohorts with different cohort size, outcome incidence and exposure prevalence.
Competing interests
All authors declare that they have no competing interests.
Authors’ contributions
HVL, under the supervision of his advisor TS, conceived research questions, developed study design and methods, carried out statistical analysis, interpreted results and drafted the manuscript. CP, MAB, VJS and KJB, who were dissertation committee members, advised on the study design, methods, statistical analyses and manuscript. JBL was responsible for interpretation of results and manuscript preparation. All authors commented on successive drafts, and read and approved the final manuscript.