Identification of important sources of systematic error
Table two of the publication [
4] showed the rate ratios comparing cancer incidence rates for ever-users of glyphosate with never-users of glyphosate. Thirty-two cases of multiple myeloma occurred. The rate ratio comparing ever-use with never-use adjusted for only age equaled 1.1 (95% CI 0.5–2.4), whereas the rate ratio adjusted for age, demographic and lifestyle factors, and other pesticide use equaled 2.6 (95% CI 0.7–9.4). This second adjusted estimate of effect inspired a substantial amount of subsequent analysis and inference, so merits careful inspection. As with the alachlor paper, without access to primary data, only a limited bias analysis could be implemented.
The change in the rate ratio from 1.1 when adjusted only for age to 2.6 when adjusted for the additional factors suggests a substantial amount of confounding controlled by the adjustment or a substantial amount of bias introduced by the adjustment (or some combination of the two). The relative risk due to confounding is a measure of the direction and magnitude of confounding controlled by adjustment variables. In this case, the relative risk due to confounding by the additional adjustment variables would have to equal 2.36 to explain the change in the rate ratio from the age-adjusted estimate of 1.1 to the fully adjusted estimate of 2.6. The relative risk due to confounding is bounded by the following limits: (a) the odds ratio associating the exposure with the confounder [OR
EC], (b) the odds ratio associating the disease with the confounders [OR
DC], (c) the inverse of the prevalence of the confounder in the unexposed group [1/P
C + E-], (d) OR
DC/{(1-P
C + E-) + OR
DC*P
C + E-}, and (e)OREC/{(1-P
C + E-) + OR
EC* P
C + E-} [
18].
I have assessed the potential for confounding by education to illustrate the value of bounding the relative risk due to confounding. The odds ratio associating glyphosate use with level of education (item a) equaled 1.94, the inverse of the prevalence of the confounder in the unexposed group equaled 1.46 (item c), and item (e) equaled 1.18. While I cannot estimate items b or d from above, I do know that the relative risk due to confounding cannot exceed the minimum of the three bounds that I can estimate (1.18). The observed relative risk due to confounding (2.36) is far in excess of that bound. None of the variables controlled after age, individually or jointly, would be expected to yield a relative risk due to confounding of the size observed (2.36), particularly conditional on the initial adjustment for age.
Indeed, the authors acknowledged that the change in estimate from 1.1 to 2.6 derived more from restricting the sample size to those without missing data on the adjustment variables than from the adjustment itself. They wrote in the discussion section:
Table 1 shows that the 54,315 subjects were included in the age-adjusted models, whereas because of missing data for covariates, only 40,719 subjects were included in fully adjusted analyses. The association of glyphosate with myeloma differed between the two groups, even without adjustment for any covariates, with no association among the full group and a positive association among the more restricted group.
Table 1
Summary of sources of error, bias parameters, and assigned distributions used in the two bias analyses
Alachlor Bias Analysis
|
number of incident cancer cases | positive predictive value negative predictive value | trapezoidal distribution with minimum = 0.95, maximum = 1.0, and modes of 0.98 and 0.99 fixed at 0.99 |
mismeasurement of cumulative exposure | cumulative exposure value assigned to each category | triangular distributions with minimum and maximum equal to reported bounds of exposure category, and mode equal to its reported midpoint |
allocation of cases to cumulative exposure categories | number of cases in each exposure category | multinomial centered on the number of cases observed in each category |
rate ratios assigned to each exposure category | log normal distribution | mean equal to the log of the reported hazard ratio for each category and variance imputed from its 95% confidence interval |
Glyphosate Bias Analysis
|
confounding of the association by all adjustment variables | relative risk due to confounding | trapezoidal distribution with minimum of 1 (no confounding after adjustment for age), lower mode of 1.18 (bound due to confounding by education), upper mode of 1.39 (all variables but state), and maximum 1.56 (all variables, including state) |
exposure misclassification | sensitivity and specificity of exposure classification | triangular distribution with minimum 0.79, maximum 0.87, mode 0.82 based on agreement proportions |
Unfortunately, the authors did not provide the age-adjusted rate ratio comparing ever-users with never-users restricted to subjects who had complete data, although they apparently calculated that rate ratio. Had they provided it, the reader would be able to judge the extent to which the fully adjusted rate ratio derived from restricting the sample to those without missing data. The authors went on to say in the discussion section:
The increased risk associated with glyphosate in adjusted analyses may be due to selection bias or could be due to a confounder or effect modifer that is more prevalent among this restricted subgroup and is unaccounted for in our analyses.
The prevalence of ever-use of glyphosate was equivalent in the full and restricted samples, so the strength of association between the confounder and the exposure would have to be larger in the restricted data set than in the full data set. That circumstance would describe selection bias, not confounding. The effect modification explanation is also implausible. If all applicators had the effect modifier in the restricted sample, then the rate ratio among the full sample (1.1) would have to equal an inverse variance weighted average of the 2.6 rate ratio in the 75% of applicators who had the effect modifier and were included in the restricted analysis and another rate ratio in the 25% of applicators who did not have the effect modifier and were excluded from the restricted analysis. That other rate ratio would have had to equal an extremely unlikely protective association between glyphosate use and rate of multiple myeloma among those with this hypothetical effect modifier for the weighted average effect to be null. Other scenarios would be even less plausible. The best explanation is that restricting the data set to those with complete data introduced an important selection bias.
Misclassification of exposure to glyphosate was a second major source of systematic error, just as it was for the alachlor publication. As for the alachlor analysis, cohort members were originally split into three groups: (a) those with missing data on application of glyphosate, (b) those who reported any application of glyphosate, and (c) those who reported no application of glyphosate. The same concerns about the quality of classification of exposure apply to the glyphosate analysis as they did to the alachlor analysis.
There were 32 cases of multiple myeloma, 75% of whom reported ever using glyphosate. Among all subjects, 79% reported ever using glyphosate. To calculate the expected number of exposed cases and persons, I used reliability data from the Agricultural Health Study reported by Blair and colleagues [
14]. Although reliability data are not as informative a measure of validity as true sensitivity or specificity, it can be used as a reasonable approximation, as illustrated with an example in the Blair paper and in our earlier work [
15].
Modeling the relative risk due to confounding
Since the relative risk due to confounding exceeded its plausible maximum, I adjusted for the potentially confounding variables in the bias analysis by drawing the relative risk due to confounding from a trapezoidal distribution. I parameterized the trapezoidal distribution using the data in the publication to calculate the bounds on the relative risk due to confounding for each individual confounder. For each potential confounder, the bound on the relative risk due to confounding was ultimately derived from item e. The greatest of these is item e from the education variable, so if adjustment for all variables were attained by adjusting for just this variable, then the relative risk due to confounding would equal its item e (OREC/{(1-PC + E-) + OREC*PC + E- = 1.18). If the confounding impacts of all of the variables are completely independent, which would require them to be unassociated, then the relative risk due to confounding would equal the product of their individual bounds on the relative risks due to confounding, which equaled 1.56. State is unlikely to be related to risk of multiple myeloma, but disease risk is not available within levels of the confounders, so the association cannot be examined. Recalculating the independent relative risk due to confounding without including the contribution from enrollment state yields an expected relative risk due to confounding of 1.39. I parameterized the trapezoidal distribution of the relative risk due to confounding with a minimum of 1 (no confounding after adjustment for age), a lower mode of 1.18 (the bound on relative risk due to confounding by education), an upper mode of 1.39 (the independent relative risk due to confounding by all variables but state), and a maximum of 1.56 (the independent relative risk due to confounding by all variablen, including state).
Modeling exposure misclassification
Blair
et al. reported agreement for report of exposure to eleven pesticides ranging from 0.79 to 0.87 [
14], with a value of 0.82 for glyphosate. I used these agreement proportions as the minimum, maximum, and mode of a triangular distribution for both sensitivity and specificity of glyphosate exposure classification. For each iteration, one value was drawn from the distribution to represent the sensitivity of exposure classification and a second value was drawn independently from the distribution to represent the specificity of exposure classification.
The bias analysis required that I calculate the probability that a person classified as exposed was truly exposed and the probability that a person classified as unexposed was truly unexposed. These are predictive values (positive and negative, respectively), not sensitivity and specificity. To calculate the predictive values, I used the following equations relating predictive values to sensitivity and specificity:
Where PPV and NPV are the positive and negative predictive values, s is the sensitivity of exposure classification, t is the specificity of exposure classification, p is the prevalence of glyphosate exposure, and i is an index for the four combinations of exposure status (glyphosate exposed and unexposed) with disease status (cases and persons at risk).
Quantitative Analysis
To conduct an iteration of the bias analysis, I drew a value of the sensitivity and a value of the specificity from the trapezoidal distributions, then calculated the positive and negative predictive values for cases and persons at risk, conditional on the observed prevalence of glyphosate exposure. I used a random binomial distribution to model the number of expected exposed cases and persons and the number of expected unexposed cases and persons. For example, I drew the number of expected exposed cases using the random binomial distribution with 24 trials (the number of observed exposed cases) and the positive predictive value for cases as the probability of a success. This procedure would yield between 0 and 24 exposed cases, and the difference between the number of modeled exposed cases and 24 would be reclassified as unexposed cases. Similarly, I drew the number of expected unexposed cases using the random binomial distribution with 8 trials (the number of observed unexposed cases) and the negative predictive value for cases as the probability of a success. This procedure would yield between 0 and 8 unexposed cases, and the difference between the number of modeled unexposed cases and 8 would be reclassified as exposed cases. The total number of exposed cases would equal the number modeled as correctly classified as exposed (out of 24) and the number modeled as incorrectly classified as unexposed (out of 8). I used a similar procedure to model the total number of persons at risk, but using that population size and the predictive values calculated using the prevalence of glyphosate exposure in the total population. I then calculated the modeled risk ratio and divided it by the crude risk ratio (24/40,376/8/13,280 = 0.99) to obtain an estimate of the bias due to misclassification.
To combine the bias analysis applied to account for confounding and misclassification, I multiplied the conventional age-adjusted hazard ratio reported in the paper (1.1), by each iteration's relative risk due to confounding and its misclassification bias to obtain the hazard ratio (HRs) simultaneously accounting for both threats to validity. To account for random error, I calculated the standard error of the hazard ratio (SE) using the exposed and unexposed case counts generated by the misclassification procedure, drew a random standard normal deviate (z), and then calculatedln(HR
R
) = ln(HR
s
) - z·SE
to yield the natural logarithm of the hazard ratio incorporating random error as well as systematic error (HR
R
).