Background
Formaldehyde (FA) is an important industrial chemical. Production in the U.S. and the European Union exceeds 10 million tons per year [
1]. Adhesives and binders are produced from resins based on FA (e.g., for the manufacture of particle board, paper, and vitreous synthetic fibers), to make plastics and coatings, and FA is used in textile finishing [
2]. FA is an intermediate in the production of many chemicals, and as formalin it is used as a disinfectant and preservative. In addition, FA is produced in combustion, e.g. in vehicle exhausts and tobacco smoke [
2]. Also, FA is formed endogenously in humans [
3].
In 2004, the International Agency for Research on Cancer (IARC) reclassified FA from a probable (Group 2A) [
4] to a known human carcinogen (Group 1) [
1] citing results for nasopharyngeal cancer (NPC) mortality from the follow-up through 1994 of the National Cancer Institute (NCI) formaldehyde cohort study [
5]. Based on the same NCI findings, the Group 1 classification was upheld by IARC following the working group meeting for IARC Monograph Volume 100F [
2]. Subsequently, the U.S. National Institute of Environmental Health Sciences National Toxicology Program changed the classification of formaldehyde from “anticipated to be carcinogenic in humans” to “known to be a human carcinogen” [
6].
In contrast, in 2012, the Committee for Risk Assessment
1 of the European Chemicals Agency
2 disagreed with the proposal to classify FA as a known human carcinogen (Carc. 1A), proposing a lower but still protective category, namely as a substance which is presumed to have carcinogenic potential for humans (Carc. 1B)
3. Thus, U.S. and European regulatory agencies currently disagree about the potential human carcinogenicity of FA. An overview of open issues and scientific discussions about the health effects of FA exposures is given in Bolt and Morfeld [
7].
The National Cancer Institute formaldehyde cohort study
In June 2013, the NCI published the findings of its update through 2004 of mortality from solid tumors among workers in the US industry-wide FA study [
8]. This study includes 10 plants and represents the largest cohort study of workers with potential exposure to FA [
9]. The purpose of the Beane Freeman et al. update was to extend the mortality follow-up through 2004 and to examine the associations among different exposure characterizations and mortality from several solid tumors. This study also included corrections by Beane Freeman et al. [
10] to the earlier update of mortality through 1994 published in 2004 [
5]. For an evaluation of the errors that lead to these corrections see Issues 1 and 2 in Marsh et al. [
11]. Beane Freeman et al. [
8] claim that a persistent increased risk remains for NPC mortality within the updated cohort associated with peak, average intensity and cumulative FA exposure metrics as reported in Hauptmann et al. [
5], although this NPC risk was not reported by Blair et al. [
9] in the original FA cohort analysis based on follow-up through 1979. The main conclusion from Beane Freeman et al. [
8] is that the update through 2004 suggests a link between FA exposure and NPC mortality that is consistent with some case–control studies [
12‐
17]. Aside from not statistically significantly elevated rate ratios for salivary gland cancer mortality, the authors observed no associations with mortality from other cancer types reported in other studies, including lung, laryngeal, nasal sinus and brain [
2,
4].
In 2013, two of us (GM, PM) published a commentary [
11] describing why we believe NCI’s interpretation regarding the persistent NPC risk is not consistent with available epidemiological evidence including: (1) data from the most recent update of the NCI cohort study [
8]; (2) other large and recently updated cohort studies of FA-exposed workers [
18‐
21]; (3) alternative analyses of the 1994 update of the NCI cohort study [
22‐
24] or (4) the independent study of one of the NCI’s study plants (Plant 1) [
25]. Plant 1, which historically has included the majority of the NPC deaths observed in the NCI cohort [
5,
9], was also the focus of our reanalyses of the 1994 update of the NCI cohort [
22,
23]. Plant 1, a plastics producing plant operating since 1943 in Wallingford, CT, includes 4261 workers or 17 % of the total NCI cohort of 25,619 workers. Regarding potential for FA exposure, Table
1 shows that workers in the Plant 1 cohort had a median average intensity of exposure (AIE) of 1.023 ppm compared to a range of median AIEs of 0.08 to 2.799 ppm for Plants 2–10.
Table 1
Selected characteristics and findings for 10 plants in 2004 update of NCI formaldehyde cohort study
Entry year | 1943 | 1945 | 1949 | 1958 | 1957 | 1951 | 1938 | 1934 | 1956 | 1941 |
No. Subjects | 4261 | 784 | 2375 | 1692 | 744 | 5248 | 4228 | 1679 | 1933 | 2675 |
Formaldehyde exposure | | | | | | | | | | |
% Subjects ever exposed | 87.7 | 99.9 | 92.7 | 93.3 | 64.4 | 91 | 81.6 | 99.3 | 88.2 | 95 |
% Subjects ever in highest peak category | 46.1 | 91.6 | 0 | 72.9 | 20.4 | 2 | .4 | 1.1 | 9.3 | 69.7 |
Median AIE (ppm) a
| 1.023 | 2.799 | .112 | .234 | .196 | .233 | .080 | .382 | .400 | .543 |
(5–95 %-tile) | .310–1.417 | .300–3.927 | .010–.222 | .100–.596 | .029–1.132 | .033–.868 | .020–.250 | .100–2.000 | .100–1.615 | .216–1.124 |
Median Cum (ppm-years) a
| .9 | 19.0 | .1 | 2.2 | 1.9 | .7 | .1 | .6 | .3 | 1.3 |
(5–95 %-tile) | .1–17.2 | .4–86.5 | .01–2.1 | .06–11.9 | .08–27.5 | .01–16.3 | .01–3.5 | .03–12.0 | .03–5.9 | .05–16.4 |
Median Dur (years) a
| 1.0 | 11.3 | 1.1 | 9.7 | 16.7 | 3.6 | 1.0 | 1.0 | .8 | 2.3 |
(5–95 %-tile) | .1–24.4 | .3–30.7 | .1–20.3 | .4–29.5 | 1.0–34.4 | .1–31.3 | .1–28.0 | .1–25.0 | .09–16.5 | .1–29.2 |
Observed and expected deaths and SMRs for NPC | | | | | | | | | | |
Obs | 6 | 1 | 2 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
SMR-US (Exp) | 5.44** (1.1) | 4.32 (.2) | 3.01 (.7) | - (.4) | - (.2) | - (1.1) | .93 (1.1) | - (.4) | - (.3) | 1.21 (.8) |
(95 % CI) | 2–11.85 | .11–24.08 | .36–10.87 | 0–9.03 | 0–18.63 | 0–3.42 | .02–5.18 | 0–9.39 | 0–12.79 | .03–6.72 |
SMR-local (Exp) | 5.57** (1.1) | 4.03 (.2) | 7.60 (.3) | - (.5) | - (.2) | - (1.3) | 1.24 (.8) | - (0) | - (.4) | 1.01 (1.0) |
(95 % CI) | 2.04–12.12 | .10–22.48 | .92–27.46 | 0–7.30 | 0–21.09 | 0–2.82 | .03–6.89 | 0–90.04 | 0–10.14 | .03–5.63 |
Siew et al. [
26] analyzed a study cohort of all 1.2 million economically active Finnish men born between 1906 and 1945 who participated in the national population census on December 31, 1970. The Finnish job-exposure matrix (FINJEM) was used to calculate occupational FA exposure estimates [
27]. The authors analyzed 149 NPC cases and found no association with FA exposure. Although the exposure assessment is limited in this investigation, the large register based study by Sew et al. adds to the cohort studies that showed no elevated NPC risk after FA exposure.
Checkoway et al. [
28] performed a re-analysis of the NCI cohort and evaluated associations between cumulative and peak formaldehyde exposure and lympho-hematopoietic malignancies, in particular myeloid leukemia. The authors did not address NPCs. We note that the US National Institute of Environmental Health Sciences National Toxicology Program judged in their decision on FA that “the evidence for nasopharyngeal cancer is somewhat stronger than that for myeloid leukemia” [
6]. Thus, it is of specific interest to examine whether the “stronger evidence” for NPC is robust and can be confirmed or refuted in a re-analysis of the updated NCI cohort study [
8].
Main methodological issues
Our recent commentary also described several methodological issues in the most recent update of the NCI study that formed the basis for our reanalysis of the updated NCI cohort data on mortality from NPC [
11]. In this paper, we addressed three methodological issues: (Issue 1) inappropriateness of excluding unexposed workers from the evaluation of exposure-response relationships; (Issue 2) the trend tests used in the NCI 2004 updates produce misleading results and may be mis-specified and (Issue 3) failure to recognize the important interaction structure between plant group (i.e., Plant 1 vs. Plants 2–10) and FA exposure reported by Marsh et al. [
23]. We report here our updated reanalysis of the relationship between FA exposure and mortality from NPC using data from the 2004 update of the NCI FA cohort study.
Discussion
In this paper, we challenged NCI’s claim that an increased mortality risk for nasopharyngeal cancer (NPC) in relation to formaldehyde (FA) exposure persisted in their 2004 update of the FA cohort [
8]. As we demonstrated in our re-analyses of the 1994 update of the NCI FA cohort [
22,
23], and again here, NCI’s claim of a persistent NPC risk stemmed from the use of inappropriate and non-robust statistical analysis methods. The foundation of our current reanalyses was three of the six methodological issues presented earlier: inappropriateness of excluding unexposed workers from exposure-response evaluations; improper trend tests and failure to recognize the important interaction structure between Plant 1 and Plants 2–10 [
11].
Our reanalyses included external mortality comparisons via SMRs, in which we compared NPC rates among workers with the corresponding NPC rates of the general populations of both the U.S. and regional CT area. This enabled comparison with NCI’s U.S. rate-based only SMRs, and provided new data that accounted for geographic variability in NPC rates. Our reanalyses also included comparisons of NPC mortality among subgroups of workers defined by FA exposure level. In these exposure-response evaluations, we fit relative risk regression models in which subgroups of workers with higher FA exposure were compared to workers with lower or no FA exposure.
We fit many variations of our models to address the three issues noted above. For example, we used both the lowest FA exposure category (as done by NCI) and the unexposed category (our recommended approach) as the baseline category. We also modeled the continuous forms (i.e., not categorized) of the FA exposure metrics and applied corresponding continuous variable trend tests. This enabled a comparison with NCI, where continuous variable trend tests were inappropriately applied to categorical FA exposure variables. Further, to address the dramatic difference in NPC mortality among workers in Plant 1 vs. Plants 2–10, we fit models that included terms to account for this important interaction structure. To date, NCI has not fit models that account explicitly for this interaction. Finally, because NCI relied on Poisson regression based on asymptotic estimation rather than relative risk regression to evaluate exposure-response relationships for FA and NPC, we fit our models using both asymptotic and exact estimation, the latter being better suited for the small number of observed NPC deaths.
Overall, our reanalyses of the 2004 update of the NCI FA cohort do not support an association between FA exposure and NPC as suggested by Hauptmann et al. [
5] and Beane Freeman et al. [
8]. Our findings and conclusion also corroborate those presented in our earlier reanalysis of the NCI 1994 FA cohort data, and are now even stronger given that the one additional NPC death observed by NCI occurred in Plant 3 among workers in the lowest exposure category of highest peak, average intensity and cumulative FA exposure and in the second exposure category of duration of exposure. This finding led to: (1) reduced SMRs and RRs in the remaining nine study plants in unaffected exposure categories, (2) attenuated exposure-response relations for FA and NPC for all the FA metrics considered and (3) strengthened and expanded evidence that the internal analyses of Hauptmann et al. [
5] and Beane Freeman et al. [
8] were non-robust and mis-specified as they did not account for an statistically significant interaction structure between plant group (Plant 1 vs. Plants 2–10) and FA exposure (see Models 8 in Additional file
3: Table S3 and Additional file
4: Table S4).
A specific focus of the internal mortality comparisons was to address our concern about the inappropriateness of omitting unexposed workers from the baseline category in exposure-response analyses (Issue 1). We found that analyses using the lowest FA exposure category as the baseline (NCI approach) produced evidence of an exposure-response relationship for FA and NPC for highest peak and average intensity of FA exposure (the basis of NCIs conclusion [
8]). In contrast, our corresponding analyses using unexposed workers as the more appropriate baseline category yielded lower RRs for the exposure categories and little or no evidence of an exposure-response association for any of the FA metrics considered. Again, NCI’s finding of only one additional NPC death in the lower FA exposure categories contributed to this null finding.
Our internal analyses also addressed NCI’s practice of mixing results of internal mortality comparisons based on categorical analyses with trend tests based on the continuous form of the FA metric considered. More appropriately, our internal analyses matched the results of the analysis (categorical RRs or slope estimates) with the corresponding trend tests based on categorical or continuous (or pseudo-continuous) scores, respectively. While the p-values associated with these two sets of trend tests differed, in most cases these differences were quantitative and the tests consistently rejected or failed to reject the null hypothesis of no association between FA and NPC.
To address Issue 3, we focused on two aspects of risk analysis to explore a possible mis-specification of the models as presented in Beane Freeman et al. [
8], confounder adjustment and interaction assessment. Confounding is understood as defined by Greenland and Robins [
43] and as explicated graphically in Greenland et al. [
44]. We explored confounding in practice by applying the change-in-estimate criterion [
45,
46]. Models 1 and 2 of Additional file
3: Table S3 a, b, c, d gave results about the possible confounding effect of the plant group indicator. Although not pronounced, some indication of confounding was indicted in peak exposure and average intensity models because the relative risk in the highest exposure category decreased after taking the plant group indicator into account. Using the continuous peak exposure variable, the same tendency can be seen as a somewhat reduced risk estimate after adjustment for plant group in the peak exposure model but not so in the other analyses. Therefore, the statement of Hauptmann et al. [
5] that the risk estimates for FA exposure did not change considerably after adjusting for plants is confirmed again in this re-analysis. We have observed this in our previous analysis too [
23].
Beane Freeman et al. [
8] did not perform a risk analysis adjusted for plant or plant group. They performed what they called an “influence analysis” by “excluding one plant at a time”. Such an analysis cannot contrast the findings of Plant 1 vs Plants 2–10 because it does not cover the important case of studying Plant 1 alone. The authors, however, studied Plants 2–10 as a group: “When Plant 1 was excluded, the number of NPC deaths was two in the highest peak exposure category (RR = 3.36, 95 % CI: 0.3, 37.27), one in the highest average intensity category (RR = 4.09, 95 % CI: 0.25, 66.0), and zero in the highest cumulative exposure category.” This can be compared with our findings in Additional file
1: Table S1a-c. The relative estimate is 2.92 (95 % CI: 0.15, 177.22) for the highest peak exposure category, 4.08 (95 % CI: 0.05, 326.39) for the highest average intensity category and 6.74 (95 % CI: 0.32, 428.37) for the highest cumulative exposure category using the low exposure group as the baseline. However, the corresponding relative estimate decreased to 0.43 (95 % CI: 0.02, 7.92) for the highest peak category, 0.42 (95 % CI: 0.01, 9.65) for the highest average intensity category and 0.44 (95 % CI: 0.04, 16.12) for the highest cumulative exposure category with unexposed group as the baseline.
Beane Freeman et al. [
8] concluded from their “influence analysis” that they found “. . .
no evidence of plant heterogeneity for a broad group of metrics, including peak exposure.” We judge that this statement is wrong. We base our judgement on the findings of our interaction analyses (Issue 3). We begin with stating that the full interaction models (Models 9 in Additional file
3: Table S6 a, b, c, d) showed instabilities: The coefficient for the plant group indicator was always accompanied with a lower 95 % CI limit of –infinity. Accordingly, the likelihood ratio p-values were 100 % for the plant group variable in all analyses with the exception of 42 % when analyzing peak exposures. Thus, it is of interest to reduce the models by dropping the plant variable indicator from the Models 9. This means to force the baseline risk of all plants to be the same and then check for different slopes, although usually recommendations are given not to drop main effects if interactions are explored [
45]. These reduced models without the main effect of plant group are presented as Models 8 in Additional file
3: Table S3 a, b, c, d. Because the reduced model uses all cases simultaneously (more power than the separate models) and avoids the problem of relying on the very imprecise baseline risk in Plant 1 (disadvantage of the full interaction model), the estimates are more stable: no median unbiased estimates were necessary and no confidence interval limit approached infinity. The interaction terms were found to be significant at the 5 %-level in all analyses (exception: average exposure analysis returned a likelihood ratio p-value of 0.063).
It has been argued to use the p-value of the interaction term in the decision process when assessing interactions [
47]. A conservative approach, however, was recommended, i.e., comparing the p-value of the interaction term with a cut point clearly higher than the usual significance level of 5 %: keep the interaction terms within the models if their p-values are not higher than 25 % [
45]. This recommendation is in line with the statement that “in epidemiological settings, the power to detect statistical interactions is typically an order of magnitude less than the power to detect main effects” [
48]. Following this advice, our re-analyses found clear evidence of an interaction effect of all three FA exposure metrics and the plant group indicator which cannot be ignored.
We conclude from these analyses that there is no NPC risk identified in Plants 2–10 and all effects of formaldehyde that were described in Beane Freeman et al. [
8] stem from Plant 1 only. It is curious that Beane Freeman et al. [
8] did not follow the advice given in Marsh et al. [
23] to perform a regular interaction analysis, but conducted an “influential analysis” (see above). This type of analysis never analyzed Plant 1 alone and was, therefore, unable to judge the degree of heterogeneity between Plant 1 and Plant 2–10. Marsh et al. [
11] explained the misinterpretation by Beane Freeman et al. [
8] of the previous interaction analyses performed in Marsh et al. [
23] and showed that the results presented by Beane Freeman et al. [
8] are entirely consistent with the interaction effect observed in Marsh et al. [
23].
We emphasize that the current re-analyses strengthen the argument made in Marsh et al. [
23] and Marsh et al. [
11], that is, we showed a pronounced positive interaction effect (risk modification) by plant group (Plant 1 vs. Plants 2–10), not only for the continuous peak exposure metric but also for average and cumulative exposure and duration of exposure to FA. It follows that the internal modelling approaches presented by Hauptmann et al. [
5] were mis-specified and that Beane Freeman et al. [
8] did not correct this flaw, but repeated the misleading model set-up.
Competing interests
GM’s, SZ’s, YL’s and LB’s work on this commentary was performed under a sponsored research contract between the University of Pittsburgh and the Research Foundation Health and Environmental Effects, which is a not-for-profit affiliate of the American Chemistry Council. PM’s work was performed under a separate sponsored research agreement between the Institute for Occupational Epidemiology and Risk Assessment of Evonik Industries and RFHEE. Evonik Industries does not produce formaldehyde and has no economical link to production or use of formaldehyde. The funding agencies played no role in the design, writing, interpretation and conclusions. The decision to submit this manuscript for publication is that of the authors.
Authors’ contributions
GM and PM were the co-investigators of the reanalyses of the 2004 NCI cohort data and earlier served as co-investigators on reanalyses of the 1994 NCI cohort data. They took lead roles in the drafting of the manuscript. SZ and AL were the primary biostatisticians on the project and contributed to the writing and editing of the manuscript. LB also contributed to the writing and editing of the manuscript. All authors read and approved the final manuscript.
GM is Professor of Biostatistics and Director of the Center for Occupational Biostatistics and Epidemiology at the University of Pittsburgh, Graduate School of Public Health. Since the 1980s, he has been involved epidemiological research on the potential carcinogenicity of formaldehyde, including re- analyses of earlier updates of the NCI formaldehyde cohort and serving as principal investigator of an independent cohort study of workers from one of the NCI study plants.
PM is head of the Institute for Occupational Epidemiology and Risk Assessment of Evonik Industries AG. Evonik Industries and Cologne University have started a public-private partnership to conduct, and participate in investigations, research, and analyses relating to the health, safety, and epidemiological aspects of working conditions. The contract between Evonik Industries and Cologne University guarantees freedom of publication of all research work produced by the Evonik Institute. After his habilitation at Cologne University PM is teaching epidemiology and biostatistics at Cologne University. PM performed re-analyses of NCI’s industrial cohort formaldehyde study in cooperation with GM.
SZ is Research Specialist V and Senior Biostatistician in the Center for Occupational Biostatistics and Epidemiology at the University of Pittsburgh, Graduate School of Public Health. YL and LB are PhD students in the Department of Biostatistics at the University of Pittsburgh, Graduate School of Public Health.