Background
Chronic obstructive pulmonary disease (COPD) is an incurable lung disease characterized by progressive airflow obstruction involving emphysematous destruction of lung parenchyma and mucus hypersecretion with chronic bronchitis. Over 12 million Americans are affected by COPD, which is the third leading cause of death in the US, [
1] and projected to become the third leading cause of death worldwide [
2]. Recent data suggest that the prevalence of emphysema, chronic bronchitis, and COPD hospitalizations are increasing among African-Americans (AA), [
3‐
5] and that AA may develop COPD at a younger age than those who racially self-identify as white (WH) [
5]. In addition, AA males have one of the highest prevalence rates of smoking (25.5 %) among racial groups in the United States, [
6] leading to a predictable growing burden of lung disease in this group. AA individuals present with similar severity of airflow obstruction as WH, despite fewer pack-years of smoking [
5]. Once they have developed COPD, AA have lower quality of life scores [
7]. Despite these alarming trends, COPD has been understudied in African-Americans.
Race is an important contributor to genetic [
8] and epigenetic variability, and recent studies have identified epigenetic association signals that differ between racial groups [
9]. Similarly, the results of differential methylation association studies of complex traits in single racial-ancestry cohorts may miss epigenetic risk factors in another racial-ancestry cohort, and may not be generalizable to other racial cohorts at all [
10,
11]. Recent methylation studies have shown a subset of methylation signals particular to AA smokers, [
9] but to our knowledge investigations of epigenetic associations in AA with COPD have not been previously performed. Understanding the epigenetic associations of smoking and COPD in AA current and former smokers may provide insights into features relevant to COPD-related disparities in AA that may inform treatment within these groups as well as point out disease pathways applicable to all people with COPD.
DNA methylation patterns are determined at multiple time points in the life of an individual, [
12] including
in utero during imprinting, tissue-specific methylation during development, and changes in the methylation of genes in response to major environmental exposures. Differential methylation impacts gene regulation, which may lead to clinically relevant changes in disease-related phenotypes. Modules of genes with correlated comethylation profiles may identify groups of genes under similar regulation that are associated with COPD risk. Prior research has identified differential methylation signals related to tobacco smoke exposure that may influence risk for development of COPD [
13‐
18]. The majority of these studies have focused on WH subjects as the largest proportion of their cohorts. Our investigation focused on the identification of differential methylation sites associated with COPD as well as COPD-associated comethylation modules in an AA cohort (the Pennsylvania Study of Chronic Obstructive Pulmonary Exacerbations, PA-SCOPE), with comparison to a separate WH cohort (the International COPD Genetics Network, ICGN). Our hypothesis was that patterns of DNA methylation in AA would identify differentially methylated genes or comethylation networks relevant to COPD in AA that may not be significantly associated in WH cohorts. A better understanding of the epigenetic factors associated with the features of COPD in AA smokers may provide insights into new diagnostic options, drive the discovery and targeting of therapeutics, and improve primary prevention strategies in this susceptible population.
Discussion
Within the PASCOPE AA cohort, we identified 5 differentially methylated CpG sites significantly associated with COPD using an FDR of 5 %, and 7 additional associations that approached significance using an FDR of 10 %. We used WGCNA to identify comethylation modules associated with COPD that were enriched for genes related to lung development and immune response and contained biologically relevant genes associated with COPD and lung function. Differentially methylated CpG sites associated with COPD mapped to genes that were biologically plausible candidates for COPD pathogenesis. Notable functions among these genes included
NOTCH4-dependent lung angiogenesis, alveolar macrophage response pathways, and airway defense mechanisms targeting bacteria, as detailed below. Our results included genes and CpGs implicated in previous studies of obstructive lung disease and exacerbations, including
GRASP and
IFNGR2 (previous genetic associations with asthma) as well as
FYXD1 (differential methylation associated with response to systemic steroids and COPD) [
15,
22]. Only 1 of these 12 differentially methylated CpG sites (cg27461196, mapped to
LGI4/
FXYD1) was statistically significantly associated with COPD at an FDR of 5 % in an independent, larger WH COPD methylation dataset cleaned and processed in a comparable way. In addition, in a qualitative comparison of the difference in differential methylation of CpG sites between AA and WH, many of our results were statistically significant for differential hypomethylation only in AA.
The majority of our significantly associated (FDR less than 10 %) differential methylation CpG sites are located in genes that are biologically plausible genes for lung disease that may affect the pulmonary, immune, and vascular biology of COPD based on previously published data. Additionally, many of these genes are expressed in either lung (
MAML1,
RBFOX2,
GRASP,
FAXDC2,
FXYD1/LGI4,
IFNGR2) or whole blood (
IFNGR2) based on GTEx data showing a median reads per kilobase of transcript per million reads (RPKM) >10 in these tissues, [
23] providing further support for their potential effects on lung pathology and disease.
Two of the disease-associated genes were associated with COPD in prior studies. Folate Receptor Gamma (
FOLR3) was found to be 15- to 20-fold upregulated during stable COPD and acute exacerbations of COPD in previous studies, [
24] although the mechanistic and functional implications of this upregulation are unclear. Differential methylation of Phospholemman (
FXYD1) was shown to be associated with COPD in the ICGN cohort by Qiu et al, [
15] and this gene was also previously found to be differentially methylated in response to systemic steroid use in COPD [
22]. Both of these genes are notably related to acute exacerbations of COPD as well as a preferred treatment modality (systemic steroids) for acute exacerbations. The PASCOPE study recruited subjects who were hospitalized for acute exacerbations of COPD, and blood draws for DNA methylation analysis were performed during the inpatient hospitalization. Because of the timing of our sampling, the methylation pattern of these genes may be related to a confounder such as acute exacerbations of COPD, systemic steroid use, recent smoking, or to a subset of subjects in our dataset with a phenotype of frequent exacerbations, although this could not be directly assessed based on our data.
Five CpG sites were annotated to genes related to pulmonary and airway physiology. Lactoperoxidase (
LPO) is secreted by submucosal glands in human bronchi and plays a role in human airway host defense against bacteria [
25]. Gamma-aminobutyric acid Receptor1 (
GABRR1) has been shown to affect alveolar fluid homeostasis in alveolar epithelial type II cells [
26]. Upregulated gene expression of Very Long Chain Fatty Acid Elongase3 (
ELOVL3) has been proposed to contribute to dysregulated lipid droplet formation in pulmonary surfactant in response to particulate exposure [
27]. Rare missense mutations of GRP1-Associated Scaffold Protein (
GRASP) were previously associated with asthma in a Latino cohort [
28]. The function of SH3 Domain and Tetratricopeptide Repeats 1 (
SH3TC1) has not been adequately described in the lung; however, it is implicated in networks related to bronchial airway epithelial cells and cigarette smoking [
29].
An additional three CpG sites were annotated to genes related to immune response and steroid synthesis. Cluster of Differentiation 72 (
CD72) is a CD5 co-ligand involved in hypersensitivity reactions and sarcoidosis, highly expressed in pulmonary alveolar macrophages [
30]. Fatty Acid Hydroxylase Domain Containing2 (
FAXDC2) is implicated in “steroid biosynthesis” through KEGG pathways [
31]. Interferon Gamma Receptor 2 (
IFNGR2) plays a role in activation of macrophages and regulation of Th1 response to intracellular pathogens, with genetic variants previously associated with atopic asthma [
32] and pulmonary tuberculosis.
The final two CpG sites were related to cardiovascular processes. Mastermind-Like1 protein (
MAML1) effects angiogenesis during organ development through NOTCH-dependent signaling in murine lung [
33]. RNA-binding Protein Fox-1 Homolog 2 (
RBFOX2) is a splicing regulator implicated in differentiation of myofibroblasts to skeletal muscle, and diminished expression previously associated with pressure-overload-mediated progression of dilated cardiomyopathy/heart failure; [
34] potential impact on airway smooth muscle has not been described.
We present data showing that many of our top COPD-associated CpG sites are located in the lower tail of a histogram of the difference in test-statistic between CpG sites in the African-American PA-SCOPE dataset and the white ICGN dataset (see Additional file
1: Figure S1) using similar model parameters and adjustment for covariates. This finding may represent qualitative evidence that these sites are more differentially methylated in African-Americans compared to whites, although this conclusion must be seen as hypothesis-generating only without a separate properly controlled and matched study design that would be free of confounding by technical artifacts related to batch. Boxplots of the unadjusted absolute methylation at these sites in both PA-SCOPE and ICGN (see Additional file
1: Figure S3) reveal that the methylation difference between cases and controls is consistent with hypomethylation in COPD cases among both AA and WH, however only among the AA subjects is the difference statistically significant. The relative differential hypomethylation of these CpGs among AA subjects compared to WH subjects could be explained by several scenarios. The most mechanistically attractive possibility is that these CpG sites represent differential methylation events in response to gene-environment interactions experienced preferentially by African-Americans. The second mechanistic possibility is that these CpG sites represent blood methylation quantitative trait loci (mQTL) that are influenced by the genetic architecture specific to the population substructure [
35] of African-Americans. In both of the preceding scenarios, the differential methylation could in turn impact damage and airflow obstruction through changes in gene expression and protein production, which could present unique targets for intervention. Finally, the differential methylation may simply be a marker of a confounder between methylation state and COPD, tagging a prior or recent exposure (such as smoking) that directly contributed to both disease and CpG methylation through distinct mechanisms. Blood draws in the PA-SCOPE COPD case subjects occurred during inpatient hospitalizations for acute COPD exacerbations, while blood draws for non-COPD control subjects occurred during study-related office visits; this could lead to potential confounders of our COPD associations including COPD exacerbation, inpatient medication use including corticosteroids, exacerbation triggers such as viral or bacterial infections, or other unmeasured variables.
WGCNA identifies modules of comethylated genes starting from the level of thousands of CpG probes and correlates these modules to phenotypic variables. The network creation and module-building processes of WGCNA are informed purely by gene methylation levels, without consideration of case-control status for COPD. Individual genes within the module can then be related to the module eigengenes by measures of module membership and gene significance to the module. This technique identifies driver genes for the module that may help identify biologically meaningful pathways. In our dataset the yellow and blue modules showed significant association with COPD. Yellow module measures of gene significance were predominantly positive (indicating positive correlation of module comethylation in association with COPD) while the blue module contained primarily negative measures of gene significance (indicating negative correlation of module comethylation in association with COPD).
Further investigation of the blue module showed a network with biological significance for obstructive lung disease. The module was statistically enriched for pathways related to lung development, and also contained multiple genes previously associated with COPD and lung function.
SERPINA1 is the gene responsible for alpha-1-antitrypsin deficiency, [
36] a known genetic cause of COPD, and this gene was found to be highly significant in the blue module. The blue module also contained multiple genes previously associated with COPD or lung function measurements through GWAS. WGCNA modules are composed of genes with similar methylation states, which could give insight into processes of coregulation between these genes. While the
SEPRINA1 mutations known to cause alpha-1 antitrypsin deficiency are uncommon in AA, one could hypothesize from this data that coregulation of the
SERPINA1 gene through DNA methylation (and other genes related to lung development in the blue module) could contribute to COPD susceptibility in a disease module framework. However, this hypothesis would require further study with larger datasets including additional modalities such as gene expression. Many of the CpG sites found in the differential methylation analysis were also found in the blue module with high measures of module membership (indicating importance of the gene to the module) and high measures of gene significance to COPD. The recapitulation of these CpG sites in the same module as previously known COPD- and lung-function-related genes adds validation to our differential methylation results.
The yellow module, by comparison, contained genes enriched for immune response pathways. Chronic inflammation in response to airway damage from cigarette smoking as well as external pathogens are recognized as integral parts of the pathogenesis of COPD and exacerbations [
37‐
39]. Enrichment for the chemotaxis of effector cells that are known to play a role in COPD pathogenesis (neutrophils, [
40] eosinophils, [
41,
42] and natural killer cells [
43]) were found using yellow module genes with high module membership values. The PA-SCOPE population was ascertained using subjects with disease exacerbations, so this population may have been enriched for signals associated with acute inflammation and immune response [
44,
45].
The PA-SCOPE dataset was a retrospective case-control study and so no direct causation can be inferred from results, only associations of CpG sites with disease. DNA methylation in response to smoking is a dynamic process, and effects may be time-dependent; longitudinal profiling of methylomes and phenotypes is needed [
16,
46]. Our data did not contain information on the duration of COPD in our subjects, so we could not assess whether this might impact on our results. COPD is often underdiagnosed or diagnosed at more severe stages of disease, [
47,
48] however, so duration of COPD could potentially be unreliable in statistical models comparing COPD cases and controls. While our data did contain information related to spirometric severity of COPD, we were underpowered to detect significant differential DNA methylation site associations with COPD severity due to sample size. The methylation dataset for PA-SCOPE did not contain data related to current smoking or time since quitting smoking, and we could not assess the effects that these important variables might have on our differential methylation results in association with COPD. Multiple studies have shown that smoking history affects DNA methylation, and a recent study showed that a subset of these DNA methylation effects are dynamic in a time-dependent fashion after smoking cessation, [
46] however our data did not allow us to control for smoking cessation or time since quitting. Data on chronic or inpatient medication use was also not available, which limits our ability to control for these potential confounders. Longitudinal data was not available in PA-SCOPE, so further conclusions integrating clinical stability, clinical progression, or other lung function trajectories [
49] associated with CpG sites cannot be made using these data. Without paired gene expression data, it is unclear what effect these differentially methylated sites have on expression of the associated gene products. While both PA-SCOPE and ICGN were studies of COPD subjects and smoking controls, differences in ascertainment of the datasets may influence the conclusions. Notably, the PA-SCOPE dataset compared AA subjects recruited during inpatient COPD exacerbations with controls without known lung disease or recent respiratory illness. The ICGN dataset compared WH subjects with COPD (with no selection criteria related to COPD exacerbations) with control family members. Because of this difference, our differential methylation site associations with COPD could be confounded by potential methylation effects related to COPD exacerbations. Similarly, the comparison of test-statistic differences between ICGN and PA-SCOPE could be influenced by factors other than racial differences in differential methylation related to COPD. Race was determined by self-report and no genetic testing of ancestry or admixture was performed, thus individuals of mixed genetic ancestry who self-identified as African-American may be included in our analyses and these data may be a source of residual confounding. Batch effects between the PA-SCOPE and ICGN assays, differences in ascertainment and study design related to the timing of COPD exacerbations, and baseline differences in the two populations other than racial make-up could also account for the differences in statistical association among these populations, so we present these data points as qualitative and hypothesis-generating for further investigations.
The Illumina Infinium HumanMethylation27 BeadChip Array interrogates only a subset of CpG sites in the human epigenome, and additional unmeasured sites may be differentially methylated in association with COPD. Specifically, the HumanMethylation27 BeadChip’s design focused on CpG sites within transcription start sites of over 14,000 genes and additional coverage of around 200 cancer-related genes [
50]. Additional information on genes not represented on the array as well as additional CpGs in promoter regions, enhancer regions, or the gene body might yield additional associations with COPD and would be an area for further investigations. This study focused on DNA methylation; however other epigenetic changes such as histone acetylation and chromatin modification could impact gene regulation and have relevant associations with COPD; these other modalities were not assayed in our study.
We assayed whole blood for DNA methylation signals associated with COPD, but not lung tissue samples. Prior studies have shown associations between smoking and DNA methylation in whole blood [
9,
17,
46]. DNA methylation of lung tissue could potentially capture information related to additional airborne environmental exposures relevant to COPD, which might not be present in DNA methylation from peripheral blood alone. While lung tissue DNA methylation could provide additional insight into disease mechanisms, the additional risks and costs of obtaining lung tissue are not trivial, and human lung tissue itself is a heterogeneous mixture of cell types [
51]. However, some whole blood CpG sites may recapitulate DNA methylation signals related to lung exposures [
52] or lung disease, [
53] and we examined our data in this context. While total pack-years of smoking was a covariate within our models, additional unmeasured variables related to environmental exposure may impact the findings. One could hypothesize that disease mechanisms related to organ development, systemic inflammation, immune response, and protease activity might be best represented in whole blood compared to lung tissue, and our results may reflect this. Additional studies including contemporaneous collections of whole blood and lung tissue would be needed to gain additional insight into these relationships. We present statistically significant differentially methylated CpG associations with COPD with strict multiple testing corrections, however these results need replication in separate datasets. Future studies including large populations of both AA and WH would be needed to further validate both the differential methylation results as well as the race-specificity of our results. The recapitulation of many of our differentially methylated genes in network modules strongly associated with COPD provides some biological validation of the importance of these sites to COPD using a different analytic approach.
Acknowledgments
Robert Busch would like to acknowledge the mentorship and guidance of Dr. Dawn DeMeo, Dr. Edwin K Silverman, and Dr. Andrea Baccarelli.
The Authors would like to acknowledge the Pennsylvania Department of Health for supporting the Pennsylvania Study of Chronic Obstructive Pulmonary Exacerbations study.
The Authors would like to acknowledge the International COPD Genetics Network Investigators.
International COPD Genetics Network (ICGN) investigators: Edwin K. Silverman, Brigham & Women’s Hospital, Boston, MA, USA; David A. Lomas, University College London, London, UK; Barry J. Make, National Jewish Medical and Research Center, Denver, CO, USA; Alvar Agusti and Jaume Sauleda, Hospital Universitari Son Dureta, Fundación Caubet-Cimera and Ciber Enfermedades Respiratorias, Spain; Peter M.A. Calverley, University of Liverpool, UK; Claudio F. Donner, Mondo Medico di I.F.I.M. srl, Multidisciplinary and Rehabilitation Outpatient Clinic, Borgomanero (NO), Italy; Robert D. Levy, University of British Columbia, Vancouver, Canada; Peter D. Paré, University of British Columbia, Vancouver, Canada; Stephen Rennard, Section of Pulmonary & Critical Care, University of Nebraska Medical Center, Omaha, NE, USA; Jørgen Vestbo, University of Manchester, UK.
Duplicate or redundant publication statements:
This manuscript has not been previously published and is not under consideration in any other peer-reviewed media. A portion of this work was submitted as a scientific research thesis towards the completion of the requirements for a Masters of Medical Science Degree in Biomedical Informatics through Harvard Medical School. This thesis will be released for public viewing through the Harvard Library Office for Scholarly Communication in May 2018. An abstract based on a portion of this work was presented as a poster discussion session at the American Thoracic Society International Conference 2016 in San Francisco, CA. An abstract based on a portion of this work was presented as a poster at the American Society of Human Genetics Annual Meeting 2015 in Baltimore, MD.