Background
Chronic obstructive pulmonary disease (COPD) is a leading cause of death worldwide [
1‐
3] and may be diagnosed in adults reporting a history of childhood asthma and maternal smoke exposure [
4‐
8]. It is a complex disease, influenced by multiple factors including genetic variants, and environmental factors, including exposure to maternal smoking in early fetal life and personal smoking in later life. Maternal smoking during pregnancy may influence the risk for diseases during adulthood, potentially through epigenetic modifications including methylation [
9‐
13]. Primary prevention of adult lung diseases includes identifying predisposing molecular factors [
14,
15].
Recent observations support that genes associated with complex traits have protein products that tend to interact with each other more frequently than expected by chance [
16‐
22]. Therefore, a single gene does not function as a single activator for a disease, but the interplay of multiple genes will eventually lead to a pathogenesis [
22‐
24,
40]. Network-based approaches can be used to identify these groups of genes. Genes associated with an exposure or disease may form connected subnetworks (exposure or disease modules containing usually 10 to 100 genes) within the larger protein–protein interaction network (PPI). Furthermore, genes in close proximity in the PPI annotate to similar functional pathways. Network-based approaches for studying complex diseases have identified COPD disease modules [
25‐
33]. Most approaches use methods which are based on seed genes, sets of 5–30 genes associated with a disease such as COPD that are used as a starting set, with additional genes added to the module iteratively based on the topology of the network [
25,
27,
30,
34]. Other methods use similarity measures between transcriptomic data [
26,
28,
29,
33] and most studies highlight a single module only. However, some identify additional modules associated with respiratory diseases [
25,
27,
29] and analyze the interactions and linking molecular mechanisms between the different modules. Typically, only one omic data type has been used, usually transcriptomic data.
In this current work, to identify network modules related to IUS-exposure and adult lung disease, we compute significantly connected components using DNA methylation and gene expression association information from lung tissue and a functional PPI [
35]. For fetal and adult lung methylation and adult lung expression data, genes were selected based on at least nominal statistical thresholds for association with IUS-exposure and COPD, respectively.
We identified network modules and studied the connectivity between the fetal lung DNA methylation and COPD DNA methylation and expression modules. Leveraging these modules, we highlight biological mechanisms and common pathways, including the AGE-RAGE pathway, which may provide molecular links between lung development and COPD.
Discussion
COPD is a complex multi-factorial disease with no known cure. Understanding early life susceptibility factors, including epigenetic factors, may lead to preventative interventions [
54‐
56]. Many studies of COPD susceptibility have focused on genetic factors, but environmental perturbations starting in utero may contribute to fetal programming and set epigenetic trajectories of lung disease [
57]. In utero exposures such as cigarette smoking and perturbed lung growth and development are associated with COPD, but there are limited insights into the molecular links between early exposures, lung growth and adult disease. It is likely that in utero exposures do not impact single genes but networks of genes. Using protein–protein interaction networks to study links between smoking-related perturbations during lung development and COPD is of clinical significance as identified genes and networks may provide insights into biomarkers and targets for primary prevention of adult lung disease [
58]. Prior observations linking in utero tobacco smoke with COPD support fetal programming, but mechanisms are not fully understood [
59]. Here, we focus on fetal lung methylation marks associated with IUS exposure which may link to molecular signatures to adult COPD.
Simple intersections of DNA methylation associations may not reveal links between early life exposures and lung disease [
36]. Here, we applied a protein–protein interaction network-based approach using published results to generate modules for fetal and adult lung tissue to link IUS-exposure and COPD susceptibility. However, the module characteristics are highly dependent on the completeness of the PPI and the data sets used. We used available PPIs to verify our results, but future work must include functional validation of network findings.
COPD heterogeneity and cellular heterogeneity in lung tissues may impact the modules characterized using bulk genomic results. The COPD lung tissue cohort has limited information regarding COPD subtypes (emphysema vs chronic bronchitis) [
38]. For this manuscript, we leverage published results for COPD based on a spirometric diagnosis. Future work needs to consider subtype specific molecular associations and network models. Longitudinal birth cohorts are limited for addressing links between fetal exposures impacting lung tissue and adult lung disease, as molecular markers are generally studied using cord blood not fetal lung tissue. Leveraging life-course genomic data is also an important direction for future investigation.
There are only two genes which are significantly differentially expressed or methylated in all three data sets:
ODF3L1 (Outer Dense Fiber Of Sperm Tails 3 Like 1) and
DTX1 (Deltex E3 Ubiquitin Ligase 1).
ODF3L1 has not been studied extensively beyond associations with testis but as a class ODF proteins have been implicated in cytoskeleton pathways and cilia.
DTX1 has been implicated in Notch signaling [
60] and is key ubiquitin E3 ligase implicated in multiple pathways including development [
61].
The omnigenic model distinguishes between core and peripheral genes, where core genes can be strongly associated with the studied phenotypes and the peripheral genes have a small effect on disease risk. Therefore, to understand complex diseases, additional information beyond genetic variation needs to be integrated into the model. To account for this, we computed COPD modules using transcriptomics and epigenetic information. Additionally, we identified a module associated to leveraged data from IUS exposure of fetal lung. Using these three modules and their adjacency within the PPI we were able to study more than just the most significant genetic associations to COPD.
In order to identify “
core” genes [
23] we first identified a module [
42] for each data set. Interestingly, the three modules do not have any genes in common, except for
BCL11A. Thus, each module captures the associated phenotype individually [
23]. To evaluate a potential link between IUS perturbed lung development and COPD we analyzed the connection of the fetal lung methylation module to the two COPD disease modules. COPD related genes connecting the modules are potentially functionally related through diverse aspects such as airway remodeling, immune response, and inflammation. The number of interactions between the three modules is higher than expected by chance suggesting that the perturbation of the genes in one module potentially impacts the functionality of the genes within the other modules. Most edges connecting the modules with each other are functional not physical interactions between proteins. Interestingly, 16 of the 23 interactors in the COPD expression module which are connected to the COPD methylation module are down-regulated, suggesting in most cases methylation represses transcription.
Pathophysiological mechanisms that may link fetal smoke exposure and adult COPD may be highlighted by the genes that connect the fetal lung methylation exposure module to the COPD modules. For example,
MAPK8 (a member of the fetal lung module which has connections to both COPD modules) which encodes the Mitogen-Activated protein kinase 8 (MAPK8) can be stimulated by environmental factors. Once MAPK8 is activated, it may target transcription factors that are involved in immediate early response [
62‐
64].
EGFR, found in the COPD methylation module, encodes a transmembrane protein implicated in inflammation and airway remodeling [
65,
66]. When activated, it mediates a signal transduction through the MAPK and JNK pathways.
BCL2, a member of the COPD expression module, localizes to mitochondria [
67] and regulates apoptosis through the release of cytochrome C and reactive oxygen species [
68]. The BCL2 pathway can be regulated through the JNK pathway by phosphorylation and may impact immune responses [
69‐
72]. BCL2 protein is increased in lung lymphocytes from smokers, which may influence chronic inflammation in COPD [
73], and has been identified in COPD GWAS [
74]. The gene
BCL2 has been identified as a key functional interactor with other COPD GWAS genes [
37] through regulation of apoptosis and mitochondrial pathways [
73,
75,
76]. While
MAPK8 and
EGFR are located in the methylation modules,
BCL2 is located in the expression module but these genes are all connected to each other.
Interactor genes reveal the most robust enrichments and pathways between fetal IUS and COPD. Using the whole set of genes of a module (not only the interactors) the same or fewer pathways were enriched with limited statistical significance; thus, the results of the enrichment analysis did not improve. Also, no pathways were significantly enriched for the whole set of genes of the fetal lung methylation module, while three pathways were significantly enriched using only the interactors of this module. Seven pathways were significantly enriched using the whole set of genes of the COPD expression module, while using only the interactors gave rise to 13 significantly enriched pathways, including Focal Adhesion, AGE-RAGE, VEGF signaling pathway, and Pathways in cancer (Figs.
3,
4, Additional file
6: Table S5). Most of the genes in the pathways which were significantly enriched using the whole set of genes from the modules are interactors, further supporting the robust nature of the findings.
The identified pathways may link between perturbed lung development associated with exposure to cigarette smoke and COPD. The pathway which was significantly enriched for most gene sets (four out of seven gene sets) was the AGE-RAGE pathway, followed by the Focal-Adhesion pathway.
The AGE-RAGE pathway may be involved with COPD through inflammation [
77,
78]. From a biomarker points of view, soluble receptor for advances glycosylation end products (RAGE) is the most compelling biomarker of adult COPD [
79]. Given the role of the AGER-RAGE pathway in lung development and rodent models demonstrating links between maternal nicotine exposure and offspring perturbation of lung RAGE signaling [
80,
81], we contend our method has identified biologically plausible pathways linking fetal lung perturbations and COPD. RAGE (encoded by
AGER) has been implicated as a driver of cigarette smoke related emphysema [
82], and circulating sRAGE has been implicated as a biomarker for emphysema [
83].
AGER is not part of any of the three modules but is directly connected to the COPD expression disease module.
The Focal Adhesion pathway members facilitate physical links between the cytoskeleton of the cell to the extracellular matrix playing an important role in tissue organization and airway remodeling [
84]. The AGE-RAGE and Focal Adhesion pathways are connected through
VEGFA. The genes in the fetal lung methylation module are found up-stream in the AGE-RAGE pathway, whereas down-stream genes are from the COPD expression disease module. The up-stream part of Focal Adhesion pathway includes genes from the COPD methylation module and the COPD expression module genes are represented downstream. These pathways regulate closely related processes including airway inflammation and remodeling [
77,
78,
84]. These findings require functional validation; however, we can speculate that this observation may represent a temporally directed relationship between the perturbed genes identified in the fetal lung and the genes related to COPD. Given the growing interest in targeting the AGE-RAGE pathway for lung disease our findings may suggest a future role for targeting the AGE-RAGE pathway for the primordial prevention of obstructive lung diseases.
Different approaches exist to identify network modules [
85] and the focus in this current work is on PPI modules related to diseases. One main difference between the various approaches is that we are able to use published findings integrated in a network framework. Some approaches exploit only the topology of the PPI and employ knowledge from omic data sets afterwards to study the enrichment of the modules [
17,
86‐
90]. Other methods use seed genes (5–30), genes that can be associated to a disease, and add new genes iteratively based on the topology of the network [
34,
41,
91]. Another way to compute modules is to integrate omic data sets by using scores (e.g. p-values, fold change values, etc.) which are assigned to genes indicating their differential status in patients and control groups. Modules identified using omic data sets are called active modules [
92] and there exist a variety of methods for computing these active disease modules, where most of them still rely on a set of seed genes as starting points [
93]. Methods that are not using seed genes as a starting point are rare [
94]; SigMod is most similar to our current method [
95]. SigMod is based on optimization and computation of module scores, using p-values given by GWAS studies. The strategy favors high degree genes which are often genes which can be associated to diseases. However, even though some of the genes in our modules have a high degree in the underlying PPI, we do not explicitly favor these genes when using the ENCORe framework [
42], since it computes modules which consist of genes which have small p-values and are highly connected to each other. Limitations of this approach include that the genes which are potentially crucial may be excluded from the module (like
AGER) due to the p-value cutoff calculated by the method. However, we believe that using ENCORe provides us with a good balance between integrating scores on the genes based on disease affection status and the structure of the chosen PPI (Additional file
1 section “Disease modules integrating omic data sets”) (Additional file
7: Table S6, Additional file
8: Table S7, Additional file
9: Table S8).
Network-based approaches hold potential for studying fetal origins of complex lung diseases such as COPD [
25‐
33]. Similar to the method we present, Halu et al. [
25] computed a COPD disease module using a network-based approach and analyzed its vicinity to a pulmonary fibrosis disease module. Their modules for COPD and IPF are, like ours, significantly close to each other in the PPI and the biological pathways identified by Halu et al. give new potential insights into shared molecular interactions and shed light on biological processes lying at the intersection of these two incurable lung diseases. Maiorino et al. [
27] introduce a method which calculates a ranking of genes linking two disease modules in a given PPI. They study genes linking a COPD disease module to an asthma disease module using the DIAMOnD approach [
41]. They identified the asthma gene
GSDMB and showed that by studying interconnecting genes it is possible to identify potential mediators of the interactions between different phenotypes. Both approaches [
25,
27] use module detection methods based on seed genes and remaining module members are added solely based on the topology of the underlying PPI. Thus their methods differ profoundly from the method used in our work, and consequently the COPD modules have very different structures compared to the modules presented here.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.