Background
Chronic obstructive pulmonary disease (COPD) is characterized by progressive airflow obstruction accompanied by chronic inflammation. It is a major and growing cause of morbidity and mortality worldwide [
1]. Although environmental exposures such as cigarette smoking are risk factors, a genetic component to susceptibility has been observed [
2‐
5]. Genomic regions influencing COPD susceptibility have been identified at multiple loci through genome-wide association studies [
6‐
12]. Airway inflammation and remodeling and emphysematous destruction in the lung contribute to disease severity and progression [
13,
14], with macrophage activity having an important role [
15,
16]. The recapitulation of these gene expression signals in peripheral blood remains elusive. However, gene expression in blood has been used as proxy in identification of COPD subtypes [
17], and peripheral blood gene expression underlines the systemic effects of COPD inflammation [
18‐
20].
Several published COPD studies have performed microarray gene expression profiling [
21]. Specifically, studies in the airway epithelium have focused on expression changes related to smoking [
22,
23] and COPD status [
24,
25], including targeted RNA-seq profiling [
26]. Studies of gene expression in peripheral blood have also focused on COPD [
19,
27,
28] and smoking [
29,
30], including RNA-seq profiling [
31]. Given the putative role macrophages have in inflammatory lung disease [
32], gene expression profiling of these cells has been performed in the context of COPD [
33] and smoking [
34]. In addition to the airway studies, there have also been several COPD and emphysema gene expression studies involving resected lung tissue [
35‐
39], including RNA-seq profiling in a cohort of males [
40] and RNA-seq profiling of early COPD and emphysema in males [
41].
Despite the volume of this previous work, the expression signatures for alveolar macrophages, bronchial epithelium, and peripheral blood have not previously been studied within the same population at the same time. However, gene expression in nasal and bronchial brushing samples from the same subjects has been compared [
42]. Another study of nasal and bronchial gene expression was performed in independent cohorts [
43]. A study of lung tissue, small airway, and peripheral blood gene expression, with tissue samples obtained from separate cohorts, involved both emphysema and lung function phenotypes [
44]. Overlapping gene expression signatures have been studied in alveolar macrophages and peripheral monocytes isolated from separate cohorts [
45]. Gene expression signatures have been explored across many tissues in the Genotype-Tissue Expression (GTEx) project [
46], leveraging network methods to identify tissue-specific gene and transcription factor regulation [
47‐
49] and examining the overall blood-lung gene expression overlap [
50].
The foundation of this study is the integration of RNA-seq profiling across three COPD-relevant tissues from the same COPDGene (Genetic Epidemiology of COPD) study subjects, mitigating variation typically seen when studying tissue samples from different subjects. Gene expression in the airway epithelium, alveolar macrophages and whole blood samples were tested for association with measures of lung function, airway disease, emphysema severity and cigarette smoke exposure. Given data across three tissues and 11 phenotype variables, we believed a comprehensive hypothesis could not be the goal. Instead, highlighting private and overlapping gene signatures when present was the more effective approach. Using statistical methods and a gene set enrichment framework, we sought to detect expression signatures across the tissues, highlighting systemic and tissue-specific signatures of lung disease and damage. By integrating these findings with previous COPD lung tissue studies and a recent COPD Genome-wide Association Study (GWAS), we sought to place our results in the context of lung disease biology and shed light on the functional role of genes previously identified at genome-wide significant COPD GWAS loci. Similar integration approaches have been previously applied in COPD [
43,
44,
51]. Systems biology has the potential to reveal the molecular architecture of complex traits and disease [
52] in part by examining broad biological information rather than individual genomic determinants. We hypothesized that this systems biology study would inform blood biomarker identification, motivate hypotheses regarding the systemic functions of lung disease, and potentially identify novel genes and pathways for COPD and emphysema, as targets for functional, translational and diagnostic studies.
Discussion
We integrated RNA-sequencing across three matched COPD-relevant tissues using tests of association with lung function, airway disease, emphysema severity and cigarette smoke exposure, and a gene set enrichment framework. This has revealed expression signatures across the tissues in the context of each phenotype, highlighting systemic and tissue-specific signatures and pathways of lung disease and damage. We did not observe any genes differentially expressed across all three tissues. However, we did find pathways overlapping the three tissues in emphysema and smoking. Disease relevance and biology were elucidated through integration with previous COPD lung tissue studies and a recent COPD GWAS.
Replication of airway differential gene expression
Our top two results from the differential expression analysis of smoking status in the bronchial epithelium were
CYP1A1 (cytochrome P450 family 1 subfamily A member 1) and
CYP1B1 (cytochrome P450 family 1 subfamily A member 1). These replicate previous findings in studies of smoking in the airway [
23,
58] and oral mucosa [
59], with
CYP1B1 also identified in the lung [
60]. Significant in our analysis of smoking was
AHRR (aryl-hydrocarbon receptor repressor), previously found to be differentially expressed by smoking status in lung tissue [
60] and in the oral mucosa [
59]. Both
CYP1B1 and
AHRR were also significant in our analysis of smoking status in macrophages, and Poliska et al. also found
CYP1B1 correlated with COPD status in alveolar macrophages [
45].
In our bronchial epithelium analysis of airway disease, the genes
CLDN10 (claudin 10),
TMEM2 (
CEMIP2 - cell migration inducing hyaluronidase 2) and
ALDH1A3 (aldehyde dehydrogenase 1 family member A3) were significant across the three airway-disease variables.
CLDN10 is believed to have a role in idiopathic pulmonary fibrosis (IPF) progression [
61]. A gene-by-environmental tobacco smoke interaction study on the level of FEV1 identified a locus intronic to the gene
TMEM2 [
62] and
TMEM2 was previously associated with lung function in the small airway [
44]. Last, the gene
ALDH1A3 was found to be differentially expressed by smoking status in both the bronchial and nasal epithelium [
42].
The top gene in our bronchial epithelium analysis of percent emphysema was
APOD (apolipoprotein D), a gene found differentially expressed in a study of emphysema severity and bronchiolitis [
37]. The second gene in this emphysema analysis was
CYP2A6 (cytochrome P450 family 2 subfamily A member 6) from a locus previously identified in GWAS of smoking behavior [
63] and COPD [
7]. These replications suggest a link to smoking-related lung disease and progression, with relevance throughout the respiratory tract.
Pathways overlap across tissues
We observed a mixed and complex overlap pattern of significant genes across all differential expression results. To better glean information from the overlaps, we focused on private and cross-tissue signatures. We combined the differential expression and pathway results across phenotype variables, based on our observations of clustering by phenotype categories in the correlation heatmap. In this context, we observed statistically significant enrichment primarily across the bronchial epithelium and alveolar macrophages. We did not observe genes differentially expressed in all three tissues for any of the four phenotype categories. However, for emphysema and smoking we did observe pathway overlaps across all three tissues. We also observed statistically significant pathway overlaps across pairs of tissues in each of the four phenotype categories. In emphysema, the pathways at the three-tissue intersection were related to hemostasis and immune signaling, both markers of systemic inflammation. The three-tissue overlap for smoking included amyloid and telomere related pathways. This is concordant with observations of amyloids as putative biomarkers of systemic inflammation and COPD [
64] and the association between lung disease, lung aging, and telomere length [
65].
In addition to the three-tissue intersections, the robust two-tissue pathway overlap in airway disease for the bronchial epithelium and macrophages appears to be localized with signatures of oxidative stress, highlighted by enrichment of nonsense mediated decay and metabolic pathways. The cell-cycle pathways also present in this overlap are suggestive of cellular senescence mechanisms [
66,
67], particularly given the findings in emphysema for these cells [
68]. A differentially expressed gene observed at this intersection and the bronchial epithelium and macrophage intersection for smoking was
SCGB1A1 (secretoglobin family 1A member 1). This gene is expressed at high levels in the lung and encodes for CC16 (Club Cell Secretory Protein) a blood biomarker of COPD [
69,
70].
Another significant pathway overlap was observed between blood and bronchial epithelium in emphysema, characterized by clusters of metabolic, cancer, and immune signaling pathways, with adherens junction and focal adhesion pathways also present. These pathways highlight signals of structural damage and systemic immune response in airway disease and emphysema [
14,
71]. The significantly differentially expressed gene at this intersection was
FCN1 (ficolin 1), a gene found to be differentially expressed in peripheral blood in mild IPF [
72]. In addition, functional polymorphic sites in the promoter region of
FCN1 regulate ficolin-1 expression and influence outcomes during systemic inflammation [
73].
Airway signatures overlap in blood and recapitulate in lung
We observed clustering of the smoking signature in the bronchial epithelium with the smoking signature in blood in our differential expression correlation heatmap, brought about by the relationship between the emphysema signature in blood and the smoking signature in the bronchial epithelium. We also observed clustering of the smoking signature in blood with the airway-disease signature in blood, owing to the correlation between the smoking and emphysema signatures in blood. Taken together, these suggest a common and systemic marker of emphysema with a gene expression signature of smoke-induced damage [
18,
44,
74].
We integrated our differential expression results with findings from previous COPD studies, along with the significant bronchial epithelium results in airway disease. We observed enrichment of the bronchial epithelium airway-disease genes in macrophage results across all four phenotype categories. This was in line with the findings when we intersected the significant genes and pathways for these tissues. Lung tissue COPD and emphysema genes were enriched in our bronchial epithelium results for both the lung function and emphysema phenotype categories, demonstrating disease relevance in lung tissue. We found that the down-regulated lung tissue genes were found enriched in the genes up-regulated in the bronchial epithelium by disease status. Although not an equivalent comparison, this finding is similar in nature to that of Obeidat et al., [
44], where the lung tissue and blood gene expression directionality was opposite across their two tissue cohorts for a majority of the genes of interest.
Within the emphysema results for peripheral blood, we also observed enrichment of the COPD-relevant lung tissue B cell expression module [
38] and DNA methylation [
57] gene sets. The direction of effect for these enrichments were concordant with respect to disease status. The methylation directionality relationship is more difficult to resolve given the various gene regulation mechanisms [
75]. We did not find enrichment for the lung emphysema genes in these phenotype variables. Overall, this suggests a systemic B cell signature observed previously in the lung [
38], recapitulated here in peripheral blood. The significant gene at B cell module intersection with the bronchial epithelium results for COPD was
CD28 (CD28 molecule). This gene may play a role in immunologic senescence [
71] and COPD inflammation [
76], owing to its role as a co-stimulatory molecule, constitutively expressed by naïve T cells and required for full activation (and survival) of T cells.
Reversal of bronchial epithelium disease signature
Using Connectivity Map, we identified perturbing compounds that produce gene expression signatures in two lung cell lines opposing the disease gene expression signature we observed in the bronchial epithelium. The top chemical pertubagen from the A549 results was lomerizine, a calcium channel blocker, suggesting potential drug repurposing. Others on the list include glucocorticoid receptor agonists, used in the treatment of inflammatory lung diseases [
77] through their activation of specific glucocorticoid receptor mechanisms. Among the HCC515 compounds, fluticasone is a current therapeutic for treatment of respiratory disease [
78], and as the top result for HCC515, ephedrine is a known bronchodilator.
Some limitations to the current study involve blood and bronchial epithelium cellular heterogeneity. We have partially addressed the heterogeneity in blood using the measured leukocyte counts. However, remaining variation (e.g. lymphocyte composition) may influence the gene expression signatures, as
GPR15 was differentially expressed in our smoking analysis in blood and was found to be expressed in a T cell dependent manner with cigarette smoking [
79]. Single cell or single cell type sequencing will better resolve specific gene expression signatures. We have not addressed the polarization of the alveolar macrophages, that increases with COPD severity and cigarette smoke exposure [
80]. The study of early and intermediate phenotypes of COPD would help to link the temporal changes in the tissue gene expression overlap with disease progression, as would longitudinally repeated gene expression experiments. Last, despite the use of RNA-seq to improve the resolution of gene expression signatures and use of gene set enrichment to extract signals from all results, our sample size does limit our power to detect these signatures. Given this limited power, our focus was not on the identification of specific biomarkers. Future work will involve larger study cohorts with greater power to also resolve individual biomarkers.
Conclusions
In this integrative genomics study, we have performed RNA-seq profiling of gene expression in three matched COPD-relevant tissues. Using statistical and gene set enrichment methods, we have identified overlapping differentially expressed genes and pathways across the tissues, providing lung disease biomarker insight. We observed no common genes across all three tissues. However, we did observe shared pathways across all three. By integrating the gene expression profiles with previous COPD findings to provide additional disease context, we identified a lung disease signature in our emphysema results in the bronchial epithelium and peripheral blood, while also suggesting recapitulation of a systemic B cell lung signature in peripheral blood. Together this hints that peripheral blood has the potential to capture relevant lung pathobiology. Connectivity Map provided some translational context, identifying known and putative compounds that elicit a gene expression signature in lung cell lines that opposes the disease signature we observed in the bronchial epithelium.