Background
Chronic obstructive pulmonary disease (COPD), a disorder characterized by reduced maximum expiratory flow and slow forced emptying of the lungs, is a common, costly, and preventable disease that has implications for global health [
1]. Although cigarette smoke (CS) is a well-known risk factor for the development of COPD, smoking-related damage manifestations, such as airway wall thickening, loss of small airways functions, and emphysematous lung destruction, vary in individual smokers [
2]. These heterogeneities of smoking-related manifestations lead to difficulty in investigating the risk continuum across smoking and COPD. Moreover, various next-generation products (NGPs), including e-cigarettes and heat-not-burn tobacco products, have been recently introduced in global markets [
3,
4]. These NGPs can potentially reduce the harms associated with tobacco use because of their reduced yields of toxicants, which is attributable to the generation of aerosols without combusting tobacco leaves [
5,
6], but the effects of long-term use of these NGPs on human health remain controversial [
7,
8] despite previous non-clinical [
9‐
12] and clinical studies [
13‐
15]. Epidemiological analysis could be one of the solutions to estimate the realistic risk of the use of such products, but several years would be needed to reach a conclusion. Furthermore, epidemiological studies on a product-by-product basis would be difficult because new products are frequently introduced and customer choice would vary. Considering these issues together, rapid methodology for precisely predicting the potential risk of COPD is demanded to estimate the realistic impact of NGPs in comparison with combustible cigarettes.
Alternatives to animal testing have been introduced recently based on the principle of 3Rs: reduction, refinement, and replacement [
16]. They have been also expected as rapid and precise risk assessment tools because of their high resemblance to in vivo situations [
17]. In terms of investigating the effects of airborne materials such as CS, a three-dimensional (3D) cultured airway epithelial cell model that functionally differentiates through an air-liquid interface (ALI) culture is more representative, exhibiting a pseudostratified columnar epithelial structure with beating cilia as observed in the human airway [
18,
19]. Our group also applied these in vitro alternative testing approaches to the investigation of biological responses to or prediction of the risks of acute or subchronic inhalation toxicity of CS [
20‐
22]. In addition, the National Academy of Sciences [
23] proposed a paradigm shift in toxicology from current animal-based testing toward the application of emerging technologies, including “-omics” technologies. This new paradigm would provide greater mechanistic insight into the mechanism by which many compounds affect human health [
24]; therefore, omics technologies have also improved our understanding of the complex effects of CS [
25‐
27]. Furthermore, these large-scale datasets may be well suited for computational methodology to develop risk prediction models [
28]. However, the development of computational methodologies that can quantitatively assess human disease risk remains challenging issues.
The objective of the present study was to further understand of smoking effects and COPD pathogenesis. Among the existing omics technologies, we believe that the transcriptomic approach is one of the powerful tools because of the high quality of the data and availability of public available databases, such as ArrayExpress (
https://www.ebi.ac.uk/arrayexpress/) and the GEO database (
https://www.ncbi.nlm.nih.gov/geo/). Therefore, we first obtained the global transcriptomic profiles of CS exposure and COPD-related biological response inducers in ALI-cultured 3D human bronchial epithelial cells. However, the precise mechanism of action of CS exposure throughout the development of COPD has been unclear. CS-mediated oxidative stress is believed to be the uppermost biological event in respiratory tissues [
29], and severe oxidative stress may lead to chronic inflammation and cellular DNA damage, as observed in the tissues of patients with COPD [
30‐
32]. Thus, we exposed a commercially available 3D human airway epithelia reconstituted culture (MucilAir™) to the aqueous extract (AqE) of a reference cigarette and inducers of oxidative stress, cellular DNA damage, and inflammatory response. We hypothesized that the transcriptomes of tissues exposed to CS and those exposed to test substances possess valuable information related to COPD; therefore, we identified descriptive marker genes and their potential for reflecting the risk continuum across smoking and COPD pathogenesis. In this study, we developed an effective approach for new potential marker identification and estimation of disease risk using machine-learning techniques.
Discussion
In this study, we utilized a 3D cultured bronchial epithelial tissue model, which is expected to be one of the alternative models to animal testing. We conducted exposure studies using the AqE of 3R4F smoke and inducers of oxidative stress, DNA damage, and inflammatory responses because these are considered the earliest key events for chronic inflammatory lung diseases [
29‐
32]. To identify potential descriptive marker genes, we extracted commonly up- and down-regulated genes from the transcriptomes of tissues exposed to those test substances (Fig.
2). ADM, AREG, CXCR4, CYP1B1, DUSP6, EFNA1, EGLN3, FBXO32, HILPDA, IGFBP3, PHLDA1, SLC7A11, TXNIP, WNT5A, and ZBED2 were identified as commonly perturbed genes, and 10 of these genes, as well as their coding proteins, had not previously been identified as biomarkers for chronic inflammatory lung disease or associated with lung function (Table
2). In addition, these 15 genes were highly correlated with each other (Additional file
1: Figure S1), suggesting that they are perturbed by the same or similar mechanisms. To verify the association of these 15 genes with COPD pathology, we performed RF-based multi-classification to discriminate COPD subjects, smokers, and non-smokers using publicly available transcriptomic data (Table
1). This model with the 15 genes clarified patient status with marginally higher accuracy than known COPD-associated genes [
42], suggesting that the 15 genes, including newly identified potential marker genes, are closely associated with COPD status. These newly identified biomarkers are related to proliferation (DUSP6 [
43], EFNA1 [
44], IGFBP3 [
45], and PHLDA1 [
46]), hypoxia (EGLN3 [
47] and HILPDA [
48]), redox homeostasis (SLC7A11 [
49] and TXNIP [
50]), and epithelial-mesenchymal transition (FBXO32 [
51]) (Table
2). Among them, the expression levels of AREG, CXCR4 and DUSP6 were significantly different between non-smokers and COPD subjects, and these genes are known to be associated with EGFR signaling, which plays a key role in the pathogenesis of COPD [
52]. AREG, an EGFR ligand generated by the ADAM17-mediated shedding of pro-AREG proteins, stimulates the transcription of inflammatory mediators in bronchial epithelial cells [
53]. Moreover, recent research illustrated that AREG-mediated IL-6 secretion is enhanced in differentiated bronchial cells from patients with COPD compared with the findings in cells from subjects without COPD [
54,
55]. CXCR4 is associated with the recruitment of lymphocytes to disease lesions [
56]. The mRNA levels of the CXCR4 ligand SDF-1 are reduced in mesenchymal stem cells (MSCs) derived from bone marrow, suggesting an impairment of the migratory capacity of MSCs. MSC migration to disease lesions plays crucial roles in anti-inflammatory effects and tissue repair [
57,
58]. The publicly available transcriptomic data used in this study were obtained from lung biopsies; however, downregulation of CXCR4 in COPD subjects implies attenuation of MSC recruitment, thereby eventually accelerating inflammation and tissue destruction. Although the direct relationship between DUSP6 and COPD has not yet been reported, several advanced studies demonstrated that activation of EGFR induces DUSP6, which regulates EGFR signaling via specific ERK1/2 inhibition [
59]. Therefore, the observation of DUSP6 upregulation in COPD subjects in this study implies constitutive activation of the EGFR signaling pathway. Taken together, these three genes extracted from the transcriptome of in vitro tissues may be associated with COPD pathogenesis via the EGFR signaling pathway, and they are expected as novel markers of COPD.
Although the 15 genes were able to predict non-smokers, smokers, and COPD subjects with high accuracy, the result clearly revealed that it is difficult to discriminate COPD subjects from smokers (Table
3). Therefore, we provide the PRF index model based on a logistic regression method to distinguish COPD subjects from smokers. This approach enabled the conversion of gene expression levels to a numeral index named the PRF index (see the formula in the Materials and Methods section). Logistic regression is used frequently in clinical trials to calculate the odds ratio when the risk ratio cannot be obtained directly [
60]. The PRF index is also based on the concept of odds ratios, which indirectly estimate the risk ratio of CS exposure. Because the gene expression profiles of smokes and COPD subjects were similar, we first performed stepwise elimination of the 15 extracted genes to identify important variables. We selected 11 genes as important for distinguishing non-smokers from smokers, and 4 genes for distinguishing smokers from COPD subjects. Interestingly, 3 out of 4 genes for distinguishing smokers from COPD subjects (AREG, EFNA1, and TXNIP) were also marker genes for distinguishing non-smokers from smokers (Table
4). AREG was considered to be associated with EGFR signaling pathway activation as described. EFNA1 encodes a member of the ephrin family, ephrin A1. Advanced studies suggest that these proteins play an important role in inflammation through NF-κB signal activation [
61]. Thioredoxin-interacting protein (TXNIP) reduces the anti-oxidative function of thioredoxin by binding to its redox-active cysteine residues [
62,
63]. The expression level of EFNA1 increased in smokers compared with non-smokers, and was higher in COPD subjects than in smokers (Fig.
3). On the other hand, the expression level of TXNIP decreased in smokers compared with non-smokers, and was lower in COPD subjects than in smokers. These data suggest that those gene expression levels could provide an important means of distinguishing between smokers and COPD subjects. The PRF index was then calculated using the normalized expression values of the selected genes, the estimated intercept, and the regression coefficient of each gene. The PRF indices of smokers and COPD subjects were significantly different from that of non-smokers (Fig.
4a). Because the ages and pack-years differed significantly between the smokers and COPD subjects (Additional file
2: Figure S2A), and were moderately correlated (Additional file
2: Figure S2B), we analyzed the correlations of the PRF indices and the expression values of the 15 identified genes with age and pack-years. AREG and TXNIP exhibited weak correlations with both pack-years (Additional file
3: Figure S3) and age (Additional file
4: Figure S4). However, the other genes exhibited little correlation, and notably, there were very weak correlations between the PRF indices and those factors. This suggests that a combination of several genes could appropriately reflect the risk continuum across smoking and COPD pathogenesis, and also, each individual genes used in the PRF index model may provide further understanding of smoking effects and new insights into COPD.
Although the PRF index does not reflect future COPD risk, and is incapable of diagnosing COPD severity in individuals, the model may have a potential to compare the toxicity of various tobacco products in in vitro study based on the COPD-related biological responses. We also calculated the PRF index using MucilAir™ samples exposed to the AqE of 3R4F smoke for 4 and 24 h (Additional file
5: Figure S5). Although dose-dependent increases of the PRF index were observed, the PRF index for the lowest concentration of the AqE of 3R4F smoke was less than 1.0, indicating a lower risk than observed for the air-exposed control group. Because the pathological or morphological changes in smokers or patients with COPD could be caused by habitual cigarette smoking, we must examine the variability of the PRF index in a repeated long-term CS exposure study in a future analysis to validate the PRF index using in vitro experimental datasets for prospective risk estimations. In addition, it is also a reasonable next step to calculate the PRF index in a study comparing exposure to NGP vapor and conventional combustible cigarette smoke to demonstrate the usefulness of the index for the potential assessment of the relative toxicity based on the COPD-related biological responses.
We believed our model and PRF index are useful for the discrimination of non-smokers, smokers, and COPD subjects, but there are some limitations, which must be considered further. (i) Because cigarette smoking can have acute and eventually chronic effects, the smoking status of the subjects is an important consideration with regard to the gene signature (e.g., the gene expression profiles would be different between smokers with COPD and former smokers with COPD). However, we only found a clear description of the smoking status of the subjects in the E-MTAB-1690 study [
64‐
66]. Therefore, it is possible that our model ignored the factors related to acute phase effects in the COPD subjects. (ii) Eight substances, focusing on three biological events, were used to identify COPD-associated biomarker genes. Because COPD is a complex disease, other important biological perturbations such as apoptosis and autophagy are involved. Gene expression profiles obtained in additional exposure studies using the inducers of such biological events would increase the plausibility of potential biomarker genes. (iii) We utilized microarrays to analyze gene expression profiles in this study; however, next-generation sequencing could potentially permit a more comprehensive analysis of RNA expression profiles including non-coding RNAs. As such, room for improvement of our methodology remains, but our present approach suggests that mechanism-based large-scale dataset generation combined with computational analyses is useful for biomarker identification and risk estimation using the identified biomarker genes.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.