Introduction
Idiopathic Pulmonary Fibrosis (IPF) is an interstitial lung disease of unknown origin characterized by progressive lung fibrosis [
1]. The pathogenesis of IPF is complex and still unclear. Previous studies of whole genome transcriptomics have described alterations in different molecular pathways in end-stage IPF lungs, including aberrant activation of epithelial cells that promote fibroblast to myofibroblast differentiation [
2,
3], excessive production of extracellular matrix proteins, such as matrix metalloproteases (MMPs), collagen and fibronectin [
4,
5], aberrant activation of lung developmental pathways [
6,
7], mitochondrial abnormalities [
8,
9] and oxidative stress [
9,
10, and type II epithelial cells and fibroblasts senescence [
2,
11,
12]. The combination of all these pathogenic mechanisms leads to a highly heterogeneous disease, in which the identification of disease endotypes is an important unmet clinical need to move toward precision treatment [
13].
In this setting, the role of the immune system is unclear. Some studies have proposed a role of immune pathways such as CD3 + and CD20 + lymphocytes in the development of fibrosis [
14,
15] through the promotion of epithelial to mesenchymal transition (EMT) [
4,
7,
15‐
17]. Further, the progression of IPF and the occurrence of exacerbations was associated with B cell responses [
18,
19] through their capacity to modify the pro or anti-fibrotic lung micro-environment, thus influencing fibroblasts activity [
20]. However, other findings challenge the role of the immune response in IPF [
21]. First, clinical trials with immune-suppressive agents showed increased mortality and fibrosis in treated patients [
22]. Second, the expression of markers of lung T lymphocytes exhaustion (such as PD-1, ICOS and CD28) is associated with enhanced TGF-β production and poor survival in IPF [
23,
24]. Finally, the proportion of NK cells with impaired activity is reduced in IPF lungs [
25] and their functionality is profoundly compromised by the lung microenvironment [
26].
We therefore hypothesized that it is likely to be significant immune-related molecular heterogeneity in patients with IPF. To test this hypothesis, we used gene set variation analysis (GSVA) in lung tissue samples of patients with IPF, instead of previous studies using conventional analysis of single-gene expression. GSVA is a statistical technique that enables the discovery of inflammatory and leukocyte lineage gene signatures by comparing combined enrichment scores (ESs) of established and predefined gene sets, especially in heterogeneous samples [
27,
28]. Specifically: (1) we first applied GSVA to lung transcriptomic data of 109 severe IPF patients (explanted lungs) available at the Lung Tissue Research Consortium (LTRC) to estimate the proportion of immune cells in their lungs; (2) we then used unbiased cluster analysis to identify distinct groups of IPF patients with overall distinct level of immune signatures; and, finally, (3) we explored differential gene expression between observed clusters, both for newly identified signatures as well as for previously stablished IPF related pathways.
Methods
Availability of data and materials
Study design, patients and ethics
Transcriptomic data of IPF explanted lungs (n = 109) was obtained from the LTRC following established procedures. Experimental validation using cell-based (not mRNA) methods (flow cytometry) was performed in lung tissue samples of IPF patients undergoing bilateral lung transplant at the University of Pittsburgh (USA). The Institutional Review Board and the Committee for Oversight of Research and Clinical Training Involved Decedents of the University of Pittsburgh, approved the study and the sample transfer respectively. In all cases, a signed informed consent form was collected before organ procurement.
Clinical characterization of IPF patients
Available clinical data in LTRC include age, sex, body mass index, Forced Expiratory Volum (FEV1), Forced Volum Capacity (FVC), carbon monoxide diffusing capacity (DLCO), quantify Computed Tomography (CT) of the thorax by an adapted version of the CALIPER software and daily activity and health questionaries. All procedures were realized following LTRC protocols, the diagnosis of IPF was performed by a specialist evaluating the medical record, CT scan report and the post-transplant pathology report.
GSVA, immune-signatures enrichment and unbiased cluster analysis
We analyzed the transcriptomic data set GSE47460 from the LTRC [
29]. This data set was split in two, GPL14550 was used as a discovery data set (D#1, n = 109) whereas GPL6480 was used for validation (D#2, n = 34). For the current analysis we used the normalized matrix downloaded from GEO, selecting only patients with a diagnosis of IPF. Gene set variation analysis (GSVA) was used to determine patient‐wise enrichment scores (ES) that indicate the relative collective expression of genes within the gene signatures for patients relative to the rest of the cohort of patients in a given transcriptomic dataset [
30]. Sets of the immune signatures used were based on available gene expression publications (n = 31, Additional file
2: Table S1) [
27,
31]. Unbiased clustering of the GSVA immune signatures were identified using the dendextend R package in R [
32]. To maximize the differences in the GSVA scores, the number of clusters was set at 2, the distance metric was calculated with the minkowski method and the hierarchical clustering method was ward. D2 [
32].
Differential gene expression between clusters was investigated using limma [
33]. To build the correlation network with the clinical parameters and to further understand the relationship between the immune and epithelial cells in these patients, the gene sets included in our GSVA analysis were extended, while preserving the already obtained immune-based unbiased clustering, to include epithelial lineage cell signatures (skipping genes already included in the immune cell signatures) (Additional file
2: Table S2).
Experimental validation of LTRC results in fresh lung tissue samples by flow cytometry
To validate results from the GSVA immune enrichment in the LTRC, we used flow cytometry, a non-mRNA related method. Fresh lung tissue samples of IPF patients undergoing bilateral lung transplant at the University of Pittsburgh (USA) were washed with PBS and enzymatically digested as previously described [
34]. Lung homogenates included multiple areas of the same lung lobe, ensuring the representability of the sample to address patient’s heterogeneity. Lung tissue homogenates (10
6 cells) were then stained 5 min with the viability staining (Fixable viability-Alexa600, BD, USA) and 30 min at 4ºC in the dark with the following conjugated monoclonal antibodies CD3-PECy5.5, CD45-Alexa700, CD16-BV412, CD56-FITC, CD8-V500, CD4-APC-Cy7, CD19-BV650 (BD, USA) and CD14-PE (BioLegend, USA). A minimum of 5 × 10
5 cells per sample were acquired in a FACS LSRII (BD Biosciences, USA), and data was analyzed using FlowJo v10 (FlowJo LLC, USA). Immune cell populations were determined using the gating strategy depicted in Additional file
1: Fig. S1.
Biologic pathway analysis
To evaluate the enrichment of biological signatures in the observed clusters, gene ontology (GO) enrichment and hypergeometric tests were used [
35]. The gene signatures for the hypergeometric test were selected from previously published sc-RNAseq studies: epithelial cells signatures [
36‐
39] and fibroblast related signatures [
37‐
41]; or from the Gene Ontology (GO) extracellular matrix (GO:0031012), oxidative stress (GO:0000302), mitochondrial transport (GO:0006839), mitochondrial respiratory chain (GO:0005746) and response to stress (GO:0006950). Additional file
2: Table S3 shows the complete list of gene signatures investigated here.
Statistical analysis
Quantitative and qualitative data is presented as mean, or n and proportion, respectively. Results were compared using the ANOVA or Fisher tests, as appropriated. Differences in the distribution of the GSVA calculated signatures between clusters were assessed with the ANOVA test too. Correlations between immune cell signatures and clinical features were assessed using the Spearman correlation test, which was considered statistically significant if its r value was >|0.5| and the p value < 0.05. To explore correlations between biological and clinical features, we used network analysis, where each node was the variable of interest, its size was proportional to its mean value in each cluster, and links (edges) represent the Spearman Rho between linked variables, with results being plotted using Cytoscape [
42]. All statistics were computed with R 4.2.2, using custom scripts.
Discussion
The main and novel observation of this study are that, by using unbiased cluster analysis of lung immune signatures in a large cohort of patients with IPF (n = 109), we identified two clusters (C#1 and C#2) of similar size with different immune-related characteristics and differentially expressed genes: C#1 (n = 55, 53%) was characterized by a higher expression of immune signatures, particularly cytotoxic and memory T cells, whereas C#2 (n = 49, 47%) was characterized by an upregulated expression of cilium associated genes, epithelial and secretory cells (structural cell cluster). Interestingly, though, the clinical presentation of these two clusters was remarkably similar, indicating that at the end-stage of the disease the identified molecular heterogeneity does not translate directly into a different clinical phenotype. However, further research is need to understand whether these clusters are already present in earlier phases of the disease and/or associated with the disease progression.
Previous studies
A few previous studies used transcriptomic data to identify clusters of IPF patients. Using lung transcriptomics, Yang et al
. identified a cilium associated subtype and a fatty acid metabolism one [
43], but the expression of immune related genes or the associated cell types was not reported. Using blood transcriptomics, Kraven et al. described three clusters of IPF patients, one of them enriched in immune response genes [
44]. Additionally, Herazo-Maya JD et al. identified a 52 gene signature on PBMCs that stratified patients with different disease outcomes [
45,
46], and an increase of peripheral blood monocytes has been associated with poor prognosis [
47]. Finally, De Sadeleer et al. used transcriptomic results of bronchoalveolar lavage fluid analysis identified 6 clusters in IPF patients, one of them again enriched in immune signatures [
48]. Collectively, these studies support our observation of immune heterogeneity in IPF. To our knowledge, however, no previous study has used unbiased cluster analysis of IPF lung immune signatures enrichment. Importantly, results were validated experimentally in independent lung tissue samples using non-mRNA related method (flow cytometry).
Interpretation of novel findings
The application of this cutting-edge methodology to IPF lung tissue allowed us to identify two clusters of IPF patients (C#1 and C#2) with marked biological differences: while C#1 was an "
immune-cell" cluster, particularly enriched in cytotoxic and memory T cells, C#2 was a "
structural cell" cluster, with marked upregulation of cilium, epithelial and secretory cells genes. Because in the study mentioned above Yang et al
. also identified a cilium associated IPF subtype using lung transcriptomics [
11], we explored the degree of overlap between their results and our identified clusters. The hypergeometric test showed that our C#2 shared a 99% and 72% of their described genes, indicating that our unbiased clustering of immune signature enrichment generates a similar grouping of IPF patients than the more traditional transcriptomic hierarchical clustering.
From the clinical viewpoint, it is of note that these two very different biologic clusters of patients with IPF show remarkably similar clinical characteristics (Table
1). We think that this may likely be due to the fact that lungs were harvested at transplantation, this is at an end-stage course of the disease. It is possible that at an earlier stage, clinical differences may have been more evident or that these two clusters represent different disease trajectories, varying in either rate of progression, frequency of infections or exacerbations and/or the response to treatment. All these possibilities require and deserve future research. This is the main limitation of the study, the lack of longitudinal information to understand the disease evolution, progression and a record of infections and exacerbations that could have a direct impact in the lung immunological state.
Conclusions
The use of unbiased clustering of the transcriptomic enrichment in immune signatures in lung tissue of patients with end-stage IPF identified two distinct clusters, an immune-cell one and a structural-cell one, with a negative correlation between the expression of immune and epithelial related signatures. These very different biological clusters are not related with clinical characteristics but whether they are present at an earlier stage and/or there is an association with disease phenotypes or progression should be further studied.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit
http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (
http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.