Discussion
In this analysis of the IPF-PRO Registry, a prospective registry of patients with IPF, we used a two-step method to harmonize multi-omics datasets and conduct unsupervised clustering based on the molecular features. This method identified two novel molecular subtypes of IPF associated with distinct clinical characteristics. Patients in subtype 1 had more severe disease at enrollment and shortened time to disease progression than patients in subtype 2, after adjusting for disease severity and use of antifibrotic treatment at baseline. The distribution of subjects into the molecular subtypes was driven by miRNA expression and protein abundance, while toRNA expression did not differ between the subtypes. Consistent with this observation, these molecular subtypes of IPF were distinct from risk groups identified using a previously described 52-gene (RNA) signature [
9,
10]. A signature of 34 circulating proteins and 7 circulating miRNAs may be useful to classify patients as subtype 1 or 2. These data will be important to permit validation of the existence and clinical implications of these subtypes. A biological pathway analysis of genes encoding differentially abundant proteins or regulated by the differentially expressed miRNAs suggested a coordinated alteration of gene expression among individuals at greater risk of disease progression, including in pathways previously associated with pulmonary fibrosis.
Accurate identification of patients with IPF who are likely to experience short-term disease progression has been proposed as part of an enrichment strategy for clinical trial design [
41]. Previous studies have demonstrated associations between circulating levels of protein biomarkers and IPF prognosis; most of these studies measured a limited panel of proteins (selected based on disease mechanisms), or evaluated progression-free survival without considering disease progression [
42‐
47]. Interestingly, two independent studies found that several neoepitopes of matrix metalloprotease-degraded extracellular matrix proteins or collagen synthesis were elevated in the blood of patients with progressive IPF relative to those with stable IPF [
44,
45]. Another study used an aptamer-based platform for proteomic profiling of blood in patients with IPF, and identified 9 proteins associated with IPF progression [
48]. Interestingly two (carbonic anhydrase XIII and NACA) were among the 232 proteins that we identified as differentially abundant in the IPF subtypes, but while we determined that lower abundance was associated with progression, this prior study found lower abundance to be protective [
48]. Similarly, the 52-gene signature has been shown to predict transplant-free survival; however, its association with disease progression has not previously been tested [
9,
10]. When applied in our cohort, the high-risk group based on the 52-gene signature experienced significantly shorter transplant-free survival (as expected), but did not experience shorter progression-free survival based on a composite of ≥ 10% absolute decline in FVC % predicted, lung transplant, or death. In contrast, our molecular subtype 1 experienced shortened progression-free survival after adjusting for disease severity and antifibrotic drug use at enrollment. This suggests better resolution to predict disease progression based on multi-omics rather than gene expression (toRNA) alone. While a recent analysis suggested that longitudinal change in peripheral blood gene expression predicted a ≥ 10% decrease in FVC over follow-up [
49], risk ascertainment at a single timepoint would be optimal, with the protein/miRNA classifier of IPF subtypes a candidate for further development and validation.
Integrating high-throughput data from multiple platforms remains a challenge. In this study, we initially considered three methods based on two general approaches. iCluster + and iClusterBayes include a variable selection step (i.e., lasso) followed by distillation of input matrices to a smaller set of latent variables, allowing joint clustering of samples and identification of cluster-relevant features [
17,
31,
32]. Our two-step scSNF constructed a sample-similarity network (where each patient is a sample) for each omics data type and integrated these networks into a fused similarity network using a non-linear combination method [
13], followed by unsupervised spectral clustering [
32]. Importantly, the scSNF procedure omitted the variable selection step, limiting one source of bias.
The molecular subtypes that we identified based on integration of data from several constituents of the gene-to-protein expression pathway appear to reflect the pathobiology of IPF. Several of the proteins that were different in subtype 1 compared to 2 have been implicated in IPF pathogenesis. For example, activation of GSK-3 beta protein, which is reduced in molecular IPF subtype 1, is enhanced by TGF-beta, contributing to myofibroblast differentiation; GSK-3 beta signaling inhibition has been proposed as a treatment strategy for IPF [
50]. PKB beta protein, reduced in subtype 1, has been implicated in the pathogenesis of IPF, where AKT2 knockout results in lower IL-13 and TGF-beta production by macrophages, alleviating fibrosis in animal models [
51]. The MAPK/ERK pathway, of which several protein constituents were reduced in subtype 1, is activated by TGF-beta, with ERK-1/2 linked with abnormal cellular senescence [
52,
53]. MAPKAPK2 (MK2) is elevated in fibroblasts and epithelial cells from patients with IPF, and its inhibition has been proposed as a treatment strategy based on pre-clinical models [
54]. Interestingly, we found decreased protein abundance in the peripheral blood of persons with IPF who were at increased risk for physiologic progression, while the literature suggests that reduced quantity or activity should be protective or therapeutic. It is possible that target tissue protein quantity or activity differs from blood, but these findings may have important implications for use of blood proteins as candidate biomarkers of disease stage and/or treatment response.
Several miRNAs that have been mechanistically linked with IPF were differentially expressed in molecular IPF subtype 1 compared to 2. We identified increased expression of mir-142-5p and reduced expression of mir-130a-3p in subtype 1. Altered expression of these miRNA in macrophages (in a similar direction as we observed) has been implicated in lung and liver fibrosis via reduced STAT6 signaling; mir-142-5p targets SOCS1 (a negative regulator of STAT6 phosphorylation), and mir-130a-3p targets the PPAR-g inhibitor [
16]. We found reduced expression of miR-21-3p and increased expression of miR-21-5p in molecular subtype 1. Over-expression of miR-21 has been demonstrated in the lungs of patients with IPF and in animal models of lung fibrosis, suggesting it may function via reduction of Smad7, a downstream inhibitor of TGF-beta signaling [
15]. We also observed differential expression of miR-34a-5p, miR-126-5p, and miR-199a-5p in molecular subtype 1 although the direction of differential expression did not always match that expected in IPF based on published literature [
55‐
58].
To gain additional insight into biologic differences between the molecular subtypes, canonical pathways over-representation analysis (IPA) was conducted separately for up- and down-regulated molecules in subtype 1 compared to 2. The intersection of these datasets comprised a number of pathways known to be altered in IPF (e.g., VEGF, PDGF, ERK/MAP signaling [
52‐
54,
59]). Among non-intersecting (across proteins and miRNA) pathways, multiple innate or adaptive immunity-related pathways were over-represented among target genes of miRNA that were down-regulated in progressive IPF. Pathways that were uniquely over-represented among target genes of up-regulated miRNA in progressive IPF included a number that were related to cellular or metabolic processes. Given that miRNA often act as post-transcriptional down-regulators of gene expression, this might suggest that IPF progression is associated with increased immune responses and decreased cellular metabolism. With miRNA not extensively studied in IPF, additional research is needed to better understand these results.
Our study has several limitations. First, the aptamer-based proteomics platform we used contains a targeted list of biomarkers that is not comprehensive of all the proteins that may be found in the blood or potentially associated with pathobiology. Second, molecules measured in peripheral blood may not reflect the pathobiology of the target tissue [
18,
19,
24]. Third, while this real-world registry followed participants to death or transplant, we cannot exclude the possibility that detection of disease progression based on only physiologic decline was impacted by informative missingness in lung function measurements (i.e., sicker patients were less able to complete testing). Finally, although we were able to internally validate (via resampling) our classifier of the molecular subtypes, the classifier of the molecular subtypes of IPF requires further development and validation in an independent cohort.
Acknowledgements
We thank the principal investigators and enrolling centers in the IPF-PRO Registry: Albert Baker, Lynchburg Pulmonary Associates, Lynchburg, VA; Scott Beegle, Albany Medical Center, Albany, NY; John A Belperio, University of California Los Angeles, Los Angeles, CA; Rany Condos, NYU Medical Center, New York, NY; Francis Cordova, Temple University, Philadelphia, PA; Daniel A Culver, Cleveland Clinic, Cleveland, OH; Daniel Dilling, Loyola University Health System, Maywood, IL; John Fitzgerald (formerly Leann Silhan), UT Southwestern Medical Center, Dallas, TX; Kevin R Flaherty, University of Michigan, Ann Arbor, MI; Kevin Gibson, University of Pittsburgh, Pittsburgh, PA; Mridu Gulati, Yale School of Medicine, New Haven, CT; Kalpalatha Guntupalli, Baylor College of Medicine, Houston, TX; Nishant Gupta, University of Cincinnati Medical Center, Cincinnati, OH; Amy Hajari Case, Piedmont Healthcare, Atlanta, GA; David Hotchkin, The Oregon Clinic, Portland, OR; Tristan J Huie, National Jewish Health, Denver, CO; Robert J Kaner, Weill Cornell Medical College, New York, NY; Hyun J Kim, University of Minnesota, Minneapolis, MN; Lisa H Lancaster (formerly Mark Steele), Vanderbilt University Medical Center, Nashville, TN; Joseph A Lasky, Tulane University, New Orleans, LA; Doug Lee, Wilmington Health and PMG Research, Wilmington, NC; Timothy Liesching, Lahey Clinic, Burlington, MA; Randolph Lipchik, Froedtert & The Medical College of Wisconsin Community Physicians, Milwaukee, WI; Jason Lobo, UNC Chapel Hill, Chapel Hill, NC; Tracy R Luckhardt (formerly Joao A de Andrade), University of Alabama at Birmingham, Birmingham, AL; Yolanda Mageto (formerly Howard Huang), Baylor University Medical Center at Dallas, Dallas, TX; Marta Kokoszynska (formerly Yolanda Mageto, Prema Menon), Vermont Lung Center, Colchester, VT; Lake Morrison, Duke University Medical Center, Durham, NC; Andrew Namen, Wake Forest University, Winston Salem, NC; Justin M Oldham, University of California, Davis, Sacramento, CA; Tessy Paul, University of Virginia, Charlottesville, VA; David Zhang (formerly Anna Podolanczuk, David Lederer, Nina M Patel), Columbia University Medical Center/New York Presbyterian Hospital, New York, NY; Mary Porteous (formerly Maryl Kreider), University of Pennsylvania, Philadelphia, PA; Rishi Raj (formerly Paul Mohabir), Stanford University, Stanford, CA; Murali Ramaswamy, PulmonIx LLC, Greensboro, NC; Tonya Russell, Washington University, St. Louis, MO; Paul Sachs, Pulmonary Associates of Stamford, Stamford, CT; Zeenat Safdar, Houston Methodist Lung Center, Houston, TX; Shirin Shafazand (formerly Marilyn Glassberg), University of Miami, Miami, FL; Ather Siddiqi (formerly Wael Asi), Renovatio Clinical, The Woodlands, TX; Reginald Fowler (formerly Barry Sigal), Salem Chest and Southeastern Clinical Research Center, Winston Salem, NC; Mary E Strek (formerly Imre Noth), University of Chicago, Chicago, IL; Hiram Rivas-Perez (formerly Jesse Roman, Sally Suliman), University of Louisville, Louisville, KY; Jeremy Tabak, South Miami Hospital, South Miami, FL; Rajat Walia, St. Joseph’s Hospital, Phoenix, AZ; Timothy PM Whelan, Medical University of South Carolina, Charleston, SC.
The authors thank Janine Roy, Staburo GmbH, Munich, Germany for conducting the primary analysis of the sequencing data and Naftali Kaminski and Jose D. Herazo-Maya from Yale University, New Haven, USA for confirming how the 52-gene signature should be applied. The authors meet criteria for authorship as recommended by the International Committee of Medical Journal Editors (ICMJE). The authors did not receive payment for development of this article. Editorial support was provided by Melanie Stephens and Wendy Morris of Fleishman-Hillard, London, UK, which was contracted and funded by Boehringer Ingelheim Pharmaceuticals, Inc. Boehringer Ingelheim was given the opportunity to review the article for medical and scientific accuracy as well as intellectual property considerations.