Background
The interstitial lung diseases (ILDs) are a family of non-infectious, non-malignant lung diseases characterized by alveolar injury, inflammation, and fibrosis. Idiopathic pulmonary fibrosis (IPF), the most common idiopathic ILD, has a median survival time of 3.8 years and affects nearly 1 out of 200 older adults in the United States [
1]. Previous genome-wide association studies (GWAS) for IPF have demonstrated the utility of genetic approaches for discovering novel biological pathways involved in the pathogenesis of ILD in studies of up to ~1,600 cases and ~4,700 controls [
2,
3]. Variation in the promoter of
MUC5B was identified as a risk factor for clinically evident sporadic and familial ILD [
4] and is associated with the presence of interstitial lung abnormalities (ILAs), a qualitative subclinical ILD phenotype, in the Framingham Heart Study (FHS) [
5]. Other genes implicated in the development of ILD include
SFTPA2, SFTPC, ELMOD2, TERT, TERC, FAM13A,
DSP,
OBFC1,
DPP9, TOLLIP, ATP11A, and
SPPL2C [
2,
3].
The study of early, subclinical disease before advanced parenchymal changes have occurred could lead to the identification of novel biological pathways involved in the pathogenesis of ILD at a stage more amenable for intervention. High attenuation areas (HAA), defined as the percentage of lung voxels in the range of -600 – −250 Hounsfield units, correspond to the CT attenuation values observed in areas visually described as having ground-glass and reticular attenuation [
6]. In the Multi-Ethnic Study of Atherosclerosis (MESA), we have shown that greater HAA is associated with cigarette smoking, lower forced vital capacity, reduced exercise capacity, higher serum levels of matrix metalloproteinase-7 (MMP-7) and interleukin-6, a higher prevalence of interstitial lung abnormalities (ILAs) at 9.5 years follow-up, and a higher risk of death [
6,
7].
Under the hypothesis that high throughput genetic association analysis could identify novel pathophysiologic pathways to explain the occurrence of HAA and ILD, we conducted a GWAS for percent HAA on cardiac CT in the MESA Study. We further sought replication of identified SNPs for percent HAA in participants of European ancestry from the FHS, validation in analysis of ILD cases from the Columbia ILD Study compared to matched controls selected from among the MESA participants, and performed differential gene expression analysis for comparing mRNA from lung tissue of IPF cases versus controls.
Methods
Study participants
MESA is a population-based longitudinal study of subclinical cardiovascular disease [
8]. Between 2000 and 2002, MESA recruited 6,814 men and women 45–84 years of age from six US sites who were free of clinical cardiovascular disease. The MESA Family Study recruited an additional 1,595 African-American and Hispanic family members 45–84 years of age specifically for genetic analysis, and the MESA Air Pollution Study recruited an additional 257 participants [
9]. Participants who did not consent to genetic analyses or who had no usable genetic material were excluded, resulting in a combined sample of 7,671 participants comprising the MESA SNP Health Association Resource (SHARe) sample, which has been previously described [
10].
We sought replication of selected SNPs from the race/ethnic specific GWAS of MESA Whites in the Framingham Heart Study (FHS), an independent population-based cohort. See Online Supplement for details.
Percent high attenuation areas
The MESA Lung Fibrosis Study was developed to determine the percent HAA for all participants on the lung fields of cardiac CT scans, which were acquired under a standardized protocol [
6,
11]. For discovery analyses in MESA as well as replication in FHS, percent HAA was defined at lung regions between -600 and -250 Hounsfield Units (HU), with basilar percent HAA quantified as the percent HAA in the caudal 1/3
rd of the imaged lung. In MESA, basilar peel-core ratio of HAA (henceforth termed “basilar peel-core ratio”) computed as the percent HAA in the peel region (outer 20 mm) divided by that in the core region for the caudal 1/3
rd of the imaged lung.
Genotyping
Genome-wide genotyping was performed for MESA and FHS participants followed by imputation the 1000 Genomes reference panel [
12]. Additional details are provided in the
Online Supplement.
Genetic association analysis
Within the MESA Lung Fibrosis study, we completed GWAS analyses of percent HAA, basilar percent HAA, and basilar peel-core ratio. Race/ethnic-specific analyses were performed in linear regression models with adjustment for age, sex, study site, principal components (PCs) of ancestry, CT scanner, tube current, breath artifacts, height, weight, cigarettes per day (for current smokers only) and pack-years of smoking. We did not adjust for dichotomous smoking exposures (current/former/never smoking) because this information was already captured by the continuous measures of smoking exposure. Results were combined by fixed-effect meta-analyses across all four MESA race/ethnic groups.
We examined genomic control values of all GWAS for evidence of residual population stratification, undetected family structure, or other sources of inflation in type I error. Single SNP association results that attained a threshold of P < 5.0 x 10−8 were considered genome-wide significant. Those SNPs demonstrating genome-wide significant association in discovery GWAS of percent HAA and basilar percent HAA in MESA were examined for evidence of replication with adjustment for the same covariates in FHS. Additional details provided in the Online Supplement.
Validation with ILD cases
Three hundred sixteen ILD cases (108 of these were IPF confirmed) were recruited between 2007 and 2011 through the Columbia ILD Study, an NIH-funded prospective cohort study at Columbia University Medical Center. For selected SNPs, we performed genetic analysis with these ILD cases compared to MESA controls matched on race/ethnicity. See Online Supplement for details.
Gene expression analyses
We measured mRNA expression of four of the genes we identified (ALCAM, FOXP4, CDKN2B, and ANRIL, along with the reference gene GAPDH) in OCT-embedded fresh frozen lung tissue obtained from 15 adults with IPF and a histologic usual interstitial pneumonia pattern (UIP) and 15 adults without lung disease stored in the Columbia University Pathology Tissue Bank. See Online Supplement for details.
Discussion
In this multi-ethnic GWAS of ~8,000 individuals, we identified associations between HAA phenotypes and the 9p21 region and two other loci also near other RNA genes.
In race/ethnic-specific analyses, we identified a number of genes related to obesity (GNPDA2, ZNF664-FAM101A, PFKP, SAMD4A), glycosylation (GYPC, FUT10), and carbohydrate metabolism (GNPDA2, PFKP, SLC45A). We also identified novel associations with genes that code for a transcription factor (FOXP4) that validated in clinical ILD cases, a cell adhesion molecule (ALCAM) expressed in the pulmonary microvascular endothelium, a protein involved in reorganization of the actin cytoskeleton (DAAM1) that may play a role in pulmonary vascular remodeling, and a (STK38) MAP kinase inhibiting protein that protects against acute lung injury.
Our major finding is that
ANRIL expression is higher in IPF lung compared to normal lung (Additional file
1: Table S14). This discovery came as a result of follow-up on our GWAS finding that HAA, a putative measure of subclinical ILD, is associated with genetic variation at rs7852363, a SNP that sits within the long non-coding RNA
FLJ35282 downstream of the
CDKN2A/
CDKN2B/
ANRIL locus. The latter has been strongly linked to cardiovascular disease [
13], diabetes [
15], and cancer [
16]. Evidence suggests that much of the phenotypic variation linked to 9p21 may be explained by genotypic variation in
ANRIL [
17], a long non-coding RNA (lncRNA) gene that might promote
cis-acting epigenetic silencing of the 9p21 region by promoting recruitment of polycomb repressive complexes [
18], leading to decreased expression of
CDKN2A and
CDKN2B. A recent study found that
CDKN2B is highly methylated in IPF fibroblasts, possibly contributing to increased cyclin-dependent kinase activity and fibroblast proliferation in IPF [
19].
Our findings of increased ANRIL expression and reduced CDKN2B expression in IPF lung support a possible role for disinhibition of cyclin-dependent kinases in IPF progression. The localization of ANRIL expression to the airway epithelium, however, suggests that if ANRIL plays a role in IPF pathogenesis, it might contribute to an abnormal small airway epithelial response to injury rather than excess fibroblast proliferation. We did not seek validation of rs7852363 in the Framingham Heart Study, since the phenotype demonstrating an association with this locus (basilar HAA peel-to-core ratio) was not available. Although we were unable to demonstrate an association between rs7852363 and clinical ILD case status, the small number of clinical cases, combined with our above findings, suggest that additional work in this area is needed.
We found that the
FOXP4 minor allele was present in 4.8% of ILD cases compared to 2.1% of controls, and
FOXP4 also demonstrated reduced expression in lung tissue from IPF cases compared to controls.
FOXP4 encodes a transcription factor (forkhead box P4) expressed in proximal and distal airway epithelium as well as in subepithelial mesenchyme during lung development [
20].
FOXP4 is also expressed in CD4+ and CD8+ T cells and appears to contribute to cytokine production and memory T cell responses [
21], and a recent study reported
FOXP4 as an important regulator of non-small cell lung cancer cells [
22]. The specific role of this locus in disease pathogenesis requires further elucidation.
In our replication analysis for percent HAA, only one SNP near the
GNPDA2 demonstrated nominal evidence of association in European ancestry participants from FHS. Given the number of SNPs considered in the replication analysis, nominally significant evidence of association does not pass correction for multiple comparisons, but perhaps offers a suggestion of association.
GNPDA2 catalyzes conversion of D-glucosamine-6-phosphate into D-fructose-6-phosphate, representing a possible relationship with catabolism of N-linked carbohydrates in glycoproteins, similar to
GYPC,
GFPT1 and
FUT10 also reported in our current GWAS of HAA in MESA. Glycosylation plays a key role in quality control of glycoproteins [
23]. In the lung, mucins are heavily glycosylated proteins produced in the epithelium and have been among the most widely replicated association for ILD at
MUC5B [
2,
4,
5]. Altered glycosylation of mucin and/or extracellular matrix proteins in the lung could impair remodeling responses to injury and promote interstitial lung disease [
24]. Our novel findings of associations between subclinical ILD and variation at loci responsible for glycosylation suggest that investigation of the role of protein glycosylation in early lung remodeling may provide insights into pathogenesis of ILD.
In analysis of 33 SNPs reported previous GWAS of IPF, we identified association of two
TOLLIP SNPs with percent HAA traits in MESA African Americans. Combined with the previous association of the same
TOLLIP SNPs with IPF [
3], our identification of TOLLIP SNPs associated with percent HAA suggests these SNPs play a role in subclinical disease progressing toward clinical development of pulmonary fibrosis.
TOLLIP (also known as Toll interacting protein) has been shown to regulate inflammatory cytokine production in response to interleukin-1 [
25], identifying inflammation as a possible mechanism underlying the role of
TOLLIP in development of IPF.
While we were unable to validate other SNPs that we identified in our GWAS, literature evidence provides some support for a role in ILD pathogenesis for three genes located in or near SNPs identified by our GWAS. Activated leukocyte cell adhesion molecule (CD166;
ALCAM) is an immunoglobulin-family receptor expressed on activated leukocytes [
26] and in some cancers. We found that
ALCAM is under-expressed in IPF lung. A recent exome sequence analysis of severe early onset COPD exhibited an increased burden of rare deleterious
ALCAM variants in cases versus controls [
27].
ALCAM may play a signaling role in extracellular matrix remodeling [
28] and is expressed in the pulmonary microvascular endothelium [
29].
DAAM1 encodes for disheveled association activator of morphogenesis 1, a protein involved in actin assembly [
30] and endothelial cell proliferation [
31].
DAAM1 is overexpressed in pulmonary arteries in idiopathic pulmonary arterial hypertension [
32].
STK38 encodes for serine-threonine kinase 38, an inhibitor of mitogen-activated protein (MAP) kinase 1 signaling [
33] has been linked to protection from nickel-induced acute lung injury [
34].
Limitations of the study include a modest sample size for GWAS, particularly among Chinese, such that reported associations should be interpreted with caution. However, the present study is the first and largest GWAS of subclinical ILD phenotypes to date. In addition, data available for replication was limited to European ancestry participants from the Framingham Heart Study who were assessed for percent HAA and basilar percent HAA, but not for basilar peel-core ratio. Accordingly, we could not meaningfully pursue replication of genetic loci identified in non-European ancestry groups from MESA. Further, while the MESA participants were genotyped for GWAS at ~900,000 SNPs using the Affymetrix 6.0 array which was designed to capture common variation across Europeans, East Asians and West Africans [
35], we expect that we would have had limited coverage of common variants with MAF less than 0.1 even after incorporating imputation to the 1000 Genomes reference panel. Therefore, despite the fact that the genetic loci reported from our GWAS were subject to strict and systematic filters on imputation quality and minor allele counts, we recognize that many of our reported loci reflect infrequent variants that should be viewed with caution, particularly in the absence of replication and lack of consistent effects across race/ethnic groups. Given that we have focused on a population-based cohort, our study may have reduced power to detect infrequent risk variants seen predominantly in ILD cases. We expect our use of quantitative measures to define subclinical ILD provided us with greater power to identify infrequent variants that confer protection against ILD. Taken together, the limitations of our study underscore the fact that the SNPs and genes implicated by our current work will require additional confirmation through replication and validation in larger sample sizes.
Acknowledgments
The authors thank the participants, investigators and study staff of MESA, FHS and the Columbia IPF Study.
Funding
MESA and the MESA SHARe project are conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with MESA investigators. Support for MESA is provided by contracts HHSN268201500003I, N01-HC-95159, N01-HC-95160, N01-HC-95161, N01-HC-95162, N01-HC-95163, N01-HC-95164, N01-HC-95165, N01-HC-95166, N01-HC-95167, N01-HC-95168, N01-HC-95169, UL1-TR-001079, UL1-TR-000040, and DK-063491. MESA Family is conducted and supported by the NHLBI in collaboration with MESA investigators. Support is provided by grants and contracts R01HL071051, R01HL071205, R01HL071250, R01HL071251, R01HL071258, R01HL071259, by the National Center for Research Resources, Grant UL1RR033176, and the National Center for Advancing Translational Sciences, Grant UL1TR000124. MESA Air is conducted and supported by the United States Environmental Protection Agency (EPA) in collaboration with MESA Air investigators, with support provided by grant RD831697. Funding for SHARe genotyping was provided by NHLBI Contract N02-HL-64278. Genotyping was performed at Affymetrix (Santa Clara, California, USA) and the Broad Institute of Harvard and MIT (Boston, Massachusetts, USA) using the Affymetrix Genome-Wide Human SNP Array 6.0. The MESA Lung and MESA COPD Studies are funded by NIH grants R01HL077612 and R01HL093081. The MESA Lung Fibrosis Study was funded by NIH R01HL103676, K24HL131937, and by a grant from the Pulmonary Fibrosis Foundation. The MESA Lung/SHARe Study was funded by NIH grant RC1HL100543. This study was supported by NIH K23HL086714, KL2 TR000081, UL1 TR000040 and R01HL131565.
The Framingham Heart Study was supported by the National Heart, Lung, and Blood Institute’s Framingham Heart Study (contract number N01-HC-25195) and its contract with Affymetrix, Inc., for genotyping services (contract number N02-HL-6-4278).