Background
Chronic obstructive pulmonary disease (COPD) is a major cause of morbidity and mortality in the world and encompasses emphysema, chronic bronchitis, and small airways disease [
1,
2]. The diagnosis of COPD is largely based on the presence of airflow obstruction, measured by the spirometric assessment (post-bronchodilator) of the ratio between forced expiratory volume in one second and forced vital capacity (FEV
1/FVC). The Global initiative for chronic Obstructive Lung Disease (GOLD) recommends to use a fixed cut-off for defining airflow obstruction, namely an FEV
1/FVC ratio below 70% [
3], whereas the American Thoracic Society/European Respiratory Society (ATS/ERS) guidelines recommend to define airflow obstruction as FEV
1/FVC below the lower limit of normal (LLN) [
4]. The LLN is a reference value based on sex, age, height and ethnicity and is calculated as the lower fifth percentile of a healthy reference population [
5]. There is a considerable controversy about which definition should be used in research and clinical practice, since both may lead to misclassifications [
5‐
8]. This has important implications, since misclassifications may lead to inappropriate medication and therapies [
9,
10].
It is generally accepted that both genetic susceptibility and environmental factors contribute to airflow obstruction. Genetic variants associated with airflow obstruction have been identified by several genome-wide association studies (GWAS), but different definitions of airflow obstruction and populations were used. [
11‐
16] As an illustration, the case-control study including only smokers with > 2.5 pack-years by Pillai et al. used the fixed ratio (FEV
1/FVC < 70%) to define airflow obstruction, while the population based study including both ever- and never-smokers by Wilk et al. used the lower limit of normal (LLN) [
15,
16]. Only few regions were identified in both studies, namely the
CHRNA5/3 and
HHIP regions. We therefore aimed to assess the genetic overlap between the two definitions of airflow obstruction in the same individuals. We stratified by smoking status to assess the overlap between the two airflow obstruction definitions in never- and ever-smokers separately. We used the Lifelines Cohort Study as discovery sample and the Vlagtwedde-Vlaardingen study to replicate our observations. In addition, genetic loci associated with both airflow obstruction definitions could indicate robust genetic associations with airflow obstruction, which could potentially be novel loci. We therefore, as a secondary aim, validated the top overlapping single-nucleotide polymorphisms (SNPs) between the two airflow obstruction discovery analyses in an independent SNP validation sample and assessed if they were acting as expression quantitative trait loci (eQTLs) in a lung tissue sample.
Discussion
We investigated the genetic overlap between GWASes using two airflow obstruction definitions in the same population (FEV
1/FVC < 70 or < LLN). We expected a reasonable overlap in associated SNPs between the two definitions, since 96% of the never-smokers and 93% of the ever-smokers were classified the same way in the discovery sample LifeLines. Surprisingly, only a very small proportion (4% and 6%) of SNPs was overlapping at
p < 10
− 4 (see Fig.
1). Even with different significance thresholds the overlap was limited (26% and 29% at
p < 0.05) (see Table
2). The same observation was made in the replication sample, the Vlagtwedde-Vlaardingen study. In this cohort, 94% and 90% of the never- and ever-smokers, respectively, were classified concordantly, but at p < 0.05 only 24% or 25% of the SNPs were overlapping. In addition, the effect estimates for the two airflow obstruction definitions correlated strongly in both cohorts but the
p-values showed more variation and correlated only moderately resulting in different top-hits depending on the obstruction definition (see Fig.
2). Thus, the chosen strategy and definition of airflow obstruction had a substantial influence on the GWAS results. This implies that in a discovery-replication design with a predetermined selection p-value, different genetic variants would be followed-up depending on the definition used. In addition, there was no correlation between the p-values nor between the ORs of never- and ever-smokers in both cohorts. None of the selected SNPs overlapped between never- and ever-smokers at p < 10
− 4, and at
p < 0.05 the overlap was only 3% in LifeLines (discovery sample) and 2% in Vlagtwedde-Vlaardingen (replication sample, see Table
2). The current study therefore also highlights the importance of stratifying the analysis according to smoking status.
The difference between results from the two definitions might be explained by the fact that obstructive airway diseases are heterogeneous diseases with multiple phenotypes, symptoms and comorbidities. It might thus be beneficial for future GWA studies to focus more on specific COPD subtypes rather than on a broad definition of airflow obstruction or COPD that can be caused by multiple underlying physiologic and genetic mechanisms. In previous GWA studies, in mainly smokers, on classical COPD phenotypes like emphysema and chronic bronchitis, the well-known general COPD genes (
HHIP,
CHRNA and
FAM13A) were consistently identified [
27‐
32]. Perhaps, to identify specific genetic pathways underlying specific COPD phenotypes we should not study the classical COPD phenotypes, but rather clinical COPD subtypes based on symptoms, comorbidities or pathology.
The
CHRNA5/3 and
HHIP regions were overlapping between six previous GWA studies on airflow obstruction, using different airflow obstruction definitions and populations [
11‐
16]. In the current study, two of the identified SNPs in ever-smokers were located in the
CHRNA5 and
HHIP regions as well, pointing towards a robust genetic association of these regions with airflow obstruction and COPD (see Additional file
1: Table S6). Likewise, most of previously identified regions associated with airflow obstruction or COPD were nominal significant (
p < 0.05) in the current study (see Additional file
1: Table S12). Out of the 22 loci identified by the study of Hobbs et al, SNPs in 18 loci were associated with at least one of the airflow definitions at a nominal significance (10 SNPs in never-smokers and 12 SNPs in ever smokers) [
14]. In never-smokers, 6 of the 10 SNPs were significantly associated with both definitions and in ever-smokers 7 of the 12 SNPs were significantly associated with both definitions. Some SNPs were significant in both never- and ever-smokers (e.g.
HHIP, PID1 and
THSD4), while others were either only significant in never-smokers (e.g.
FAM13A, DSP and
RIN3) or in ever-smokers (e.g.
CHRNA5, TET2 and
ADGRG6). In addition, many of the loci previously associated with lung function outcomes (FEV
1, FVC, and FEV
1/FVC) were also nominal significant (p < 0.05) in the current study (see Additional file
1: Table S13). Specifically, of the loci reported by Wain et al., 23 out of 28 loci for FEV
1, 10 out of 17 loci for FVC and 38 out of 51 loci for FEV
1/FVC were associated with at least one of the airflow definitions at a nominal significance [
33]. Lastly, we also checked if the top overlapping SNPs were associated with lung function outcomes in our previous GWA studies on FEV
1, FEV
1/FVC and FEF
25–75 [
34,
35]. A SNP annotated to
HHIP was associated with FEV
1/FVC and FEF
25–75 in both never- and ever-smokers (results were replicated) and the
CHRNA5/3 region was only associated with FEV
1/FVC in ever-smokers. The
NFYC and
FABP7 regions were associated with FEV
1/FVC (
p = 4.40 × 10
− 4 and
p = 1.87 × 10
− 4) in never-smokers, and the
FABP7 SNP was also associated with FEF
25–75 levels (
p = 0.026). Interestingly, the
NFYC region was also overlapping between the current study and the study by Pillai et al. We identified multiple SNPs annotated to
NFYC, whereas Pillai et al. identified a SNP (rs3767943) in the gene
KCNQ4, which is located on the right side (3′) of
NFYC [
15]. The
NFYC region might therefore be an interesting region to further study the underlying mechanisms of its association with airflow obstruction.
A SNP in the intron of
NFYC and a SNP in
FABP7 were the two overlapping SNPs between the airflow obstruction definitions at
p < 10
− 4 in never-smokers and showed the same direction of effect in the five independent cohorts. The minor allele of the SNP in
NFYC (rs7519348) was associated with a higher risk of airflow obstruction. This gene is a highly conserved transcription factor that is predicted by GeneGlobe to bind promoter regions of 218 genes (see Additional file
1: Table S14) including genes previously associated with lung related outcomes, like
ADORA2B, AKAP9, CD163, ELMOD2, HLA-DPB1, ITPR2, KLF10 and
SERPINA6 [
27,
36‐
42]. In more detail,
HLA-DPB1 is a known COPD gene related to disease severity,
SERPINA6 was associated with emphysema, a deletion in
ADORA2B was shown to be associated with a decrease in lung fibrosis and pulmonary hypertension, and
ELMOD2 is a candidate gene for familial idiopathic pulmonary fibrosis [
27,
36,
39,
40]. The identified SNP was not associated with expression levels of
NFYC in lung tissue, but was an eQTL for a probeset annotated to
NFYC-AS1. The function of this specific antisense-RNA, which are generally thought to have a regulatory role, is still unknown.
The minor allele of the SNP in
FABP7 (rs6913003) was also associated with a higher risk of airflow obstruction in never-smokers. This SNP was not associated with the expression of
FABP7 or other genes in lung tissue.
FABP7 is an intracellular lipid-binding protein, involved in long-chain fatty acids transport and cell proliferation [
43]. It may be involved in abnormal pulmonary development, since lower expression of
FABP7 was found in patients with congenital cystic adenomatoid malformation [
44]. In addition, higher expression of
FABP7 was seen in clear cell renal cell carcinoma and the authors suggested that the gene activates the
ERK and
STAT3 signalling pathways [
45].
STAT3 was implicated to play a role in pulmonary inflammation and thus
FABP7 might indirectly be involved in airflow obstruction [
46].
We were aware of the risk for spurious findings due to the low power of our study and thus we validated our top overlapping SNPs in 4 independent validation cohorts. We furthermore investigated the effect of low power on the overlap between the two definitions by increasing our dataset 2 and 4 times. We found that the percentage of overlap increases when the sample size increases, but still the number of SNPs that do not overlap remains high, i.e. 73.4% when the sample size increased 4-fold. So even when the study power is greatly increased, different SNPs will be found depending on the airflow obstruction definition tested. We also performed a simulation study by 10 times randomly allocating airflow obstruction cases and based on this simulation, we have to conclude that the differences and overlap we found could be chance findings, but that is why we validated the overlapping SNPs in 4 independent validation cohorts.
We only assessed a modest number of SNPs (
n = 227,981 SNPs) compared to previous large GWAS studies (
n > 1 million SNPs), since we only included genotyped SNPs to prevent any bias by imputation. The disadvantage of this approach is that we may have a lower genomic coverage. Another limitation of the current study is the use of pre-bronchodilator measurements to define airflow obstruction, which preferably should be based on post-bronchodilator measurements. Especially subjects with asthma could be misclassified as having airflow obstruction, but the results of the overlapping SNPs did not change in a sensitivity analysis excluding asthmatics or adjusting for asthma (see Additional file
1: Table S15). Moreover, only a low number of never-smoking subjects had an FEV
1/FVC < LLN in the three Rotterdam Study cohorts, but nevertheless results were replicated in these never-smokers. Finally, the “FEV
1/FVC < 70%” model was adjusted for sex, age and height, but the “FEV
1/FVC < LLN” model was not adjusted for these variables, since they are included in the LLN calculation. If we do however adjust the “FEV
1/FVC < LLN” model for these variables, the results do not change. The top SNPs are the same and the correlation between
p-values for the LLN models adjusted and not adjusted is 0.98. In addition, the reported correlation in never-smokers between the two definitions was 0.48 for p-values and 0.78 for OR. If we use the LLN adjusted model the correlation is 0.48 and 0.79, respectively. This confirms that we used appropriate models to assess the genetic overlap between the two airflow definitions.
Acknowledgments
We thank Rob Bieringa, Joost Keers, René Oostergo, Rosalie Visser, and Jan Schouten for their work related to data collection and validation in the LifeLines cohort study and the Vlagtwedde-Vlaardingen study. The authors would like to thank the staff at the Respiratory Health Network Tissue Bank of the FRQS for their valuable assistance with the lung eQTL dataset at Laval University. We are grateful to the study participants and all staff involved in the LifeLines cohort study, Vlagtwedde-Vlaardingen Study, Rotterdam Study and lung eQTL database. We would like to thank Anis Abuseiris, Karol Estrada, Dr. Tobias A. Knoch, and Rob de Graaf as well as their institutions Biophysical Genomics, Erasmus MC Rotterdam, The Netherlands, and especially the national German MediGRID and Services@MediGRID part of the German D-Grid, both funded by the German Bundesministerium fuer Forschung und Technology under grants #01 AK 803 A-H and # 01 IG 07015 G for access to their grid resources.