Background
Chronic obstructive pulmonary disease (COPD), characterised by a persistent airflow obstruction [
1], is a life-threatening disease accounting for 6 % of all deaths globally in 2012 [
2]. The development of the disease is influenced by environmental determinants, most commonly cigarette smoking, genetic risk factors and possible genetic protective factors [
3]. Candidate gene association studies have suggested several potential COPD susceptibility genes, and genome-wide association studies (GWAS) have identified multiple COPD susceptibility loci [
4]. However, genetic mapping in families with high penetrance for a disease gene variant can be helpful in pinpointing new susceptibility genes even for multifactorial traits. Recently, we reported mutations in the gene for fibroblast growth factor 10 (
FGF10) involved in lung development, as a possible cause of COPD in families from Sweden [
5]. Hence, a monogenic form of COPD could result from mutations in
FGF10. To date, the only other known monogenic form of COPD is alpha 1-antitrypsin deficiency caused by disruption of the alpha-1-antiproteinase (
SERPINA1) gene [
6].
Typically in GWAS, common polymorphisms are tested for association. In this study, we provide an alternative approach with the aim to perform an in-depth analysis of exons of candidate genes for COPD by using high-throughput sequencing. This allowed us to detect the full spectrum of single nucleotide variation at any frequency in selected genomic regions and to also capture variants with a potential functional effect on gene expression levels. We show here that targeted high throughput sequencing using a well-defined population-based case–control sample can i) assess the impact of common variants in genes important for lung development, and ii) test genetic variants in a large set of candidate genes and genomic regions for association with COPD. To accomplish this we captured and sequenced 22 genes implicated in lung development as well as 61 genes and 10 genomic regions previously associated with COPD. The sample used here is comprised of cases and controls from The Obstructive Lung Disease in Northern Sweden (OLIN) studies. The population in northern Sweden, an admixture of three different ethnic groups (Swedes, Finns and Saami), showed a dramatic growth of population size since the 18
th century from a relatively small founder population [
7]. This resulted in founder effects that significantly reduced the heterogeneity of this population, making it suitable for genetic association studies of multifactorial phenotypes, such as COPD [
8].
This study assessed Swedish COPD cases and controls and assessed detected variants in candidate genes for association with COPD. We replicated a previous described association signal in CHRNA3, which also associated with lower CHRNA5 gene expression. The DNA capture design and targeted sequencing used here show potential to detect known single nucleotide variants in association with COPD with the additional potential to also detect low-frequent variants. The result presented here using the relative limited sample size could be replicated using our targeted capture design in larger samples from different populations.
Discussion and conclusion
Genetic variants influencing lung function in children and adults may ultimately lead to the development of COPD [
23]. Since limited disease-specific therapy for COPD is available, an improved knowledge of genetic variants modulating the pathogenic mechanisms underlying COPD is greatly needed. We aimed here to identify genetic variants within, or close to, the coding regions of genes and loci previously associated with COPD, or in genes involved in lung development. We opted for a qualitative rather than a quantitative approach with the selection of cases with moderate or severe COPD and progressive decline in lung function. Furthermore, controls were all smokers without COPD that, in our study design, can aid the identification of potential protective genetic variants and aid detection of genetic variants associated with severe COPD. When applying a Bonferroni correction for the total number of variants detected, no variants showed statistically significant association. We did, however, identify several variants with a likely biological significance, as indicated by high effect sizes (odds ratio), that we believe warrants further investigation in a larger sample. Furthermore, potential functional effects of variants were investigated using data from a large number of lung samples and we describe here a COPD lung eQTL.
When comparing our association data with the lung eQTL data (discovery data set from Laval University), we could identify a variant associated with COPD that was also associated with level of gene expression (Fig.
2). This variant, synonymous variant (rs8040868) in
CHRNA3 on chromosome 15, confers a risk for the development of COPD in both our OLIN discovery sample with moderate or severe COPD and our OLIN replication sample including all available COPD cases and controls in OLIN (OR 1.4,
p = 0.003). In the lung eQTL data, we could see a correlation of the C allele of rs8040868 with lower expression levels of
CHRNA5 (Fig.
2), and, to a lesser extent, also
CHRNA3 and
PSMA4, which are located in close proximity to
CHRNA5. The α-nicotinic receptor (
CHRNA3/5) gene locus on chromosome 15q25.1 is associated with COPD, lung cancer and peripheral arterial disease, as well as other smoking related conditions [
24,
25] and nicotine addiction [
26,
27]. Recently, the
CHRNA3/5 locus was implicated in all-cause mortality among smokers in a Finnish cohort [
2]. The rs8040868-C allele associates with both reduced pulmonary function and lung cancer [
24,
25,
28,
29] and affects DNA-methylation and transcription of
CHRNA5 [
30]. Furthermore, rs8040868 is also in LD with a nearby variant (rs16969968) previously reported to be associated with expression levels of
CHRNA5 in the lung [
21]. The direction of effect is the same for both SNPs, with the minor alleles associated with reduced expression of
CHRNA5. Also recently, rs16969968 was found to be the most significantly associated variant in an exome array analysis in a study including more than 6,100 COPD cases and 6,000 control subjects across five cohorts [
31].
Several genetic variants showed association with COPD in our population, but did not correlate with gene expression levels in the lung, including previously identified variants in the genes glutathione S-transferase, c-terminal domain containing (
GSTCD), surfactant protein D (
SFTPD) and matrix metalloproteinase-12 (
MMP12) [
32‐
36]. We identified a haplotype consisting of three risk-conferring variants, rs72671840, rs72671858 and rs11728716 (G-T-A haplotype), at the
GSTCD gene locus on chromosome 4q24. The variant rs11728716 has previously been associated with lung function [
32‐
34] and is likely to affect the transcription of
GSTCD. We show here that rs11728716 was associated with severe COPD using the OLIN replication sample. Although intriguing, due to the limited number of severe COPD cases used in this study, this result needs further verification in a larger sample. The other two variants (rs72671840 and rs72671858) are of unknown function [
37].
GSTCD encodes a glutathione S-transferase C-terminal domain protein involved in detoxification by catalysing conjugation of glutathione to products of oxidative stress. We found association between COPD and rs6413520, a synonymous variant, p.(Ser45=), within
SFTPD on chromosome 10q22.3
. This variant conferred a high risk (OR = 8.2) for COPD in our study and has previously been reported to be associated with COPD susceptibility [
36].
SFTPD encodes surfactant protein D, of importance for the regulation of oxidant production, inflammatory responses, and apoptotic cell clearance in the lung [
38]. We also identified rs632009, in the
MMP12 gene on chromosome 11q22.3, to confer moderate risk. Matrix metalloproteinases (MMPs) are involved in both tissue remodelling and repair and several members of the MMP family have been implicated in COPD pathology [
35,
39,
40].
In this study, we also found association (uncorrected) with novel susceptibility variants. Several variants in the G-protein-coupled receptor 126 (
GPR126) gene on chromosome 6q24.1 have previously been associated with FEV
1 / FVC ratio [
32]. GPR126 belongs to a superfamily of G protein-coupled receptors and is involved in cell signalling and adhesion. Studies in mice show an induction of Gpr126 expression between embryonic day 7 and 11 with expression in the developing heart and face as well as a high expression in the adult lung [
41]. We found significant association between a synonymous variant in the
GPR126 gene (rs2143390, p.(Asp373=)) and COPD. The alternative T allele is highly overrepresented in cases compared to controls (
p = 4.5 × 10
−3, OR = 7.9).
We also focused our attention to the chromosome 4q31 locus upstream of
HHIP, previously shown to be associated with expression of the gene [
20,
42]. The
HHIP upstream region belongs to one of the so far strongest COPD association signals [
43], but no association could be seen in our case–control groups for any upstream variants.
The sequencing approach allowed us to detect rare alleles in both cases and controls. We therefore performed gene burden tests to find evidence of overrepresented rare or common variants in individual genes or transcripts in the cases or controls, respectively. Interestingly, we found that the genes ADAM19, WNT2, CHRNA5, NOS3 and PTCH1 all contain rare variants (MAF < 5 %) uniquely found in cases of the OLIN discovery sample. These variants, and especially the coding variants with predicted functional effect, could be followed up in a larger case–control sample for verification and further genetic and functional analysis.
We assessed 83 genes and 10 genomic regions of 1.5 kb size for variants associated with COPD in a sample from Northern Sweden. Still, one limitation of our study is that the targeted capture design may exclude yet unknown genomic regions that can harbour genetic variation influencing COPD. Also, the two novel variants detected after sequencing were monomorphic and an assessment of the false discovery rate using HaloPlex with subsequent Illumina sequencing would be helpful in order to evaluate our set of candidate genes as a gene panel for COPD. Furthermore, we cannot rule out that some findings are influenced by population substructure and replication of our result in different populations is essential. It is also possible that some risk variants were not identified due to the limited number of cases and controls used for sequencing. Using a conservative Bonferroni correction based on the 1588 variants detected resulted in no variants reached significant association with COPD. However, we believe there is no definite consensus regarding the type of multiple testing procedures to use in targeted sequencing based approaches. Furthermore, many parameters such as variant quality checks, genotyping success rate and sequencing depth limit will influence the number of variants found, and consequently, multiple testing adjustments. Also, in addition to include genes including variants previously associated with COPD or asthma, we explored if a set of genes involved in lung development would harbour variants in association with COPD in the Swedish discovery sample. Therefore, as the study is exploratory with a mixed hypothesis the p values for association testing in this study are not corrected for multiple testing.
Despite the limited size of the discovery sample used here, we identified several high-risk genetic variants for COPD and we replicated several previous GWAS results. In particular, our results support the CHRNA5 gene as a likely candidate gene for COPD where the rs8040868-C allele confers a risk for the disease in the Swedish population. Furthermore, we indicate the advantage of using less heterogeneous populations in the studies of complex disorders.