Introduction
Breast cancer, one of the most common malignancies among women worldwide, is a complex polygenic disease in which genetic factors play a significant role in the disease etiology [
1,
2]. So far, genome-wide association studies (GWASs) have reported over 40 common low-penetrance variants in 25 loci that are associated with the breast cancer risk reported in the National Human Genome Research Institute catalog [
3]. The most strongly and consistently associated single-nucleotide polymorphisms (SNPs) reside in intron 2 of the receptor tyrosine kinase
FGFR2 (rs2981582) at 10q26.13 and near the 5' end of the
TOX3 gene at 16q12.1 (rs3803662) [
4‐
9]. With the exception of three studies conducted among Asian women [
10‐
12], all other previously published GWASs have been conducted primarily in women of European descent. Several studies, including our study, have investigated previously identified loci in European populations in other ethnic groups and validated the initial findings [
13,
14]. However, newly discovered loci initially identified in women of European descent tend to be weakly associated with breast cancer in women of Asian descent [
10,
11] or could not be confirmed in Asians because of the difference in linkage disequilibrium (LD) patterns between ethnic populations, suggesting that additional genetic variants for Asian women remain to be discovered.
In this study, we conducted a three-stage GWAS to identify common breast cancer susceptibility loci and to validate the previously reported loci by using Affymetrix Genome-Wide Human SNP Array 6.0 (Affymetrix, Inc., Santa Clara, CA, USA) with 2,273 patients with breast cancer from the Seoul Breast Cancer Study (SeBCS) and 2,052 healthy controls from a large urban cohort, the Korea Genome Epidemiology Study (KoGES), as stage I. By analyzing data from two replication stages that consisted of 4,049 cases and 3,845 controls, we found strong evidence for a new genetic variant that may be associated with breast cancer risk among Asian women.
Discussion
In the present study, we conducted a three-stage GWAS in Korean women (6,322 cases and 5,897 controls). We not only confirmed previously identified loci in Europeans or Chinese populations or both but also found rs13393577 at 2q34/ERBB4 as a new breast cancer susceptibility variant in Korean women.
In the validation study, we evaluated whether 27 SNPs in the 20 GWAS-identified loci were also relevant in our population using stage I and identified that 10 SNPs at seven loci were significantly associated with breast cancer risk. As anticipated, the strongest and the most significant results were observed in rs2046210 at 6q25.1/
ESR1 and rs4784227 at 16q12.1/
TOX3, and these results are slightly similar to those of the magnitude and direction of previous reports conducted in a Chinese population [
10,
11]. For the SNPs rs2048671 at 7q32.3/
NR and rs10822013 at 10q21.2/
ZNF365, which were also identified in Asians, we recently reported significant associations with breast cancer risk through multi-stage GWAS with a cumulative sample size up to over 34,000 East Asian subjects (OR
per-allele = 1.10; 95% CI = 1.07 to 1.14;
Ptrend = 5.87 × 10
-9 and OR = 1.08; 95% CI = 1.04 to 1.11;
Ptrend = 6.21 × 10
-6) [
12]. However, the associations of these SNPs were not significant in this study, possibly because of its limited power.
Among the remaining 23 SNPs that were initially identified in Europeans, eight SNPs - rs10736303 (10q26.13/
FGFR2, proxy of rs2981579), rs3803662 (16q12.1/
TOX3), rs7716600 (5p12/
MRPS30), rs16886165 and rs889312 (5q11.2/
MAP3K1), rs3734805 (6q25.1/
ESR1), and rs1562430 (8q24.21) - showed significant associations in the same direction except for rs1092913 (5p15.2/
ROPN1L) with the G allele as the risk allele. The effect sizes of the confirmed variants were similar to or smaller than those of the initially identified ones. This phenomenon has been frequently observed in validation studies using ethnic populations different from the population used for the initial findings [
22,
23].
In addition, we could not evaluate the SNPs (rs1219648, rs2981572, rs2981585, and rs2981579) previously identified within intron 2 of
FGFR2 at 10q21.13, because they were not genotyped or successfully imputed (imputation QC r
2 < 0.3). Thus, we selected rs10736303 as the best tagging SNP capturing 10q26.13/
FGFR2 since it is in high LD with the reported SNPs with pairwise r
2 values of 0.67 for rs2981579 (r
2 = 0.48 in CHB+JTP; r
2 = 0.74 in CEU), 0.57 for rs2981575 (r
2 = 0.36 in CHB+JPT; r
2 = 0.72 in CEU), and 0.53 for rs1219648 (r
2 = 0.29 in CHB+JPT; r
2 = 0.72 in CEU) on the basis of our data. Furthermore, the rs10736393 is located at intron 2 of
FGFR2 within the sequences conserved across all placental mammals and suggested to be a functional variant to regulate
FGFR expression by generating a putative ER-binding site [
5]. In the present study, the rs10736393 G allele was significantly associated with increased breast cancer risk with an effect size of rs2981579 that was the same as in a previous report [
8]. However, the recently added SNP, rs10510102, located in the 300-kb telomeric region of intron 2 of
FGFR2 but not with a genome-wide significance level (
P = 1.6 × 10
-6), was not replicated in the present study [
24].
Subgroup analysis revealed that some of the validated associations differed by ER or PR status. Recent studies showed stronger associations with ER
+ than with ER
- tumors for several loci - rs13387042(2q35), rs4973768 (3p24), rs889312 (5q11.2/
MAP3K1), rs7716600 (5q12/
MRP30), rs13281615 (8q24), rs1219648 and rs2981582 (10q26.13/
FGFR2), and rs3803662 (16q12) - and with PR
+ than with PR
- tumors for rs2981582 (10q26.13/
FGFR2) [
6,
25,
26]. Among these loci, rs13387042 (2q35), rs7716600, and rs4415084 (5q12/
MRP30) showed significantly different associations by ER status, and rs4973768 (3p24) also showed a stronger association with ER
+ tumors, although the test for heterogeneity was not significant. The association of rs2380205 (10p15.1/
ANKRD16, FBXO18) with breast cancer risk was also stronger for ER
+ or PR
+ tumors than with negative tumors, and this heterogeneity in association remains to be evaluated in other populations.
The stronger association of rs2046210 at 6q25.1/
ESR1 with ER
- than with ER
+ tumors has been well documented [
10]. In the present study, we could also observe this heterogeneity for rs2046210 and its nearby SNP rs3784805, although the differences were not statistically significant. Direct replication in some of the loci showing significant differences in associations according to ER or PR status provides further support for the hypothesis that intrinsic subtypes of breast cancer should have different etiologic pathways; thus, the polygenic component of these subtypes of breast cancer should be different [
27].
There are several potential reasons for the failure of validation for previously identified loci in women of European descent. First, several risk variants could escape detection because of the limited statistical power caused by either low allele frequency or a very small effect size of the initial findings. There are several SNPs of which the allele frequencies in Koreans are substantially lower than in Europeans: SNP rs11249433 at 1p11.2/
NOTCH2, FCGR1B (4% versus 39%), rs1011970 at 9p21.3/
CDKN2A, CDKN2B (7% versus 17%), rs865686 at 9q31.2/
KLF4, RAD23B, ACTL7A (7% versus 24%), rs10995190 at 10q21.2/
ZNF365 (2% versus 15%), and rs10483813, proxy of rs999737, at 14q24.1/
RAD51L (3% versus 24%). Thus, we have only 8% to 30% of the statistical power to detect the reported effect sizes of 1.06 to 1.16 for these SNPs with the current sample size. We could not exclude the possibility that the effect size of the original reports could be represented as exaggerated ORs caused by 'winner's curse'. Second, a difference in underlying genomic structure between ethnicities could produce the bias to cover SNPs tagging the causal variants, although the reported SNPs could work effectively in women of European descent. Another possibility is that some of the variants evaluated may not be strongly associated with breast cancer risk in Asian women such as shown in the null association of rs2180341 (6q22.33). For rs3180341, we had a statistical power of 80% to detect an OR as small as 1.15; furthermore, the lack of an association has been shown in a study conducted in a Chinese population [
13]. Moreover, the risk profiles of genetic variants could be manifested differently in different ethnic populations, assuming that the relative contribution of the risk variants to carcinogenic pathways of breast cancer varies between different populations. Finally, the interactions of environmental exposures, lifestyle, or other effect modifiers and even the difference in breast cancer prevalence could have an effect on the penetrance of these alleles.
The
ERBB4, harboring rs13393577 in the first intron at 2q34, is a member of the epidermal growth factor (
EGF/ERBB) family of receptor tyrosine kinases, which are key activators of signaling pathways involved in cell division, migration, adhesion, differentiation, and apoptosis [
28]. It is reported that
ERBB4 is frequently overexpressed in breast cancer, and the expression of transcripts encoding the cleavable
ERBB4 isoforms was associated with ER expression and a high histological grade of differentiation [
29]. Rokavec and colleagues [
30] identified the presence of five germ-line variants in the
ERBB4 5'-untranslated region and reported that one of these variants (
ERBB4 -782T > G) was associated with breast cancer risk from the different promoter activity according to the different allele. However, rs13393577 is not in LD with
ERBB4 -782T > G; thus, the potential influence of rs13393577 is unlikely to be mediated through this previously reported variant.
We conducted an
in silico functional analysis to assess the potential biological function of rs13393577. The rs13393577 C allele had no predicted binding site, whereas several transcription factors were predicted to bind the rs13393577 T allele implementing six high-scoring binding sites (maximum score = 92.7 points; minimum score = 85.9 points) [
31]. In agreement with this, FASTSNP scored rs13393577 as 1-2 (intronic enhancer) [
32]. Additionally, Murabito and colleagues [
33] have shown that three SNPs in
ERBB4 (rs905883, rs7564590, and rs7558615) were associated with breast cancer risk in a family-based GWAS that included 58 breast cancer cases, although no association was attained with genome-wide significance level. Among these variants, rs7564590 is in moderate LD with rs13393577 (r
2 = 0.44 in CHB+JPT and r
2 = 0.25 in CEU) whereas the other two SNPs (rs905883 and rs755861515) are in very weak LD with rs13393577 (all r
2 < 0.02 in CHB+JPT and CEU). Thus, if both rs13393577 and rs7564590 are not themselves functional, they might be in high LD with the true causal variants. Additionally, we could not exclude the possibility that the strong association shown in rs13393577 is related to the function of the mir-548f-2 gene harboring SNP rs6956468, which is in tight LD with rs13393577.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
DK helped to conceive and design the experiments and to write the paper. B-GH helped to conceive and design the experiments. JHO, D-JK, MP, E-hK, and W-YP helped to perform the experiments. H-cK and J-YC helped to write the paper. Ji-YL and HS helped to write the paper, to manage the genotyping data, and to perform statistical analyses. Jong-YL coordinated the genetic study. YJK helped to manage the genotyping data and to perform statistical analyses. MJG helped to manage the genotyping data. J-YC and SKP helped to perform statistical analyses and to direct the studies that contributed data or biological collection of original studies. K-ML, YSC, HM, HMK, JP, D-YN, S-HA, K-YY, LL, MHL, S-WK, JWL, B-WP, WH, MKK, S-AL, KM, C-YS, P-EW, C-NH, J-WK, J-PL, S-YJ, and H-LK helped to direct the studies that contributed data or biological collection of original studies. All authors read and approved the final manuscript.