Introduction
Twin and family studies consistently indicate that the risk of breast cancer is influenced, in part, by hereditary factors, but high-risk mutations seem to account for only 1–2% of all breast cancer cases in the general population [
1]. A polygenic model has been proposed to account for the residual familial risk [
2], which anticipates small effects of several low-penetrance genetic risk variants in combination with environmental influence. Although no human data exist for the affect of germ-line polymorphisms on tumour outcome, there is strong evidence that strain background is a significant determinant of the clinical behaviour of experimental mammary carcinomas in mice [
3‐
5]. We thus set out to study the role of common variation in key breast cancer candidate genes in relation to breast cancer aetiology, survival and tumour characteristics.
The ataxia-telangiectasia mutated (
ATM; MIM 607585), checkpoint kinase 2 (
CHEK2; MIM 604373) and v-erb-b2 avian erythroblastic leukemia viral oncogene homolog 2 (
ERBB2; also named
HER2; MIM 164870) genes have been suggested to have an important role in breast cancer aetiology. The ATM protein is activated in response to ionizing radiation and triggers phosphorylation of CHEK2 and other proteins that promote cell cycle arrest and activation of DNA repair [
6‐
12]. The
ATM gene is mutated in the rare autosomal recessive disorder ataxia-telangiectasia (A-T) and the risk of breast cancer has been found to be increased in relatives of A-T patients [
13,
14], in addition to A-T heterozygotes [
15,
16]. Mutations in the
CHEK2 gene have been found in patients with Li-Fraumeni syndrome [
17] and one such mutation – the rare
1100delC gene mutation – has been found to increase breast cancer susceptibility at the population level [
18] and in families without
BRCA1 or
BRCA2 gene mutations [
19,
20]. ERBB2 is a transmembrane glycoprotein, with tyrosine kinase activity [
21‐
25], that has a major role in signal transduction, thereby affecting cell proliferation, differentiation, motility and survival [
26‐
28]. The
ERBB2 gene is amplified and/or overexpressed in approximately 30% of breast tumours, a phenomenon that is associated with a poor prognosis [
29‐
31].
Until now, common variation in the
ATM, CHEK2 and
ERBB2 genes has mainly been studied in relation to the overall risk of breast cancer, but the results have been inconclusive. It is worthwhile studying common variation in the
ATM,
CHEK2 and
ERBB2 genes in relation to breast cancer progression because defects in the genes could increase the risk of developing tumours with an unfavourable prognosis through their role in the regulation of cell cycle checkpoints and amplification. Variation in the genes could also affect cancer survival through increased radiosensitivity [
32‐
34]. One group found a relationship between poor breast cancer prognosis and common haplotypes in the
ERBB2 gene [
35], but to our knowledge, nothing has been reported regarding the association between common haplotypes in the
ATM and
CHEK2 genes and breast cancer survival or tumour characteristics. Hence, many questions regarding the role of these genes in breast cancer survival and the progression of breast cancers are unanswered.
We performed a haplotype analysis of the ATM, CHEK2 and ERBB2 genes by genotyping a dense set of markers in each gene in 92 randomly selected controls, thus acquiring a comprehensive coverage of the common variation in each entire gene. We genotyped selected haplotype-tagging SNPs (tagSNPs) in a well-defined, Swedish population. We then assessed the association of the tagSNPs in the ATM, CHEK2 and ERBB2 genes and their haplotypes with breast cancer survival and the risk of tumour-characteristic-defined breast cancer. We also studied the tagSNPs and haplotypes in the ATM and ERBB2 genes, in addition to two mis-sense mutations in the ATM gene (2572 T→C and 4258 C→T), in relation to the overall risk of breast cancer.
Materials and methods
Study population
The study base included all Swedish-born women between 50 and 74 years of age who were resident in Sweden between October 1993 and March 1995. During that period, we identified all breast cancer cases at diagnosis through the six regional cancer registries in Sweden, which provide virtually complete data on incident cancers in Sweden [
36]. We randomly selected controls, who matched the cases in 5-year age strata, from the Swedish registry of the total population. Of the eligible women, 3,345 (84%) breast cancer cases and 3,454 (82%) controls participated in this initial questionnaire-based study, providing detailed information on their use of menopausal hormone therapy, their reproductive history and other lifestyle factors. Results from the study have been published [
37‐
41].
From this initial study, we randomly selected 1,500 breast cancer cases and 1,500 age–frequency-matched controls among the postmenopausal participants without any previous malignancy (except carcinoma in situ of the cervix or nonmelanoma skin cancer). With the intention of increasing the statistical power in subgroup analyses, we further selected all of the remaining breast cancer cases and controls who had used menopausal hormones (oestrogen alone or any combination of oestrogen and progestin) for at least 4 years (191 cases and 108 controls) and all women with self-reported diabetes mellitus (110 cases and 104 controls). Additionally, 345 controls, who were included in both the initial breast cancer study and an endometrial cancer study with the same study base and inclusion criteria, were added to the control group. In total, we selected 1,801 breast cancer cases and 2,057 controls.
After informed consent was obtained, participants donated whole blood. For deceased breast cancer patients and women who declined to donate blood but consented to the use of tissue samples, we collected archived paraffin-embedded, noncancerous tissue samples. We acquired 70% of the requested tissue samples; the main reason for nonparticipation was unwillingness or a lack of time at the respective pathology department to provide the tissue blocks. In total, we obtained 1,321 blood samples and 275 archived tissue samples from the breast cancer patients and 1,524 blood samples from the controls. The mean time from diagnosis to the arrival of the blood and tissue samples at our department was 5 years. Reasons for nonparticipation included a lack of interest in research, a negative attitude towards genetic research, old age and severe disease or death. Population-based participation rates (taking into account the proportion of individuals who did not participate in the questionnaire-based study) for the breast cancer cases and controls were 75% and 61%, respectively.
This study was approved by the Institutional Review Boards in Sweden and the National University of Singapore.
DNA isolation
The Swegene laboratories in Malmö (Sweden) extracted DNA from 4 ml of whole blood using the QIAamp DNA Blood Maxi Kit (Qiagen, Solna, Sweden) according to the manufacturer's instructions. From nonmalignant cells in paraffin-embedded tissue, we extracted DNA using a standard phenol/chloroform/isoamyl alcohol protocol [
42]. We successfully isolated DNA from 1,318 blood samples and 272 tissue samples from the breast cancer patients and 1,518 blood samples from the controls.
SNP markers and genotyping
The ATM gene covers 146.3 kb of genomic sequence on chromosome 11, the CHEK2 gene spans 54.1 kb on chromosome 22 and the ERBB2 gene covers 33.7 kb on chromosome 17 (build 125 in the dbSNP (Single Nucleotide Polymorphism database). We selected SNPs in the ATM, CHEK2 and ERBB2 genes and their 10 kb flanking sequences from dbSNP (build 124) and Celera databases aiming for an initial marker density of at least one SNP per 5 kb. SNPs were genotyped using the Sequenom primer extension-based assay (San Diego, CA, USA) and the BeadArray system (Illumina, San Diego, CA, USA) according to the manufacturers' instructions. All genotyping results were generated and checked by laboratory staff unaware of case–control status. Only SNPs for which >85% of the samples gave a genotype call were analysed further. As a quality control, we genotyped 200 randomly selected SNPs (not including SNPs in the ATM, CHEK2 or ERBB2 genes) in the 92 control samples using both the Sequenom system and the BeadArray system. The genotype concordance was >99.5%, suggesting high genotyping accuracy.
Characterization of linkage disequilibrium and haplotype-tagging SNP selection
We produced linkage disequilibrium (LD) plots of the D' values for
ATM and
ERBB2 genes (supplementary Figures 1 and 2, respectively) using the Haploview program [
43]. We reconstructed haplotypes for all three genes using the partition-ligation-expectation-maximization (PLEM) algorithm [
44] implemented in the tagSNPs program [
45] and selected tagSNPs according to the
R
2 coefficient, which quantifies how well the tagSNP haplotypes predict the SNPs or the number of copies of haplotypes that an individual carries. We chose tagSNPs so that common SNP genotypes (minor allele frequency ≥0.03) and common haplotypes (frequency ≥0.03) were predicted by
R
2 values ≥ 0.8 [
46]. To evaluate the performance of our tagSNPs in capturing unobserved SNPs within the genes and to assess whether a denser set of markers was needed, we performed a SNP-dropping analysis [
47,
48]. In brief, each of the genotyped SNPs was dropped in turn and tagSNPs were selected from the remaining SNPs so that their haplotypes predicted the remaining SNPs with an
R
2 value of 0.85. We then estimated how well the tagSNP haplotypes of the remaining SNPs predicted the dropped SNP, an evaluation that can provide an unbiased and accurate estimate of tagSNP performance [
47,
48].
Tumour characteristics and follow-up
We retrieved information on the date and cause of death (until 31 December 2003) from the Swedish causes of death registry and the date of emigration from the Swedish national population registry. Information from the causes of death registry in Sweden has been found to be of high quality [
49]. The follow-up time began at the date of diagnosis and ended on 31 December 2003 or at the date of death or emigration, whichever came first. We collected information on tumour characteristics, such as tumour size, lymph-node involvement, grade (tumour differentiation), histological type and date of the first distant metastasis, from medical records. We obtained information on the oestrogen and progesterone receptor content and S-phase fraction (i.e. the proportion of tumour cells in the DNA synthesis phase of the cell cycle) of the tumours from seven laboratories around Sweden that routinely perform these tumour measurements for the whole country. At the time of the study, all seven laboratories used an enzyme immunoassay (Abbott Laboratories, Solna, Sweden) on cytosol samples for analysing the oestrogen and progesterone receptor content. This method was oestrogen receptor type α specific [
50]. The laboratories reported either quantitative measurements (fmol receptor per μg DNA or mg protein and the percentage of cells in S-phase) or categorical measurements (strongly positive, positive, weakly positive or negative for receptor status and high, intermediate or low S-phase fraction). A rather high proportion of this information was missing, which was owing to the fact that these measurements were not routinely performed in the mid-1990s. We classified the tumour characteristics as follows:
1. TNM stage:
1.a. Stage 1 – tumour size ≤20 mm and no regional lymph-node metastases.
1.b. Stage 2 – tumour size ≤20 mm and lymph-node metastases, or tumour size 20–≤50 mm, or tumour size >50 mm and no lymph-node metastases.
1.c. Stage 3 – an inflammatory breast tumour, or tumour size >50 mm and lymph-node metastases.
1.d. Stage 4 – distant metastasis within 90 days of diagnosis.
2. Lymph-node involvement:
2.a. Yes – at least one metastasised lymph node.
2.b. No – no metastasised lymph node.
3. Grade:
3.a. High differentiation.
3.b. Intermediate differentiation.
3.c. Low differentiation.
4. Oestrogen and progesterone receptor status:
4.a. Positive – ≥0.05 fmol/μg DNA, ≥10 fmol/mg protein or categorically strongly positive, weakly positive or positive.
4.b. Negative – <0.05 fmol/μg DNA, <10 fmol/mg protein or categorically negative.
5. S-phase fraction:
5.a. High – ≥9% or categorically high.
5.b. Low – <9% or categorically low.
We combined TNM stage 3 and TNM stage 4 in all association analyses because of small numbers.
Statistical analyses
In assessing the association with tumour-characteristic-defined breast cancer, we stratified the cases on breast cancer subtypes and compared each group with the controls. Our testing strategy was to fit a single model and assess within each stratum of risk-factor subgroup and for different tumour characteristics, haplotype-trait associations as a global likelihood ratio test [
51]. We first computed expected haplotype dosage using the tagSNPs program [
45], with haplotype frequencies estimated for the breast cancer cases and controls combined, assuming Hardy–Weinberg equilibrium (HWE) of haplotypes. We then, included the haplotype dosages as explanatory variables in the regression models. To estimate the power in the risk component of the study, we used a method described by Chapman and colleagues [
52], which assumes co-dominant effects at an unobserved locus. To calculate the power for log-additive effects in the survival component of the study, we used the Quanto program [
53] in a similar manner to that described by Manolio and colleagues [
54].
We applied unconditional logistic regression models adjusted for age (in 5-year age-groups) to assess the relationship between the
ATM,
CHEK2 and
ERBB2 tagSNP haplotypes and the overall risk of breast cancer, in addition to breast cancer subtypes. We estimated the hazard ratio of death owing to breast cancer in relation to the genes' tagSNP haplotypes using Cox proportional hazards models. The appropriateness of these approaches is supported by Stram and colleagues [
45]. That is, if
R
2 values are high, such as here, the point and interval estimates obtained by this approach will be approximately accurate. To assess the proportional hazards assumptions of the Cox models, we examined scaled Schoenfeld residuals and found no evidence against proportionality.
'Confounding' has been defined as the presence of a common cause of the exposure and outcome [
55]. We believe that lifestyle and reproductive breast cancer risk factors are unlikely to cause genetic variation in the genes, but they could be intermediates in the causal pathway between the genes and both breast cancer and tumour-characteristic-defined breast cancer. For completeness, we assessed whether the tagSNPs were associated with known breast cancer risk factors (age at menarche, age at menopause, body mass index, age at first birth, parity, menopausal hormone use and diabetes mellitus) among the randomly selected controls using Kruskal-Wallis and Chi square tests. Analyses were performed using the SAS system (release 9.1; SAS Institute Inc., Cary, NC, USA).
Discussion
Although the gene products of ATM, CHEK2 and ERBB2 are involved in various aspects of breast cancer development and progression, our results suggest that common variation in these genes does not affect survival, tumour-characteristic-defined risk or the overall risk of breast cancer. We carefully studied these associations, both overall and in subgroups of known nongenetic breast cancer risk factors, using large population-based case–control material, and conclude that on the population level, common genetic variation in these genes is not of great importance for these outcomes. This does not preclude the possibility that more crucial – and rare – variation is influential in selected patient groups.
Our study was population-based. All participants were born in Sweden between 1919 and 1944, a time at which foreign immigration to Sweden was still rare [
57], which means population stratification is of limited concern in our study. To minimize exposure misclassification, we applied genotyping methods with low error rates (the Sequenom and Illumina methods have genotyping error rates of 0.5% and 0.3%, respectively), DNA samples were randomly assigned to the genotyping plates and the genotyping personnel were blinded to case–control status. Furthermore, we replicated genotype calls of 200 randomly selected SNPs for a subset of samples using a separate genotyping method with >99.5% concordance.
The oestrogen and progesterone receptor status of tumours and S-phase fraction were assessed at seven different laboratories in Sweden, but it is doubtful that the genotype frequencies could be related to any interlaboratory differences. A large proportion of the information on receptor status, S-phase fraction and grade was missing. Assessment of receptor status and S-phase fraction was, to a large extent, dependent on the size of the tumour, but evaluation of the tumour grade was mostly dependent on the pathologist's decision. Because genotype frequencies were not related to tumour size, bias owing to the missing information on these factors seems unlikely.
Survival bias could be a concern in our study because nonparticipation was related to severe disease or death. However, we obtained the majority of the tissue samples requested for deceased patients with breast cancer or breast cancer cases who had declined donation of a blood sample. The lack of tissue accessibility is unlikely to be related to ATM, ERBB2 or CHEK2 genetic variation because it depended on the inability of the respective pathology department to retrieve the samples. The genotype frequencies of the tagSNPs in the ERBB2 and CHEK2 genes did not differ between blood and tissue samples, suggesting that the survival bias was negligible. We were not able to genotype TAG1, TAG2, TAG3, TAG6 and TAG7 in the ATM gene in the tissue samples, but because the results were not different in the analyses restricted to the most severe cases among those who donated blood samples indicates that this was not a major problem in our study.
The loss of power from testing SNPs indirectly is thought to be related to the
R
2 measure [
46,
58], but exceptions to these assumptions have been reported [
59]. We assessed the capability of the tagSNPs to convey an association signal from unobserved, in addition to observed, SNPs. We captured the unobserved SNPs with average
R
2 values of 0.92, 0.72 and 0.93 in the
ATM,
ERBB2 and
CHEK2, genes, respectively, and thus suffered minimal loss of power owing to indirect testing. We performed standard power calculations assuming α = 0.05, thus giving an indication regarding the general power of our study. We acknowledge, however, that the power is reduced in the subgroup analyses (an effect cannot be excluded in the smallest groups) and at lower α-levels (required when multiple tests are carried out). For the ability of haplotypes to predict the allele count at a causal locus with a minor allele frequency of 0.20, we had 89% power for
ATM, 73% power for
ERBB2 and 87% power for
CHEK2 to detect an odds ratio of 1.3 in the risk component of the study. To detect a hazard ratio of 1.4, with an α-level of 0.05, in the survival component of the study, we had 50% power for TAG1 in
CHEK2, which had a minor allele frequency of 0.13, and 76% power for TAG5 in
CHEK2, which had a minor allele frequency of 0.38.
Our data did not support an association between common variation in the
ATM,
CHEK2 and
ERBB2 genes and breast cancer survival or the risk of developing tumours of different characteristics. Hence, we did not confirm the finding of Han and colleagues who found that an
ERBB2 haplotype composed of two nonsynonymous tagSNPs – I655V and P1170A – increased the risk of breast cancer death or recurrence [
35]. We included the I655V as a tagSNP in our study and genotyped the P1170A in the 92 controls. We found no effect of the I655V on breast cancer survival or tumour characteristics defined breast cancer.
To our knowledge, no study has investigated
ATM or
CHEK2 common haplotypes in relation to breast cancer survival or tumour characteristics, although one study explored the effect of three common polymorphisms in
ATM and two polymorphisms in
CHEK2 on breast cancer survival [
60]. This study found no association, which is in agreement with our findings. The rare
1100delC mutation in the
CHEK2 gene has been associated with breast tumours of high grade [
61], in addition to steroid receptor-positive breast tumours, but not with overall survival [
62]. The mutation was too rare in our study population to be investigated in relation to breast cancer survival or tumour characteristics.
We previously found no association between common variation in the
CHEK2 gene and overall risk of breast cancer [
56], which was in agreement with earlier findings [
63]. In this study, we, correspondingly, found no effect of common variation in the
ATM or
ERBB2 genes on the risk of breast cancer, even when the results were stratified by known breast cancer risk factors. One study of the
ATM gene [
64] and two studies of the
ERBB2 gene [
35,
65] are in agreement with our findings. Tamimi and colleagues found no association between the haplotypes of five Hapmap tagSNPs (one of which was TAG7 in our study) in the
ATM gene and the risk of breast cancer [
64]. Benusiglio and colleagues explored
ERBB2 haplotypes composed of five tagSNPs [three of which were TAG2, TAG3 and TAG5 in our study] – including the nonsynonymous I655V and P1170A – in relation to the risk of breast cancer [
65], whereas Han and colleagues solely studied the I655V and P1170A as tagSNPs [
35]. Neither study found any effect of the haplotypes on the risk of breast cancer. Common haplotypes in the
ERBB2 gene thus do not seem to affect the risk of breast cancer, although results regarding the I655V common variant in the
ERBB2 gene have been conflicting [
35,
65‐
67]. We found no association of the I655V with the risk of breast cancer.
Three groups have found an association between specific
ATM common haplotypes and the risk of breast cancer [
68‐
70]. Lee and colleagues [
70] and Koren and colleagues [
68] reconstructed haplotypes in the
ATM gene from five and eight randomly selected common SNPs, respectively, whereas Angele and colleagues [
69] included 11 common SNPs in the
ATM gene for their haplotype estimation that had been either previously reported in the literature or detected by sequencing. SNP selection overlapped somewhat between the three studies, but none of them reported the probability of their SNPs being able to predict underlying variation in the gene. Furthermore, findings from two of the three groups [
68,
69] were derived from small sample sizes.
The 4258 C→T and 2527 T→C mutations in the
ATM gene have been detected in breast cancer patients [
69,
71‐
73], but to the best of our knowledge, these mutations have not been reported in A-T patients. In line with two [
69,
72] out of three reports [
69,
71,
72], we found an elevated – but not significant – risk of breast cancer in carriers of the rare 4258 T allele in the
ATM gene. We did not, however, find such an association in carriers of the 2527 C allele. One study found a twofold increase in the risk of breast cancer that was related to the 2527 T→C mutation in a population from USA, but did not confirm the finding in a Polish population [
73]. Three other groups did not detect any significant effect of the 2527 T→C mutation on the risk of breast cancer [
69,
71,
72], although one of the groups found elevated point estimates [
72].
Although the 4258 C→T and 2527 T→C mis-sense variants do not seem to target residues known to be crucial for the function of the ATM protein [
72], an increasing amount of evidence suggests that mis-sense variants in the
ATM gene cause chromosomal instability and abolish the radiation-induced kinase activity of ATM [
74]. Mutant ATM protein also seems to interfere with normal ATM function in a dominant-negative manner [
74]. Previous mutation screening studies have indicated that mis-sense mutations in the
ATM gene – rather than protein-truncating mutations – are over-represented in patients with breast cancer compared with the general population [
72,
75‐
79]. A recent publication refuted this and found that
ATM gene mutations that cause A-T – that is, truncating, splicing and mis-sense mutations – are breast cancer susceptibility alleles [
80]. They found a greater than twofold increase in the risk of breast cancer related to a combination of 12 mutations, including six truncating mutations [
80]. Thus, controversy remains both regarding which type of mutations in the
ATM gene are involved in breast cancer aetiology and which mutations actually drive the association with the risk of breast cancer [
69‐
73,
81‐
85].
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
KE wrote the article and performed the statistical analyses with the assistance of KH and YL. LUR and SW organized data and sample collection. CB oversaw the genotyping procedures and KE assisted on the genotyping. KE and YL selected the polymorphisms. SW and JL co-ordinated the study. KH, JP, KSC, ETL, PH, JL and SW contributed to the conception and design of the project and crucially revised the manuscript. All authors read and approved the final manuscript.