Introduction
Excluding non-melanoma skin cancer, breast cancer is the most common cancer in the UK, with 36,939 new cases diagnosed in 2004 [
1]. The prognosis of breast cancer is generally good, with an overall 5-year survival rate of approximately 80% in England and Wales [
2]. Clinical stage at diagnosis, including tumour size, lymph node status, and presence of metastases, along with tumour biological factors such as histological grade and type are the most important determinants of prognosis [
3].
Cyclins and their regulators, which are involved in cell cycle control, are important as potential oncogenes or tumour suppressor genes in breast cancer [
4]. The cell cycle consists of a series of well-controlled events that drive DNA replication and cell division. These events are divided into specific phases: preparation for DNA synthesis (G
1), DNA synthesis (S), a gap phase (G
2), and mitosis (M). Transition between these phases requires tight control; the G
1/S phase transition, in particular, includes many cell cycle events that are altered in breast cancer [
5]. Somatic alterations in these genes have been shown to correlate with breast cancer prognosis and survival [
6‐
13], but few studies have examined the effects of inherited genetic variation in cell cycle genes. The
a870
g polymorphism of the
CCND1 gene (rs603965) has been shown to be associated with breast cancer survival in a large Chinese population-based study [
14] and in a small population of patients with metastatic breast cancer [
15]. The V109G polymorphism of the p27 gene
CDKN1B (rs2066827), examined by polymerase chain reaction analysis of tumour specimens, was associated with shortened disease-free survival in a subset of patients with infiltrating metastasis-free breast cancer [
16].
These previous studies, however, were only of selected single nucleotide polymorphisms (SNPs), and the genes involved in the G1 phase of cell cycle control have not been systematically evaluated. The purpose of this study was to assess whether common germline genetic variation in these genes is associated with breast cancer survival by using a comprehensive SNP tagging approach to efficiently capture the common variation. Thirteen genes involved in the G1 phase of the cell cycle have been investigated in this study, including those that encode for the cyclin family that regulate cyclin-dependent kinases (CCND1, CCND2, CCND3, and CCNE1); cyclin-dependent kinases, which are necessary for the G1/S transition (CDK2 [p33], CDK4, and CDK6); and cyclin-dependent kinase inhibitors (CDKN1A [p21, Cip1], CDKN1B [p27, Kip1], CDKN2A [p16], CDKN2B [p15], CDKN2C [p18], and CDKN2D [p19]).
Materials and methods
Study population
Cases were selected from the Studies of Epidemiology and Risk factors in Cancer Heredity (SEARCH) breast cancer study, an ongoing population study of women diagnosed with breast cancer in the region of England included in the Eastern Cancer Registration and Information Centre (ECRIC) (formerly the East Anglian Cancer Registry). Eligible participants include women diagnosed with invasive breast cancer who were either under 70 years of age since the beginning of the study on 1 July 1996 (incident cases) or age 55 or younger since 1 January 1991 and who were alive at the start of the study (prevalent cases). Due to boundary changes, some prevalent cases diagnosed before 1995 were identified by the North Thames Cancer Registry.
Of those eligible, 67% returned a comprehensive epidemiological questionnaire and 64% returned a blood sample for genotyping. All participants in the study provided informed consent, and the study was approved by the Eastern Multicentre Research Ethics Committee. DNA is available from 4,470 cases for genotyping; 27% of these participants are prevalent cases.
The samples have been split into two sets in order to save DNA and reduce genotyping costs. Cases with high genomic yield were randomly selected from the first 3,500 recruited to comprise set 1 (n = 2,270), with set 2 comprising the remainder of these plus the next 970 incident cases recruited (n = 2,200). DNA yield was not associated with genotype for those cases included in set 1 or set 2. SNPs showing a positive association with survival after a diagnosis of breast cancer (
P trend < 0.05) were genotyped in set 2. Data from both sets were then combined (n = 4,470) to jointly analyze the SNPs with positive associations. This joint analysis approach results in increased power to detect genetic association despite more stringent significance levels with Bonferroni correction [
17].
As the prevalent cases were the first recruited, the proportion of prevalent cases was somewhat higher in set 1 than set 2 (33% versus 20%). In total, 1,370 prevalent cases were included in both sets; median time from diagnosis to blood draw was 3.4 years (range: 0.8 to 9.34 years). Median age at diagnosis was similar in the two sets (50 and 53 years old, respectively). Median time from diagnosis to blood draw was slightly longer for set 2 (18 months) than for set 1 (9 months), but the number of deaths in each set was similar (359 in set 1 and 278 in set 2). There was no significant difference in the morphology, histopathological grade, or TNM (tumour, node, metastasis) stage [
18] of the cases by set or by prevalent/incident status.
Participant follow-up
The ECRIC and the North Thames Cancer Registry have active follow-up at years 3 and 5 after diagnosis and then at 5-year intervals. Follow-up information and all-cause mortality are obtained by searching hospital information systems for recent visits. If a patient has not had a recent visit, the patient's general practitioner is contacted to obtain the vital status. Death certificate flagging through the Office of National Statistics also provides the registries with notification of deaths. The lag times with this process are a few weeks for cancer deaths and 2 months to a year for non-cancer deaths. Cause-specific mortality was obtained from part I of the death certificate.
Gene/single nucleotide polymorphism selection
Thirteen genes involved in the G
1 phase of cell cycle regulation were selected as candidate genes for breast cancer survival. A comprehensive SNP tagging approach was used in which tagging SNPs (tagSNPs) were chosen to capture all known common genetic variation in each gene with an estimated correlation coefficient (
r2) of greater than 0.8. In some cases, SNPs that were poorly correlated with other single SNPs could be efficiently tagged with a haplotype defined by multiple SNPs. Correlation between these SNPs and the haplotype of tagSNPs (
r2s) was aimed to be greater than 0.8. TagSNPs were identified with the program Tagger [
19]. Data from the International HapMap Project [
20] and resequencing data from the National Institute of Environmental Health Sciences (NIEHS) Environmental Genome Project [
21] were used to select tagSNPs. In total, 85 tagSNPs were chosen.
Genotyping
Genotyping was carried out using the TaqMan® platform (Applied Biosystems, Foster City, CA, USA) according to the manufacturer's instructions. Primers and FAM- and VIC-labeled probes were supplied directly by Applied Biosystems as Assays-by-Design™. All assays were carried out in 384-well plates. Each plate included negative controls (with no DNA) and positive controls duplicated on a separate quality control plate. Plates were read on the ABI Prism 7900 using Sequence Detection Software (Applied Biosystems). Failed genotypes were not repeated. Assays in which the genotypes of duplicate samples did not show greater than 95% concordance were discarded and replaced with alternative assays with the same tagging properties. Call rates for each assay were over 95%.
Statistical methods
Cox regression analysis was performed to determine the effect of each tagSNP on survival. The proportional hazards assumption was evaluated by visual inspection of log-log plots as well as tested analytically using Schoenfeld residuals. TagSNPs significantly associated with survival were re-evaluated in a model adjusted for known breast cancer prognostic factors, which included age at diagnosis (<40, 40 to 49, 50 to 59, or >60 years), clinical stage (TNM stage 1, 2, 3, or 4), histopathological grade (well differentiated, moderately differentiated, or poorly differentiated), estrogen receptor (ER) status, and treatment.
Time at risk began on the date of blood sample receipt and ended on the date of death from any cause or, if death did not occur, on 30 November 2006. This allows for the difference in ascertainment of incident and prevalent cases and provides an unbiased estimate of the relative hazard provided that the proportional hazards assumption is correct. Follow-up was censored at 10 years after diagnosis as follow-up became less reliable for each individual after 10 years. A hazard ratio (HR) was estimated for heterozygous and rare homozygous genotypes relative to the common genotype. Primary tests used were a likelihood ratio test (2 degrees of freedom) for heterogeneity of risk among the three genotypes (common homozygote, heterozygote, and rare homozygote) and a trend test (1 degree of freedom) based on the number of rare alleles carried. All analyses were performed with Intercooled Stata, version 8.2 (StataCorp LP, College Station, TX, USA).
Discussion
We have evaluated the association of 85 tagSNPs in 13 cell cycle control genes with survival after a diagnosis of breast cancer. Previous work has shown that expression of these genes is associated with breast cancer prognosis; however, to our knowledge, this is the first study to systematically assess germline variation in genes involved in controlling the cell cycle and breast cancer survival.
This study used a two-stage design, with an initial set of 85 tagSNPs genotyped in 2,270 individuals. The top two SNPs, with a P value of less than 0.05, were genotyped in the second set of patients (n = 2,200). Because a combined analysis with adjustment for multiple testing has been shown to increase power over a replication study, a joint analysis of both sets of data was performed. One SNP, CCND3 rs2479717, showed a significant association with survival after a diagnosis of breast cancer (unadjusted P value = 0.0001). This finding remains significant after a conservative Bonferroni correction for multiple testing (P value = 0.0085), and the HR is not significantly attenuated after adjusting for stage, grade, and treatment. There was no evidence of association with survival for polymorphisms in CCND1, CCND2, CCNE1, CDK2, CDK4, CDK6, CDKN1A, CDKN1B, CDKN2A, CDKN2B, CDKN2C, and CDKN2D.
These findings were based on the analysis of all-cause mortality. This may result in a reduction of statistical power as some deaths will be unrelated to breast cancer. Breast cancer-specific mortality was available from death certificates, and the results were consistent for the breast cancer-specific analysis, with an identical HR for CCND3 rs2479717. It is worth noting that cause of death as coded on a death certificate is also prone to misclassification and subsequent loss of statistical power.
Our analyses incorporated prevalent cases. It may be thought that inclusion of prevalent cases may result in a bias of the HR. However, provided that the Cox proportional hazards assumption holds true, the HR estimate is unbiased. For example, there is no significant difference between the HRs for CCND3 rs2479717, TNM stage, or histopathological grade when comparing subjects recruited within 6 months of diagnosis with those recruited more than 6 months after diagnosis (P = 0.69, 0.90, and 0.42, respectively). Furthermore, our repeated analyses, including only those individuals recruited within 3 years of their diagnosis, were identical to our full analysis.
CCND3 encodes for cyclin D3, a protein involved in the regulation of the G
1/S phase transition. The SNP associated with survival,
CCND3 rs2479717, is in an intron and unlikely to have a functional effect. However, a functional effect would not be expected as it was chosen as a tagSNP, not as a functional SNP. Furthermore, a functional variant tagged by this SNP may not even alter the function of
CCND3; the SNP lies in a large LD block with several genes that are reasonable candidates for breast cancer survival.
PGC encodes for pepsinogen C, a proteolytic enzyme involved in digestion, which is expressed in breast tumours [
30]. Higher pepsinogen C expression is associated with well-differentiated and moderately differentiated breast tumours [
31] and has been associated with longer overall survival in these patients [
32,
33].
BYSL encodes for bystin, a crucial component protein of an adhesion molecule complex that is important for the attachment of the embryo to the uterus [
34]. This protein is present in human prostatic carcinoma cells in areas of perineural invasion in an increasing gradient, suggesting a role in perineural adhesion [
35].
C6orf49 encodes for overexpressed breast tumour protein, a member of the LIM domain (cysteine-rich double zinc fingers) protein family that is overexpressed in tumours and has a possible role in cancer differentiation [
36,
37].
FRS3 encodes for fibroblast growth factor receptor substrate 3, a negative regulator in epidermal growth factor receptor tyrosine kinase signaling pathways [
38,
39].
USP49 encodes for ubiquitin-specific protease 49, which is involved in the modification of cellular proteins. Ubiquitin-specific protease 49 is expressed in samples derived from tumour biopsies [
40].
TRFP encodes for TATA-binding protein-related protein, which is associated with an RNA polymerase II-SRB complex; this complex may regulate class II genes [
41].
To further evaluate this LD block, we examined breast tumour expression of these seven genes using expression microarray data from seven published datasets. Significant associations between increased tumour expression levels of BYSL and C6orf49 transcripts and breast cancer survival emerged. Differences between the microarray datasets, varying outcome information, and incomplete control of confounding by prognostic factors may limit interpretation of these findings; however, we attempted to control for patient and tumour heterogeneity between these studies by performing two analyses: a random-effects and a fixed-effects meta-analysis. Although the results for BYSL are unclear, both analyses showed consistent associations of elevated tumour expression of the C6orf49 transcript with survival after a diagnosis of breast cancer.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
EMA carried out statistical analyses and drafted the manuscript. KED carried out genotyping, ER immunohistochemistry, and data cleaning. FL carried out genotyping and selected SNPs for evaluation. MS is responsible for patient recruitment. DG is responsible for collecting patient characteristics and patient follow-up data. DFE is a co-investigator in SEARCH and is involved in study design. AET was responsible for microarray data retrieval and cleaning and advised on data analysis. CC and NEC contributed to study design and interpretation of results. PDPP is a co-investigator in SEARCH, conceived of the study, participated in its design and coordination, and helped to draft the manuscript. All authors read and approved the final manuscript.