Introduction

Breast cancer (BC) is the most common cancer and the leading cause of cancer-related deaths in women in both developed and developing countries. Worldwide, 1.67 million new BC cases are diagnosed each year, and the number of BC-related deaths is 522 000.1 In Finland, BC was diagnosed in 4694 women in 2012, and the annual number of BC diagnoses is predicted to increase (Finnish Cancer Registry). Up to 10% of all BCs are familial.2 Approximately 20–25% of familial BCs are due to germline variants in two high-risk genes, BRCA1 and BRCA2, which are tumour suppressor genes that encode large proteins that act in a common pathway of genome protection and have important roles at different stages in DNA damage response (DDR) and DNA repair.3, 4 Variant in BRCA1/2 confer a high risk of breast and ovarian cancers; the average cumulative cancer risks in a population-based series of unselected breast or ovarian cancer patients have been estimated to be 65% and 39% for BRCA1 variant carriers and 45% and 11% for BRCA2 variant carriers, respectively, by age of 70.5 In the Finnish population, the contribution of BRCA1/2 -variants to the breast/ovarian cancer families and ovarian cancer-only families is ~20% and 26%, respectively.6, 7

In addition to BRCA1/2, rare germline variants in the known high-risk genes TP53, CDH1, STK11, and PTEN predispose to familial cancer syndromes, among which BC is also observed.4 Moreover, genes for which their protein products interact with BRCA1/2 in the DDR pathway are strong candidates for breast and/or ovarian cancer susceptibility, including genes such as ATM, CHEK2, BRIP1, PALB2 that moderately increase the breast/ovarian cancer risk (by two- to fourfold).4 Moreover, large-scale genome-wide association studies have identified common variants in over 70 loci associated with BC, explaining ~14% of the familial risk of the disease.8 Despite the intensive efforts, the familial predisposition remains largely unresolved and the majority of this missing heritability likely comprises of thousands of rare variants, each presenting a minor disease risk.9

In this study, a family-based exome-sequencing approach was used to identify rare variants that contribute to breast/ovarian cancer predisposition in Finnish BRCA1/2 -negative hereditary breast and/or ovarian cancer (HBOC) families, many of which included very early-onset BC patients. The primary aim was to identify variants that are shared between affected family members and that target DDR pathway genes. The secondary aim was to identify novel candidate genes contributing to early-onset BC by exploring genes in any pathway, with a special focus on those that likely has a role in tumorigenesis such as pathways related to the cell cycle, DNA replication, signalling, DNA repair, and apoptosis.

Materials and methods

Study subjects

Genomic DNA samples of 37 individuals from 13 HBOC families were used for whole-exome sequencing. Six of the families belong to a previously characterised cohort of high-risk Finnish BRCA1/2 founder variant-negative HBOC individuals collected from Tampere University Hospital in which index patients have been screened for variants in BRCA1, BRCA2, CHEK2, PALB2, BRIP1, RAD50, and CDH1 genes and for copy number variations on a genome-wide scale.10, 11 In one of these families, two CHEK2 variants, c.470T>C and c.1100delC, have been reported.10 Seven other families were selected into this study based on the previously described high-risk hereditary BC criteria.10 The families were recruited from Turku University Hospital, and the index patients had tested negative for BRCA1/2 founder-variants. From each of the 13 families, one to three breast or breast and ovarian cancer-affected individuals were selected for exome sequencing. Corresponding healthy relatives of the affected family members, used as controls, were selected from each family whenever possible. Of the 37 selected individuals, 23 were breast or breast and ovarian cancer-affected females, 1 was a BC-affected male, and 13 were healthy. The clinical characteristics of the 23 affected females and 1 affected male are presented in Supplementary Information. Notably, six of the females were early-onset patients with either BC or bilateral BC diagnosed at an age ≤29 years. Genomic DNA samples of the index patients’ relatives were utilised for candidate variant segregation analyses. Moreover, candidate variants were screened from germline DNA from 129 female HBOC patients from the Tampere and Turku regions, 49 Finnish male BC patients, up to 989 healthy female population controls, and 909 healthy male population controls (more details in Supplementary Information). In addition, variants were analysed from formalin-fixed paraffin-embedded (FFPE) DNA from tumour and normal tissue from 31 Finnish breast or breast and ovarian cancer patients obtained from the Auria Biobank (Turku, Finland; more details in Supplementary Information). All of the exome-sequenced study subjects as well as their relatives have been informed of the analyses and have provided their written consent to use their DNA samples in the study. The Ethical Committees of Tampere and Turku University Hospitals, the National Authority for Medicolegal Affairs, and the Auria Biobank have provided their permission for the research project.

Sample preparation and whole-exome sequencing

Genomic DNA was extracted from peripheral leukocytes by using a Wizard Genomic DNA Purification Kit (Promega Corporation, Madison, WI, USA) according to the manufacturer’s instructions. For one patient (id 240010), only FFPE tumour tissue was available. After a pathologist had confirmed the presence of tumour cells, DNA was extracted by using the Arrow DNA kit and NorDiag Arrow automated magnetic bead-based nucleic acid extraction system (DiaSorin, Saluggia (Vercelli), Italy) according to the manufacturer’s instructions at Fimlab Laboratories (Tampere, Finland). Exome capture and sequencing were conducted out by BGI Tech Solutions (Hong Kong, China) Co. Ltd. Exome capture was performed on 3–5 μg of genomic DNA per sample by SureSelect Human All Exon 51M kit (Agilent Technologies, Inc., Santa Clara, CA, USA) followed by 100-bp paired-end sequencing with 50X genome coverage depth per sample on the HiSeq 2000 instrument (Illumina, Inc., San Diego, CA, USA) according to the protocols by Agilent, Illumina, and BGI.

Data analysis

The sequencing data have been deposited at the European Genome-phenome Archive (EGA; www.ega-archive.org) under accession number EGAS00001001835. Exome sequencing detected a large number of variants, and several filtering steps were used to reduce the number of candidate variants for downstream analyses (described in detail in Supplementary Information). The analysis workflow is illustrated in Figure 1. The distribution of variants predicted to affect function in DDR pathway across 24 affected breast or breast and ovarian cancer patients with phenotypic information is presented in Figure 2.

Figure 1
figure 1

Data analysis workflow.

Figure 2
figure 2

DDR pathway gene variants in 24 breast or breast and ovarian cancer patients.

Variant validation

A total of 18 variants were genotyped by Sanger sequencing or TaqMan SNP genotyping assays from germline DNA from 129 female HBOC patients and up to 989 healthy female controls as well as from DNA from FFPE breast tumour tissue samples from 31 Finnish breast or breast and ovarian cancer patients. The reference sequences of the DDR genes are shown in Figure 2. Two of the variants, RAD50 c.280A>C, and ATM c.4424C>G, which were identified in one male BC patient, were further analysed from the germline DNA of an additional 49 male BC patients and 919 healthy male controls. Either Sanger sequencing or TaqMan assays were used to study segregation of the variants in families in which they were observed by utilising germline DNA samples of index patients’ relatives (Supplementary Information).

Statistical and bioinformatic analyses

Variants were tested for Hardy–Weinberg equilibrium in controls. The allele frequencies between patients and controls were compared by Fisher’s exact test using PLINK v1.07.12 P-values were two-sided. P<0.05 was considered statistically significant. ESEfinder was used to predict the effects of two ATM variants on exonic splicing enhancer (ESE) elements.13

Variant prioritisation in exome-sequenced early-onset patients

Filtering steps based on the variant function, allele frequency, and predicted effect on function were performed similarly as for all exome-sequenced individuals (described above). Instead of focusing on a certain pathway, gene variants in all pathways were considered. In the case that no pathway information was available, the gene function was obtained from the GeneCards database (http://www.genecards.org/). Catalogue of human somatic mutations in cancer (COSMIC) database search was performed for the variants.14 Four variants were further studied by segregation analysis using Sanger sequencing.

Results

After extensive filtering, a total of 18 variants in DDR pathway genes were selected for further validation and the variants were screened in cohorts of female HBOC patients, male BC patients, and healthy population controls. Observed variant frequencies and their association with breast and/or ovarian cancer are presented in Table 1. All variants were in Hardy–Weinberg equilibrium in the controls. None of the validated variants showed a significant association with breast/ovarian cancer. Five variants, ATM c.5558A>T, MYC c.77A>G, PLAU c.43G>T, RAD1 c.341G>A, and RRM2B c.211dupC, were more commonly detected in female HBOC patients than in female controls (odds ratio (OR) 1.16–2.16), suggesting that these variants may be low-to-moderate risk alleles but further studies are needed to confirm the findings. The rest of the variants were either absent in both patients and controls or more commonly detected in controls.

Table 1 Variant frequencies

Clinical features of the female HBOC patients carrying the validated variants are presented in Table 2, with the exception of the RRM2B variant, which was observed with high frequency in HBOC patients. Eighteen variants were also screened from tumour samples of 31 breast or breast and ovarian cancer patients. Observed genotypes are presented in Table 3. No wild-type allele loss was observed for any of these variants.

Table 2 Clinical features of the patients with the validated variants
Table 3 Observed genotypes in breast tumour samples

Based on available DNA samples, the variants were further studied in families in which they were observed. The PLAU c.43G>T variant was detected in one bilateral BC patient (onset at 59 and 61 years of age) by exome sequencing (Figure 2; patient 906001). The patient had three BC-affected sisters (diagnosed at 49, 50, and 56 years of age) but no DNA samples from these relatives were available. However, two out of three patients’ healthy daughters (current ages 41 and 34 years) were confirmed to carry the variant. In addition, the PLAU variant was also detected in a BC-affected female relative of the patient with the mucinous type of BC observed in the validation experiments (Table 2). The female relative had received a diagnosis of grade 1 ductal carcinoma in situ at 51 years of age.

The MYC c.77A>G variant was detected by exome sequencing in one family in two BC-affected relatives who received BC diagnoses at 28 and 45 years, respectively (Figure 2; patients 903001 and 904001). In genotyping, the variant was observed in homozygous form in one (0.8%, 1/129) BC patient but in none of the 987 controls. The patient with the homozygous MYC variant had high-grade triple-negative ductal carcinoma diagnosed at 61 years of age and had a family history of three other BC cases (mother, sister, sister’s daughter; Table 2). Moreover, the BC-affected sister had received a colon cancer diagnosis at 66 years of age. In addition, throat and stomach cancers were reported in the index’s grandparents, but the cancers were not confirmed. Further analyses identified the MYC variant in its homozygous form in the index patient’s healthy sister (current age 81 years) and in its heterozygous form in the index patient’s healthy daughter as well as in three other healthy relatives (Supplementary Information). Unfortunately, no DNA samples from BC-affected relatives were available.

The BRCA1 c.3904A>C variant was identified in two BC-affected females, a mother and daughter, in a single family by exome sequencing (Figure 2; patients 907001 and 909001). In addition, the BRCA1 variant was detected in affected daughter’s healthy daughter who is currently 22 years of age. Moreover, three other BC cases have been reported in the maternal side of the index patient’s family, but no DNA samples were available from these individuals.

The ATM c.5558A>T variant was detected in three BC-affected females from three different families by exome sequencing (Figure 2; patients 271010, 906001, and 918001) and in 1 out of 129 (0.8%) female HBOC patients by genotyping (Table 1). Interestingly, a known variant that affects correct splicing, c.5557G>A, was unexpectedly detected in the adjacent nucleotide position in the validation experiments. Preliminary predictions using ESEfinder indicated that the c.5558A>T variant introduces a new exonic splicing enhancer site in both the presence and the absence of the c.5557G>A variant. Two of the exome-sequenced patients, 271010 and 906001, had both of the variants and received a diagnosis of lobular BC at the age of 66 years and bilateral ductal BC at the ages of 59 and 61 years, respectively. A total of 0.2% (2/989) of the healthy females was found to have both of the variants. The patients carrying only the c.5558A>T variant had ductal high-grade triple-negative BC diagnosed at 49 years of age (Figure 2; patient 918001) and an early-onset BC (Table 2).

The RAD50 c.280A>C and ATM c.4424A>G variants were detected in one male BC patient by exome sequencing (Figure 2; patient 910001). The male BC patient had BC diagnosed at the age of 72 and a family history of one female BC case (daughter, diagnosed at the age of 28). Unfortunately, the daughter’s DNA sample was not available for further analyses. The RAD50 variant was also observed in 1 out of the 909 (0.1%) male controls but was absent in the cohorts of female HBOC patients and male BC patients as well as in female controls, whereas the ATM variant was absent in a cohort of female HBOC patients, male BC patients and in 909 male controls but was observed in 0.4% (1/278) of the female controls (Table 1).

The sequencing data for the six early-onset patients was re-analysed and all rare variants predicted to affect function that were present only in early-onset patients (compared with exome-sequenced patients diagnosed >40 years) in any possible pathways were considered as good candidates for the disease susceptibility. A list of 56 candidate variants (50 nonsynonymous SNVs, 3 stopgains, and 3 frameshift indels), in genes that have a role in important cellular functions such the cell cycle, proliferation, apoptosis, adhesion, different signalling pathways and the DDR is presented in Table 4. Based on available DNA samples, four variants were further studied in families in which they were observed. Segregation of three frameshift altering variants, BNIPL c.33dupA, MAGEF1 c.52dupG, and EDN3 c.560dupA, and one nonsynonymous variant, APEX1 c.190A>G, is presented in Figure 3.

Table 4 Candidate variants in early-onset breast cancer patients
Figure 3
figure 3

(a) Segregation of the BNIPL c.33dupA in pedigree 236. (b) Segregation of the EDN3 c.560dupA, MAGEF1 c.52dupG, and APEX1 c.190A>G in pedigree 110. Two CHEK2 variants, c.470T>C and c.1100delC (presented in parenthesis), have been described previously in family 110.10 Exome-sequenced samples are marked with an asterisque. Plus indicates heterozygous variant and minus indicates no variant. Females are marked with circles, and males are marked with squares. The index patient is marked with an arrow. Breast cancer is marked with a black circle and with the age at diagnosis. Other cancers are indicated with grey squares. The current ages of healthy females are presented. Deceased individuals are indicated with a slash. Generations are presented in Roman numerals.

Discussion

In the current study, the aim was to identify rare variants that would contribute to HBOC susceptibility in Finnish BRCA1/2-negative families. An exome sequencing approach was used to analyse a total of 13 high-risk HBOC families, 6 of which were involved in our previous analyses. A total of 18 DDR pathway gene variants were considered the most potential ones for validation analyses after filtering. However, none of the validated variants showed a significant association with breast/ovarian cancer upon screening the variants in cohorts of female HBOC patients, male BC patients, and healthy controls; this finding might be explained by the rarity of the variant and the limited number of the analysed patients. Variants in ATM, MYC, PLAU, RAD1, and RRM2B were enriched in female HBOC patients compared with controls, suggesting that these variants may be low-to-moderate risk alleles but additional analyses of larger sample set are needed to further validate the findings.

ATM is a protein kinase that has a central role as an activator of the DDR cascade after DNA double-strand breaks. Heterozygous ATM variants have been shown to increase the BC risk by approximately twofold.15 Here, an interesting observation was that several ATM variants were enriched, particularly in one family (Figure 2; patients 271009, 271010, and 271310). A polygenic risk model for BC has been suggested; according to this model, several low-to-moderate risk alleles can act multiplicatively on the cancer risk.16 Moreover, an additional ATM variant, c.5557G>A, was identified in the validation experiments. The c.5557G>A is a common variant that has been associated with bilateral BC in the Finnish population.17 In line with this, one of the patients with the c.5557G>A had bilateral BC. Moreover, the c.5557G>A variant has been reported to have an effect on splicing.18 Similar findings was observed here for the c.5558A>T variant, which was predicted to create a new splicing site. Splicing variants have been reported to be common in ATM;18 therefore, additional analyses are warranted.

MYC is a well-known oncogene that has a central role in growth control, differentiation, and apoptosis, and its abnormal expression is associated with many human cancers including tumours of epithelial origin.19 An interesting finding here was that a rare homozygous form of the MYC variant was observed in one BC patient who was diagnosed with an aggressive BC at the age of 61 years and who had familial aggregation of breast and other cancers. However, further analysis identified the homozygous variant in the index patient’s healthy sister (current age 81 years), indicating a likely neutral role of the homozygous variant.

The PLAU gene encodes for the urokinase-type plasminogen activator protease (uPA), which has a crucial role in the process of cancer metastasis.20 Here, the PLAU c.43G>T variant was detected in three BC patients who were diagnosed at an older age. However, an interesting observation was that one of the BC patients had been diagnosed with the mucinous type of the disease. Mucinous breast carcinoma is a rare histological subtype that accounts for ~2% of all BCs and is related to favourable outcomes.21 Here, the mucinous subtype was very rare, as it was observed only one HBOC patient (1/129, 0.8%). Clearly, further analyses in a larger sample set are warranted to investigate whether the PLAU variant relates to the mucinous subtype; such finding might hold clinical value. Moreover, further evidence of the enrichment of the PLAU variant in BC patients was obtained by also identifying the germline variant in its heterozygous form in 9.7% (3/31) of breast tumour samples

RAD1 encodes a component of the heterotrimeric cell cycle checkpoint complex that participates in cell cycle checkpoint activation and DNA repair, and RRM2B encodes ribonucleotide reductase, which is directly involved in the p53 checkpoint for the repair of damaged DNA.22, 23 Both RAD1 and RRM2B variants were frequent in genotyped female HBOC patients as well as in female controls, indicating that they are either neutral or low-risk common variants. Because both of the variants are frequent, larger sample sets would be needed to show their possible contribution to breast/ovarian cancer risk.

Interestingly, one BRCA1 variant, c.3904A>T, which was detected in a single exome-sequenced family, was absent both in genotyped female HBOC patients and healthy female controls indicating that this family-specific variant is extremely rare and likely deleterious; this variant indeed warrants further investigations because it might have clinical relevance in genetic counselling of high-risk HBOC families. Moreover, the RAD50 c.280A>C variant, which was observed in a male BC patient by exome sequencing, was in fact absent in both female HBOC patients and healthy controls and was extremely rare (0.1%, 1/909) in male controls, indicating that it might contribute to BC susceptibility, particularly in males. Another variant detected in a male BC patient by exome sequencing, ATM c.4424A>G, was absent both in male BC patients and controls and present in one female control (0.4%, 1/278); these findings suggest that it is a very rare variant that is not specific for male BC.

In addition, rare DDR pathway gene variants were observed in single individuals in the studied families (Figure 2); these variants may modify the cancer risk in the family and explain the phenotypic heterogeneity. Especially, variants in AKT1, ATRIP, BRCA1, CASP3, CASP8, ERBB2, FANCD2, MAP3K1, MAP3K4, PCK2, TGFB1, TP53, and WNT4 are of great interest. Many of these genes have been highlighted as significantly mutated genes in BC by integrated molecular analyses.24 Supported by previous findings, the variants in these genes can be considered as good candidates for BC susceptibility and warrant further investigations.

Notably, although exome sequencing is a useful tool for new variant identification, the causal variants may remain undetected owing to technical challenges. Here, exome sequencing failed to detect the CHEK2 c.1100delC variant in a patient 110010 (Figure 2), who is known to carry the two CHEK2 variants c.470T>C and c.1100delC based on previous analyses.10 The c.1100delC variant is located in a gene region that shows homology with other chromosomal regions; this might be the reason why it was not identified by exome sequencing.

Deeper analyses on six early-onset patients revealed defects in interesting candidate genes in pathways related to cell cycle, proliferation, apoptosis, adhesion, and DDR, among other signalling pathways thus providing a fruitful starting point for later studies. For instance, frameshift altering variants were detected in the BNIPL, EDN3, and MAGEF1 genes. BNIPL is an apoptosis-associated protein that interacts with BCL2 and promotes the invasion and metastasis of human hepatocellular carcinoma cells.25 Segregation analysis confirmed that the BNIPL frameshift variant (c.33dupA) was not detected in index patient’s healthy daughters, sister, and brother’s daughter (Figure 3a), which supports that the variant is likely deleterious. Instead, the variant was observed in index patient’s brother and sister’s healthy daughter who is currently 30 years of age. Because index’s healthy sister did not carry the variant her daughter’s variant is of paternal origin.

EDN3 belong to a family of endothelins, which are vasoactive peptides, important signalling molecules that function in fundamental cellular processes such as proliferation, migration, and differentiation. Altered endothelin signalling has a role in carcinogenesis, and the loss of EDN3 expression has been reported in human BC.26 MAGEF1 belongs to a superfamily of MAGE proteins, called cancer–testis antigens, that mediate normal cellular functions and are ubiquitously expressed in both normal and tumour tissue.27 Altered expression of MAGE proteins has been implicated in BC.28 Segregation analysis of the EDN3 (c.560dupA), the MAGEF1 (c.52dupG), and the APEX1 (c.190A>G) variants in the same family showed incomplete segregation (Figure 3b). Because index’s mother is healthy, it is likely that combination of these and other variants explain the index’s early onset. Interestingly, in this family, two deleterious CHEK2 variants, c.470T>C and c.1100delC have been reported previously.10 It can be seen in Figure 3b that the truncating CHEK2 c.1100delC variant is not segregating with the disease in the family, indicating its moderate character. It would be interesting to study further if the EDN3, MAGEF1, and APEX1 variants could be modifiers of the CHEK2 variants. Intriguingly, germline defects in APEX1 have been reported to be associated with BC susceptibility in other population.29 Moreover, variants that induce premature stop codons were identified in the DENND2D, EFCAB13, and TICRR genes. Of these genes for instance, EFCAB13 is a very interesting candidate gene, because it functions in calcium ion binding, and it has been associated with familial prostate cancer in Finland by a recent study.30

In conclusion, family-specific enrichment of multiple DDR pathway gene defects likely explains a proportion of BC predisposition in high-risk HBOC families. Moreover, interesting candidate genes targeting pathways involved in DNA replication, apoptosis, and the cell cycle, among other signalling pathways, were identified in early-onset BC patients, thus providing novel information on potential BC-related pathways and an excellent premise for future studies.