Background
Multigene panel testing is increasingly adopted for managing breast cancer susceptibility in high risk individuals suspected of having hereditary breast cancer, but the evidence-based practice guidelines remain far from being comprehensive. The advent of next generation sequencing (NGS) technologies is making multigene panel testing easier and affordable [
1‐
4]. In addition, multigene panel testing could identify up to 50% more individuals with cancer susceptibility gene mutations in comparison with testing only for
BRCA1 and
BRCA2 (
BRCA) [
5]. Most of these additional mutations are from moderate risk genes, many of which could result in alterations of cancer risk estimation and clinical action [
5,
6]. However, arriving at consistent and optimal clinical recommendations on the basis of the interpretations of the multigene panel testing and associated variants of uncertain significance (VUS) could be challenging due to lack of comprehensive understanding on the consequences of the genetic alterations [
7]. As the multigene panel testing is becoming widely adopted, studies are needed to develop evidence-based practice guidelines.
In addition to risk assessment of breast cancer susceptibility in germline mutation carriers, understanding prognosis after breast cancer diagnosis will also impact practice guidelines for breast cancer. With the efficacy of poly (ADP-ribose) polymerase (PARP) inhibitors in controlling
BRCA mutation positive tumors, many clinical trials are now underway evaluating their use in breast cancer [
8]; their incorporation into systemic therapy in clinical practice is highly anticipated. It has not been established whether
BRCA or any cancer susceptibility gene mutation is an independent prognostic factor after breast cancer diagnosis. Despite the suspicion for a more aggressive tumor phenotype, most studies have fallen short of showing differences in clinical outcomes in
BRCA mutation carriers [
9‐
12]. Systematic reviews with larger pooled sample size have yielded conflicting conclusions, possibly due to variability of included studies [
13‐
15]. Consequently, conventional decisions regarding systemic therapy for
BRCA mutation-associated breast cancer have been based on disease characteristics rather than
BRCA mutation status. As such, it would be informative to discover causal or statistical correlations of germline breast cancer susceptibility gene mutations to breast cancer prognosis.
A panel of 20 known and candidate breast cancer susceptibility genes were selected herein for the multigene panel testing study. Among the 20 genes,
BRCA1, BRCA2, PALB2, TP53, CDH1, PTEN, ATM, CHEK2, BARD1, STK11, NBN have been well established as breast cancer susceptibility genes [
7,
16,
17]. Some are part of rare high-penetrance cancer predisposing syndromes (e.g.
BRCA1, BRCA2, TP53, CDH1, PTEN, STK11, PALB2) while others are moderate-penetrance genes (e.g.
ATM, NBN, CHEK2, BARD1). The impacts of the mutations in
RAD50,
RAD51C, and
RAD51D on breast cancer susceptibility and survival are controversial: Mutations in
RAD50 have been found not associated with breast cancer risk [
17]. Also mutations in
RAD51C have not been found to increase the risk of breast cancer [
18‐
20] and mutations in
RAD51D have been associated with high risk of ovarian cancer but not with breast cancer [
21]. Nevertheless, other studies have indicated that mutations in
RAD51C [
22] and
RAD51D [
17,
23,
24] contribute to the risk of both breast and ovarian cancer, and that
RAD50 is an intermediate-risk breast cancer susceptibility gene [
25]. Although germline mutations in the DNA mismatch repair genes (
MLH1, MSH2, MSH6, PMS2) have been mostly associated with Lynch syndrome, evidence has been established to support the connections between the mutations in the DNA mismatch repair genes and the risk or survival of breast cancer [
26‐
29]. Similarly, whether
BRIP1 is a breast cancer susceptibility gene remains controversial [
30], and perhaps is dependent on the ethnicity of the cohort studied [
22].
NF1 mutations have been known to associate with increased risk of breast cancer in younger population [
31] and poor breast cancer survival [
32]. To clarify the controversies, we included in the panel the potentially relevant genes above to explore the germline mutation-dependence of breast cancer predisposition and outcomes in our local high risk population.
Different ethnic populations need respective studies on cancer risks pertinent to germline mutations. In the western populations, about 5% of the breast cancer patients may carry heritable cancer susceptibility gene mutations [
33,
34].
BRCA1 and
BRCA2 account for the majority of these gene mutations, with
BRCA1 being the most common [
34,
35]. However, studies in Asian populations have indicated somewhat different conclusions: available results show that
BRCA mutation rates in Asians are lower than those in Whites, and that the distributions of the gene mutations are also different [
36‐
43]. It is imperative to enrich mutation databases on different ethnic populations, so as to better interpret ethnically specific germline mutations and better manage cancer risks among corresponding ethnic populations.
In this study, we analyzed germline mutations in the 20 breast cancer susceptibility genes using NGS-based technique in a cohort of high risk ethnic Chinese population. We evaluated the correlation of mutations with clinical characteristics and cancer outcomes. We aimed to clarify the prognostic value of BRCA and other breast cancer susceptibility gene mutations on breast cancer specific outcomes after conventional cancer treatment.
Methods
Study participants and data collection
Koo Foundation Sun Yat-Sen Cancer Center (KF-SYSCC) treats over 1000 newly diagnosed breast cancer patients annually. Between July 30, 2015 and March 31, 2016, we enrolled 480 individuals fulfilling at least one of the six eligibility criteria: family history of breast or ovarian cancer at any age (2 or more individuals on the same lineage of the family), personal history of breast cancer with age of diagnosis less than or equal to 40, bilateral breast cancer diagnosed at the same time or sequentially, triple negative (ER/PR/HER2 negative) breast cancer, breast and ovarian cancer in the same individual, and male breast cancer. None of the participants had known mutation status in any cancer susceptibility genes prior to enrollment. Clinical information was collected through participant surveys, electronic medical records, and the institutional breast cancer database.
Sequencing and variant analyses of cancer susceptibility genes in genomic DNA
Germline DNA sequencing of all exonal regions was done in twenty breast cancer susceptibility genes: BRCA1, BRCA2, PTEN, TP53, CDH1, STK11, NF1, NBN, MLH1, MSH2, MSH6, PMS2, ATM, BRIP1, CHEK2, PALB2, RAD50, BARD1, RAD51C, and RAD51D. Polymerase chain reaction (PCR)-enriched amplicon-sequencing on an NGS platform was used to sequence genomic DNA extracted from whole blood or frozen buffy coat samples using the Gentra Puregene Blood kit (Qiagen, Minneapolis, MN, USA). The DNA samples were first PCR amplified using the Qiagen GeneRead DNAseq custom panel primer sets for the 20 genes, covering all exons as well as at least 10-base exon-flanking regions. The Qiagen primer set included 1184 amplicons and provided at least 90% coverage for most genes except for STK11 (59%), PMS2 (74%) and MSH2 (89%). PCR enriched amplicons were end-repaired, adenylated, and ligated to NEXTflex-96 DNA barcodes (Bioo Scientific, Austin, Texas, USA) using the Qiagen GeneRead DNA Library I Core Kit. Barcoded libraries were amplified using the Qiagen GeneRead DNA I Amp Kit and NEXTflex primers (Bioo Scientific). Quality control and quantification of libraries were performed using the Qubit dsDNA HS Assay kit and the Agilent DNA 1000 kit. The barcoded DNA libraries were pooled in equal amounts and underwent 2x150bp paired-end sequencing on an Illumina MiSeq platform. The average base call error rates were less than 1.0%.
We constructed a pipeline based on public domain software and databases for alignment, variant calling, and annotation, using GRCh37 as the reference genome. BWA (
http://bio-bwa.sourceforge.net/) was used to map reads to the reference genome. Bam-readcount (
https://github.com/genome/bam-readcount) was used to count variants for each aligned position. Variant calling protocols were carried out either based on GATK Best Practices (
https://software.broadinstitute.org/gatk/best-practices/) or using a non-GATK based algorithm, where lower limits of 50 for read depth and 10% for proportion of raw reads with a variant were used for variant calling. Variants that were intergenic, intronic (except for the 10 bp exon-flanking regions), or synonymous (sense) were excluded. All other variants identified with the two algorithms were compared, and discrepant variants were manually inspected by viewing the BAM reads using the Integrative Genomics Viewer (IGV, Broad Institute, Inc.) to decide on the validity of the variant. The variants were searched in the dbSNP database (
http://www.ncbi.nlm.nih.gov/SNP/) and the ClinVar database (
http://www.ncbi.nlm.nih.gov/clinvar/). Variants were annotated as pathogenic, uncertain significance, or benign, using variant-dependent methods and disease-dependent methods. Nonsense, frameshift, and splice-site mutations that result in a truncated protein product were classified as pathogenic. The clinical significance interpretation on ClinVar, if available, was referenced for categorization. Novel missense mutations not found in the public databases were classified as variants of uncertain significance. We used various
in silico models (Align-GVGD [
44], PolyPhen-2 [
45], SIFT [
46], PROVEAN [
47], CADD [
48]) to evaluate the deleteriousness of the variants, especially missense variants. However, we did not change classification based on the
in silico models. As the interpretation of missense variants is often controversial [
7], we took a more conservative approach of only including the protein-truncating variants in the clinical correlation of this study. All variants classified as pathogenic were further verified using the Sanger sequencing method, confirming they were germline mutations.
Detection of large genomic rearrangement using copy number variation (CNV) analyses
Coverage or read depth has been used to detect CNVs in genome-scale (whole genome sequencing) datasets. Multiplex PCR-based enrichment focuses sequencing efforts on a very small fraction of the genome, and the observed read depth for each of the regions can differ due to varying number of PCR amplicons, sequence variation, or PCR enrichment efficiency. For CNV detection in our PCR-enriched amplicon sequencing data of the 20 genes, we used two algorithms, Quandico [
49] and ONCOCNV [
50], specifically developed for CNV analysis of amplicon sequencing data. The CNVs detected with these algorithms were then verified experimentally using the multiplex ligation-dependent probe amplification (MLPA) technique. This analysis resulted in the discovery of two carriers of a
BRCA1 large genomic rearrangement (LGR) in the study cohort.
Clinical correlation and statistical analyses
For study participants who have had breast cancer, tumor characteristics and clinical outcomes were extracted from the institutional breast cancer database and participant survey. Correlation statistics between clinical characteristics and BRCA mutation or non-BRCA mutation status were performed using the Chi-square test or t test.
For the correlation analyses of clinical outcomes and germline mutations, we performed survival analyses using Kaplan-Meier curves and Cox proportional hazards regression analysis. The primary end point was disease-free survival, defined as the time from breast cancer surgery to the first appearance of one of the following: invasive recurrence of breast cancer (local, regional, or distant) or death without breast-cancer recurrence. Secondary end points included the following: time interval without breast cancer recurrence, defined as the time from breast cancer surgery to the recurrence of invasive breast cancer (local, regional, or distant); time interval before a recurrence of breast cancer at a distant site, defined as the time from breast cancer surgery to the recurrence of breast cancer at a distant site; and overall survival, defined as the time from breast cancer surgery to death from any cause. For patients who did not have an end-point event, the times were censored at the date of the last follow-up visit (or for the analysis of overall survival, the date at which the patient was last known to be alive).
The above primary and secondary end points of the groups with different mutation status were compared using Kaplan-Meier curves, and the statistical significance was evaluated using the log-rank test. Cox proportional hazards regression analysis was used to evaluate univariate and multivariate hazard ratios (HR) for BRCA germline mutation for the end points. The covariates for the multivariate analyses included: tumor size > 2 cm, lymph node positivity, triple negative tumor type, young age of onset (≤ 40), mastectomy (vs breast-conserving surgery), adjuvant chemotherapy, adjuvant radiotherapy, and hormonal therapy.
The six eligibility criteria for study enrollment were considered clinical risk factors for having a germline mutation in the cancer susceptibility genes sequenced. The odds ratios (OR) of having a pathogenic germline mutation were calculated using multivariable logistic regression with the six dichotomous risk factors as independent variables, and having a pathogenic germline mutation as the dependent variable. The odds ratios for the number of risk factors were obtained by logistic regression. Statistical significance was represented as 95% confidence intervals and P-values. An alpha level of 0.05 was defined as statistically significant for rejecting the null hypothesis. All analyses were performed using SAS 9.4® (SAS Institute, Cory, NC, USA).
Discussion
We showed that
BRCA germline mutation carriers in this large ethnic Chinese cohort were more likely to be diagnosed with breast cancer already spread to regional lymph nodes, and their breast cancer related outcomes were significantly worse. The 5-year disease-free survival rate was only 73.3% for
BRCA mutation carriers, in contrast to 91.1% for non-mutation carriers. The
BRCA mutation status was an independent prognostic factor with an adjusted hazard ratio of 3.04 (95% CI 1.40–6.58) for cancer recurrence or death. The poor clinical outcome in
BRCA mutation carriers mainly resulted from recurrence as distant metastasis, therefore excluding the contribution by new primary cancer in the ipsilateral or contralateral breast, of which the risk had been known to be elevated in
BRCA mutation carriers. Our result implied the more aggressive nature of breast tumors in
BRCA germline mutation carriers. Most previous studies on clinical outcomes of
BRCA mutation carriers have failed to show a significant prognostic effect by
BRCA mutation [
9‐
11,
13,
14]. However, a recent systematic review showed that both
BRCA1 and
BRCA2 mutation carriers had significantly worse breast cancer specific survival [
15]. The discrepancy in these studies may in part result from limitations due to small sample sizes, lack of adjusting for disease characteristics, variations in mutation assay techniques, mutation types, cancer treatment modalities, or lengths of follow-up. Our study was conducted in an all-Chinese cohort where all study participants underwent the same NGS-based complete sequencing of the coding regions of
BRCA genes among other genes. The majority of the mutations were in the
BRCA2 gene. The follow-ups were extensive in terms of length, with median duration over 5 years, and completeness. The tumor characteristics and outcome data had been collected in a prospective manner in a breast cancer registry as well as after the participants were enrolled. The majority of the breast cancer patients in the cohort (94%) had undergone treatment in a single cancer center where cancer characteristic-based treatment guideline was consistently adhered to. The homogeneity on data collection may have strengthened the validity of the prognostic analysis.
Studies have shown that tumor cells with
BRCA mutations may show different response to different chemotherapy agents; they may have enhanced sensitivity to platinum while more resistant to taxanes. However, clinical studies on comparison of chemotherapy regimens in the
BRCA mutation populations are limited [
12]. For the breast cancer patients in our cohort, less than 10% received cisplatin in the neoadjuvant setting and none in the adjuvant setting, while about a third received a taxane (docetaxel) in the adjuvant setting. There were no significant differences in chemotherapy choice between the groups with and without
BRCA mutation. Despite a higher rate of
BRCA mutation carriers receiving chemotherapy, they had poorer cancer outcomes. Further prospective studies are needed to determine the optimal chemotherapy for
BRCA mutation carriers. With the anticipated efficacy of incorporation of PARP inhibitors in the treatment of
BRCA mutation associated breast tumors, knowledge of
BRCA mutation status prior to initial cancer treatment becomes even more crucial.
In this Chinese cohort of high risk individuals for hereditary breast cancer, we found an overall prevalence of 13.5% for carriers of germline mutations in 11 of the 20 breast cancer susceptibility genes. In contrast to western populations,
BRCA2 mutations (52.3%) were much more common than
BRCA1 mutations (9.2%) in our cohort, similar to findings in other studies in the Asian population [
34]. Non-
BRCA genes contributed to 38.5% of the mutation carriers, with
PALB2 (13.8%),
RAD51D (9.2%), and
ATM (4.6%) being the majority.
PALB2 is particularly important since lifetime risk for breast cancer can reach 58% in those with family history [
51], and NCCN guideline recommends consideration of risk-reducing mastectomy [
52]. Among the 8 cases with
RAD51C and
RAD51D mutations, 6 (75%) were triple negative breast cancer, in agreement with the recent studies suggesting that mutations in these two genes may confer higher risks of triple-negative or basal subtypes of breast cancer [
23,
24]. We also found 2 individuals with protein-truncating mutations in
TP53 and
PMS2 genes, which are high-penetrance cancer predisposing genes and would result in significantly high risk for other cancers. These results showed that testing more than
BRCA1 and
BRCA2 increased the detection rate of clinically actionable high and moderate risk gene mutations, therefore may be an important strategy in the Chinese population. In a study by Thompson et al., significant excess of mutations was only observed for
PALB2 and
TP53 in familial breast cancer cases compared to cancer-free controls [
6]. We similarly only found a small number of genes contributing to the majority of mutation carriers. To overcome the challenge of high rates of VUS and questionable clinical actionability, we recommend limiting cancer susceptibility multigene panel in clinical testing to include only a handful of genes with high clinical impact.
Among the six risk factors for hereditary breast cancer in this cohort of all high risk individuals, only bilateral breast cancer showed a statistically significant odds ratio of 3.27 for having a germline mutation in multivariate analysis. In addition, having more risk factors was also associated with a high detection rate of mutations (OR 2.07 for having more than one risk factor). These results suggest that these known risk factors were helpful in identifying individuals for genetic testing and we may need to pay particular attention to those with bilateral breast cancer, even in the absence of family history or young age of onset. Larger cohorts are needed to clarify the significance of ovarian cancer and male breast cancer on breast cancer susceptibility gene mutations in the Asian population. Results from the correlation analysis done with the few high penetrance genes were similar to those done with all the studied genes, suggesting that the correlations were driven by these high penetrance genes including BRCA1, BRCA2, PALB2, and TP53, which was expected since they represented the majority of the pathogenic variants.
There were some limitations in our study. First, we did not conduct experiments to detect large genome rearrangement (LGR) in all study participants, but used bioinformatics analytical tools to detect copy number variations on the NGS data. This could underestimate the prevalence of LGR in this cohort. However, LGRs have not been shown to contribute significantly to germline mutations in
BRCA genes in East Asian populations [
53,
54]. Second, we were conservative in classifying variants as pathogenic and limited those to protein-truncating variants, which were without ambiguity in assignment of pathogenicity. There were two missense variants classified as likely pathogenic in the ClinVar database, and many missense variants deemed damaging by multiple
in silico models. However, we did not assign those as pathogenic mutations in this study. We could therefore have underestimated the prevalence of pathogenic mutations. Further studies are underway to evaluate variant segregation with cancer in families, and the accuracy of
in silico models.