Introduction
Hereditary forms of cancer have been described for decades. Evidence-based guidelines for screening are now applied for suspected hereditary breast and ovarian cancer (HBOC) syndrome, Lynch syndrome, and other conditions [
1,
2]. Screening multiple genes simultaneously by massively parallel sequencing is cost-effective and has replaced single-gene sequencing in hereditary breast cancer (HBC). It can reveal mutations in clinically validated genes in up to 5% of cases without
BRCA1 or
BRCA2 mutations [
3]. Its use will probably expand, as recent publications question the validity of established screening criteria given the high number of germline mutations identified in cancer types unrelated to the initial syndrome or in patients lacking family history [
4,
5]. However, multigene panel testing has a major drawback: the likelihood of identifying a variant of unknown significance (VUS) far exceeds that of discovering a pathogenic mutation, especially as the number of genes tested increases [
6]. Indeed, several converging arguments are required to define pathogenicity of a variant [
7,
8].
Taking VUS into consideration is a daily clinical challenge. It has a major impact on the preventive screening or treatment strategy; therefore, misinterpretation of a VUS can be physically or psychologically harmful [
9]. Functional testing helps reclassify VUS and is trending in translational studies [
10], but feasibility on a clinical scale is sparse and not yet implemented [
11]. Large international consortia like ENIGMA aim to reclassify variants by gathering genotypic and phenotypic data from various sources, recognizing that the rarity of the variants is the main issue [
12].
Current variant classification guidelines do not include analysis of matched tumor samples. Yet, the two-hit theory for inherited cancer predisposition conferred by heterozygous, germline mutations in tumor suppressor genes postulates that the normal allele is locally lost or outcompeted by the mutant allele, due to a second, somatic variation in the same gene. These may be copy number alterations (CNAs), pathogenic point mutations, small insertions/deletions (INDELs), or epigenetic modifications that reduce the expression or function of the normal allele, or increase that of the germline mutant [
13]. We therefore hypothesized that matched tumor sequencing could serve as an argument to study the implication of germline variants in the development of cancer in HBC patients, based on the presence of somatic events in the same gene.
Methods
Patients and germline DNA samples
Patients were eligible for inclusion if they had a personal history of breast and/or ovarian cancer, met the criteria for clinical genetic counseling and testing based on the guidelines of the Belgian Society of Human Genetics and were negative for
BRCA1,
BRCA2,
TP53, and
CHEK2 pathogenic mutations. Matching tumor material had to be available. All patients signed an informed consent approved by the Ethics Committee of the hospital. Demographic, familial, and clinical data were recorded to calculate the breast cancer (BC) lifetime residual risk and
BRCA mutation carrier pre-test probability for each patient using the BOADICEA algorithm [
14]. Ten milliliters of blood was drawn from each patient for DNA extraction using the Wizard genomic DNA purification kit (Promega).
Germline whole-exome sequencing (WES)
Briefly, 1 μg of genomic DNA was processed. Genomic DNA was captured using Agilent in-solution enrichment methodology with their biotinylated oligonucleotide probes library (SureSelect V6 Exome, Agilent Technologies), followed by paired-end 150 bases massively parallel sequencing on Illumina HiSeq4000 to at least 60× average coverage. Sequence capture, enrichment, and elution were performed according to the manufacturer’s instructions and protocols. Image analysis and base calling were performed using Illumina Real-Time Analysis (2.7.7) with default parameters.
Matched tumor WES
Five consecutive 10-μm sections were obtained from the most tumor-representative formalin-fixed and paraffin-embedded (FFPE) sample. A matched hematoxylin and eosin-stained section was used to macrodissect the tumor area. DNA was extracted using the QIAamp DNA FFPE Tissue Kit (Qiagen) and quantified using the Qubit dsDNA high-sensitivity Assay kit (Thermo Fisher Scientific). Tumor WES was performed by Integragen (Ivry, France) with similar capture kit and sequencer as the germline WES. Specifically, a minimum amount of 50 ng of DNA was needed to create the libraries, followed by paired-end 75 bases massively parallel sequencing to at least 120× average coverage.
Both germline and tumor reads were aligned to the reference human genome sequence GRCh37 using Burrows-Wheeler Aligner 0.7.15 (Wellcome Trust Sanger Institute). Duplicate reads were marked and removed using Picard 1.107 (Broad Institute). Local realignment around indels and base quality score recalibration were performed using the Genome Analysis Toolkit 3.3 (Broad Institute). Germline single-nucleotide variants (SNV) and small indels were identified using GATK Haplotype Caller 3.3 whereas somatic SNV and small indels were identified using the Mutect2 algorithm based on GATK Haplotype Caller 3.7 (Broad Institute). Called variants were annotated, filtered, and visualized using Highlander (
http://sites.uclouvain.be/highlander/), an in-house bioinformatics framework.
Classification and selection of germline variants identified by WES
We used a list of 735 candidate genes, including 565 genes selected for germline mutation analysis in a previous landmark study of cancer predisposition [
15], supplemented with genes implicated in DNA repair or related to BC by literature mining (Table S
1). Variants were retained if passing quality criteria (Phred score for quality of mapping > 30, no more than two different haplotypes at the considered position, variant called outside the 3′ end of the supporting reads, absence of strand bias) had an allele frequency in the ExAC database of < 0.015, were considered pathogenic by at least two prediction softwares (among SIFT, CADD, Fathmm, LRT, DEOGEN2, Mutation Assessor, Mutation Taster, and Polyphen2), or affected splicing (estimated by 2 ensemble learning methods [
16]).
Germline variants in well-established BC predisposing genes (
BRCA1,
BRCA2,
TP53,
PALB2,
ATM,
CHEK2,
CDH1,
PTEN,
STK11) were manually classified according to ACMG guidelines [
17]. Variants classified as VUS, pathogenic, or likely pathogenic were retained, as were the variants with conflicting interpretation in ClinVar [
18]. Germline variants in the remaining genes (thus without known association with the phenotype) were retained if meeting the aforementioned sequencing filtering criteria.
Assessment of tumor WES data for somatic second hits
We assessed each tumor for somatic variations (point mutations; INDELs; CNAs, i.e., amplifications and deletions; loss of heterozygosity (LOH); and copy-neutral loss of heterozygosity (CN-LOH)) in the genes containing a germline variant. The presence (both positive as negative) or absence of selection pressure of the germline variant in the tumor sample was assessed by the difference in allele balance between tumor and normal (DAB) analysis (chi-square test,
p value threshold of 0.05 for significance) derived from the allelic depths in both samples. DAB of the considered genomic region was further considered by analysis of the behavior in the tumor of each germline heterozygous SNV on the given chromosome. The Benjamini-Hochberg procedure was used to correct for multiple testing, and
p values were plotted as in Manhattan plots from genome-wide association studies. True DAB was retained only if the region surrounding the germline variant depicted significant
p values for DAB. CNAs were assessed by the FACETS algorithm [
19]. Regarding the locus of the germline variant, loss of heterozygosity (LOH) was defined as the loss of the normal allele in the tumor. Copy-neutral LOH (CN-LOH) was defined as a diploid status with DAB in the tumor. Homozygous deletion (HZ-DEL) was defined as the loss of both alleles. Amplification was defined as a copy number status ≥ 6, similar to the threshold used when considering clinically meaningful ERBB2 amplification [
20]. The validity of the allele calls was cross-checked with the DAB analysis. Somatic mutations in a gene carrying a considered germline variant were called using Mutect2, as described above, and retained only if the germline variant did not display negative pressure selection in the tumor.
Analysis of the patterns of somatic mutations
Global patterns of somatic variants were analyzed using complete WES data. We analyzed mutational signatures and quantified the contribution of the known COSMIC signatures (
http://cancer.sanger.ac.uk/cosmic/signatures) to the observed somatic mutational processes using the R package MutationalPatterns [
21]. Homologous recombination deficiency (HRD) was determined in each tumor sample. We processed the data derived from FACETS to calculate three different HRD scores (telomeric DAB, large-scale state transition, and genomic LOH) combined to a global mean HRD score, using the R scripts kindly made available by Nathanson and Pluta et al., described elsewhere [
22]. We used MutSigCV to identify significantly mutated genes [
23]. Tumor mutation burden (TMB) was defined as the ratio of the number of somatic variants detected (after the exclusion of germline variants) and the size of the capture kit (60 Mb).
Visualization of the genomic results
All analyses downstream of the variant calling were performed in R (version 3.5.1,
http://www.R-project.com). Data visualization was obtained with in-house developed scripts, with the Gviz and GenVisR packages [
24,
25], or with ProteinPaint [
26].
Evaluation of splicing alterations
RNAs were extracted from lymphocytes with TriPure (Roche) and retro-transcribed using RevertAid H-Minus First Strand cDNA Synthesis Kit (Fermentas), with random hexamers. PCR amplification was done using specific primers, available upon request. Amplicons were cloned into pCRII-TOPO Vector (Invitrogen). Plasmids were purified with PureYieldTM Plasmid Miniprep System (Promega) and Sanger sequenced.
In vitro kinase assay of the germline ERBB2 variant
MSCV-human Erbb2-IRES-GFP was a gift from Martine Roussel (Addgene plasmid # 91888;
http://n2t.net/addgene:91888; RRID:Addgene_91,888) [
27] and served as a template for mutagenesis (the considered variant and the positive control V777L described in Bose et al. [
28]). Primers for the mutagenesis (available upon request) were designed using QuikChange Primer Design (Agilent). The entire coding sequence was verified using Sanger sequencing before and after the insertion in a lentiviral vector. HEK293T cells were grown in Dulbecco’s modified Eagle’s medium (DMEM) supplemented with 10% fetal bovine serum and 1% penicillin/streptomycin. As the overexpression of wild-type ERBB2 has an oncogenic effect which could prevent us from seeing the effect of the mutations, we artificially reduced the number of ERBB2 proteins expressed in each cell, by transfecting a mix (5% ERBB2 plasmid and 95% of empty lentiviral vector) of plasmids into the HEK293T cells using jetPEI® (Polyplus, France) according to the manufacturer’s instructions. Protein lysates were homogenized using a 21-gauge needle and resolved on precast polyacrylamide gels (Bio-Rad). Primary antibodies were purchased from Cell Signaling Technologies: ERBB2, phospho-ERBB2(Y1248), EGFR, phosphor-EGFR(Y1068), phospholipase C gamma (PLCγ), phospho-PLCγ(Y783), and alpha-actinin and used at recommended dilutions. Separate membranes were used for total and phospho-antibodies. Visualization was performed with an anti-rabbit secondary antibody (BioSource) at 1:10,000 dilution with a femto-sensitive ECL detection system (Pierce).
Immunohistochemistry
Immunohistochemistry was performed on 4-μm paraffin sections. Heat-induced antigen retrieval was performed in a PT-link pre-treatment module (DAKO, Agilent Technologies). After endogenous peroxidase blocking, sections were incubated overnight at 4 °C with a PMS1 rabbit polyclonal primary antibody (1:100 dilution, 10859-1-AP, Proteintech). After three washes with TBS-Tween, sections were incubated with HRP-conjugated anti-rabbit polymer (Envision, DAKO) for 30 min at room temperature, and immunoreactivity was revealed using 3′3′-diaminobenzidine.
Discussion
Multistage acquisition of DNA abnormalities in cancer-related genes is a well-recognized oncogenic process. Hereditary retinoblastoma and hereditary renal cell carcinoma (Von-Hippel Lindau disease) arise with the inheritance of a germline loss-of-function mutation in
Rb1 and
VHL, respectively [
40,
41]. The loss of the normal (functional) allele occurs locally, due to a somatic second mutation in the same gene, rendering these cells deficient. Inactivation of both
BRCA1 and
BRCA2 alleles appears to be required for the HRD characteristic of
BRCA-related HBOC [
22]. Two-hit inactivation has also been described, in smaller case series, for
PALB2- and
ATM-related HBC, and
BRIP1-related hereditary ovarian cancer [
42‐
44].
Our hypothesis was that matched tumor sequencing could be helpful in pinpointing genetic bases of suspected predisposition to BC in patients without pathogenic mutations in BRCA1, BRCA2, TP53, and CHEK2. In 735 cancer-related genes, we identified a mean of 4.7 variants per patient, with some in silico features of pathogenicity.
Of 329 germline variants, 28 from 19 different patients were significantly enriched in the paired tumor by a CNA, supporting a possible role for them in oncogenic processes. Importantly, CNA-related enrichment of these germline variants could not be attributed simply to an overall increase in genome-wide aneuploidy in these samples: cumulative aneuploidy size was not significantly different between samples that did or did not show a somatic second hit CNA at the locus of the germline variant. Besides, two genes presented a second somatic mutation, in samples not characterized by a high TMB. Twenty-five variants from 22 different patients were significantly depleted in the tumor sample by CNA, refuting the involvement of these variants in oncogenesis.
We showed that, besides gene-centric analyses, data on global somatic mutation patterns (TMB, somatic indel count, mutational signatures, and HRD) are necessary to refine the interpretation of germline variants. These analyses allow to differentiate somatic driver from passenger events and to highlight if the biological process related to the gene considered is dysregulated. These analyses confirmed the involvement and enrichment of the
NTHL1 variant in CABR51, similar to a previous study [
45], whereas they helped to refute the role of another
NTHL1 variant in the oncogenesis of CABR90. In one case (CABR46), several arguments pointed towards the presence of a “WES-invisible” second hit mechanism involving the germline
PMS2 variant (e.g., gene promoter methylation) leading to a hypermutated tumor. While lacking definitive proof of mismatch repair deficiency, we could not find any other event that could explain the very high TMB associated with this tumor. A large study demonstrated that
BRCA1,
BRCA2, or
PALB2 (but not
ATM or
CHEK2) bi-allelic inactivation is associated with the mutational signature 3 [
46]. In our study, high HRD scores could be explained in almost every case by tumor histology and molecular classification (invasive medullary carcinoma or TNBC) or by the presence of a tumor-enriched germline variant in a gene implicated in homologous recombination (
PALB2 in CABR95). Interestingly, CABR61 presented a tumor with suspected bi-allelic
MRE11A inactivation and had a high signature 3 activity. The
MRE11A variant was probably not implicated in the tumorigenesis of CABR38, as this tumor did not contain a sign of HRD (Figure S
2A). This adds relevant data to the study of Polak et al., which did not contain a case of
MRE11A inactivation [
46].
Sequencing of the second primary tumor was also helpful in reclassifying variants. It served as an argument to weaken the case for ABCD4 and NF2 as oncogenic drivers of CABR18, given the discordant results found in her breast and ovarian tumors. This analysis should be considered with caution for several reasons; to our knowledge, data on the consistency of second hits in multiple tumors in a single patient carrying predisposing mutations are scarce. Furthermore, sporadic tumors may arise in patients with germline predisposing mutations [
47]. Thus, both tumors will not necessarily present the same founder oncogenic events. Nevertheless, this analysis strengthened the hypothesis that the
MRE11A variant in CABR61 is indeed pathogenic, as an enriching second hit was also detected in the tumor of her mother.
Predisposition to cancer has historically been linked to the transmission of a heterozygous defective tumor suppressor gene, giving rise to oncogenesis after the inactivation of the second allele. However, recent publications demonstrated the involvement of germline defects in oncogenes also responding to the two-hit mechanism. In a study of more than 10,000 cases from 33 cancer types, high tumor expression of a germline variant in an oncogene (
AR,
MET,
RET,
CBL, and
PTPN11) was found in 33 patients [
5]. Inherited susceptibility to lung cancer has also been demonstrated in rare families with a germline
EGFR mutation, the majority of them harboring a somatic second hit [
48‐
50]. Somatic activating mutations of
ERBB2 represent a well-described mechanism driving oncogenesis in several cancer types. These mutations typically cluster in the extracellular ligand-binding and intracellular kinase domains, but transmembrane and juxtamembrane domain mutations have also been identified [
28,
51]. Here, we describe a patient with a germline
ERBB2 variant undergoing highly significant somatic enrichment by CN-LOH. Despite its unusual location in the C-terminal part of the protein, the expression of this variant strongly increased phosphorylation of ERBB2 and the downstream signaling protein PLCγ. Added to its low frequency in the general population (MAF 0.76%, with only one homozygous individual), this suggests the variant is a weak activating mutation requiring a second hit for oncogenesis.
We believe that the clinical spectrum of the phenotype is still a critical point to acknowledge when considering the predisposing role of a variant. Recently, several studies focusing on mutation prevalence questioned the ability of guidelines for cancer genetic testing to detect mutation carriers [
52,
53]. Nevertheless, the penetrance of disease-causing mutations may vary according to the testing indications, family history pattern, and the presence of other risk factors; underscoring cautious decision-making is required when highlighting variants in a gene not fitting the classical clinical syndrome [
54].
The limitations of our study are those that are typically encountered by geneticists and oncologists in the clinical setting: First, in most of the families, we were not able to obtain germline and tumor DNA from other affected relatives due to cancer-related death or from a second primary tumor. As demonstrated in four cases, this can be very effective in reinforcing or weakening the candidacy of the findings in the index patient. Second, we did not study all possible second hit mechanisms (e.g., epigenetic modification). While read-outs such as mutational signatures, TMB, and HRD analyses can be surrogate markers of defects in particular (classes of) proteins, they do not provide complete information on the ultimate genetic causes. Third, a large, collaborative dataset would increase the probability of encountering each germline variant at least twice. Consistency in the behavior of the variant across tumors could be seen as a strong argument for its implication in oncogenesis. Fourth, although we argue that it would be a missed opportunity to not consider somatic events and patterns for refinement of variant analysis, we also agree that this should not be considered as a stand-alone argument, irrespectively of the existing ACMG criteria. Pathogenic
BRCA1/2 variants do not present LOH in all pancreatic cancers [
55]. Fifth, sample purity and sequencing depth are critical factors in determining the sensitivity of the detection of somatic events. Although all our samples had a tumor purity estimate > 30% (median 55%), we acknowledge that higher coverage would have been beneficial for more accurate LOH detection in the samples with lower tumor purity. Finally, as cancer is a multistage process evolving over time, predisposition due to a germline mutation implies the second hit is an early event. Multiregional tumor sequencing or single-cell sequencing would be useful in unraveling the evolutionary history of the disease, distinguishing drivers from passenger somatic mutations. Theoretical methods to infer the timing of events using single DNA samples exist, but are based on broad assumptions about tumor clonality and apply only to gain (mutation, amplification) and not to loss (deletion, LOH) of information [
56].
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.