Introduction
Based on risk and frequency, three classes of breast cancer susceptibility genes or loci are currently recognized: high-risk genes such as
BRCA1 and
BRCA2 (Mendelian Inheritance in Man numbers (MIMs) 113705 and 600185) in which protein-truncating mutations and severely dysfunctional missense substitutions confer a five- to ten-fold increased risk and for which the summed allele frequency in the general population is <1%; intermediate-risk genes such at
ATM and
CHEK2 (MIMs 208900 and 604373) in which protein-truncating mutations and severely dysfunctional missense substitutions confer a two- to five-fold increased risk and for which the summed allele frequency in the general population may approach 1%; and common, modest-risk single nucleotide polymorphisms (SNPs) which individually have much higher frequency but only rarely confer risk greater than 1.25-fold [
1,
2]. Linkage analysis provided an effective genome-wide approach for locating high-risk susceptibility genes, and genome-wide association study provided an effective approach for finding risk-associated SNPs. While exome sequencing-based strategies may eventually provide a hypothesis-free, genome-wide approach for identification of intermediate-risk genes, much of our current knowledge base has flowed from candidate gene studies [
3].
The MRN complex, formed from dimers of the proteins encoded by
MRE11A,
RAD50, and
NBN (MIMs 600814, 604040, and 602667), plays key roles in DNA double-strand break (DSB) repair, meiotic recombination, cell cycle checkpoints, and maintenance of telomeres [
4]. In mice, homozygous knockouts of these genes are lethal [
5‐
7]. Humans born with biallelic mutations in any one of the three genes share a cellular phenotype that includes sensitivity to ionizing radiation, a deficit in DNA DSB repair, and chromosomal instability (MIM 604391; MIM 251260) [
8]. Moreover, these people are at risk of severe cancer susceptibility phenotypes. For example, two brothers who were compound heterozygotes for mutations that fell in the amino region of the MRE11A protein died of pulmonary adenocarcinoma before age 20 [
9,
10], which would seem very unlikely in the absence of an underlying cancer predisposition. Susceptibility to lymphoma is a prominent feature of Nijmegen breakage syndrome, which is caused by biallelic mutation of
NBN[
11]. While too few human biallelic
RAD50 mutation carriers have been identified to reach a conclusion about their cancer susceptibility, more than 20% of mice homozygous for a hypomorphic Rad50 allele (Rad50 p.Lys22Met) that lived past age four months died with lymphoma or leukemia [
12].
Breast cancer risks for heterozygous carriers of MRN gene mutations were summarized briefly by Hollestelle
et al. [
3]. Of the three genes,
NBN has the strongest evidence in support of acting as an intermediate-risk breast cancer gene. This is largely because the truncating variant
NBN c.657del5 has a high enough frequency among individuals of Slavic origin to be evaluated by case-control analysis, and meta-analysis of nine such studies revealed a combined odds ratio (OR) of 2.63 (95% confidence interval (CI) 1.76 to 3.93) for this variant [
13]. Most evidence in favor of
RAD50 rested on a truncating variant
RAD50 c.687delT which has been observed in Finnish cases and controls [
14,
15]; while subsequent studies in the same and other populations are consistent with the hypothesis that
RAD50 is an intermediate-risk breast or pancreatic cancer susceptibility gene, they have not provided significant supporting evidence [
16,
17]. Evidence for
MRE11A rests primarily on the observation of two mutations in the gene from a series of eight non-
BRCA1/
2 breast cancer families with tumors that showed loss of all three MRN proteins [
18].
Previously, we performed case-control mutation screening studies of
ATM,
CHEK2,
XRCC2, and
RAD51 to clarify our understanding of their role in breast cancer susceptibility [
19‐
22]. A common thread across these studies has been use of bioinformatic and statistical approaches designed to detect evidence of pathogenicity from both truncating and splice junction variants (T + SJV) and/or rare missense substitutions (rMS). Here, we apply a case-control mutation screening strategy in an ethnically diverse series of subjects to evaluate
MRE11A,
RAD50, and
NBN. Given that the three proteins form an evolutionarily conserved complex involved in maintenance of genomic integrity, we decided to evaluate the three genes as a single large candidate intermediate-risk breast cancer susceptibility gene with a concatenated open reading frame of 2,774 amino acids - which nevertheless is not quite as large as the 3,056 amino acid open reading frame of
ATM. Our analysis addresses two related questions: (1) are some rare MRN variants intermediate-risk breast cancer susceptibility alleles, and if so (2) do the MRN genes follow a
BRCA1/
2 pattern wherein most susceptibility alleles are protein-truncating variants, or do they follow an
ATM/
CHEK2 pattern wherein half or more of the susceptibility alleles are missense substitutions?
Discussion
In the present work, we evaluated the contribution of rare variants in the genes
MRE11A,
RAD50, and
NBN to breast cancer risk. As the proteins encoded by these genes form an evolutionarily conserved complex that could be functionally impaired by a dysfunctional variant in any one of the genes, we evaluated them as if they constitute a single candidate susceptibility gene. Combining T + SJV, and key functional domain rMS, we found that this set of rare MRN genes variants contributes to breast cancer susceptibility (OR = 2.88,
P = 0.0090). A
post hoc test for heterogeneity did not reveal evidence for between-gene differences in the case-control distributions of likely pathogenic variants: Fisher’s exact test
P values of between-gene heterogeneity for T + SJV, key domain rMS and the combination of the two classes of rare variants were 0.43, 1.00, and 0.53, respectively (Table
7). Similarly, looking at the genes individually, neither truncating variants, nor key domain missense substitutions, nor a combination of the two reached statistical significance from single gene data (Table
9). Thus evidence from this study in favor of the MRN genes as intermediate-risk breast cancer susceptibility genes emerges from the ensemble analysis of the three genes.
Table 9
Individual contributions of
MRE11A
,
RAD50
, and
NBN
to the ensemble model
Noncarriers | 1,283 | 1,114 | | |
MRE11A |
T + SJV¥ | 1 | 0 | ∞[P = 1.00]¥ | |
rMS or IFD*¢ | 7 | 1 | 6.08 (0.75-49.5) | 3.62 (0.42-31.5) |
Combined | 8 | 1 | 6.95 (0.87-55.6) | 5.02 (0.59-42.8) |
RAD50 |
T + SJV¥$ | 4 | 3 | 1.16 (0.26-5.18) | 1.09 (0.23-5.24) |
rMS or IFD* | 10 | 2 | 4.34 (0.95-19.9) | 3.21 (0.68-15.2) |
Combined | 14 | 5 | 2.43 (0.87-6.77) | 1.98 (0.68-5.71) |
NBN |
T + SJV¥ | 4 | 0 | ∞[P = 0.13]¥ | |
rMS or IFD* | 3 | 1 | 2.60 (0.27-25.1) | 2.12 (0.20-22.6) |
Combined | 7 | 1 | 6.08 (0.75-49.5) | 5.28 (0.62-45.2) |
MRE11A, RAD50, and NBN ensemble model* |
T + SJV¥$ | 9 | 3 | 2.60 (0.70-9.65) | 2.61 (0.67-10.1) |
rMS or IFD*¢ | 20 | 4 |
4.34 (1.48-12.7)
|
3.07 (1.01-9.31)
|
Combined | 29 | 7 |
3.60 (1.57-8.24)
|
2.88 (1.22-6.78)
|
Although MRN gene T + SJV were not by themselves a significant breast cancer risk factor, we note that our OR point estimate of 2.61 is both very close to the meta-analysis point estimate of 2.63 that Zhang
et al. obtained for
NBN c.657del5 [
13], and close to the point estimate of 2.32 that we obtained in our meta-analysis of
ATM T + SJV [
19]. Thus, while we cannot exclude that our nonsignificant finding is actually indicative of little or no risk of breast cancer conferred by MRN gene protein-truncating variants, our data are more strongly in accord with the hypothesis that they confer an intermediate risk of magnitude similar to the risk conferred by truncating variants in
ATM.
Overall, there was no association between rMS and risk of breast cancer. Nevertheless, tightening the focus to key functional domain rMS resulted in a significant association with an OR of approximately 3.0. In this sense, the MRN genes behave as the homologous recombination repair genes BRCA1, BRCA2, and ATM - genes in which rare missense substitutions that are pathogenic because of missense dysfunction per se are largely confined to key functional domains.
Combining MRN T + SJV and key functional domain rMS, we observed an OR of 2.88 with a
P value of 0.0090. That
P value meets the threshold of
P <0.01 that Hollestelle
et al. suggested for establishing intermediate-risk susceptibility genes that were already strong candidates based on their biochemical function [
3]. Thus with a mutation screening and data analysis approach that considered MRE11A, RAD50, and Nibrin as a unique functional entity and focused the analysis of rMS to those that fall in the key functional domains of the MRN complex, we overcame the limitation of previous suggestive studies that were based on a small number of founder mutations [
13,
15,
18], and confirmed the hypothesis that
MRE11A,
RAD50, and
NBN are intermediate-risk susceptibility genes in a general sense. Moreover, because we did not observe any of the four sequence variants most responsible for the MRN genes’ candidate gene status (
MRE11A p.Arg202Gly,
MRE11A p.Arg633Stop,
RAD50 c.687delT, and
NBN c.657del5) [
13,
14,
18], this confirmation is independent of the hypothesis-generating data.
Five of the sequence variants observed in the MRN case-control mutation screening bear further discussion.
NBN p.Arg215Trp was of interest because association studies have found evidence that it confers modest risk of several cancers (for review, see [
66]), and there is biochemical evidence, albeit somewhat conflicting, of altered function of this nibrin allele [
67,
68]. We observed six cases and six controls with the p.Arg215Trp missense substitution, resulting in an OR of 0.96 (95% CI 0.30 to 3.06,
P = 0.95). While these confidence intervals are too wide to exclude the possibility that
NBN p.Arg215Trp is actually a modest-risk variant, we also point out that position Arg215 is quite variable in our protein multiple sequence alignment and that, according to EVS data, the variant has a frequency of 0.37% in Caucasian Americans - well above the frequency threshold we found for severely dysfunctional variants in homologous recombination repair genes in which biallelic mutations cause embryonic lethality or severe childhood disease.
Second, we observed one carrier, a Caucasian control ascertained at the age of 55, of the
MRE11A in-frame deletion c.2109del9. The variant falls in the last exon of the gene, near the carboxy terminus of the second DNA binding domain (which is also the carboxy terminus of the protein). Because this domain is required for double-strand break formation during meiosis but not for repair of double-strand breaks [
46], the domain was not included in the list of ‘key functional domains’ and the indel was not included in statistical analyses of key functional domain variants.
Third, we observed one carrier, an East Asian case diagnosed at the age of 35, of
MRE11A p.Thr481Ile. This residue is a threonine in all but one species in our alignment, but is a methionine in the cephalochordate,
Branchiostoma lanceolatum. The substitution falls within the protein’s RAD50 interaction domain. While very few of the rare variants that we observed have been reported in human ataxia-telangiectasia-like disease or Nijmegen breakage syndrome patients, another substitution at this residue, p.Thr481Lys, was observed in an Italian ataxia-telangiectasia-like disease sib-pair [
69].
Fourth, we observed one carrier, an East Asian control ascertained at age 50, of the RAD50 frameshift c.3852del4. Because the frameshift falls in the last exon of the gene where it would not be expected to cause nonsense-mediated decay of the mRNA, we evaluated it as an in-frame deletion rather than as a frameshift. As such, it scrambles well-conserved sequence near the carboxy terminus of the protein’s carboxy-end ATPase domain and final MRE11A binding domain including positions such as Arg1288 and Lys1291 that are invariant in our protein multiple sequence alignment. The sequence scrambling creates nonconservative substitutions at invariant key functional domain positions, resulting in the highest possible sequence variant severity score.
Fifth, we observed one carrier, also an East Asian control ascertained at age 50, of the NBN missense substitution p.Ile35Thr. This position falls in the protein’s functionally important forkhead-associated (FHA) domain and is either isoleucine or leucine in all of the species included in our NBN protein multiple sequence alignment.
The last three variants described above illustrate two of the analytic problems encountered in this study. All three were evaluated as key domain C65 rMS and all three were observed in East Asian subjects. Combined with eight additional observations of key domain C65 rMS in either East Asian or Latina subjects against just one in a Caucasian of European ancestry, there was an unexpected excess of these variants in the non-Caucasian subjects mutation screened in this study. Second, the two variants observed in the controls affected positions with little or no cross-species physicochemical variability; consequently, they would be graded as severe C65 variants with either a mammals-only protein multiple sequence alignment or with our complete alignment through the deuterostomate Strongylocentrotus purpuratus. In contrast, the MRE11A rMS that described from a breast cancer case (p.Thr481Ile), as well as the rMS observed at the same position in a pair of ataxia-telangiectasia-like disease cases (p.Thr481Lys), score as severe C65 substitutions when evaluated with the mammals-only alignment but as likely innocuous C0 substitutions when evaluated with the evolutionarily deep alignment. Since the observation of a nonconservative rMS at MRE11A position Thr481 in a pair of ataxia-telangiectasia-like disease cases increases the odds that substitutions at this position are in fact pathogenic, it appears that using the evolutionarily deeper alignments is, for the MRN genes, counterproductive. On the other hand, the empirically determined allele frequency thresholds derived by combining older ATM, BRCA1, BRCA2, and CHEK2 case-control mutation screening data with EVS and 1000G data - found to be 0.1% for the three genes (ATM, BRCA1, and BRCA2) where inheritance of biallelic mutations is either embryonic lethal or causes a developmental phenotype that severely reduces reproductive fitness, and 0.32% for CHEK2 - provides a new tool to help with evaluation of the many rare variants observed in a case-control mutation screening study of candidate cancer susceptibility genes.
For
BRCA1 and
BRCA2, it is well established that a strong majority of pathogenic variants are, ultimately, protein-truncating variants. In contrast, case-control mutation screening of
CHEK2 revealed an approximately equal contribution from T + SJVs and rMSs to the fraction of breast cancer attributable to rare variants in that gene, and a case-control mutation screening meta-analysis of
ATM revealed that rMS in that gene may actually be responsible for a larger fraction of the breast cancer attributable to rare variants than are the T + SJVs [
19,
20]. In the mutation screening data reported here, rare key functional domain missense substitutions in the MRN genes were more frequent (24 vs. 12 observations) than truncating variants and conferred a slightly higher OR (3.07 vs. 2.61) with a lower
P value (0.029 vs. 0.14). These data are more congruent with the
ATM/
CHEK2 pattern than the
BRCA1/2 pattern. Since there is not yet any efficient approach to clinically actionable classification of missense substitutions in these genes, these data point toward a clinical problem. When the MRN genes are mutation screened as part of a clinical panel-based cancer susceptibility gene sequencing test, a large fraction, if not the majority, of the genetic risk attributable to them will reside in rare missense substitutions that will initially be reported to clinical geneticists as unclassified variants.
The analytic strategy of treating the three genes as a single concatenated gene had one notable drawback: we are not able to ask whether variants in each of the three genes are best evaluated under the same analysis model. Thus an enormous amount of work, likely involving larger scale mutation screening efforts to gain more analytic precision, tests of segregation to examine penetrance and tumor spectrum, and perhaps development of functional assays to aid evaluation of rare missense substitutions, remains to be performed on with MRE11A, RAD50, and NBN.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
FD led the mutation screening of RAD50, contributed to data analysis, and helped to draft the manuscript. MP led the mutation screening of NBN, contributed to data analysis, and helped to draft the manuscript. JO led the mutation screening of MRE11A, contributed to data analysis, and helped to draft the manuscript. FLCK contributed to study design, led the laboratory team, and helped to draft the manuscript. CV was responsible for data management throughput for the project, helped to refine the laboratory platform, and helped to draft the manuscript. ELY built and curated the MRN protein multiple sequence alignments, helped with scoring missense substitutions, and helped to draft the manuscript. NR contributed to the mutation screening and data analysis, helped to refine the laboratory platform, and helped to draft the manuscript. NF contributed to the mutation screening and data analysis, helped to refine the laboratory platform, and helped to draft the manuscript. GD contributed to the mutation screening and data analysis, helped to refine the laboratory platform, and helped to draft the manuscript. MPV built the algorithm for evaluating splice junction variants and helped to draft the manuscript. KT contributed to evaluation of splice junction variants and helped to draft the manuscript. TCR adapted the splice junction analysis algorithm to run on sequence variants written in genome coordinates and helped to draft the manuscript. GJW defined the coordinates of the MRN protein key functional domains and helped to draft the manuscript. JLH was the lead investigator for subjects gathered through the Australian site of the BCFR and helped to draft the manuscript. MCS contributed to study design, contributed to the management of samples obtained through the Australian site of the BCFR, and helped to draft the manuscript. ILA was the lead investigator for subjects gathered through the Ontario site of the BCFR and helped to draft the manuscript. EMJ was the lead investigator for subjects gathered through the Northern California site of the BCFR and helped to draft the manuscript. DEG contributed to study design, contributed to the analysis of rare variants in BRCA1 and BRCA2, gave advice on analysis of the MRN data, and helped to draft the manuscript. FL contributed to study design and data analysis, and helped to draft the manuscript. SVT was responsible for overall study design, contributed to data analysis, and helped to draft the manuscript. All authors read and approved the final manuscript.