Background
Breast cancer (BC) is the most frequent cancer among women in the world, being also the second leading cause of cancer death in women in more developed regions after lung cancer [
1]. As early diagnosis for BC could lead to successful treatment and good prognosis for recovery, it is important to develop efficient risk prediction algorithms that aid to identify high-risk individuals. Although many countries have implemented mammography screening programs, they are mostly applied to all women in certain age categories without any additional stratification by other risk factors. However, the benefits of such screening programs are often debated. Existing tools to assess BC risk [
2‐
4] are often not systematically used in screening due to insufficient up-to-date risk factor’s information. Also, they only capture the heritable component either in the form of family history or using the information on rare genetic variants (BRCA1/2).
It has been estimated in twin studies that the heritability of breast cancer ranges from 20 to 30% [
5]. However, only 5–10% of BC cases have a strong inherited component identified in a form of rare genetic variants [
6], indicating that in addition there should be a considerable polygenic component in the disease liability. This is also supported by the results of large genome-wide association studies (GWAS) – more than 100 genomic loci have been identified as being associated with BC in Europeans [
7].
Based on the GWAS results, several efficient polygenic risk scores (GRS) have been developed for common complex diseases that in many cases could be used to improve the existing risk prediction algorithms [
8‐
11]. It is natural to expect that a similar GRS for BC may aid risk prediction in clinical practice.
So far, several studies have combined the SNPs with established genome-wide significance in a GRS for BC. Sieh
et al [
12] used 86 SNPs and Mavaddat
et al [
13] 77 SNPs to calculate a GRS, both showing a strong effect of the score in predicting future BC cases. Few studies have also demonstrated the incremental value of adding GRS to proposed BC prediction algorithms [
14,
15]. Although several different GRSs have been proposed for BC risk prediction, no head-to-head comparison of the scores has been found in the literature. It has also not been assessed, whether the number of SNPs in the GRS could be increased. The latter was also problematic due to unavailability of summary statistics from large-scale GWASs.
In 2017, the large scale GWAS by Michailidou
et al [
7] released summary statistics for around 11.8 million genetic variants. Almost at the same time, UK Biobank released their GWAS results for BC for ~ 10.8 million SNPs. As evidence from studies on other common complex diseases have indicated that predictive ability of a GRS can be improved by adding the effects of a large number of independent SNPs in addition to the ones with established genome-wide significance, we intended to explore this approach using both summary files.
Discussion
We demonstrated that a metaGRS that combines a multigenic and a polygenic GRS for breast cancer - metaGRS2 - performed better than using either one of the previously published multigenic GRSs and also better than the best polygenic GRS alone. While in average about 5% of women in the EstBB cohort (as well as in the Estonian population) have been diagnosed with BC by the age of 70, women in the highest five percentiles of the metaGRS2 distribution reach the same cumulative risk level (5, 95% CI 2.1 to 7.8%) by the age of 49, thus more than 20 years earlier. It is also notable that women with metaGRS2 level below median reach such risk level (4.6, 95% CI 3.6 to 5.6%) only by age of 79, thus almost 10 years later. These findings suggest that the polygenic risk estimate based on metaGRS2 could be an efficient tool for risk stratification in clinical practice, for targeted screening and prevention purposes.
Given that the potential benefits of non-selective BC screening within certain age categories (compared to potential harm from over diagnosis) have been under serious discussion in the medical community [
23], personalized approaches based on individual risk levels deserve further assessment. Ideally, those should integrate available information from clinical risk factors and also genetic information. The latter could include both moderate- and high-penetrance germline mutation testing, as well as polygenic risk scores. That approach is also supported by our findings, where considerable increase in c-statistics were observed while combining polygenic risk scores and NCI estimates together.
However, while incorporating a GRS in clinical BC prediction, one should keep in mind that a GRS represents a mixture of different pathways, but is still not likely to capture the heritable component completely. As our findings indicated that a GRS and family history have independent predictive effects on BC risk, accounting for individual’s genetic information and family history (indicating either the mother has suffered from breast cancer or not or the status is unknown) simultaneously seemed to result in the better risk estimation than using only one of these predictors alone. However, more research is needed to assess the usefulness of combining our proposed metaGRS2 with full pedigree-based family history data.
As depending on a GWAS that is used as a basis, different (and not necessarily highly correlated) GRSs can be produced, it can be expected that those GRSs might emphasize the effects of different biological pathways. This hypothesis seems plausible in the light of several associations found between different GRSs and BC risk factors. Expectedly, GRSs including only a small number of significant SNPs (like GRS75 and GRS70) were highly correlated and if we could have included all original 86 SNPs instead of 70, correlation between GRS86 and GRS75 would have likely remained similar or decreased a little, as excluded SNPs from the original 86 SNPs were rather rare.
The fact that a metaGRS performed better than alternatives, suggests that even though the multigenic GRS75 including only genome-wide significant SNPs was already a good predictor for BC, other SNPs included in the polygenic GRSONCO - but not in the GRS75 - have some additional predictive power. Most likely, not all SNPs included in the GRSONCO are truly associated with BC, however, as they have some predictive power, possibly also through being associated with some of the risk factors of BC, one should not completely ignore them while building an optimal GRS.
It remains an open question whether it is always the best practice to use metaGRS instead of several different genetic risk scores – if one can pinpoint biological mechanisms behind different scores, more optimal preventive strategies could be chosen. Still, until we are unable to convincingly link different GRSs with specific preventive measures, targeted prevention should be based on a GRS with the best possible overall predictive ability, such as the metaGRS2 proposed here.
One should also keep in mind that besides GRS there are genetic mutations such as BRCA1/2 known to be associated with very high familiar BC risk. Therefore, in practice, any genomic risk stratification procedure should also include search for high- and moderate-risk genetic variants, if possible. In the high-risk mutation carriers, the clinical management could be based on the specific genetic (mendelian) variants, or if deemed useful in the future, a combination of mendelian variants and GRS levels, but it definitely needs further studies.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.