Discussion
We investigated the genetic basis of carcinogenic pleiotropy through whole-exome sequencing of individuals diagnosed with multiple primary cancers from two large, multi-ancestry study populations. Comparing individuals with multiple cancers to cancer-free controls uncovered 22 independent, suggestively associated variants, ten of which remained associated when comparing individuals with multiple cancers to those with a single cancer. Across our multiple cancer phenotypes, we also recapitulated previously known gene-based associations in ATM, BRCA1/2, and CHEK2 and found potentially novel associations in SAMHD1 and SLC6A2. These genes remained associated with multiple cancer diagnoses when comparing to individuals with a single cancer. These findings offer insights into germline exome variants that increase an individual’s risk of developing multiple primary cancers.
Compelling findings from our analyses of all individuals with more than one cancer diagnosis include associations with the rare variant rs146381257 in
ZNF106. Carriers of the rs146381257 risk allele (C) were primarily overrepresented in individuals with at least one prostate, breast, lung, or urinary bladder cancer and in individuals with lymphoid neoplasms. Carriers also demonstrated an increased risk of developing multiple cancers compared to individuals with a single cancer.
ZNF106 is an RNA binding protein involved in post-transcriptional regulation and insulin receptor signaling. Although germline variation in
ZNF106 has not previously been associated with cancer risk, a recent study found it to be associated with worse urinary bladder cancer survival [
35].
Additional noteworthy findings from our analyses of all multiple primary cancers combined include cancer susceptibility signals in
SAMHD1 and
SLC6A2, both having a significantly higher risk being diagnosed with multiple cancers compared to single cancers. Germline
SAMHD1 mutations are implicated in Aicardi-Goutieres syndrome (AGS) [
36], an autosomal recessive condition that results in autoimmune inflammatory encephalopathy. Most cancer-related studies have focused on the role of somatic alternations in
SAMHD1 [
37]; however, a study of chronic lymphoid leukemia (CLL) proposed an oncogenic role of germline
SAMHD1 variation mediated by DNA repair mechanisms [
38]. Consistent with this hypothesis, we also found increased
SAMHD1 variation in individuals with lymphoid neoplasms, as well as with prostate, breast, colorectal, and lung cancers.
SLC6A2, also known as
NAT1, has been found to be prognostic for colon cancer [
39], and both in vivo and in vitro studies have linked expression to survival in many cancer types, including prostate [
40] and breast [
41]. Polymorphisms in
SLC6A2 may also interact with smoking exposure to modulate the risk for tobacco-related cancers [
42].
Because we compared multiple primary cancers with both cancer-free controls and individuals diagnosed with a single cancer, we were well positioned to explore patterns of pleiotropy and disentangle variation likely to be driven by single cancers. For example, we identified two variants, rs7872034 (missense variant in
SMC2) and rs143745791 (missense variant in
NCBP1), suggestively associated with a diagnosis of at least one breast cancer (plus any other cancer) versus no cancer. These variants remained associated with a diagnosis of breast and another cancer when comparing to individuals diagnosed with a single breast cancer. While rs7872034 is in high LD (
r2 = 0.98) with a known breast cancer risk variant (rs4742903;
SMC2 intron) [
43], it may also increase the risk of developing multiple cancers. Regarding rs143745791, germline variants in
NCBP1 have not been previously associated with cancer; because it is rare (MAF < 0.2%), larger sequencing efforts may be necessary to identify variation in studies of individuals with a single cancer. Expression of this gene has been found to promote lung cancer growth and poor prognosis [
44], and
NCBP1 is overexpressed in basal-like and triple-negative breast cancers [
45]. Similarly,
BRCA1/2 germline variants are prevalent among these subtypes; however, in our study populations,
BRCA1/2 carriers were more common among those with an additional ovarian cancer whereas
NCBP1 carriers more frequently had an additional cervical cancer.
In our prostate cancer-specific analysis comparing individuals with multiple cancers versus those with only a single cancer, we discovered a suggestive association with rs3020779, an eQTL for
RNF123 (also known as
KPC1), which is a gene involved in p50 mediation and downstream stimulation of multiple tumor suppressors [
46]. In our analysis of head and neck cancer, we detected an association with rs12253181, located in the 3′-UTR of
RTNK2. Integration of whole blood gene expression data at this locus determined that another nearby gene,
ARID5B, may be a more likely candidate. Expression of
ARID5B was negatively correlated with the cancer susceptibility signal in this region. While this gene has not previously been associated with head and neck cancer risk, germline variation in
ARID5B has been implicated in acute lymphoblastic leukemia (ALL) [
47], as well as treatment resistance and higher rates of relapse [
48]. Genetic variants in
ARID5B have also been linked to autoimmune diseases [
49,
50], suggesting that immune dysregulation may be a plausible pleiotropic mechanism at this locus, especially given the infectious etiology of oropharyngeal carcinoma [
51,
52].
Our findings have potential implications for improving our understanding of the shared mechanisms of carcinogenesis. With further replication, they may also enable prevention (e.g., smoking cessation) and screening strategies that prioritize individuals at risk for developing additional cancers. For example, women who carry the rare missense variant in NCBP1 (rs143745791) were estimated to have an approximately sixfold higher risk of developing breast and other cancers in comparison with no cancer and an approximately threefold higher risk in comparison with women diagnosed with breast cancer alone. If replicated, such findings suggest that the pleiotropic variants reported here could have clinical significance for preventative cancer screening and early detection among individuals with a previous cancer diagnosis.
Limitations of our study included the identification of variants that were likely somatic in our analyses of hematologic cancers due to an expansion of hematopoietic clonal populations with the same acquired mutation (i.e., CHIP). Confounding of germline testing by CHIP has been reported in
TP53 [
53] and
TET2 [
54], so careful interpretation is critical to avoid unnecessary clinical intervention. An additional limitation of our, and other, studies are obtaining accurate effect estimates for rare variants and the reliance on available annotations for inclusion into gene-based tests. Although heterogeneity was minimal in our study, differences in effects across populations may reflect differences in population characteristics and sample size. Replication of rare findings in larger cohorts and optimization of functional impact annotations could lead to more precise results. Also, our approach did not allow for formal replication, due to the limited sample size of each cohort. In order to identify signals for our largely understudied phenotype, we combined the two cohorts in a meta-analysis rather than undertaking underpowered replication. Finally, while all individuals with multiple cancers were included in our study regardless of genetic ancestry, individuals of non-European ancestry were underrepresented; larger, more diverse cohorts will be needed to fully explore the genetic basis of multiple cancers.
Selection bias and phenotypic misclassification may also have biased our results. We combined prevalent and incident cancer cases together to maximize statistical power for detecting potential associations. The prevalent cases may include fewer individuals with worse prognosis since these individuals may be less likely included in the study. If any pleiotropic variants reflect more aggressive disease, this could lead to underestimating their potential associations, and vice-versa. Also, the controls’ disease status is conditional on their being cancer free at the last follow-up. If some controls would eventually be diagnosed with cancer, then any associations would be underestimated. There is the potential that recurrences arising from the first cancer may have been misclassified as second primaries. If so, this may overestimate pleiotropic associations. In our study, 10.3% and 17.6% of second primaries that occurred within 1 year of the index cancer in the KPRB and UKB respectively may represent recurrences. However, the average age at diagnosis between first and second cancers was 8.3 years (median = 7) in the KPRB and 9.5 years (median = 6.5) in the UKB, suggesting that the majority of multiple cancer cases were most likely second primaries.
Strengths of this work include studying individuals of multiple ancestries who were largely unselected for specific cancer phenotypes. We also performed the first ever exome-wide study of genetic susceptibility to multiple primary cancers, using two large multi-ancestry study populations. Our study design allowed us to characterize variation across multiple primary cancers representing 36 unique sites, as well as to conduct cancer-specific analyses of 16 sites. Using this approach, we confirmed many known single-variant and gene-based findings, strengthening and supporting our novel results reported for individual cancers through our cancer-specific analyses.
In summary, by undertaking an exome-wide survey of common and rare variations in two large study populations, we identified several variant and gene-based associations that may increase the risk of developing multiple cancers within individuals. Future studies should aim to replicate our findings and undertake experiments that validate the functionality of the discovered pleiotropic variants. Combined with future research, our results have the potential to inform genetic counseling, improve risk prediction for multiple cancers, and guide novel treatment and drug development.
Acknowledgements
We are grateful to the Kaiser Permanente Northern California members who have generously agreed to participate in the Kaiser Permanente Research Program on Genes, Environment, and Health and the ProHealth Study. The authors also thank the Regeneron Genetics Center for covering the costs of whole-exome sequencing of the Kaiser Permanente Research Bank study participants.
Regeneron Genetics Center author list and contribution
RGC Management and Leadership Team
Goncalo Abecasis, D.Phil., Aris Baras, M.D., Michael Cantor, M.D., Giovanni Coppola, M.D., Andrew Deubler, Aris Economides, Ph.D., Katia Karalis, Ph.D., Luca A. Lotta, M.D., Ph.D., John D. Overton, Ph.D., Jeffrey G. Reid, Ph.D., Katherine Siminovitch, M.D., Alan Shuldiner, M.D.
Sequencing and Lab Operations
Christina Beechert, Caitlin Forsythe, M.S., Erin D. Fuller, Zhenhua Gu, M.S., Michael Lattari, Alexander Lopez, M.S., John D. Overton, Ph.D., Maria Sotiropoulos Padilla, M.S., Manasi Pradhan, M.S., Kia Manoochehri, B.S., Thomas D. Schleicher, M.S., Louis Widom, Sarah E. Wolf, M.S., Ricardo H. Ulloa, B.S.
Clinical Informatics
Amelia Averitt, Ph.D., Nilanjana Banerjee, Ph.D., Michael Cantor, M.D., Dadong Li, Ph.D., Sameer Malhotra, M.D., Deepika Sharma, MHI, Jeffrey Staples, Ph.D.
Genome Informatics
Xiaodong Bai, Ph.D., Suganthi Balasubramanian, Ph.D., Suying Bao, Ph.D., Boris Boutkov, Ph.D., Siying Chen, Ph.D., Gisu Eom, B.S., Lukas Habegger, Ph.D., Alicia Hawes, B.S., Shareef Khalid, Olga Krasheninina, M.S., Rouel Lanche, B.S., Adam J. Mansfield, B.A., Evan K. Maxwell, Ph.D., George Mitra, B.A., Mona Nafde, M.S., Sean O’Keeffe, Ph.D., Max Orelus, B.B.A., Razvan Panea, Ph.D., Tommy Polanco, B.A., Ayesha Rasool, M.S., Jeffrey G. Reid, Ph.D., William Salerno, Ph.D. , Jeffrey C. Staples, Ph.D., Kathie Sun, Ph.D., Jiwen Xin, Ph.D.
Analytical Genomics and Data Science
Goncalo Abecasis, D.Phil., Joshua Backman, Ph.D., Amy Damask, Ph.D., Lee Dobbyn, Ph.D., Manuel Allen Revez Ferreira, Ph.D., Arkopravo Ghosh, M.S., Christopher Gillies, Ph.D., Lauren Gurski, B.S., Eric Jorgenson, Ph.D., Hyun Min Kang, Ph.D., Michael Kessler, Ph.D., Jack Kosmicki, Ph.D., Alexander Li, Ph.D., Nan Lin, Ph.D., Daren Liu, M.S., Adam Locke, Ph.D., Jonathan Marchini, Ph.D., Anthony Marcketta, M.S., Joelle Mbatchou, Ph.D., Arden Moscati, Ph.D., Charles Paulding, Ph.D., Carlo Sidore, Ph.D., Eli Stahl, Ph.D., Kyoko Watanabe, Ph.D., Bin Ye, Ph.D., Blair Zhang, Ph.D., Andrey Ziyatdinov, Ph.D.
Research Program Management & Strategic Initiatives
Marcus B. Jones, Ph.D., Jason Mighty, Ph.D., Lyndon J. Mitnaul, Ph.D.