Background
Cancer is a neoplastic disease consisting of cancer cells that harbor numerous biological capabilities that occur due to the accumulation of various genetic aberrations and genomic alterations [
1]. In 2018, about 18 million new cases of cancer were recorded and about 9.5 million deaths occurred with lung and BC; being the leading cause of mortality in men and women, respectively [
2]. BC is the leading cancer of women in India, with 27.7% of all the cancers in women [
3]. A study conducted in the Kashmir province of Jammu and Kashmir highlighted BC to be the second most common cancer in women with 16.1% of all the cancers, closely following colorectal cancer (16.8%) [
4]. Despite the exacerbating rate of BC in J&K, there is a high susceptibility of post-menopausal women to develop breast cancer, especially with numerous factors underplay [
5,
6]. Early menarche and late menopause ensure a longer exposure to the hormone estrogen thus increasing the risk of BC [
7]. Various studies have identified the variations in high penetrant genes like
BRCA1, BRCA2,
PTEN,
TP53,
CDH1, and
STK11 along with moderate penetrant genes such as
CHEK2,
BRIP1,
ATM and
PALB2 [
8,
9] and their association with BC. There are about 182 loci that have been identified and are susceptible to BC [
10] which accounts for 30% of the genetic heritability of BC [
11]. Keeping in view the missing heritability and scanty literature of BC in studied population group, we investigated cancer-associated variants [
12,
13] to get an insight into their role in BC development among the population of Jammu and Kashmir. With an effort to bridge this gap, we conducted a case-control association study in population of post-menopausal women from North India.
Discussion
A replicative case-control study was done in about 550 samples to analyze the variants using Agena MassARRAY genotyping for the population of Jammu & Kashmir. Here, we investigate various BC loci in cases and controls. We investigated using the Agena massARRAY platform and identified numerous SNPs that were found significantly associated with BC genome-wide and independent of each other. Our study demonstrated 4 genome-wide loci which have been associated with BC development in the population under study. The rsIDs rs12190287 and rs1051266 associated with the genes
TCF21 and
SCL19A1 are causing risk in our population group. We also found six variants following HWE however not showing significance with BC development. The allele frequency of all the variants is shown in Figs. S
2-S
11.
The variant rs1051266 is located on the
SLC19A1 gene.
SLC19A1 or Solute Carrier family member protein is a gene implicated in placental carcinomas and pediatrics osteosarcomas. Studies have shown the
SLC19A1 gene variants to be associate with BC risk in worldwide populations [
35] including African American women [
36]. Our data revealed the variant rs1051266 to be significantly associated with BC risk in the population under study. Further, the bioinformatic analysis revealed that the associated variants are conserved in primates including humans and have been located in the conserved domain region (Fig. S
13). We also studied the genotype tissue expression of the variants (GTEx) with their NES (Normalized Effect Size) values, which have been shown in Table S
3. The GTEx with NES (Normalized Effect Size) was used to study the correlation between the genetic variation and gene expression in the human tissues. The variant 1,051,266 (
SLC19A1) was significantly showing expression in breast tissue with an NES value of − 0.4333 and a
p-value of 2.4e-6 (< 0.05).
TCF21 or Transcription factor gene is a tumor suppressor gene and is associated with Uterine Corpus carcinoma and Pericoronitis.
TCF21 is found mutated in several types of cancers [
37] Studies have shown a lower expression of
TCF21 in breast tumor tissues corresponding to enhanced tumor size and increased lymph node metastasis [
38]. We analyzed the variant rs12190287 G > C of the
TCF21 gene and found it to be significantly associated with BC in the studied population group. The variant was found causing risk for BC in the population. The variant rs12190287 (
TCF21) showed significant expression in breast tissue, with an NES value of 0.210 and a
p-value of 3.3e-5. The positive NES value indicated the up-regulated expression in the breast tissue.
ERCC1 gene or Excision Repair Cross-Complementing Rodent Repair gene which harbored the rs2298881 variant, functions in a nucleotide excision repair pathway [
39].
ERCC1 is found to be associated with multiple cancers.
ERCC1 variants have also been linked to an increased risk of BC [
40] in women. The variant rs2298881 C > A was found significantly associated with breast cancer. The variant was found to be conferring protection for our BC in the studied population group. The variant rs2298881 (
ERCC1) showed significant expression in the breast with an NES value of − 0.260 and a
p-value of 3.8e-9.
DCC or Deleted in Colorectal Cancer is a gene encoding the netrin1 receptor. Netrin1 receptor is a transmembrane receptor belonging to the immunoglobulin superfamily.
DCC gene is a tumor suppressor gene and is frequently mutated in colorectal carcinomas.
DCC is abundantly expressed by neurons and stimulates cell survival and axon regeneration. Apart from mutations in colorectal cancers, studies have highlighted the role of
DCC in BC. A variant of the
DCC gene, rs2229080, has been found associated with increased BC risk [
13]. Our study revealed that rs2229080 G > C was significantly associated with breast cancer and the altered allele C was causing protection in the studied population group. Though for the variant rs2229080 (
DCC) the expression in breast tissue was found non-significant with the NES value of 0.054 with a
p-value of 0.3. The positive NES value in rs12190287 (
TCF21) is indicative of the up-regulation of the expression in the breast tissue and the variants 1,051,266 (
SLC19A1) and rs2298881 (
ERCC1) with negative NES points towards down-regulated expression in the breast tissue.
The RNA fold analysis revealed the MFE and structural differences in the wild and the altered allele. We also studied the difference in the secondary structures and the MFE values of the wild type allele and the variant allele. There was a decreased MFE in the case of the wild type allele of the variants rs12190287 and rs1051266 providing them an enhanced stable structure than the altered allele. Whereas, the rsIDs rs2229080 and rs2298881 associated with the genes
DCC and
ERCC1 were found to be causing protection to BC. The MFE values of these variants were lower for the altered allele thus suggesting a more stable structure of these allele variants. The decrease in the MFE of the altered allele points towards an increase in the stability of the secondary structure. These variants have been found to be conferring protection for breast cancer in the studied population. Previously studies have elaborated on the codon selection biasness for a higher negative free energy and folding stability of the RNA secondary structure [
10]. Owing to the myriad role of RNA structure in cancer development [
11], it might be a potential cancer development risk in the population. Further analysis of the second structure of the genes with the variants highlighted a substantial difference in the MFE and MFE of centroid secondary structure. The differences in the MFE values of the variants have been summarized in the Table S
4. The differences in the secondary structures of the alleles have been shown in Fig.
1. On comparing the allele frequencies of the associated allele with 1000genome data, we found substantial differences in the allele frequencies. The differences in the allele frequency of the associated alleles have been depicted in the Fig. S
12. An intermediary value of allele frequency for the variant rs1051266 was observed. The allele frequency in the Indian subcontinent comprising of the PJL (Punjabi’s from Lahore, Pakistan), ITU (Indian Telugu from the UK) and STU (Sri Lankan Tamil from the UK) was intermediary, around 0.4 in a range of 0 (low) to 1 (high). Similar allele frequencies were observed in the GIH (Gujaratis Indian from Houston, Texas), BEB (Bengali from Bangladesh), GBR (British in England and Scotland), and CEU (Western European Ancestry) populations. The frequency of the variant rs12190287 for found inclined towards a higher side being around 0.7 for the Indian subcontinent. A similar high frequency was seen for the BEB (Bengali from Bangladesh), MXL (Mexican Ancestry from Los Angeles USA) and PEL (Peruvians from Lima, Peru) populations. The variant rs2298881 had a comparably lower frequency worldwide. Its frequency in India was on a lower side, around 0.2, with similar frequency observed in BEB (Bengali from Bangladesh) and ASW (Americans of African Ancestry in SW USA) populations. However, in the far eastern populations including the JPT (Japanese in Tokyo, Japan) and CHB (Han Chinese in Beijing, China) a higher frequency of these variants was observed. A very high frequency of about 0.8 for the variant 2,229,080 was observed in the Indian population. A similar high frequency of the variant was observed in JPT (Japanese in Tokyo, Japan) and BEB (Bengali from Bangladesh) populations. Similar frequency rates could indicate a higher BC rate in these regions. The wide gap between the genetic frameworks of the different populations makes it essential to analyze the genetic heterogeneity among various populations.
Conclusion
This is the first study that provided the preliminary data for the J&K population highlighting the role of 15 variants with BC development. Among the studied variants, four variants namely rs1051266, rs12190287, rs2229080 and rs2298881 showed significant association with BC. The logistic regression with age and BMI revealed the best-fit model to be dominant in the case of rs1051266 (SLC19A1) and rs12190287 (TCF21), while allelic in case of rs2229080 (DCC) and additive in the variant rs2298881 (ERCC1). These variants displayed maximum effect in the populations when incorporated in these models. The RNA fold analysis revealed the structure variations in wild and altered allele along with the differences in the free energy of the secondary structures. In silico analysis of these variants showed the variants to be evolutionarily conserved thus harboring minimum alterations in them over generations, which enables them to maintain their putative structure and function stability efficiently. Three variants rs1051266, rs12190287 and rs2298881 showed significant eQTL effect. The network analysis gave a deeper insight into the nuanced interactions of the candidate genes with other common proteins, alleging a common pathway of function. The associated loci may affect the development of BC in the women of Jammu and Kashmir and should be further verified in independent data sets. For a better understanding of the gene effect more variants of the genes should be further examined. It might be plausible that there are other variants, which have not been studied and have an eloquent association with BC. Our results provide a clue for further functional validation to reveal underlying genetic mechanisms in BC. These SNPs subsequent to validation can aid in the development of a breast cancer panel specific for the population under study. The specific testing can separate the potential risk targets and early detection could be beneficial in the treatment process. This further can also help in the early detection and personalized medicine for the breast cancer patients.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.