Introduction

Annually, almost one million people develop gastric cancer (GC) and ~723 000 people die of this disease worldwide.1 This makes GC the fifth most common malignancy and the third leading cause of cancer-related mortality worldwide.1 In Western Europe, the incidence of gastric cancer (GC) is 8.8 per 100 000 persons for men and 4.3 per 100 000 persons for women.1

GC is a multifactorial disease in which both genetic and environmental factors are involved. The main environmental factor is infection with Helicobacter pylori, which increases the risk of developing GC about six-fold.2 The World Health Organization (WHO) classified H. pylori as a class I carcinogen in 1994.3, 4

GC is a heterogeneous disease and can be roughly divided into three main types; diffuse-type GC, intestinal-type GC and a remaining group composed of mixed and indeterminate GC types.5 Diffuse-type GC (DGC) consists of poorly cohesive single cells without gland formation. Due to the frequent presence of signet ring cells, this type of GC is often referred to as signet ring cell carcinoma. Intestinal-type GC (IGC) is composed of glandular or tubular components with various degrees of differentiation.6

In both low and high GC incidence countries, around 8–30% of patients with GC have a familial history of GC.7, 8, 9, 10, 11 Germline CDH1 pathogenic mutations, predisposing to hereditary diffuse gastric cancer (HDGC), have been encountered in a subset of GC families.12, 13, 14, 15, 16, 17, 18, 19 The International Gastric Cancer Linkage Consortium has recently broadened the CDH1 testing criteria with the aim to identify as many CDH1 mutation carriers as possible.20

Families in whom no germline CDH1 mutation can be identified remain genetically unexplained and may carry pathogenic mutations in other, yet unknown, GC susceptibility genes. Recently, DGC families with mutations in CTNNA116, 21 and MAP3K622 have been described, but the exact contribution of these genes to GC predisposition remains unclear until more families with mutations in these genes are reported. In families with IGC exhibiting an autosomal dominant inheritance pattern, genetic susceptibility genes may also play a role, but no genes have yet been associated with this type of GC.

The aim of the current study was to identify novel candidate GC susceptibility genes using whole-exome sequencing of germline DNA isolated from the blood of patients suspected of genetic predisposition for GC, but without CDH1 mutations.

Materials and methods

Patient selection for exome sequencing

In our exome sequencing cohort, 54 patients from 53 families meeting one of the following criteria were included: one gastric cancer diagnosed below the age of 35 years, two GC cases diagnosed in first- or second-degree relatives at or below the age of 60 years (index diagnosed at or below the age of 50 years) or three cases of GC in first- or second-degree relatives diagnosed at or below 70 years of age. The majority of the patients (n=33) had previously been proven negative for CDH1 mutations. For each family a single patient was included, with the exception of one family for which two patients were tested. Patient characteristics are shown in Table 1. This study was approved by the medical ethics committee of the Radboud university medical center, reference number 2013/201 and the Institutional Review Board of the Baylor College of Medicine.

Table 1 Patient characteristics exome sequencing cohort

Exome sequencing, variant annotation and exclusion of normal variation

Detailed information on the sequencing statistics of individual samples can be found in Supplementary Table 1 Online. Whole-exome sequencing of genomic DNA extracted from peripheral blood cells of the patient was performed using the 5500XL SOLiD platform (Life Technologies, Bleiswijk, The Netherlands) for 26 samples and on the Illumina HiSeq (2x100bp paired end; Illumina, Inc., San Diego, CA, USA) for 13 samples (BGI, Copenhagen, Denmark). Exome enrichment was performed using either the human SureSelect All Exon 50 Mb kit (n=11) or the human SureSelect All Exon V4 kit (n=28), targeting the coding regions of ~21 000 human genes (Agilent Technologies, Santa Clara, CA, USA). Reads were mapped to the Human Genome Reference Assembly GRCh37/hg19 using LifeScope software (Life Technologies) for samples sequenced on the SOLiD instrument and variants were called with the DiBayes algorithm. Exomes that were sequenced on the Illumina instrument were mapped using BWA and variants were called with GATK. All variants were annotated using an in-house annotation pipeline, as described previously.23, 24

For 15 patients, exome sequencing was performed through the Human Genome Sequencing Center at Baylor College of Medicine, according to previously described methods.25, 26 Sequencing was performed on the Illumina HiSeq 2000 platform (Illumina, Inc.). Subsequently, reads were mapped and aligned to the Human Genome Reference Assembly GRCh37/hg19 using the BCM-HGSC Mercury pipeline.27 Variant calling was performed with the Atlas228 and SAMtools29 algorithms; variant annotation was performed with an in-house developed annotation pipeline30 based on ANNOVAR.31 Custom scripts were used incorporating multiple databases to retrieve more information on identified variants.

From our total set of variants we selected high-confidence (≥5 variant reads or ≥20% variant reads) non-synonymous variants that were absent from dbSNP or had a dbSNP (v132) frequency <1% and which occurred at most once in our in-house variant database (2096 exomes, the majority of which are from European ancestry).23

Enrichment of truncating variants compared to controls

The number of different truncating variants (nonsense variants, indels leading to a frameshift and variants in canonical splice sites) per gene was established for our data set and an independent in-house database containing 2329 exomes. Also, the number of genes with a given number of variants was determined for the combined datasets. The Fisher’s exact test (incorporated in the IBM SPSS Statistics software package version 20, IBM Corporation, Armonk, NY, USA) was used to determine whether the number of variants for a certain gene in our set was enriched compared with the control data set. To correct for multiple testing we used the modified Bonferroni procedure for discrete data32 based on the number of genes with the same number of truncating mutations observed in the gene of interest. All genes with a P-value<0.05 after multiple testing correction were included for further analysis.

Variant prioritization based on gene function

Missense variants with a PhyloP≥3 in selected pathways (see below) were analyzed using the Alamut 2.0 software package (Interactive Biosoftware, Rouen, France), which incorporates SIFT,33 PolyPhen-2,34 Align GVGD35 and dbSNP (build 135). Missense variants that were predicted deleterious/damaging by at least two of these programs were considered possibly deleterious. Data was analyzed using Alamut software between September 2013 and December 2014.

These possibly deleterious missense variants and the truncating variants were prioritized based on gene function using the following criteria. The first criterion used included variants in known hereditary (gastric) cancer predisposing genes. For this analysis we used an in-house generated list of 113 genes (Supplementary Table 2 Online). In addition, we assessed the recently described GC-predisposing genes CTNNA116, 21 and MAP3K622 for variants. Second, we selected genes putatively involved in GC development. A gene list (Supplementary Table 2 Online) for this category was composed by combining a list of known tumor-suppressor genes36 with genes from the following KEGG pathways:37 regulation of actin cytoskeleton (entry 04810), adherens junction (entry 04520), focal adhesion (entry 04510), epithelial cell signaling in H. pylori infection (entry 05120) and pathways in cancer (entry 05200). Third, based on the detection of a homozygous putatively deleterious variant in MYD88 in one of the patients of this cohort,38 we used the Resource of Asian Primary ImmunoDeficiencies (RAPID) gene list,39 an in-house generated candidate gene list and three KEGG pathways (JAK-STAT pathway (entry 04630), NFκB pathway (entry 04064) and TLR pathway (entry 04620)37 to select variants known to predispose to immunodeficiencies. As a fourth criterion, we selected genes with a high expression in the stomach (based on data from the Tissue-specific Gene Expression and Regulation (TiGER) database40). The combined gene list for the categories mentioned above can be found in Supplementary Table 2 Online.

The National Heart, Lung, and Blood Institute (NHLBI) Exome Sequencing Project (ESP; 6503 exomes) database (hereafter referred to as EVS41), which contains sequencing data of ~6500 individuals of European and African descent was used to assess whether selected variants were present in individuals selected for other diseases than cancer. Furthermore, we used a second in-house database containing 2329 exomes with high coverage to exclude common variants. Finally, the database from the Exome Aggregation Consorium (ExAc42) was used to obtain the frequency of specific variants in a larger control population.

For all truncating variants affecting genes not represented in the selected pathways presented above, the possible relation of the affected gene with GC tumorigenesis was evaluated based on the known function of the gene.

Variants described in this manuscript and Supplementary Tables Online are submitted to the Leiden Open Variant Database (LOVD, ID numbers 103989–104041).

Validation of variants and CDH1 exon 1 germline mutation analysis by Sanger sequencing

The DNA sequence surrounding the variant was amplified using polymerase chain reaction (PCR, primer sequences and PCR conditions are available on request) and screened for mutations using BigDye terminator sequencing (BigDye Terminators (v 1.1) Applied Biosystems, Foster City, CA, USA). Analysis was performed on an ABI 3730 DNA Analyzer (Applied Biosystems). Subsequently, the data were analyzed using Vector NTI advance v11.0 (Invitrogen Corporations, Paisley, UK) or Chromas Lite (Technelysium, Australia). For mutation analysis of CDH1 exon 1 in a subset of the patients, primers surrounding the intron-exon boundaries of this exon were used. PCR and sequencing was performed as described for variant validation.

Results

Patient cohort and characteristics

Whole-exome sequencing was performed on germline DNA from 54 patients of 53 families. In this cohort 23 patients below the age of 35 were included (two without a family history of GC), 16 patients had two cases of GC in the family at or below the age of 60 and 15 patients were from families with three or more GC cases at or below 70 years of age; in this group two patients from one family were included. The mean age at diagnosis of all patients included was 37.9 years (SD 11.9, range 22–70).

According to the original pathology reports, 27 patients had DGC, 8 patients had IGC, one GC was mixed-type and 18 tumors were ‘adenocarcinoma not otherwise specified’. For 17 cases, we were able to review and confirm the histology of the GC (12 patients with DGC and five with IGC). For the remaining cases no revision could be performed.

Exome sequencing statistics

Three different enrichment kits and two different sequencing platforms were used for exome sequencing. On average, 5.2 Gb of data aligned to targets was generated per sample (range: 2–10.2 Gb), hitting 98.9% of the targets (96–99.98%) with an average coverage of 81.7 × (36.8–132 ×). A coverage of at least 10-fold was reached for 93.3% of the targets (81.1–99.31%) and 87.7% was covered at least 20-fold (68.9–98.19%). The statistics for individual exome sequencing data can be found in Supplementary Table 1 Online. Approximately 9200 variants from 54 cases remained (average 170, range 87–379). To test our quality settings, we performed Sanger sequencing on a subset of these variants and found that we were able to confirm 93% of the variants.

Variants in previously described gastric cancer predisposition genes

None of the cases carried pathogenic mutations in CDH1. Also, no variants were found in CTNNA1, which has recently been described as GC-predisposing gene.16, 21 Three variants were identified in MAP3K6; two missense variants (p.Y591C and p.L541P) and one amino acid deletion (p.K1125del). MAP3K6 has been associated with familial GC.22 However, based on the high number of MAP3K6 variants found in non-cancerous controls, we did not follow-up on these variants.

Enrichment of truncating variants compared with controls

As the deleterious effect of truncating variants (nonsense variants, indels leading to a frameshift and variants in canonical splice sites) is most prominent, we tested whether the recurrence of truncating mutations in a given gene was different from that in a control exome sequence data set of 2329 individuals. In our data set, in 12 genes two different truncating variants occurred (Supplementary Table 3 Online). After correction for multiple testing, no enrichment of truncating variants was found in these genes compared to the control cohort.

Occurrence of homozygous or compound heterozygous variants

To explore the occurrence of pathogenic changes in GC predisposition genes that follow a recessive inheritance pattern, we explored whether missense and/or truncating variants occurred in a homozygous or compound heterozygous form in our set of patients with an age at diagnosis below 35. Apart from a germline homozygous missense variant in MYD88 in a patient with GC at age 23 and recurrent candidiasis, which we previously published,38 no other candidate genes were found.

Variant prioritization based on gene function

Because of the large amount of missense and truncating variants, prioritization was performed based on the function and recurrence of the affected genes. To select variants that may be involved in GC predisposition, we created a gene list composed of 1899 genes (for details see Materials and Methods and Supplementary Table 2 Online). Our exome data were then filtered for variants in these genes. After in silico prediction, 252 possibly deleterious variants were identified using this approach (on average 5 per patient, range 1–11). The number of variants in the individual pathways and databases can be found in Table 2, variant details are shown in Supplementary Table 4 Online (excluding variants that were not confirmed after validation using Sanger sequencing). Twenty-one variants were identified in known cancer predisposing genes (Table 2; Supplementary Table 4 Online), including four different heterozygous variants (two truncating and two missense) in the ATM gene, previously associated with a small increased GC risk (RR=3.39, 95% CI=0.86 to 13.4).43 No obvious candidate GC predisposition genes were identified from either the hereditary cancer list, or the other pathways we selected. In addition, for all truncating variants affecting genes not represented in the selected pathways, the putative relation with GC tumorigenesis of the affected gene was evaluated based on the known function of the gene. This did not result in a convincing candidate gene.

Table 2 Number of potential deleterious variant calls in different pathways

Discussion

In the current study, whole-exome sequencing was performed on germline DNA from 54 GC patients from 53 families with the aim to identify novel GC susceptibility genes. In order to increase the likelihood of finding a putative causative mutation underlying GC, these patients were selected from families at high risk of a genetic predisposition, who met strict inclusion criteria. No clear novel GC predisposition gene was identified.

Mutations in CDH1 were not detected in the 21 cases for which no CDH1 mutation analysis had been performed prior to our study. In two recent studies, germline mutations in CTNNA1 were identified in families with GC.16, 21 This gene is in the same pathway as E-cadherin, making it a plausible GC predisposition gene. In our data set of 54 patients no mutations in CTNNA1 were found, which indicates that mutations in this gene probably do not explain a large proportion of the early onset gastric or familial cancer in patients that tested negative for germline CDH1 mutations.

Gaston et al. reported on variants in the MAP3K6 gene in familial GC.22 We have also observed variants in this gene, but we do not consider this gene a strong GC candidate gene for two reasons. Firstly, in the study by Gaston et al. the gene variant p.P946L was identified in a large family, but the variant does not completely segregate with the disease.22 Secondly, this variant occurs quite frequently in the ExAc database (n=640/0.5% allele frequency). This argues against its pathogenicity, simply because it is not expected that a variant that occurs so frequently in a database containing exomes of people without suspicion of hereditary cancer would cause GC, a relatively rare form of hereditary cancer.

The observation that frequently occurring variants are reported as candidate genes for GC development stresses the importance to determine the frequency of variants in candidate genes in local and public control datasets in addition to assessment of functional relevance. In the current study, we have used three datasets to compare our exome data with. The first one is an independent in-house database containing 2329 exomes sequenced with high coverage. The second one is the EVS database,41 which contains sequencing data of ~6500 individuals of European and African descent. The third is the ExAc database,42 containing exome data of 60 706 unrelated individuals. These datasets allowed for more stringent filtering of the data.

Even though we identified over 9200 rare variants in these 54 patients, we were unable to unequivocally show that among these are the disease-causing variants and, therefore, GC-predisposing genes. There may be several reasons for the fact that we did not find clear candidate genes in the current study. For example, the large amount of variants in our data set may have influenced our ability to recognize candidate genes as such. Additionally, a gene similar to the previously described CTNNA1 mutations (which account for only a small portion of GC families) may well have remained undetected in our data set. Furthermore, our strictly selected patient cohort might be too small to identify candidate genes, especially to determine enrichment of genes or pathways compared to controls. Another reason is that, even though we used strict criteria for the selection of families, the patient group that we included in this study is still quite heterogeneous. We included patients with both DGC and IGC, who were either diagnosed at young age or had a family history of GC. For a number of cases the histological subtype was unknown, underscoring the importance of extensive pathological review and reporting of GC according to current guidelines.20 It may be very well possible that performing whole-exome sequencing in a more homogeneous patient cohort may allow for improved detection of candidate genes, although other studies also did not identify promising new GC predisposition genes.16, 44, 45 Also, we performed exome sequencing, whereas predisposing variants may also be in non-protein coding parts of the genome, currently not analyzed. Since we collected only one family member for each family and affected family members are often deceased due to the cancer, we were not able to follow-up on potential candidate genes. Finally, it could be possible that some of the patients we included developed cancer because of chance and occasional familial clustering or complex inheritance involving multiple genomic alterations.

Future perspectives

Taken together, we performed exome sequencing in 54 CDH1 mutation-negative patients from 53 families. In this study we did not identify obvious candidate genes for GC predisposition. Future studies should be performed in larger, homogeneous patient groups and we would suggest that data from different research groups should be combined to identify candidate genes in these families. If candidate genes are identified this way, it will enable better preventive care in carriers of these mutations.