Background
According to the 2015 annual report, 1,824,700 new lung cancer cases are diagnosed each year, which accounts for 13% of all cancers, excluding non-melanoma skin cancers. In addition, it is still the leading cause of cancer mortality worldwide, suggesting that lung cancer is a major problem for healthcare worldwide [
1]. According to the 2015 yearbook of the National Cancer Registration and Statistics in Korea, lung cancer occurred in 66.0 per 100,000 males and 28.7 per 100,000 females [
2]. Compared to other countries, Korea is 10th in the incidence of male lung cancer and 4th in the incidence of female lung cancer [
3]. The total incidences in Korea are not significantly different from other countries, but the incidence of lung cancer is higher in non-smokers and women in Korea, and e
pidermal growth factor receptor (EGFR) mutations are detected much more frequently than in Western countries.
The main causes of lung cancer are direct and indirect smoking, radon, indoor emissions from household combustion, and exhaust from diesel engines (
https://monographs.iarc.fr/agents-classified-by-the-iarc/) [
4]. Mutational analysis of lung cancer using publicly available data such as TCGA has shown that the smoke-related signature, with many C > A transversions, is a dominant signature in lung adenocarcinoma and lung squamous cell carcinoma [
5]. Somatic mutations in cancer are caused infidelity of the DNA replication machinery as well as and defects in DNA repair mechanisms following exposure to endogenous or exogenous mutagens [
6]. The somatic mutations observed in some cancers are significantly related to exposure to a specific carcinogen, such as smoking in lung cancer and ultraviolet light in skin cancer [
7].
Certain mutational processes in cancer often accompany unique combinations of mutation types called signatures [
8,
9]. Recently, Alexandrov et al. developed a theoretical model and computational framework that could deconstruct unique patterns of somatic mutations using cancer specimen sequencing data based on the analysis of somatic substitutions obtained from whole genome sequencing of breast cancer patients [
10,
11]. Among the 30 signatures, signature 4, which is characterized by a majority of C > A mutations along with some other base substitution classes, is found only in cancer types in which smoking is a major risk factor and in epithelial cancers that are directly exposed to cigarette smoke. The mutational signature is similar to the mutational pattern resulting from exposing cells to benzo [a] pyrene, a major carcinogen in tobacco. This mutational pattern occurs in the process of nucleotide excisional repair after binding of a bulky DNA adduct to the guanine [
5,
8].
Given that lung adenocarcinoma with EGFR-tyrosine kinase inhibitor (TKI) sensitizing mutation (mEGFR) is common in light and/or non-smokers, Asians, and women, it is expected to have mutational pressures other than cigarette smoking, [
12]. The L858R mutation and the exon 19 deletion (E19del), which includes the LREA motif, comprise up to 90% of EGFR mutations, followed by L861Q, G719X, and rare mutations [
13]. The clinical outcomes of mEGFR positive lung cancers have dramatically improved due to the development of target drugs, but these cancers eventually acquire drug resistance and show disease progression [
14,
15]. The clinical courses and responses to EGFR-TKIs differ between the E19del and L858R groups, which are representative subtypes of mEGFR [
16]. Therefore, the identification of carcinogenesis by the estimation of significant mutagenic stressors is needed, along with a proactive approach for these subtypes of cancer.
Therefore, we investigated whether there is a distinctive mutation pattern in the subtypes of lung adenocarcinoma responsive to EGFR-TKI. Targeted sequencing was performed on major cancer-related genes using lung adenocarcinoma with mEGFR (study cohort), and the characteristics of the obtained mutations were analyzed. In addition, the mutation characteristics of the L858R and E19del subtypes, which occupy the majority of the mEGFR, were compared and analyzed. Finally, whole exome sequencing data from TCGA-LUAD with mEGFR (LUAD cohort), which are publicly available, were analyzed and used to verify the mutational characteristics of study cohort. The characteristics of the genetic variations were analyzed in the context of the mutational signature proposed by Alexandrov et al. [
17].
Discussion
mEGFR-positive lung adenocarcinoma is a distinctive subtype of lung cancer which attracts attention because it is a prevalent disease which accounts for about half of East Asian lung adenocarcinomas and because of the facts that it does not involve the typical risk factors for lung cancer such as age, gender, and smoking [
19].
The proportion of C > A transversions, which is related to tobacco smoking, was not the major SNV, whereas C > T transitions comprise the highest proportions of mutations in both cohorts. Another interesting finding is that the proportion of C > A transversions is relatively high in the LUAD cohort than study cohort, which may be explained by the fact that the cases in the LUAD cohort had a higher smoking history than that of the study cohort (LAUD cohort: 40.4% vs. study cohort: 27.8%). Further analysis of LUAD cohort according to EGFR subtype revealed that the Tv frequency was relatively higher, and mutation signature 4 was observed in the SNV-subgroup. Although signature 4 was not derived in the study cohort, it is presumed that mutagenic stressors such as smoking are related to the L858R mutation because the Tv frequency is higher, similar to the LUAD cohort. The E19del-subgroup had lower mutational burden than the SNV subgroup. Both the data from the targeted panel of this study and whole exome data from TCGA-LUAD showed that the mutations of E19del subgroup randomly distributed throughout the genome and the obvious causes could not be detected in the demographic characteristics of E19del subgroup. Younger age in the study cohort and less smoking history in LUAD cohort subgroup might be attributed to these findings. Mutational signature 5 was main variation pattern in the study cohort, whereas signatures 4, 30, and 5 were derived from the analysis of the LUAD cohort. These differences may be attributed to factors such as the higher proportion of smokers in the EGFR-L858R group and higher age in the LUAD cohort compared to the study cohort (LUAD cohort: 66.30 ± 9.54 vs. study cohort: 60.51 ± 10.31 years), and unidentified racial differences. However, during additional analysis by subgroups, signature 5 was predominant and commonly derived, indicating that this maybe one of the key mutational signatures in this cancer type.
The underlying mutational mechanism for signature 5, which exhibits a transcriptional strand bias for T > C substitutions at ApTpN context, is yet to be well elucidated. This signature is common in papillary cell renal carcinoma, neuroblastoma, and clear cell renal carcinoma and, in some cancer types, is associated with increased age. However, the correlation between signature 5 and increased age was not observed in the analysis of whole exome sequencing of lung cancer, and it was observed even when we examined the demographic characteristics of our lung cancer set [
20]. Among the cancers arising in the kidney, this mutation is characteristic of clear cell and papillary renal cell carcinoma, which absorbs metabolites continuously, whereas it is low in chromophobe renal carcinoma in cortical collecting ducts, suggesting that it may be attributable to the replication error of deaminated cytosine and adenine [
21]. Indeed, relentless efforts are required to find out mutagenic stressors other than smoking, such as radon, indoor emissions from household combustion, and exhaust from diesel engines, by collecting the cases enriched with signature 5 and investigating them in various aspects. In contrast, the mutational signature 30, found in a small subset of breast cancers, was observed in the analysis of LUAD cohort; the cause of this mutation pattern has yet to be estimated.
To find out recurrent mutations in specific genes according to the mEGFR subtypes, concurrent mutation was detected by oncoplot and then detected mutations were further examined using the oncodrive function in maftools package, which based on algorithm oncodriveCLUST [
22]. In this inspection, the E19del subgroup of study cohort had concurrent mutations in the following order; TP53 > IDH2 > FBXW7 and in the SNV subgroup; TP53 > FBXW7 > KRAS. On the other hands, E19del subgroup of LUAD cohort has concurrent mutations as following order; CDKN2A > CEP76 > KIAA2026 and SNV subgroup; AP3D1 > EMR1 > FASTKD3. The mutations observed here were randomly distributed on the genes, and other recurrent driver mutations except mEGFR were not derived.
Targeted sequencing using the Foundation One panel could reflect the results of whole exome sequencing, in terms of the mutational burden [
23]. This study was carried out based on this assumption, however, the analysis of study cohort using a customized panel containing 70 major genes covering 0.62 Mb is concerning in terms of direct comparison with the LUAD cohort, which is based on whole exome sequencing. In the future, if the cost is further reduced, it may be necessary to find a minimum sequencing area that can represent whole exome sequencing.
Taken together, the subtype of lung adenocarcinoma with EGFR-TKI-sensitizing mutations does not show a characteristic mutation pattern influenced by smoking and additionally shows a low incidence of C > A transversion, which is a common feature of lung cancer; it also had a mutational burden lower than those of other TCGA cancers. E19del and L858R, which are representative subtypes of lung adenocarcinoma, differ in the characteristics of mutations, as the E19del group has a lower mutation burden and a higher ratio of transition than the transversion mutations. Overall, the presence of mutational signatures 5 and 30 was a predominant pattern observed across the subtypes, but the main factors related to this type of signature are still unknown, so they require further in-depth studies on signature 5 and 30 in this particular subtype of lung cancer.