Introduction
The cancer burden is rising rapidly due to the aging of the population and the adoption of unhealthy lifestyle behaviors, which became the leading cause of death in China [
1]. Once malignancy tumors were diagnosed, the determination of tissue origin and tumor type is critical for clinical management. In routine clinical practice, tumor diagnosis requires a comprehensive synthesis of the clinical and pathological findings. At present, although the significant advance in imaging techniques and histopathological approaches, including morphology and immunohistochemistry (IHC), the diagnosis remains challenging in patients, which initially presenting with metastatic and poorly differentiated or undifferentiated tumors [
2‐
5].
In the past decade, different approaches based on gene expression profiling, DNA methylation, and genomic alteration were developed to identify tumor tissue of origin [
6‐
8]. Many of these assays compared the molecular profiles of the test sample as determined by either microarray, next-generation sequencing (NGS), or real-time PCR (RT-PCR) to molecular profiles of tumors with confirmed tumor types. Two commercialized assays termed Tissue of Origin (TOO) (Vyant Bio, New Jersey, USA) and CancerTYPE ID (Biotheranostics, San Diego, CA, USA) were commonly performed after the failure of the morphological and IHC assessment [
9,
10]. The clinical utility of these two assays has been evaluated in few validation studies with an overall sensitivity of 87% to 87.8%, which is favorable to the histopathological method [
9,
10].
In our previous study, a 90-gene expression assay was developed to identify 21 common tumor types using RT-PCR methods with total RNA isolated from formalin-fixed, paraffin-embedded (FFPE) tumor tissue [
7]. The tumors originated from 21 tissue types, including adrenal gland, brain, breast, cervix, colorectum, endometrium, gastroesophagus, germ cell, head&neck, kidney, liver, lung, melanoma, mesothelioma, neuroendocrine, ovary, pancreas, prostate, sarcoma, thyroid, and urinary system. In a retrospective cohort of 609 clinical specimens, the 90-gene expression assay demonstrated an overall agreement of 90.4% for primary tumors and 89.2% for metastatic tumors. Several studies also demonstrated the excellent performance of the 90-gene expression assay in differentiation diagnosis of triple-negative breast cancer, metastatic brain tumor, squamous cell carcinoma, multiple primary tumors, etc. [
11‐
14]. In the present study, we conducted a large-scale, multicenter study to evaluate the performance of the 90-gene expression assay for tumor tissue of origin identification in real clinical settings.
Materials and methods
Ethics statement
The study was conducted under protocols approved by the institutional review boards of each institution, including Beijing Cancer Hospital (BCH, Beijing, China), Fudan University Shanghai Cancer Center (FUSCC, Shanghai, China), and Cancer Hospital of the University of Chinese Academy of Sciences, Zhejiang Cancer Hospital (ZCH, Hangzhou, China). All patients signed informed consent.
Case selection
In this study, we enrolled a total of 1540 patients between January 2016 and January 2021 from three institutions in China. The inclusion criteria for the multisite study were the following: (1) surgical specimen including primary or metastatic tumors; (2) histologically confirmed tumor type; (3) diagnosis contained within the 21 main tumor types; (4) FFPE tumor specimens processed less than three years from the time of testing; (5) at least 60% tumor cell content available on the hematoxylin and eosin (H&E) stained slide; (6) less than 40% necrosis. Exclusion criteria were (1) tumor specimens obtained after chemotherapy or radiotherapy; (2) cytology cases, biopsy (needle core biopsy [NCB] or fine-needle aspiration [FNA]) cases and decalcified cases. All samples were deidentified, assigned internal accession numbers. The technicians performed the 90-gene expression assay in each institution. Investigators who interpreted the test results were blinded to patients’ medical history, sample location, and histopathological information.
For cases meeting the inclusion and exclusion criteria, 5 to 15 5 μm unstained sections were freshly cut for total RNA isolation. The regions of tumor tissue were marked on the H&E-stained slides by senior pathologists at each center (W S and Q Y in BCH, QF W in FUSCC, W W and YY L in ZCH). Tumor cells were then enriched by macro-dissected manually. Total RNA was isolated using FFPE Total RNA Isolation Kit (Canhelp Genomics Co., Ltd, Hangzhou, China) as described before [
7]. The concentration and purity of total RNA were measured by spectrophotometer. Exclusion criteria were insufficient RNA (concentration of total RNA, < 60 ng/µl) and low purity (A260/A280 ratio, > 2.1 or < 1.7).
Gene expression profiling and classification algorithm
The 90-gene expression assay (Canhelp Genomics Co., Ltd) was carried out as previously described [
7]. In brief, the reverse transcription was performed on isolated total RNA. Next, the RT-PCR reaction was applied with a 7500 Real Time PCR System (Applied Biosystems) to perform tumor-specific gene expression profiling. The internal control (IC) gene was used to assess the sample quality, while a weak RT-PCR signal (cycle threshold [Ct] value of the IC, greater than 38) was excluded. Additionally, no template control (NTC) was used to evaluate the potential PCR reaction contamination. The sample was excluded when the Ct of the NTC was less than 38.
For each case, the 90-gene classifier analyzed the gene expression pattern of the 90 tumor-specific genes and generated similarity scores for each primary tumor type based on the degree of similarities of the test specimen to the gene expression database. The range of similarity scores was 0 (low similarity) to 100 (high similarity) for each tumor type, and the sum of similarity scores across 21 tumor types was 100.
Statistical analysis
The internal accession numbers of all cases were finally broken, and test results predicted by the 90-gene expression assay were compared with the reference diagnosis to evaluate the assay performance. As for each tumor type in the panel, sensitivity (or positive percent agreement) was defined as the ratio of true positive results to the total positive samples analyzed. Specificity (or negative percent agreement) was defined as the ratio of true negative results to the total negative samples analyzed. A confusion matrix was generated for each tumor type. All statistical analyses were computed in R software (version 3.6.1). All statistical tests were two-sided, and values of p-value less than 0.05 were considered statistically significant.
Discussion
In the clinic, the identification of tumor type is crucial for optimal treatment selection when a patient diagnosed with a malignant tumor. The traditional diagnosis of tumor type requires a comprehensive analysis of the clinical and pathological findings. Imaging techniques including computed tomography (CT), magnetic resonance imaging (MRI), positron emission tomography-computed tomography (PET-CT) scans are typically used for primary site detection in clinics. However, a recent meta-analysis of PET-CT in 1942 patients from 20 centers found a primary tumor detection rate of 40.9% (39.0% to 42.9%), which is still limited for identifying tumor tissue of origin [
15].
In routine pathological diagnostic practice, morphological and IHC assessments were two relatively cost-efficient and no burden methods for patients, which could identify a tumor type in most cases. Nevertheless, the diagnosis of patients with poorly differentiated or undifferentiated tumors is not straightforward because tumors often lack the typical features [
16]. Several studies reported an agreement of 69–71% in the characterization of poorly differentiated or undifferentiated carcinomas by performing the IHC and morphology analysis [
17,
18].
In the recent decade, studies investigated that distinct tumor types have recognizable differences in gene expression patterns. When tumor metastasis occurs, the gene expression profile of the metastatic foci will maintain the gene expression profile of the primary tumor. Based on this finding, the tumor type of one tumor sample could be elucidated by comparing its gene expression pattern with the gene expression pattern in tumors with known tumor types [
19,
20]. Several gene expression assays such as the TOO and CancerTYPE ID have been developed based on mRNA and commercialized to predict the putative primary site for patients with uncertain diagnoses [
9,
10]. The TOO test reported by Monzon et al. was a microarray-based test on 1550 genes to differentiate 15 main tumor types. In a blinded validation study that included 547 frozen tumor specimens, the TOO test showed an 87.8% overall agreement with the reference diagnosis [
9]. For the CancerTYPE ID assay, Erlander et al. developed a 92-gene real-time PCR assay for identifying the primary site of 28 common tumor types. A multisite validation study used the assay on 790 FFPE tumor specimens and demonstrated an overall sensitivity of 87% in primary site identification [
10].
Recently, with the advance of NGS techniques, genomic alterations and DNA methylation have also been applied for tumor molecular classification. Alexander et al. applied machine learning to the assessment of genomic alteration data (468 cancer-associated genes) to predict the tissue of origin, with an overall accuracy of 74.1% in an independent cohort [
6]. Sebastian et al. reported a DNA-methylation based test named “EPICUP” for identifying the tissue of origin of CUP. In a CUP validation cohort, EPICUP correctly predicted a primary site in 87% of CUP patients [
21]. Moreover, researchers start to investigate the possibility of classifying tumors using less invasive procedures. One exciting approach was explored by M. C. et al., who analyzed the methylation patterns obtained from circulating cell-free DNA (cfDNA) to detect more than 50 cancer types [
8]. In a validation cohort of 1354 cases, targeted methylation analysis demonstrated an overall sensitivity of 54.9% and a specificity of > 99%.
This is, to our knowledge, the largest clinical validation study of a gene expression assay for tumor origin identification to date. Overall, the 90-gene expression assay correctly distinguishes tumor type in 94.4% of specimens, which is favorable with the other two commercially available tests (TOO and CancerTYPE ID) with 87%-87.8% accuracy [
9,
10]. Furthermore, the present study also established a large-scale prospective cohort (N = 493) to assess the utilization of the 90-gene expression assay in a real clinical setting. Although the accuracy of the prospective cohort (92.1%) was slightly lower than the retrospective cohort (95.7%), it was still superior to the previous studies on tumor classification (87%-87.8%) [
7]. Our results show that there is no significant difference in the performance of the gene expression assay for poorly differentiated/undifferentiated and well-moderately differentiated tumors (94.5% versus 95.5%, respectively), suggesting that 90-gene expression patterns of the tumor cells are robust and rarely affected by the loss of cell differentiation.
The present study still had several limitations. The first limitation was the exclusion of suboptimal specimens, such as biopsy samples (NCB or FNA), cytology samples, and samples with excess necrosis or few tumor contents. However, these types of samples are common and usually difficult to diagnosis in clinics. Further verification study is needed to validate the performance of the 90-gene expression assay for suboptimal specimens. In addition, although the 90-gene expression assay achieved overall high classification accuracy cross different tumor types, we found that the performance in identifying the head&neck tumor was not optimal. In this study, eight of 31 head&neck tumors were misidentified, whereas seven of eight misclassified cases were identified as gastroesophageal tumors. Given the conjunction of esophagus and head&neck in anatomy, the mRNA expression, DNA methylation, and somatic copy-number alterations data between esophagus squamous cell carcinoma and head&neck squamous cell carcinoma were demonstrated with a strong resemblance [
22]. Gene expression analyses with the 90-gene expression assay also reflect this biologic intersection and provide additional insight into the origin of these tumors. For this instance, additional effort was needed to improve the algorithm performance for distinguishing the head&neck tumors and gastroesophageal tumors. Moreover, the predictions should be interpreted in conjunction with pathological diagnosis and clinical information when the tumor sample was predicted as head&neck and/or gastroesophageal tumors during clinical use.
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit
http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (
http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.