Background
Colorectal cancer is highly heterogeneous, and its pathogenesis and molecular classification have been widely investigated [
1,
2]. In fact, colon and rectal cancers not only have different clinicopathological features, but also undergo different molecular paths of tumorigenesis [
3]. Tumor heterogeneity, a notable feature of cancer, has recently been studied in breast cancer [
4], esophageal cancer [
5], renal cancer [
6,
7] and lung cancer [
8,
9] through multi-region sequencing of tumor masses. Intratumor heterogeneity (ITH) and branched evolution were commonly observed, and the complexity of the tumor tissue composition was beyond expectation. However, tumor heterogeneity of colorectal cancer, especially rectal cancer, was less investigated.
ITH can be assessed by single-cell sequencing, as recent progress in single-cell genome sequencing has allowed quantitative characterization of both single nucleotide variations (SNVs) and somatic copy number alterations (SCNAs) in individual tumor cells. For instance, single-cell sequencing of individual circulating tumor cells (CTCs) revealed reproducible SCNA patterns in CTCs from the same patient and identified pertinent cancer mutations [
10]. Single-cell sequencing of a large number of breast tumor cells [
11‐
13] revealed punctuated evolution of SCNAs during tumor development. In addition, single-cell exome sequencing analysis of a case of colon cancer revealed a biclonal tumor origin and proved low-prevalence mutations could also play a role in tumorigenesis [
14]. Nevertheless, the ITH of rectal cancer has not been well studied by single-cell sequencing.
In the current study, we performed multi-region whole-exome sequencing (WES) and single-cell whole-genome sequencing (WGS) to evaluate the ITH of two rectal tumors. The SCNAs and mutations were exquisitely identified from multi-region to single-cell level. We found that the extent of ITH in the two patients was variable, and the degree of heterogeneity increased when analyzed on the single-cell level.
Methods
Sample collection and single cell preparation
We obtained two fresh primary rectal tumors from patients who underwent primary tumor resection at the Department of Gastrointestinal Surgery IV, Peking University Cancer Hospital & Institute. None of them received radiotherapy or chemotherapy before surgery. The clinicopathological characteristics of the two patients are listed in Additional file
1: Table S1. Sections were collected from different regions of tumors immediately after surgical removal. To obtain single-cell suspensions, each region was washed, minced with sterile blades into small pieces, and dissociated by incubation in DMEM containing collagenase type IA (50 μg/mL; Sigma-Aldrich Co. LLC, US), hyaluronidase (20 μg/μL; Sigma-Aldrich Co. LLC, US), and antibiotics/antimyotics for 1 h at 37 °C. After digestion, cells were filtered through a 70 μm cell strainer (BD Falcon™, US), and erythrocytes were removed by treatment with NH
4Cl/EDTA. Cells were then cryopreserved in liquid nitrogen. Peripheral blood from each patient was collected and stored at −20 °C.
The fluorescent activated cell sorting (FACS) and single-cell isolation
To isolate single tumor cells, cryopreserved cells were thawed and stained with combinations of the following reagents: anti-EpCAM Alexa Fluor® 488 (eBioscience, US), and lineage-specific antibodies, including anti-CD45-PE (BD Pharmingen™, US), anti-CD235a-PE (BD Pharmingen™, US), anti-CD140b-PE (BD Pharmingen™, US), and anti-CD31-PE (BD Pharmingen™, US). To discriminate viable cells, 7-Amino-Actinomycin D (7-AAD, BD Pharmingen™, US) was labeled 5–10 min before sorting. Single tumor cells were sorted based on 7-AAD−lineage−EpCAMhigh by BD FACS Aria III (BD Biosciences, US). Individual tumor cells were verified under the fluorescence microscopy (Nikon Eclipse Ti, Japan) and separated by mouth pipetting. Isolated single cells were then lysed.
Whole-exome library preparation and sequencing
We used the QIAamp Micro DNA kit (QIAGEN, US) to extract genomic DNA from the single-cell suspension derived from sections and matched blood, and the concentrations were measured by Qubit 2.0 fluorometer (Invitrogen, US). Total gDNA (~600 ng) was sheared into fragments (~180–280 bp) by the Covaris system (Covaris, US). Libraries were generated using the Agilent SureSelect Human All Exon V6 kit (Agilent Technologies, US) following the manufacturer’s recommendations, and index codes were added to each sample. The products were sequenced with Illumina Hiseq4000 2 × 150-bp PE reads at ~100× depth.
Whole-genome library preparation and sequencing
After lysis, single cells were amplified by the multiple annealing and looping-based amplification cycles (MALBAC) method [
15]. The cells passed the quantitative PCR (qPCR) quality control [
10] were used for next-generation sequencing (Bio-Rad, US). DNA (~600 ng) from each single cell and gDNA (~500 ng) from tumor tissue was sheared into ~300 bp fragments by the Covaris system (Covaris, US), and the indexed libraries were prepared with the NEBNext Ultra DNA Library Prep Kit for Illumina (New England Biolabs, US). The products were then sequenced with Illumina HiseqXTen 2 × 150-bp PE reads at ~0.3× depth.
Analysis of WES data
The reads were aligned to the human reference genome (hg19, USCC) with the Burrows-Wheeler Aligner [
16]. The aligned BAM files were sorted and merged with Samtools 0.1.19 [
17]. First, we applied two software, the Genome Analysis Toolkit (GATK 1.6) [
18] and multiSNV [
19], to identify mutations in multi-region WES. The INDELs and SNVs were identified with GATK 1.6 [
18] based on dbSNP 135 (
www.ncbi.nlm.nih.gov/projects/SNP/), and the duplicates were removed with Picard-tools 1.76 (
http://Picard.Sourceforge.net). The functional effect of variants was annotated using SNPEFF3.0 [
20]. Then, the SNVs and INDELs (insertion and deletion) were filtered out based on previous criteria [
21] using the Catalog of Somatic Mutations in Cancer (COSMIC) database v61. We manually filtered out tumor mutations with base quality of lower than 30 and distance between two mutations of lower than 15 bp. Germline mutations were removed by comparing the tumor data to matched blood data. Next, we input the aligned BAM files into multiSNV [
19] to call the SNVs. Germline SNPs were removed by comparing the tumor data to matched blood data. After that, low quality SNPs were filtered and the functional effect of variants was annotated using SNPEFF3.0 [
20]. Shared SNVs of each region by the two software were used for subsequent analysis. Additionally, to reduce the false negative rate, we had manually assessed the SNVs which had low allelic frequency in samples. Some SNVs existed in two or more samples of one patient, but were detected by either software in only one sample. Then we would screened manually in these SNVs, of which if variant allelic frequency (VAF) in samples was more than 0.2 we would put them back into our SNV list. Eventually, we added the INDELs identified by GATK into the shared SNV list to get the final mutations for further analysis.
Phylogenetic trees were constructed by MEGA5 with maximum likelihood method [
22], and potential driver mutations were labelled on branches with Adobe Illustrator. The purities and SCNA profiles of multiple tumor regions from one patient were estimated with the Sequenza R package 2.1.1 [
23].
The SCNA profiles of the tumor regions
The libraries of tumor regions and match blood constructed with gDNA were performed WGS. The clean data was aligned to human reference genome (hg19, UCSC) with the Burrows-Wheeler Aligner [
16]. After that, we sorted and merged each sample with Samtools 0.1.19 [
17]. To visualize the SCNA profiles of WGS, we sorted the whole genome into 500Kb bins (on average), and then used matched blood as control to remove noises. Finally, the depth of each bin of tumor regions was plotted along the order of the chromosomes.
The single-cell SCNA profiling
The single-cell SCNA profiles were identified using previously described methods [
10,
15]. The reads were aligned to human reference genome (hg19, UCSC) with the Burrows-Wheeler Aligner [
16] and then sorted and merged with Samtools 0.1.19 [
17]. The whole genome was sorted into 500Kb bins (on average), and the depth of each bin was determined by the hidden Markov model normalized with the method control [
10].
Single-cell WGS analyses
The median of the absolute values of all pairwise differences (MAPD) was used to assess the quality of the single-cell data [
24]. The MAPD scores of the 88 cells were less than 0.25, and all of them passed the quality control. The clustered heat map of the large-scale copy number profiles was generated by the Euclidean distance and ward.D method and visualized by the heatmap.2 function in the gplots package. The principle component analysis (PCA) was performed with the prcomp function in the stats package. Partition around medoids (PAM) clustering was performed using the pamk function in the fpc package. The consensus copy number profiles of multiple regions were inferred from single tumor cells based on the median value of each bin.
Identification of subclonal SCNAs
The subclonal SCNAs of single cells were identified by PCA using the FactoMineR package based on the depth of each bin (each patient had 6037 bins at 500Kb) and were visualized with the gplots package. We integrated the bins of single tumor cells from each patient into one matrix and filtered out the bins with all elements equal to zero. Each included bin had at least three elements greater than zero. Then, we set the variance of each bin to greater than 0.5 to obtain subclonal SCNAs with high disparities. There were 116 and 1637 bins containing subclonal SCNAs collected from PC1 to PC6 for patients 1 and 2, respectively. After that, we manually selected subclonal SCNAs larger than 1.5 Mb (63 and 806 bins for patients 1 and 2, respectively), and visualized the results with clustered heat maps.
Single-cell mutation validation
The mutations identified in the multi-region WES were validated in single cells by Sanger sequencing (Ruibiotech, China) using 20 ng of the MALBAC products as DNA templates. The PCR was performed with OneTaq Hot Start Quick-Load 2× Master Mix (New England Biolabs, US). The thermal profile was 94 °C for 60 s; 35 cycles of 94 °C for 25 s, 58 °C for 30 s, and 68 °C 40 s; and 68 °C for 5 mins. The primers used are listed in Additional file
1: Table S2.
We used ploidy status and ubiquitous mutations to distinguish somatic diploid cells and tumor cells. We used five or six nonsynonymous ubiquitous mutations which were identified in multi-region WES as candidate mutations to exclude somatic diploid cells (Additional file
1: Table S3). A single cell was considered to be somatic diploid cells if the candidate mutations were validated as wildtype by Sanger sequencing, while tumor cells had SCNAs and mutations. Owing to allelic dropout and imbalanced single-cell amplification, some mutations were undetectable in single cells, but were validated in gDNA of the tumor. As shown in Table S3, the candidate mutations were all validated in the gDNA of the two tumors, but sporadically identified in single cells. There were 15 diploid cells excluded in patient 1, of which two cells (B1 and C8) containing more than three mutations were excluded in the later analysis, owing to the possibility that they were a mixture of one diploid cell and debris of tumor cells. The number of diploid cells in patient 2 was 13, and none of the six candidate mutations were validated in them. In total, 26 cells (13 from patient 1 and 13 from patient 2) were confirmed to be somatic diploid cells, and two cells (B1 and C8 of patient 1) seemed to be mixtures, which were all excluded in further analysis of tumor cells.
Considering the phylogenetic trees, putative driver mutations in the COSMIC database, disease-associated genes identified by DAVID [
25,
26] and possible driver mutations in cancer genome landscape [
27], we selected 14 nonsynonymous mutations for each patient and validated the presence of these WES identified mutations in single tumor cells with SCNAs. The single cells with SCNAs were confirmed to be tumor cells if at least four mutations were present.
Discussion
In this study, we performed multi-region integrated single-cell sequencing to explore the ITH in two rectal tumors. The large-scale copy number profiles of multiple regions and single tumor cells in each patient appeared to be similar, implying that the majority of chromosomal rearrangements were early events and were inherited clonally and steadily, which was accordant with previous studies on breast cancer [
12,
13]. Besides the clonal SCNAs, some subclonal SCNAs were also observed by single-cell sequencing. Subclonal SCNAs, which are generated by later events during tumorigenesis, play an important role in boosting single-cell heterogeneity. In the mutational scenarios, the ubiquitous mutations are formed early in tumor-initiating cells and are inherited by their offspring, whereas the “private” mutations accumulate sporadically and markedly increase the ITH among different individuals. Subclonal SCNAs and sporadic mutations might impart further advantages to certain subpopulations during tumor growth and mutually facilitate the ITH.
We applied 40 single cells and 48 single cells to evaluate the ITH for patients 1 and 2, respectively. After removing the diploid somatic cells, there were 24 and 35 tumor cells with SCNAs for patients 1 and 2, respectively. A previous study on breast cancer suggested that 20-40 single cells were eligible for detecting SCNA-based subpopulations [
13], which was compatible with our results about subclonal SCNAs. Therefore, the amount of single cells for each patient we have studied was reasonable. The computationally derived tumor percentage of each region was determined by Sequenza (Additional file
1: Fig. S3). The separated regions of one tumor were assessed by the pathologists, of which the histological features were reckoned similar. The tumor purifies of two patients identified by the pathologists were both more than 90%, but the deduced results of WES showed that the tumor purity of P1 was just 25-49% (Additional file
1: Fig. S3) owing to somatic cell infiltration. The lower tumor purity of P1 might give rise to lower ITH in some extent, since the diploid cell contamination would mask the true profiles, distorting the SCNA profiles and descending the mutational heterogeneity by missing low frequency mutations. When obtaining the tumor mutations by WES, the germline mutations could be excluded by comparing tumor regions to peripheral blood or normal rectum samples. Here, we utilized peripheral blood but not normal rectum as control in order to avoid missing somatic mutations that existed early in both adjacent normal tissues and tumors, which is rare but could happen in some cases.
The heterogeneity of distinct regions of one tumor arises from the proportion of various subclones. Tumor tissue is a mixture of different cell populations that interact with the microenvironment, and the evolution of tumorigenicity is complex and dynamic. The preponderant subclone adapting to the circumjacent microenvironment plays a dominant role in certain region of one tumor, of which the master status is dynamically changing. For instance, though substantial tumor cells could be killed during the therapy, there were still survival of rare subclones with resistance to drugs, which might lead to relapse. It is the heterogeneity that make some tumors so hard to eradicate. At single-cell level, SCNAs were confirmed to be in correlation with gene expression [
43], and the SCNAs of colorectal cancer, which affected the expression of functional genes, were reported to be potential biomarkers [
35]. For instance, there was only one population according to the large-scale copy number profiles in patient 1, but when zoom in to focal SCNA alterations, there were apparently two subpopulations, meaning that although the large-scale copy number profiles (24 chrmosomes) appear to be similar at this time snap-shot, the single tumor cells possibly form two subpopulations owing to the differences in subclonal SCNAs in the future. Besides clonal SCNAs which all tumor cells steadily inherited, subclonal SCNAs would facilitate further cell-to-cell heterogeneity, which might lead to different therapy requirement. Among the subclonal SCNAs in patient 1,
MINA, which is located in the focal region chr3q11.2, is a c-Myc target gene that may affect cell proliferation [
44]. The tumor suppressor genes
PIK3C3 on chr18q12.3 and
SMAD2 on chr18q21.1, which affected the TGF-β pathway, were reported to be related to metastasis [
35,
45]. SCNAs induced upregulation or downregulation of these important genes would eventually give rise to growth advantages in certain populations during tumor progression.
Two patients were of the same age, no smoking, no alcohol intake, and both adenocarcinoma without microsatellite instable. The protein biomarkers of two tumors were different, CEA was highly expressed in P1, while CA72.4 was highly expressed in P2. Even though P2 (T3), which had one lymph node metastasis and positive nerve invasion, was further progressed than P1 (T2), the postoperative therapy was quite effective. The regular follow-up showed that the two patients under personalized medicine were healthy with no relapse after surgery. Consistent with previous studies [
46], our study also demonstrated the mutational diversification of multiple regions and branch evolution in rectal cancer. Additionally, we found that the regional differences in SCNA profiles of different tumor regions might arise from different subpopulations (Fig.
3a and b). Single-cell sequencing further confirmed the distributions of minor subpopulations, and revealed the subclonal structure of the tumor. Minor cell populations might exist early in tumorigenesis but in limited quantities, or they might be generated later with extraordinary growth advantages [
47].
Tumors are composed of many cells, and bulk sequencing only reveals the average genomic alterations of this cell mixture; thus, clonal analysis cannot resolve the subclonal composition of a tumor beyond the resolution of the sample used for the analysis. Contamination by diploid cells and the proportions of tumor subpopulations may affect the SCNA profiles of tumor regions. Moreover, deep sequencing is required to detect rare mutations in bulk tumor, which is costly. Thus, single-cell sequencing is of significant importance in investigating tumor cell heterogeneity and in discovering subtle diversification. However, it should be noted that we did not find any correlation between the copy number variation and mutation events. In accordance with the previous report [
48], our results also suggest that a single biopsy is sufficient for determination of major copy number profiles and high-frequency mutations for target therapy, however, it is insufficient for precise detection of subclonal SCNAs and low-frequency mutations.
In a conclusion, although the two patients are of the same molecular classification, the extent of heterogeneity differed. There are different clinicopathological features and molecular paths of tumorigenesis in colon and rectal cancer [
3], so it is meaningful to focus just on rectal tumors. Personalized medicine, tailored to each individual based on druggable genes, is necessary. In addition, the extensive ITH might also indicate that there are many possibilities for drug resistance in each patient. This study provides a preliminary impression of ITH in rectal cancer.
Acknowledgements
We thank Mr. Zhonglin Fu and Ms. Xuefang Zhang from the National Center for Protein Sciences Beijing (Peking University) for assistance with FACS; Ms. Yu Hou from BIOPIC in Peking University for academic assistance; Dr. Shuang Geng from BIOPIC in Peking University for accurate sorting with FACS; and colleagues in Peking University Cancer Hospital & Institute for collecting specimens.