Background
Among all cancers, lung cancer has the highest incidence and mortality per year, which becomes a worldwide problem of public health [
1‐
3]. Though patients with early-stage lung cancer have high overall survival after surgery or stereotactic body radiation therapy, whose 5-year survival can be over 50%, advanced lung cancer patients might not get sufficient benefits from similar treatments [
4]. Lung cancer has been proved to be a highly heterogeneous disease, and over 85% lung cancer patients are diagnosed as non-small cell lung cancer (NSCLC) [
5]. It is estimated that about 69% of advanced NSCLC patients possess at least one potential actionable drug target, which enabled targeted therapies [
4]. Hence, based on personal tumor profiles of DNA aberrations, the concept of precision medicine arises to individually strategize treatments in advanced cancers, which has been widely recognized.
Targeted drugs usually target one or several DNA aberrations, which requires appropriate biopsies and technology to identify relevant biomarkers, including single nucleotide variations (SNVs), insertion and deletions (Indels), gene fusions, copy number variations (CNVs) and abnormal expressions [
4]. The detection method could be categorized to two classes: 1. PCR-based techniques, which could detect single DNA aberration per reaction at extremely high sensitivity, including Amplification-refractory mutation system (ARMS), droplet digital PCR (ddPCR) and BEAMing; 2. Sequencing-based techniques, which could detect multiple aberrations simultaneously, including whole genome sequencing (WGS), amplicon sequencing and target capture sequencing.
Though tissue biopsy is a well-accepted practice in targeted therapies, circulating tumor DNAs (ctDNAs), which are released from dead tumor cells to the blood stream, have attractive advantages over tissue biopsy in the applications of precision medicine, such as the sampling convenience and dynamic monitoring. However, the proportion of ctDNA in blood is extremely low, which requires super sensitive methods to detect mutations of allelic frequencies as low as 0.1% [
6]. The performance of ctDNA detection of lung cancer patients varies according to methods and tumor stages. ctDNAs from late-stage lung cancer generally have higher sensitivity (from 74% to 85%) [
7‐
9] to detect tissue-matched mutations than that from early stages (53.8%) [
10] by targeted next generation sequencing (NGS) in the past 2 years. Recently,
AM Newman proposed a digital error suppression process with barcoding technique to further increase the sensitivity of mutation detection from ctDNA to 93% [
11]. Based on this concept, we conducted a multiple center study on 131 tumor-ctDNA pairs of samples from late-stage (IIIB and IV) lung cancer patients to evaluate the utility of ctDNA targeted NGS in precision medicine. We systematically investigated the accuracy and the specificity of mutation detection from ctDNA and identified several key factors that might significantly affect the results. Furthermore, it was reported that ctDNA molecules from tumor cells were shorter than the cell-free molecules from normal cells in a small sample set [
12]. Thus we extended the analysis on the length of ctDNA fragments and its association with clinical features.
Methods
Patient selection and sample collection
Several criteria were applied to the standard group of the patients included in this study: 1. the patients were diagnosed with lung cancers at the stages of IIIB or IV; 2. the patients were treatment naive; 3. the blood samples were collected before or after acquiring tumor tissues within 14 days; 4. the tumor tissue samples were collected by either percutaneous needle biopsy or surgery, but for the surgery patients, the blood samples were collected at least 1 day before surgery. For each patient, 8-10 ml blood was drawn by venipuncture and was stored in Cell-Free DNA™ BCT (BCT) (Streck Inc., Omaha, NE). The paired tumor tissues were fixed in formalin. The samples were shipped to the Research Center of 3DMed under a constant room temperature. The time between sample collection and processing was less than 48 h.
To separate plasma, the blood in STRECK tubes was centrifuged at 1600 g for 20 min at room temperature. The blood was separated into three layers: the upper layer was plasma, the middle buffy coat was white cells, and the lower layer was red blood cells. Afterwards the plasma layer was carefully transferred to a new 1.5 ml Eppendorf tubes, followed by a room-temperature centrifuge at 16000 g for 10 min to remove the residual cells and debris. The buffy coat was then transferred to a new tube for genomic DNA (gDNA) extraction.
The tumor tissues were firstly subjected to H & E staining to determine the percentage of tumor cells. The tumor cell percentage should be over 20% to be considered as a qualified sample [
13]. The gDNAs of FFPE tumor tissues and white blood cells were extracted by the DNeasy Tissue or Blood Kit (Qiagen) respectively following the standard protocols. Cell-free circulating DNAs in plasma were extracted by QIAamp Circulating Nucleic Acid Kit (Qiagen) following the standard protocols. The DNA concentrations were determined by Qubit dsDNA HS Assay Kit (Life Technologies). Genomic DNAs were fragmented to a size ranging from 200 bp to 400 bp using the Covaris S2 Sonolab (Covaris).
Library preparation, target capture and DNA sequencing
gDNA libraries were established by KAPA Hyper Prep Kit (KAPA Biosystems) according to the manual. The cfDNA libraries were prepared by Accel-NGS 2S Plus DNA Library Kit (SWIFT) with unique identifiers (UIDs, also called barcoding technology) to tag individual DNA molecules. The concentrations of libraries were determined by Qubit, and the size distributions of libraries were analyzed by Caliper.
One to four libraries with different sample indexes were firstly pooled together, where the total DNA amount was 1 μg. The pooled DNAs were mixed with 2 ul of DNA blocker (Integrated DNA Technologies) and 5 ul of human Cot-1 DNA (Invitrogen), and then dried by a vacuum concentrator (Themofisher). The dried mixture was dissolved in a 15 ul hybridization buffer supplied by the hybridization of xGen Lockdown Probes kit (Integrated DNA Technologies), and thereafter the targeted DNAs were captured following the standard protocol by a customized set of biotinylated DNA probes. The captured DNAs were then amplified by PCR, whose final DNA concentrations were determined by Qubit and the DNA sizes were analyzed by Caliper.
1.6–1.7 Pmol/L captured libraries were loaded into the NextSeq500 (Illumina) to run 75 bp paired-end sequencing with Illumina version 4 sequencing kits according to the manufacturer’s instructions.
The paired-end reads were mapped by BWA [
14] MEM algorithm. SNVs were called by MuTect [
15] with default parameters. Small insertions and deletions were called from the union of Varscan 2 [
16] and Pindel [
17] with default parameters. Fusions were called by self-developed scripts with at least 5 pairs of reads spanned over the breakpoints between two partner genes. The CNVs of tumor tissues were calculated by BIC-seq2 [
18] with default parameters, and the CNVs of ctDNA samples were called by a method reported by Jacob J. Chabon et al. [
19]. All mutations were manually reviewed using IGV [
20] to further eliminate false-positive results. The probability density distributions of mutant and wild-type fragments were calculated by Gaussian kernel smoothing using StatsModels 0.8.0.
ctDNA library size fractionation
To separate the smaller and larger DNA fragments in library by electrophoresis, the library DNA was run in 2% agarose gel. The DNA fractions with the sizes of 200-300 bp and 350-600 bp were sliced and stored in different tubes, followed by a purification of Qiaquick gel extraction kit (Qiagen). The DNA concentrations were determined by Qubit dsDNA HS Assay Kit (Life Technologies).
Droplet digital PCR
The droplet digital PCR was performed on libraries by the droplet digital PrimePCR™ (BioRad) on the BioRad QX200 droplet digital PCR system.
Discussions
Blood ctDNA has been discovered for several years, and the detection of variations on ctDNA using the next-generation sequencing technology has been significantly improved. Though the sensitivity of different methods on various cancers was frequently reported, NGS sequencing results between ctDNAs and tumor tissues has not been well-documented, and the comparison could be noteworthy to understand the advantages and the limits of capturing-based sequencing to better conduct precision medicine.
UC-Seq significantly improves the sensitivity of ctDNA detection on SNVs and Indels. Besides, this method extends the detection limit of AFs down to 0.1% with controllable false positive rates in ctDNA. However, the AFs of tumor-matched mutations in ctDNAs are affected by many variables and are poorly correlated with their AFs in tumor tissues. Since tumors are usually heterogenic, mutations with low AFs may come from minor sub-clones, and these mutations might thus drop below the detection limit in ctDNA. Usually the cutoff of AFs for actionable mutations in tumor tissues is practically set between 5% to 10% [
21]. In addition, the patients, whose tumor sequencing showed the L858R mutation with AFs over 9%, were more sensitive to EGFR-TKIs than those whose AFs were below [
22]. Our data suggest that the mutations, with AFs in IIIB and IV lung cancer tissues below 5%, are difficult to be detected in ctDNAs, while mutations of AF over 5% can be identified with high sensitivity (92.9%). Hence the sensitivity of UC-Seq at a depth of 10,000× is sufficient to detect the actionable mutations for late-stage lung cancer patients.
However, UC-Seq might not significantly improve the sensitivity in detecting fusions and copy number variations. The detection of fusions relies on finding the split reads and the pairs of reads spanned over the break points, whose proportion is much lower than the real proportion of fusion fragments in the blood because ctDNA fragments are short in blood. Moreover, some fusion break points could also locate in the introns, which might contain a significant number of repeats like SINEs and are hard to design high-specific probes to efficiently capture fusion fragments. Since the tumor content is much lower in ctDNA than that in tissue, it is much more difficult to detect fusions in ctDNA than that in tumor tissues by the same panel.
The detection of copy number variations is determined by the ratios of gene copy numbers in tumor and the ctDNA proportions in cfDNA. A proper mathematical model is also essential to separate the real signals from background noises. Since the ctDNA proportion in cfDNA is usually around 1% from our data, the change from copy number variation is hard to be distinguished from the fluctuation introduced by experiments. Only copy gain with ratios larger than 3.5 can be detected at an acceptable sensitivity (83.3%) in ctDNA.
Tumor mutational burden (TMB) is proved to be correlated with the response of immunotherapies [
23]. TMB detection thus becomes clinical needs. TMB is defined as the number of non-synonymous somatic mutations in whole exome sequencing. However the high cost of whole exome sequencing at high coverage hinders the application of TMB in ctDNA. To expand the application of immunotherapies to late-stage cancer patients, where tumor tissues might not be available, the utility of calculating TMB in ctDNA in a small panel has to be evaluated. Firstly, the correlation of TMB between a small panel and whole exome sequencing depends on the size of the panel. It is suggested by the simulation data that it is better to calculate TMB from a panel whose size is larger than 1 Mb to ensure a high Pearson correlation to be larger than 0.9 (Fig.
2a). Secondly, a sufficiently low limit of detecting the SNVs and Indels is indispensable. Otherwise the correlation of TMB between tumor tissues and ctDNA might not be satisfactory. The cutoff of mutant AFs in our study was set to 0.3%. Since the tissue samples contained only part of the clones of tumors while ctDNA could theoretically detect the mutations from all clones as long as their DNAs are released to the blood stream. The mutations below 0.3% detected by ctDNA could come from other clones of tumors than from the clones in the tissue samples, and hence the correlation of TMB between tissue and ctDNA was lowered. Under this cutoff, the correlation of TMB between tumor tissues and ctDNA reached 0.8 considering all samples. The results also show that bTMB from ctDNA could more properly reflect TMB of metastatic tumor tissues (Pearson correlation = 0.9) than that of primary tissues (Pearson correlation = 0.8). Besides, it might not be appropriate to measure the TMB of IIIB patient by ctDNA (Pearson correlation = 0.66) in a small panel of 490 KB. In summary, bTMB from ctDNA is possible to reveal the TMB of tumor tissues and would be better to represent TMB of tumor tissues when a panel size increases to 1 MB.
Several clinical features could significantly affect the concordance between tumor tissues and ctDNA. Firstly, it is reported that the average half-life of ctDNAs after surgery of complete resection was 114 min, while after incomplete resection, the situations might be diverse [
24]. In our study, the ctDNA samples collected at least 1 day after surgeries shows the same trend. ctDNAs from patients who had received surgeries on primary sites have only 41% concordance, while those who had received surgeries on metastatic sites have higher concordance of 87%. Besides, ctDNAs from patients, who were receiving inefficacious targeted therapies or chemotherapies with progress diseases, also present high concordance of 91.3% with tumor tissues, though the sample size was small. The results indicate that the concordance between ctDNAs and tumor tissues could be influenced by the efficacy of treatments. Secondly, tumors are always heterogenic and evolving to gain new phenotypes [
25]. The composition of clones is dynamic. Different clones might compete for space and resource, and finally some sub-clones may metastasize to other locations [
26]. During this process, some clones may even regress to be un-detectable. In the extreme cases, some metastatic cancer patients could not identify the primary sites during pre-treatment evaluation [
27]. Among the ctDNA samples from the treatment-naive patients, whose tumor biopsies had been taken at least 2 weeks ago without treatments before ctDNA sampling, the samples from IV patients have significantly poorer concordance (54.1%) with tumor tissues, while the IIIB samples have high concordance (100%), though the sample size is small. Compared with IIIB tumors, IV tumors might have highly active clonal evolution, which causes poor concordance between ctDNAs and tumor tissues when the time intervals between the tissue biopsy and the blood biopsy are more than 2 weeks. In both cases of low concordance, the status of tumors may dynamically change. The data also reveal that UC-Seq of ctDNA has potential to monitor the efficacy of therapies and the clonal evolution of late-stage tumors.
Interestingly, late-stage cancer patients have various cfDNA concentrations. Though most late-stage lung cancer patients have high cfDNA concentrations, some patients might still have low cfDNA concentrations (< 9 ng/ml) close to the level of healthy people. The ctDNA samples with low concentrations show much worse concordance than those with high concentrations. Moreover, the mutant AFs in ctDNA samples with high concentrations are generally higher (Fig.
3d). That reveals higher proportions of DNA fragments from tumor cells. The cfDNA concentrations are related to the cancer stages and severity. Clinically analysis showed that cfDNAs concentrations of NSCLC patients are higher than benign lung nodules [
28]. Besides, the advanced NSCLC patients with low cfDNA concentrations have better overall survival than those with high cfDNA concentrations [
29,
30]. Nevertheless, treatments or exercises could significantly affect cfDNA concentrations. It is reported that cfDNA concentrations are elevated up to 15 folds after strenuous exercises due to acute aseptic inflammation [
31,
32]. Conversely, people who have chronic occupational exposure to low-dose gamma-neutron and tritium β radiation present lower cfDNA concentrations, due to elevated levels of DNase and antibodies to DNAs in blood [
33]. Hence a standard practice of cfDNA sampling have to be established, from which the cfDNA concentrations can be applied to measure the status of patients and the sensitivity of detection can be assured.
Two studies have reported that the mutant fragments from tumors were generally shorter than those of wild-type fragments [
12,
34]. With a larger set of clinical samples, our data concurred with their findings and further extended the knowledge. Indicated by wild-type fragments in both patients and healthy donors, the lengths of cell-free DNA fragments have two peaks: One sharp peak at around 170 nt and one broad peak around 320 nt. The core of nucleosome consists of 146 nt of DNA plus up to 80 nt linker DNA regions [
35], which creates the sharp peak around 170 nt. The broad peak at around 320 nt is likely to be the length of DNA protected by a dimer of nucleosomes. It is intriguing that mutant fragments were shifted shorter at the first peak and the second peak. 86% of SNVs and Indels had higher mutant AFs in small fragments (< 145 nt), including important actionable mutations such as L858R and exon 19 deletion in EGFR. However, the shortening effect of mutant fragments from tumors is likely to be gene-specific or even position-specific. Though most mutations showed a shortening trend, a considerable number of mutations have indistinguishable length distribution, or larger fragment sizes compared with wild-type fragments. The length distribution of mutant fragments from tumors might be affected by the DNA accessibility of the loci in the tumor tissues. Loci with high DNA accessibility are usually bound by fewer number of nucleosomes [
36], and thus present higher odds to be digested by DNases in blood. Furthermore, nucleosome depletion occurs at active transcribing regions [
37]. The shortening extent of mutant fragments from tumors might reflect the transcriptional activity of genes in the tumors, especially in a panel that genes are selected from known drug targets and cancer drivers. This finding offers a plausible way to further increase the sensitivity of detecting mutations in ctDNA, and also a theoretical guide for deducing the expressions of genes in tumors.