Introduction
Primary liver cancer was the sixth most diagnosed cancer and the third leading cause of cancer death worldwide in 2020, with approximately 906,000 new cases and 830,000 deaths. Incidence rates are the highest in transitioning countries, with the disease being the most common cancer in 11 geographically diverse countries [
1]. More than 60% of patients are diagnosed with late-stage disease after metastasis has occurred, resulting in an overall 5-year survival rate of < 16% [
2]. Hepatocellular carcinoma (HCC) encompasses 75%-85% of primary liver cancers. Early-stage HCC is potentially responsive to curative treatment, ranging from local ablation to liver transplantation [
3]. Early detection improves patients' survival rates: patients diagnosed with early-stage disease have a relatively good prognosis, with a 5-year survival rate of > 70% [
4]. Specifically, in patients diagnosed with early-stage HCC, such as Barcelona Clinic Liver Cancer stage 0 and A, the 5-year survival rate with the surgical intervention was > 93% [
4]. The improvement of early detection is vital for low-resource settings where HCC often develops early without pre-existing cirrhosis, thus removing the early sign of a risk factor [
3].
Current diagnostic tests based on serum protein biomarkers give high false-positive results, and despite improvements, early cancer detection continues to face challenges [
5,
6]. Liquid biopsy (LB) assays based on the detection of circulating tumor DNA (ctDNA) have recently emerged as noninvasive and accessible tools for the early detection of multiple cancer types [
7‐
10]. ctDNA accounts for a small proportion of circulating cell-free DNA (cfDNA) in the blood and can be distinguished from benign cfDNA by specific markers such as mutations in genes known to be cancer-related [
7,
11,
12]. There are several cancer-related mutations that allow ctDNA to be distinguished from cfDNA. CancerSEEK, a multi-analyte blood test, was used to survey 1,005 participants with clinically detected non-metastatic forms of one of eight common cancer types (breast, colorectal, esophageal, liver, lung, ovarian, pancreatic, and stomach). Evaluating levels of eight proteins and the presence of mutations in 1933 distinct genomic positions, a positive CancerSEEK test was classified as the presence of a mutation in an assayed gene or an elevated level of any of the proteins. The tests had a median sensitivity of 70% (ranging from 69 to 98%) for detecting these eight cancer types [
7]. However, there are several characteristics of cfDNA that hamper its use as a diagnostic tool. First, the extremely low concentration of tumor-derived cfDNA found in a blood draw reduces the ability to detect early-stage tumors [
6]. Additionally, some patients have low ctDNA even during late-stage disease [
8]. This low proportion, coupled with the low variant allele frequency (VAF) found in somatic mutations in tumor-derived cfDNA, causes problems for traditional single nucleotide variant (SNV) callers [
9]. Second, because cfDNA contains multiple sources of DNA, hematopoiesis mutations formed from the clonal proliferation of blood cells can lead to false-positive findings and confound LB interpretation [
13,
14]. This high contribution of cfDNA with a wide range of somatic mutations creates a bias [
14]. Third, somatic mosaicism, or normal cells carrying benign somatic mutations, is common in healthy people across many organs and tissues. Somatically mutated DNA enters the blood-lymph system and contributes to the circulating cfDNA [
15].
In addition to mutations, features of the cfDNA fragments, like size, single-stranded jagged ends, and endpoint locations, have also been exploited to develop noninvasive screening and diagnostic assays [
16]. An early study of plasma cfDNA found varying fragment sizes between benign adnexal masses and malignant gynecological neoplasms [
17]. While cfDNA of participants with hepatitis B virus (HBV) infection, cirrhosis, and HCC contained fragments sized at an average of about 166 bp, the plasma DNA of cancer patients had both shorter and longer fragment distribution [
18]. In contrast to cfDNA from healthy people, cancer patients had numerous distinct genomic differences, including longer and shorter fragments at different regions [
19]. Several studies have shown that cfDNA fragments harboring mutant alleles were often shorter than those with wild-type alleles [
20‐
22]. Size selecting for shorter cfDNA fragments increases the proportion of ctDNA within a sample [
21]. For example, the cfDNA of a group of lung cancer patients was more fragmented than that of healthy controls, with an average length of 134 to 144 bp. Thus, tracking the mutational landscape and fragmentation of plasma cfDNA might have promising diagnostic potential [
20].
This study addressed the challenges of using cancer-associated mutations to detect ctDNA in a discovery cohort of 55 patients with early-stage HCC and 55 healthy individuals. To overcome these challenges, we developed an assay based on the aggregation of fragment length profiles of mutations in the 13 most frequently mutated genes associated with HCC. We evaluated the performance of our assay in both the discovery cohort and an independent validation cohort of 54 PwHCC and 53 healthy participants from a different hospital.
Discussion
Like many cancers, early detection improves prognosis and survival rates of PwHCC. But current detection methods primarily rely upon imaging and a blood test for a non-specific tumor marker, alpha-fetoprotein, which showed inefficacy in detecting tumors smaller than one centimeter. In the present study, we presented the major challenges of LB assays based on cancer-specific variants and proposed a novel approach combining variant status and their fragment size to overcome challenges and enhance specificity and sensitivity for early detection of HCC.
Consistent with previous studies, we observed that many of the WBC can confound the clear interpretation of TDM in plasma cfDNA [
42]. To identify such mutations, we built an in-house probabilistic model using the frequency of altered alleles and total coverage of cfDNA and WBC sequences (see
Materials and methods). We observed large proportions of background mutations originated from WBC cells (Fig.
1A). However, after implementing our statistical model, we were able to reduce the proportion of WBC derived mutations, which contributed to the high false positives (Fig. S
3B). Thus, our zero-inflated Beta distribution model based on modelling different characteristics of WBC derived mutations including VAF, allele depth (AD) and total read depth (DP) enabled the removal of false positive LB-unique mutations. Consistently, other studies using similar methodology demonstrated better than the traditional mutation selection methods that relies on a pre-defined set of somatic mutations, e.g. COSMIC [
43,
44]. The model achieved high concordance rates with the more expensive approach of sequencing both plasma cfDNA and paired WBC gDNA at high depth (Fig.
1D). In addition to a small number of driver mutations, each cancer contains several passenger mutations and the classification of driver from passenger mutations is a challenging task in the field [
45]. A study by Salvadores et al. [
46] showed that passenger mutations could serve as markers to classify a tumor to a tissue-of-origin, which is clinically important for a multicancer detection blood test.
An important drawback of employing mutations as markers for the development of ctDNA screening tests is the overlap of TDM with non-tumor benign somatic mutations. Indeed, a previous study discovered that cfDNA
TP53-mutated fragments in 11% of 225 non-cancer controls suggests that circulating mutated fragments among individuals without any diagnosed cancer is common [
47]. In agreement with their conclusion, we found 10/55 healthy individuals in the discovery cohort (Fig. S
3) carrying mutations overlapping with mutations identified as TDMs, resulting in high detection rates of false-positive mutations. Interestingly, we showed that different sources of mutations could be differentiated by profiling their fragment length patterns (Fig.
3). We observed remarkable differences in the fragment length patterns of tumor-derived mutations (TDM, Fig.
3C) or LB-unique mutations (Fig.
3F) between HCC patients and healthy individuals (Fig.
3D and G). Specifically, the fragment length distribution of TDM or LB-unique mutations in HCC patients represented a nearly bi-modal distribution with a smaller peak at approximately 145–155 bp, while that pattern was not observed for TDM and LB-unique mutations in healthy individuals. Thus, our data demonstrated a unique fragment length signature of mutations detected in plasma of HCC patients which were in line with previous studies reporting that tumor cfDNA fragments tend to be shorter than non-tumor cfDNA fragments [
16,
19,
21,
41]. These studies showed that the fragmentation pattern of cfDNA is a non-random event mediated by apoptotic dependent caspases. It has been shown that fragment size distribution of non-tumor cfDNA shows a prominent size of 167 bp corresponding to DNA wrapped around histone (~ 147 bp) plus linker region (~ 10 bp). By contrast, ctDNA fragments have been shown to be around 145 bp [
16,
19,
21,
41]. Such size differences are attributed to the differences in nucleosomal organization and chromatin accessibility between non-tumor cfDNA and ctDNA [
16]. In support of this notion, ctDNA has been shown to have more accessible chromatin than non-tumor DNA, which may be linked to the highly active transcriptional state of these regions [
48]. A recent and remarkable study by Cristiano et al. [
19] reported that enrichment for fragments shorter than 150 bp improves the detection of ctDNA. Consistently, we showed that the analysis of fragment length signatures of cancer-specific mutations could be exploited to distinguish HCC patients and healthy controls.
Our combination model interrogating three distinct length signatures of cancer mutation bearing fragments. We examined models built from single feature 1, 2, or 3; combination of two features 1 + 2, 1 + 3, or 2 + 3; and combination of all three. Our data (Fig.
4D and Table
3) showed that the combination of all three features yielded the best performance, suggesting that these features were not redundant. Based on these observations, we further demonstrated that the analysis of three fragment length signatures of aggregated LB-unique mutations in 13 HCC-associated genes could overcome confounding effects of mutation markers and achieved a good AUC of 0.87 for determining the presence of HCC.
The heterogeneity of cancer mutations poses a challenge for using these mutations as markers for the early detection of HCC [
49]. Tumor heterogeneity has been reported in HCC at three distinct levels, including interpatient heterogeneity, inter-tumor heterogeneity and intra-tumor heterogeneity [
50]. In this study, we showed the interpatient heterogeneity of tumor-derived mutations among 55 HCC patients (Fig.
2) and that by using fragment length signatures of reads bearing plasma mutation rather than mutations themselves, the impact of patient-to-patient variation in their mutational profile could be minimized. However, the other two levels of heterogeneity that represent the differences in mutation profiles between tumor nodules of the patients or between different regions within the same nodule have not been addressed in this study by using a single region sampling strategy. To characterize these aspects of tumor heterogeneity, a multi-region sampling approach has been suggested by several studies [
51]. However, the feasibility of this approach is low due to its invasiveness and limited access to tissue samples. Instead, we speculated that our approach based on the integration of fragment length profiles could overcome the intratumor and intertumor heterogeneity of tumor mutations due to its unbiased sampling of ctDNA in the bloodstream and thus might provide a more comprehensive landscape of mutations in HCC patients.
The performance of our assay was comparable to previous studies that developed diagnostic models for early detection of HCC. Jiang et al. showed that quantitative assessment of cfDNA preferred end coordinates and somatic variants allowed researchers to distinguish PwHCC from healthy study participants [
18]. Like ours, their assay achieved an area under the ROC of 0.88. However, they evaluated the performance of their model using a fixed cut-off value and have not reported validation using an independent cohort. More recently, HCCseek, another blood-based assay, achieved 75.0% sensitivity at 98.0% specificity [
52]. This assay requires shallow whole-genome sequencing of cfDNA to detect copy number variations (CNV) and short fragment lengths, plus the detection of plasma α-fetoprotein. By simultaneous analysis of 5-Hydroxymethylcytosine, end motif, fragment size, and nucleosome footprint profiles of cfDNA, Chan and colleagues could achieve a sensitivity of 95.79% and a specificity of 95.00% for differentiating PwHCC from healthy participants [
53]. These studies showed that the performance of LB assays are currently varied across studies and that combining multiple signatures of ctDNA could improve the sensitivity and specificity for early detection of HCC. We assert that combination with other ctDNA biomarkers such as methylated DNA and altered chromosomal copy numbers could increase the accuracy of liquid biopsies and warrant more in-depth study [
54]. Thus, future studies are required to test if this multimodal ctDNA analysis would improve our current specificity of 81%, which is an important criterion for an early cancer screening test.
Our study did have a few limitations. The main limitation of our study is the small sample size for each tumor stage group. We attribute this to the strict selection criteria for early-stage and non-metastatic HCC, which is when cancer detection confers significant clinical benefits. Thus, our current study might be considered as exploratory analyses and future studies with a larger cohort are required for robust validation of our assay performance.
Despite being confirmed to have nonmetastatic HCC, tumor-staging and histological records were not available for some HCC patients in the validation cohorts because those patients agreed to participate in the study but later chose to undergo treatment at other hospitals. The design did not include participants without cancer but with known risk factors for HCC, like cirrhosis or HBV.
Our study lacks clinical follow-up with information on the health and disease status of healthy subjects. This is important since a healthy individual may carry cancer-related mutations and subsequently develop cancer. Hence, future case–control studies with larger data sets and follow-up assessments are required to validate the performance of our assay for detection of HCC patients at early stages and to understand the mechanism of tumorigenesis. A recent large-scale pan-cancer analysis of the evolutionary history of tumors by Gerstung et. al [
55] has revealed that cancer-causing mutations can occur decades before diagnosis. Thus, investigating the sequence and chronology of mutations leading to cancer will assist in understanding the mechanisms of tumorigenesis as well as offer the possibilities to identify a set of tumor-derived mutations occurred in the precancerous stages for early diagnosis.
Our PwHCC were older, and all our cohorts consisted of a preponderance of men which could be confounding factors of our assay. However, we did not observe any significant association between age or genders with cfDNA fragment length patterns or mutation detection rates (data not shown). A recent study performing genome-wide sequencing of cfDNA and showed elevated amounts of fragments with size smaller than 115 bp in systemic lupus erythematosus patients [
56]. Hence the inflammatory condition in such autoimmune diseases might introduce the confounding factor to our analysis that focuses on fragment length patterns of cfDNA fragments. Lastly, although the performance of our model was validated in an independent cohort from a different hospital, the numbers of patients and controls in each cohort were relatively small, thus it would be helpful to test our model in a large prospective clinical study. Future studies could include high-risk patients who are diagnosed with chronic liver diseases such as hepatitis and cirrhosis, to evaluate the ability of our method to distinguish cancer-derived mutations from benign somatic mutations found in those high-risk patients.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.