Background
Hepatocellular carcinoma (HCC), the most prevalent form of liver cancer, is the 3rd leading cause of cancer-related deaths worldwide [
1]. The majority of HCC cases develop progressively from chronic liver disease, primarily due to hepatitis B virus/hepatitis C virus (HBV/HCV) infection, or obesity-driven non-alcoholic fatty liver disease (NAFLD), are usually associated with advanced fibrosis or liver cirrhosis (LC) [
2,
3]. HCC is a cancer where early detection would make a significant difference. Early-stage patients have much-improved prognosis compared to advanced stage patients, due to the relative efficacy of curative treatments (surgical resection, transplantation, or radiofrequency ablation) compared with systemic therapy [
4]. Currently, HCC routine screening (every 6 months) in high risk population primarily relies on the detection of serum protein marker alpha-fetoprotein (AFP) and ultrasound imaging. Due to the lack of adequate specificity and sensitivity, AFP is challenged in recent studies, and no more recommended by the European Association for the Study of the Liver (EASL) and the American Association for the Study of Liver Diseases (AASLD) [
5‐
7]. Ultrasound imaging is relatively inexpensive and a less demanding procedure for screening, but the sensitivity of ultrasound alone in small nodules (<2 cm) is only 21% [
8]. Magnetic resonance imaging (MRI)/ computer tomography (CT) scan can exceed a sensitivity of 50% in early-stage subjects, but this procedure is typically reserved for those at risk since it is expensive and uncomfortable [
8]. Other blood-based protein biomarkers such as des-γ-carboxyprothrombin (DCP), glypican-3 (GP3), and Golgi protein 73 (GP73) are not recommended in clinic [
9‐
12]. Currently, most HCC cases are detected on the basis of clinical symptoms at advanced stage, rather than by high-quality screening techniques. The development of an earlier and more accurate screening assay remains an urgent unmet clinical need.
The utilization of cancer-linked genomic and epigenomic alterations for diagnosis, prognosis, and personalized medicine is becoming increasingly popular. Liquid biopsy, assessing circulating tumor DNA (ctDNA) released from apoptotic or necrotic tumor cells, can be used to interrogate the genomic and epigenomic profiles of a tumor [
6]. Many studies have shown the promising results in ctDNA-based early cancer detection and highlighted its potentials in revolutionizing cancer screening and diagnosis [
13,
14]. Among various tumor types studied, screening for HCC had achived the highest sensitivity, possibly due to the abundant blood supply in the liver [
15]. Of all mechanisms for epigenetic alterations, DNA methylation alteration is the most common type. Comparing with genomic alterations, utilizing DNA methylation as a screening approach offers several advantages: [
1] aberrant DNA methylation occurs when a methyl group (CH
3) is added to a cytosine base in a cytosine–phosphate–guanine (CpG) dinucleotide, controlling gene transcription and expression, suggesting that altered DNA methylation patterns could be one of the first detectable neoplastic changes thus reflects the early changes in tumors [
16,
17] [
2]. Methylation alterations are frequently found in specific genomic regions such as CpG islands, which provides an opportunity to analyze multiple altered sites within each targeted region and tremendous number of targeted regions by targeted sequencing [
17,
18]. At present, studies on utilization of methylation alterations were often conducted in the advanced HCC populations and healthy individuals as control, limiting their widespread application as the routine screening tool in the high-risk population including LC patients and hepatitis B surface antigen-seropositive (HBsAg+) individuals [
19,
20]. The very few studies that included LC patients did not cover the full spectrum of cirrhosis (various causes and states). The performance of assays conducted in such population would be compromised in high-risk population due to the co-existence of inflammation, cirrhosis, and/or precancerous lesions. Therefore, it is necessary to profile healthy individuals, LC, and HCC patients in parallel to precisely identify early-stage HCC cases in high-risk population.
To overcome these problems, we developed and validated an HCC screening model based on cfDNA methylation profiles to effectively distinguish patients with HCC from the high risk population with chronic hepatitis B (CHB) or LC, as well as from the non-HCC individuals. Importantly, we compared the performance of our HCC screening model with AFP in distinguishing HCC patients who were AFP-normal and early-stage HCC patients (Barcelona Clinic Liver Cancer [BCLC] stage 0-A) from the high-risk population. We also investigated whether clinical parameters, including but not limited to aspartate transaminase (AST), alanine transaminase (ALT), and AFP values, would affect the performance of the HCC screening model. Here, we reported a multi-layer HCC screening model based on cfDNA methylation profiles and domenstrated it could be a reliable approach in the early dection of HCC in clincal practice.
Methods
Study design and participants
The aims of this study are [
1] marker identification (from tissue samples) and [
2] HCC screening model construction and validation (both from plasma samples). This study involved 187 HCC participants, and 735 participants without HCC (203 LC patients and 532 healthy individuals). All the participants were enrolled from December 2017 to June 2019, collected from 3 medical centers in China (The Second Xiangya Hospital of Central South University [
n=502, the training cohort], Hunan People’s Hospital and Chongqing University Cancer Hospital [
N=420, the validation cohort]). Inclusion criteria included [
1] ≥ 18 years of age and [
2] must be treatment (surgery or chemotherapy) -naïve. Patients with intrahepatic cholangiocarcinoma including combined hepatocellular-cholangiocarcinoma or other malignancies were excluded. Healthy individuals were defined as having no clinical symptoms of liver disease nor history of cancer at the time of enrollment. HCC and LC tissue samples were either obtained at the time of segmental surgical resection or at biopsy. Normal liver tissues were obtained from liver donors who died of non-liver related causes. We conducted the pathology review on all of the tissue samples. All stages of HCC patients were included with a bias toward BCLC stage B or lower. HCC cases and healthy controls are age and sex balanced. Tissue samples were used to screen differentially DNA methylation blocks which can be used for healthy, LC and HCC plasma sample classification. A multi-layer HCC screening model was subsequently constructed based on tissue-derived differentially methylated makers and further validated in an independent cohort.
Power analysis
The study statistical plan incorporated group sizes of 144 HCC patients, 144 LC patients, and 144 healthy controls. It was sufficient to verify that our assay had an expected sensitivity and specificity both at 75% with a power of 1−β = 90% and a significance level of α = 0.05. Additional HCC cases, LC, and healthy controls were incorporated into the study due to their availability.
DNA from tumor, LC, and healthy liver tissue were extracted using the QIAamp DNA FFPE Tissue Kit (Qiagen, Valencia, CA, USA). The presence of tumor cells in HCC tissue samples and the abscence of tumor cells in non-HCC tissue samples were confirmed by histopathological assessment prior to DNA extraction. Circulating cfDNA was recovered from 4 to 5 ml of plasma using the QIAamp Circulating Nucleic Acid kit (Qiagen, Valencia, CA, USA). DNA was quantified with the Qubit 2.0 fluorimeter (ThermoFisher Scientific, Waltham, MA, USA). The distribution of the amount of input is shown in Additional file
3: Fig. S1. Extracted tissue DNA and cfDNA were stored in IDTE buffer at −20°C and −80°C, respectively.
Marker discovery and validation
We identified the differential methylation sites using Infinium HumanMethylation450K array data downloaded from The Cancer Genome Atlas (TCGA) database with the Benjamini–Hochberrg-corrected false discovery rate (FDR)<0.05. We used data from 656 normal WBC samples in the Gene Expression Omnibus (GEO) dataset to exclude hypermethylated CpG sites in haematopoietic lineage (>0.1). CpG sites on X or Y chromosomes were removed. We identified differentially methylated CpG sites. In addition, we included CpG sites that are associated with common cancers in previous studies. 85,250 CpG sites were identified in the marker discovery phase. The selected CpG sites were segregated into 8147 blocks (Additional file
1: Table S1) and later validated using data from tissue samples and plasma from healthy individuals [
21].
Targeted bisulfite sequencing
Fragmented tissue DNA (peak approximately 200bp) and cfDNA were subjected to bisulfite conversion using EZ-96 DNA methylation-lightening MagPrep (Zymo research, CA, USA). Briefly, purified DNA was treated with sodium bisulfite. Subsequently, the converted single-strand DNA molecules were ligated to a splinted adapter and amplified by an uracil-tolerating DNA polymerase to generate whole-genome BS-seq libraries. Custom-designed methylation profiling RNA baits were used for target enrichment which covers the 85,250 CpG sites and spans 1.16 mega base of the human genome. The target libraries were subsequently quantified by real-time PCR (Kapa Biosciences Wilmington, MA, USA) and sequenced on NovaSeq 6000 (Illumina, San Diego, CA, USA) with an average sequencing depth of 500X for tissue samples and 1000X for plasma samples. The total reads number for plasma is 49.24 million on average, given 2x150bp sequencing. The library preparation process includes five steps: DNA end-repair, Tail-and-Tag, single-tag DNA amplification, PCR amplification, and target enrichment.
Methylation data processing
Raw sequencing data (.fastq) were first trimmed by Trimmomatic (v.0.36) and then aligned by BWA-meth (v.0.2.0) to the C to T- and G to A-transformed hg19 reference genome [
22]. PCR duplicate reads were identified and removed by Picard tools (v.1.138). Paired reads were stitched together to represent the originating DNA fragments, and those with discordant pairing, or low mapping quality (MAPQ<60) were removed from further analyses.
Model construction
A custom module was built to classify samples using two layers of models: (i) three linear kernel support vector machine (SVM) models: a malignant versus healthy model (MH model), a malignant versus benign model (MB model), and a benign versus healthy model (BH model). Each model searches for a hyperplane with maximal distances from both two pre-defined training classes. Like all linear classifiers, the decision function is presented as \( f\left(\mathrm{x}\right)= wT\ \mathrm{x}+b \), where w = [ w1, w2, …, wk ]T is the weight vector and b represents the distance of the hyperplane from the origin. (ii) A multinomial logistic regression model: for each sample, the output from the MH, MB, and BH models was fed into a multinomial logistic regression model to obtain a cancer/benign/healthy assignment as a final prediction. Both layers were trained by the stochastic gradient descent (SGD) algorithm, and the performance of the training set was assessed by iterated 5-fold cross-validation. During the independent validation phase, the model with locked parameters was applied directly to the blind samples and the clinical information was not released until all analyses were completed.
Statistical analysis
Means and differences of the means with 95% confidence interval (CI) were calculated using the Wilson's score CI. A p value of <0.05 was considered statistically significant. Differences between the groups were calculated using the two-tailed Student’s t test, the Kruskal-Wallis test, or the Fisher’s exact test, where appropriate. All statistical analyses were performed with R (R version 3.4.0; R: The R-Project for Statistical Computing, Vienna, Austria) using default functions and packages “FactoMineR” (v2.4) and “factoextra” (v1.0.7). The differential methylation regions were called using the package “limma” (v2.0), and the cut-off was set as Benjamini–Hochberrg-corrected FDR <0.05. Linear models and empirical Bayes methods were used for assessing differential expression in microarray experiments. The first layer of HCC screening model was constructed by applying the package “e1071” (v1.7-9) using a linear kernel with C set as 1. The second layer of HCC screening model was trained with the package “nnet” (v7.3-16) using the single layer model.
Ethics committee approval
The study was approved by the ethic committee of The Second Xiangya Hospital of Central South University (KYLL2018072) and Chongqing University Cancer Hospital (2019167). All collection and usage of human samples and clinical data were in accordance with the principles of the Declaration of Helsinki. Written informed contents were obtained from all participants for the use of their tissue or plasma samples.
Discussion
Early detection is the most effective way to reduce HCC mortality. In this study, we sought to develop and validate a cfDNA-based multi-layer HCC screening model for the early detection of HCC from patients with liver disease and healthy controls using targeted bisulfite sequencing, a highly sensitive DNA methylation profiling technique based on NGS. A total of 2321 differentially methylated markers were identified by comparing the methylation profiles obtained from HCC, normal, and LC tissue samples. Our model yielded significantly improved performance over serum AFP testing for early-stage HCC versus non-HCC controls, and of the most significant clinical importance, early-stage HCC versus high-risk patients with non-malignant liver disease including LC and HBV infection. Importantly, our model also showed superior performance over AFP by accurately detecting those HCC cases that would have failed to be detected by AFP testing alone. This multi-layer model based on the 2 intermediate outputs, tumor, and benign scores, which primarily reflected the properties of tumors and cirrhosis, achieved differential diagnosis for HCC cases. Taken together, this study suggested that our model had the potentials of becoming an integrated part of HCC surveilance, for early screening of HCC patients from high risk subjects.
Identifying biomarkers for early cancer detection with minimal invasiveness is still an emerging field. Numerous studies have explored the feasibility of ctDNA-based somatic mutation profiles and concluded such technique may not be adequate [
23‐
25]. A tumor cell usually harbors only one copy of mutant DNA. A major challenge associated with utilizing somatic mutation obtained from ctDNA for cancer detection is that an early-stage tumor may not be able release enough copies of the mutant DNA. The TRACERx study revealed that only 13% of stage I lung adenocarcinoma patients had detectable mutations [
26]. In a large-scale prospective study, only 27% of early-stage patients were detected by assessing mutations and protein levels in blood testing [
27].
Methylation profiling of ctDNA has shown great potential to overcome the limited amount of ctDNA in circulation and the lack of recurrent mutations. DNA methylation alterations occur very early in tumorigenesis, even prior to the emergence of somatic mutation, and also offer an opportunity to identify the early stage cancer before clinical symptoms emerged [
16,
28]. Many promising HCC methylation-based screening models have been proposed. Liu et al. reported the study of targeted methylation analysis of circulating cfDNA screening test for over 50 cancer types across stages supported by GRAIL, Inc. [
29] 25 patients with stage I–III hepatobiliary cancer were included, demonstrating a sensitivity of 68% at 99% specificity. Kisiel et al. proposed a panel of 6 methylated DNA markers based on differentially expressed genes derived from HCC and control tissues, which achived a sensitivity of 95% and a specificity of 92% when stage I–IV HCC case were detected in high-risk population. Importantly, this panel detected 3/4 stage 0, 39/42 stage A HCC cases [
30]. In addition, Xu et al. compared differential DNA methylation profiles of HCC tissues and blood leukocytes to derive 401 candidate markers which were further refined to a panel of 10 markers for the construction of the diagnostic model, yielding a sensitivity of 83.3% and specificity of 90.5% in the validation cohort (stage I–IV HCC cases) [
13]. Cai et al. presented a genome-wide 5-hydroxymethylcytosines (5hmC)-based screening model that distinguished early-stage HCC cases (stage 0-A) from the high-risk population, achieving an AUC of 0.884 in the external validation cohort [
31]. In addition, DNA methylation alterations could be inflenced by many biological factors, and pre-specified case–control studies may not reflect the full spectrum of the disease owing to selection bias. Carefully well designed studies in the intended use screening population are still required to evaluate the clinical applicability of these studies.
An effective screening assay needs to demonstrate sufficiently high specificity to minimize the risk of overdiagnosis (false positive rate) and to avoid unnecessary anxiety and the follow-up examinations of the non-HCC individuals [
32]. Our model have achived high specificities and yeilded 19 false positive samples in both training and validation cohorts. It is possible that these signals were detected from some tumor lesions which were missed by CT-scan screening. These misclassified patients were often senior adults with lower bilirubin levels. We are tracking these false positive individuals to determine whether they have an increased risk of developing HCC.
Despite the significance of our screening model, several limitations might impede the interpretation of our results. Firstly, only cirrhotic patients were included as benign liver disease. Other non-malignant liver diseases would be helpful to improve the model for achieving the most optimal performance. Secondly, the number of enrolled HCC patients in the validation cohort was relatively small, especially the number of the early-stage HCC patients. Additional 0-A stage HCC participants would be helpful to validate the robustness of our model. Thirdly, the LC group in the training cohort has a median age of 47, which is lower than the healthy individuals and HCC patients. An age-matched training set might reduce the potential selection bias. Fourly, given the social economic effectors on the cost of the test at ideal test frequency for intended population as frequent as every 6 months, the current version of the test is not cost-effective enough and might not meet the needs in real-world clinical diagnostics from a social economics perspective. Since methylation-based tests are capable of detecting the cancer and locating the tissue of origin for simultaneously, we anticipate sensitive and cost-effective multi-cancer detection tests will benifit the general public as the technology evovles and more clinical studies have been carried out.
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit
http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (
http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.