Patients and methods
Patients
This cohort study entailed a retrospective analysis of databases meticulously maintained by the largest specialized liver disease medical center in China, The Fifth Medical Center, General Hospital of PLA. The study focused on patients who underwent RLR for HCC (RLR group) between July 1, 2016 and July 1, 2021. This cohort was compared with a control group of patients who underwent LLR for HCC (LLR group) during the same time frame. Prior to surgery, all patients provided informed consent, authorizing the anonymized collection of data and audiovisual recording of the surgical procedure. This retrospective cohort study was received approval from the institutional review board and ethics committee of The Fifth Medical Center, General Hospital of PLA (protocol KY-2022-12-76-1). This study was registered was registered as a retrospective cohort study at Research Registry (UIN: researchregistry9622, available at
https://www.researchregistry.com) in compliance with the World Medical Association’s Declaration of Helsinki, 2013.
Definitions
The diagnosis of HCC was established using computed tomography or magnetic resonance imaging, in accordance with internationally approved radiological standards [
6,
7]. The study excluded cases meeting the following criteria: (1) heterogeneous invasive lesions upon final pathology (e.g., adenosquamous carcinoma, etc.); (2) secondary carcinoma of the liver; (3) history of previous liver resection; (4) patients who required concomitant procedures, such as lymph-node dissection around the porta hepatis, cryoablation, radiofrequency ablation, or biliary tract exploration were not included in this study.
All robotic surgeries were performed using the da Vinci Si Surgical System (Intuitive Surgical, Sunnyvale, CA, USA). The surgical approaches of RLR and LLR were conducted using the procedures as previously described [
8,
9].
The baseline characteristics examined in both study groups encompassed age, sex, body mass index (BMI), Child–Pugh–Turcotte (CPT) status, age-adjusted Charlson Comorbidity Index (aCCI) [
10], American Society of Anesthesiologists (ASA) physical status score [
11], IWATE criteria difficulty [
12], Eastern Cooperative Oncology Group performance status (ECOG PS) score, albumin–bilirubin (ALBI) score [
13], hemoglobin (HGB), platelet count (PLT), carbohydrate antigen 19-9 (CA19-9), carbohydrate antigen 125 (CA125), carcinoembryonic antigen (CEA), and alpha fetoprotein (AFP). The relationship between surgical approach and study outcomes, such as OS and RFS, was investigated using directed acyclic graphs (DAGs) to identify potential confounders (Appendix Fig.
3).
Perioperative outcomes comprised estimated blood loss (EBL), requirement for packed red blood cell (pRBC) transfusion, conversion rate to open laparotomy, operative time, duration of stay in the intensive care unit (ICU), and the length of postoperative hospital stay.
Short-term postoperative outcomes were meticulously documented, encompassing common complications (such as hepatic failure, fever, abdominal effusion, pleural effusion, and abdominal infection), along with 30-day morbidity, 30-day reoperation rate, and 90-day mortality were recorded in detail. Moreover, the severity of complications was classified using the Clavien–Dindo classification (CDC) [
14].
We adhered to a standardized surveillance protocol, conducting postoperative follow-up at 30 days after the operation and subsequent patient assessments at 3-month intervals. Tumor markers, including CA 19-9, CA 125, and carcinoembryonic antigen (CEA), were assessed every 3 months, alongside a chest–abdomen–pelvis computed tomography (CT-CAP) performed at the same intervals. Comprehensive details pertaining to tumor recurrence and progression were collected using both the institutional database and patient information obtained from collaborating local hospitals.
The long-term oncological outcomes, including patients’ overall survival (OS) and recurrence-free survival (RFS), were the primary endpoints of this study. OS was defined as the duration from diagnosis to patient death or the last date of follow-up (cut-off date: August 1, 2023), while RFS was defined as the period from surgical intervention to the known date of disease recurrence or the last follow-up date (cut-off date: August 1, 2023).
Statistical analysis
Continuous data that follow a normal distribution are reported in terms of the mean (SD), while continuous data that do not follow a normal distribution are presented using the median (IQR). Categorical data are reported as counts and percentages. Comparisons between the OLR and RLR groups were performed using the Student’s t test or Mann–Whitney U test for continuous variables, Chi-squared test or Fisher exact test for categorical variables, and Cochran–Armitage test for trend for ordinal variables. Balance in covariates between treatment groups was also evaluated by the standardized mean difference (SMD). An SMD of less than 0.1 was deemed to be the ideal balance.
Randomization is sometimes difficult to achieve in observational studies. In this study, the cost of robotic surgery exceeded that of laparoscopic surgery, falling outside the coverage of basic medical insurance. Consequently, there was potential bias in the patient’s choice of surgical approach. Moreover, non-randomized grouping caused an imbalance between groups in terms of baseline features. In this context, propensity score matching (PSM) effectively reduces the confounding bias and obtains effects similar to those of randomized controlled studies. To achieve the balance between groups while making full use of the collected data, we employed a one-to-three nearest-neighbor matching algorithm with a caliper of 0.1. During the matching process, R software selects three patients with the closest propensity scores in the LLR group to match with one patient in the RLR group. However, due to the smaller size of the RLR group, only some patients in this group could find all three matched patients, resulting in an imperfect 1:3 ratio. Although the proportions are not strictly accurate, the matching model remains valid provided that it is ensured that the two matched groups are as consistent as possible on critical confounding variables, thereby conferring high internal validity to the results. Standardized mean differences (SMD) were estimated before and after matching to evaluate the balance of covariates, and the value of SMD less than 0.1 was considered relatively balanced enough. The propensity score was estimated using a multivariable logistic regression, with type of surgery as the dependent variable and age, sex, BMI, CPT status, aCCI, ASA physical status score, IWATE criteria difficulty, ECOG PS score, ALBI score, HGB, platelet count, CA19-9, CA125, CEA, and AFP as covariates.
Survival analysis of OS and RFS was performed by log-rank test and plotted by the Kaplan–Meier curve. Factors found to be significant in the univariate analysis and potential confounders were entered into the multivariate Cox regression analysis. Hazard ratio (HR) and its confidence interval (CI) were also calculated using Cox proportional hazard analysis. The PH assumptions of Cox proportional risk regression analysis were evaluated by Schoenfeld residual method.
CUSUM analysis is a statistical technique applied to surgical procedures for the quantitative estimation of the learning curve [
15,
16]. The standard CUSUM analysis shows the cumulative differences between the observed data and the target value. To perform multidimensional CUSUM analysis, we designated operative time, estimated blood loss, postoperative complication, and postoperative hospital length of stay as the assessment indicators of surgical competence. These four assessment indicators were, respectively, set as quantized value
\(\delta _1\),
\(\delta _2\),
\(\delta _3\), and
\(\delta _4\) for each case. The quantized value of assessment indicator was defined as
\(\delta\) =
\(X_n - X_0\), where
\(X_n\) was an individual attempt following the nth procedure, with
\(X_n\) = 1 if a failure occurred and
\(X_n\) = 0, if it did not.
\(X_0\) is the established risk or failure rate of the control to which the ongoing attempts are compared.
\(X_0\) can be calculated either as an overall frequency if this is known in this case, or on a case-by-case basis as with paired control trials. In this study, the mean or median of RLR group data after PSM is taken as the target value, and the proportion of each index reaching the target value is calculated to obtain the overall frequency. The
\(X_0\) for these four assessment indicators were, respectively, 0.390, 0.537, 0.439, and 0.512. Therefore, the quantized value of surgical competence for each case was defined as
S=
\(\delta _1\)+
\(\delta _2\)+
\(\delta _3\)+
\(\delta _4\). After each case, scores were sequentially added and then plotted graphically by the equation:
\({\text {CUSUM}} = \sum\) \(S_i\). It was based on CUSUM graph that fit a restricted cubic spline (RCS) curve was used to depict the learning curve. A positive slope signified that the desired target remained unattained, while a negative slope indicated that it had been surpassed. The pivot point where the slope transitioned from positive to negative served as a reflection of the surgical procedure’s proficiency.
A 2-sided p < 0.05 was considered significant in all the analysis. All statistical analyses were performed using R software (version.4.3.1).
Discussion
To our knowledge, this study represents the largest single-center, retrospective, observational cohort study in China at the time of study registration, investigating the long-term prognostic outcomes, short-term outcomes, and perioperative outcomes of consecutive patients with HCC treated with either RLR or LLR. Our study shows that OS was comparable between RLR and LLR, and RFS was improved in the RLR group after a PSM analysis based on clinical, oncologic, and technical criteria. The similar OS between the two groups may be attributed to cancer staging and the effectiveness of treatment after relapse. Most patients in our study had early stage HCC and were still in middle age with good overall physical condition, who inherently have better prognoses and longer OS. This could mask the potential survival benefits of the RLR approach in more advanced stages. In addition, in most situations, both groups received similar, aggressive, and effective treatment when they relapsed, leading to undifferentiated survival outcomes.
The higher RFS rate in the RLR group could be attributed to several key factors. A previous European multicenter comparative study by Lim et al. analyzed data from patients who underwent a multicenter comparative study for HCC [
17]. The 3-year RFS rates for 3D-laparoscopic and robotic surgeries were 24% and 48% (
p = 0.18), respectively. Although there was no statistical difference, the RFS rate in the RLR group was twice that in the LLR group, which is evidence that cannot be ignored. There are also single-center studies showing that the 3-year RFS rate was 50% in the LLR group and 64% in the RLR group (
p = 0.30), suggesting a 14% higher recurrence-free survival rate in the RLR group compared to the LLR group [
18]. Considering that these studies began and concluded approximately 5 years earlier than our own, and given the faster pace of technological iteration for RLR compared to LLR, there has been a substantial increase in the number of surgical procedures each year, as well as an expansion of the indications for robotic liver surgery [
19,
20]. Various clinical advantages of RLR have been continuously reported, and its actual clinical efficacy is increasing [
19,
20]. This may partly explain why the RLR group exhibits a superior recurrence-free survival rate compared to the LLR group. The rapid development of robotic surgical systems aims to reduce human error and provide feedback during the execution of standardized surgical procedures [
21]. Due to the unique features of the robotic surgical system, such as motion scaling and enhanced three-dimensional vision [
2,
22], providing superior visualization and high flexibility during surgery, allowing a more thorough exploration of the area around the tumor bed, facilitating finer cuts and suturing, which reduces the likelihood of tiny tumor residuals and thus the risk of recurrence. Our research shows that the rate of positive surgical margins in the RLR group was 1.0 % whereas in the LLR group, it was 4.1
\(\%\). The RLR group’s rate was approximately one-quarter that of the LLR group, suggesting that RLR may facilitate a higher rate of complete tumor resection (R0) and thereby mitigate tumor recurrence. In addition, the robot platform has a unique tremor filtering function [
2,
22], which can reduce the involuntary tremor of the hand, improve the stability of the operation, reduce errors during the operation, and make it easier to cope with various situations during the operation. This fact is further emphasized in our study by showing a conversion rate of 2.1
\(\%\) in the RLR group and 7.4
\(\%\) in the LLR group during surgery. Conversion to open surgery often leads to increased trauma, prolonged postoperative recovery, and increased risk of complications, which is more likely to lead to an increased probability of tumor recurrence. Our results of surgical complications showed that the incidence of Ascites was 20.6
\(\%\) in the RLR group and 25.4
\(\%\) in the LLR group. The incidence of Pleural effusion was 28.9
\(\%\) in the RLR group and 34.0
\(\%\) in the LLR group. Intra-abdominal infection in LLR was more than twice that in RLR (8.6
\(\%\) vs 4.1
\(\%\)). Complications, such as pleural effusion, ascites, or postoperative infection, may weaken the patient’s immune system, making the patient more susceptible to mutation and invasion of tumor cells. In addition, surgical complications can prevent patients from completing a series of postoperative treatments, such as interventional embolization or radiotherapy, which help to kill residual tumor cells and reduce the risk of recurrence. Compared with LLR, RLR can reduce the incidence of surgical complications and the risk of recurrence to a certain extent. Although RLR only improved RFS compared with LLR, relatively longer RFS means a healthier physical and mental state and a higher quality of life for patients. From a different perspective, RLR is only about two decades old compared to the decades-long record of LLR [
23,
24] and has already achieved similar results in various aspects. The enhanced potential of LLR may still exist. However, it is limited, and robot-assisted surgery is entering a phase of rapid development with continuous optimization of the surgical learning curve [
16,
25]. The prospects of RLR are undoubtedly excellent, and its expected long-term oncological benefits are destined to exceed those of conventional surgical approaches.
High-quality multicenter studies have indicated that RLR can lead to reduce blood loss, fewer conversions to open surgery, decreased postoperative morbidity, and shortened hospital stays compared with LLR [
4,
5,
26‐
29]. These findings support the argument for RLR as the surgical method of choice, especially in improving surgical safety and facilitating rapid recovery. However, it should be noted that other studies have reached different conclusions, stating that RLR may increase blood loss, increase the need for blood transfusion, increase major postoperative morbidity, and increase mortality within 30 and 90 days compared with LLR [
30]. In our analysis of the entire patient cohort, we did not observe significant differences between RLR and LLR groups in the primary perioperative outcomes, such as blood loss, transfusion rate, and length of hospital stay, despite theoretical differences in surgical approach between RLR and LLR. In addition, readmission rates within 30 days, reoperation rates within 30 days, and mortality rates within 90 days were similar between RLR and LLR groups. Considering that the main cause of liver cancer in China is HBV infection, this is quite different from Western countries. In addition, the health status of patients, the experience of surgeons, and the local medical resources are also very different, so we are more inclined to the view that robotic hepatectomy is not inferior to laparoscopic hepatectomy. We believe that both approaches can be safely used in the treatment of hepatocellular carcinoma with good results. This is also consistent with Peng Zhu’s report (3). In addition, our results showed that the operation time was longer in the RLR group than in the LLR group, which is also consistent with the previous studies [
31,
32], and we believe that this is intricately linked to the learning curve. This study commenced with the initial RLR case for HCC in our center, intentionally not starting post the proficiency period of the learn. We aimed to genuinely demonstrate the results of a new surgical technique from its introduction to gradual application compared to the established traditional technique. While, as LLR in our center began many years earlier than RLR, the LLR procedures are considerably more established, leading to the actual results being in the surgical proficiency phase, whereas some actual results of RLR do not reach the expected surgical maturity phase. Consequently, we speculate that RLR in the surgical proficiency stage may outperform LLR in multiple aspects in future studies.
To map the learning curve of individual surgeons in this study, we specifically employed multidimensional CUSUM analysis. Our institution holds considerable experience in hepatectomy, having performed numerous procedures, particularly utilizing RLR since the introduction of the da Vinci Surgical System in 2016. This vast experience positions our institution well for this comparative analysis. We utilized operative time, estimated blood loss, postoperative complications, and postoperative hospital length of stay as evaluation indicators of surgical competency. The fitted curve indicated that the proficiency period began after the 11th case, marking the proficiency stage of the learning curve. A prior systematic review reported a decline in the number of cases needed to achieve surgical proficiency from 48.3 in 1995 to 23.8 in 2015 [
33]. Additionally, the international consensus guidelines on robotic liver resection in 2023 suggest that an experienced surgeon typically requires around 25 consecutive cases to surpass the learning curve for major RLR and 15 cases for minor RLR, further emphasizing the influence of LLR experience on the RLR learning curve [
34]. Due to variations in assessment criteria for evaluating surgeons’ proficiency across different studies and individual differences such as the actual number of surgeries and individual learning capabilities of our doctors, we consider the results of this study acceptable. However, because this learning curve is individual-specific, its general applicability may be limited. Nonetheless, it can serve as a valuable reference, particularly in the epidemiological context of liver cancer, such as in China.
As previously mentioned, RLR demonstrates a long-term oncological advantage compared to laparoscopic liver resection LLR. These findings align with our expectation, because we believe that the robotic surgical system offers superior technical precision in operation and resection, potentially reducing the tendency for tumor recurrence. Moreover, the potential interference induced by economic factors should not be overlooked. As previously mentioned, the economic circumstances of patients in the LLR group might be inferior to those in the RLR group. Consequently, postoperative aspects, such as diet recovery, medication adherence, and environmental support, may be more favorable in the RLR group, potentially contributing to a longer RFS compared to the LLR group.
This study encompasses several limitations. Primarily, its retrospective nature serves as a significant constraint and may be associated with information and selection biases. Although PSM analysis was employed to mitigate selection bias, residual selection bias due to unmeasured or unknown confounders remains inevitable in the absence of randomization. Moreover, as a single-center study, it may not capture potential variations in patient management across various high-volume institutions, as differences in surgical experience and perioperative protocols are minimized.
To address these challenges, future endeavors should focus on conducting multinational, multicenter cohort studies to ensure sample diversity, minimize unknown selection biases, and guarantee external validity. Additionally, single-target value methodologies may also be considered for application in future research.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.