Introduction
Artificial intelligence (AI) is a new field in medicine gaining major interest within healthcare, but its development in clinical settings is already referred to as a digital revolution for healthcare [
1].
Artificial intelligence is defined as computer science capable of imitating several aspects of human intelligence and behavior [
2]. With the use of large datasets, AI models can be trained to conduct several complicated tasks [
3]. Machine learning, one of the domains of AI, is a computer system in which models are trained to form new predictions or decisions by analyzing large quantities of data [
4]. A specific subclass of machine learning known as deep learning uses multiple layers to analyze imported data. In each layer, weights are calculated for several factors from the data. After repeating this process, a final model is trained and ready to be applied on new data. Examples of both machine and deep learning techniques are presented in Table
1.
Table 1
Definitions of subclasses within AI
Machine learning (ML) | ML involves computer science that is able to perform desired tasks based on input data. When provided with sufficient data, algorithms can recognize patterns in data and train the model to perform better. After completion of the final model, the algorithm can be applied to new unknown data [ 5] |
Decision tree (DT) | Within a DT model, multiple factors are classified into tree branches. Based on the algorithm, these branches are divided into nodes, forming several tree pathways. In the end, this model tends to find the smallest tree that optimally fits the data [ 6] |
Gradient boosting (GBM) | In GBM, weights are added to several factors after classification. Afterwards an assessment of weights occurs, in which weights are modified based on the difficulty to classify the factors. this process is repeated until a final optimal model is generated [ 7] |
Random forest (RF) | RF involves the formation of multiple decision trees with specific values for predictors. This technique combines all decision trees in order to build an accurate model for predictions [ 8] |
Support vector machine (SVM) | SVM models use mapped input data to discover the optimal boundary to separate several classes and values [ 9] |
Deep learning | As a specific branch of machine learning, deep learning can recognize patterns within datasets by using multiple processing layers. Within each layer, weights are present for several factors within the model. After the training process, an optimal model is built to perform on new data [ 10] |
Artificial neural networks (ANNs) | Similar to our brain system, data is passed through multiple processing layers within ANNs. Each layer contains weights in order to make decisions for the resulting output. By repeat of this process, this model can improve results and produce the most accurate model in the end [ 11] |
Convolutional neural networks (CNNs) | CNNs are a specific type of neural networks, however no weights are used in the layers. Instead, multiple layers are functioning as filters to register patterns or regions of images [ 12] |
Radiomics | A radiomics model analyzes images in order to retrieve specific texture features that are registered as a 0 or 1. By detecting these features, various pathologies could be recognized [ 13] |
Several potentials of AI models have already been demonstrated in clinical practice [
14,
15]. For example, machine learning algorithms have been applied to MRI, X-ray, and CT images to detect tumors in various organs. Additionally, input from large numbers of electronic health records enabled AI models to identify risk factors for multifactorial outcomes such as length of stay, mortality, and early hospital readmission after surgery [
16]. Recently, in colorectal surgery, machine learning was used to predict outcomes such as lymph node metastasis, response to chemoradiotherapy, and postoperative complications. For these outcomes, predictions were performed with accuracies up to 96%. This could emphasize the potential of machine learning to support risk stratification and facilitate clinical decision-making for general surgeons [
17‐
19].
Currently, bariatric surgery has evolved to being a key in treating the worldwide pandemic of morbid obesity. Optimal postoperative weight loss including resolution of obesity-related comorbidities leads to a decreased burden of disease and related mortality [
20,
21]. Despite an increasing amount of large data set studies in bariatric surgery, several factors such as short- and long-term complication rates and weight loss remain unpredictable. An example in which AI could benefit bariatric surgery is insufficient weight loss after surgery. Ten to thirty percent of patients show insufficient weight loss after bariatric surgery [
22]. Risk factors for this are extremely diverse varying from socio-economic factors such as insurance policy to a specific type of microbiome [
23,
24]. A complete overview of all risk factors and ideally an algorithm to calculate the risk of insufficient weight loss for each patient separately is still missing. Assembling an algorithm to identify both patients at major risk of insufficient weight loss and high risk of postoperative complications would assist the bariatric surgeon as well as the patient to reach a well-informed decision.
Despite the potential benefits of AI, the scope of machine learning applications is rarely reported. Therefore, this systematic review aims to provide an extensive overview of (potential) machine learning applications within bariatric surgery.
Materials and Methods
Search Strategy
A systematic search was performed in accordance with the Cochrane Handbook for Systematic Reviews of Interventions version 6.0 and PRISMA guidelines. To identify all relevant publications, systematic searches were conducted in the bibliographic databases PubMed, Embase.com, Clarivate Analytics/Web of Science Core Collection, and the Wiley/Cochrane Library from inception up to the 7th of July 2021. The search included keywords and free text terms for (synonyms of) ‘machine learning’ combined with (synonyms of) ‘digestive system surgical procedures’ and ‘bariatric surgery’. The full search strategy can be found in the Supplementary information (see Appendix).
Selection Process
Two reviewers (MB and JCP) conducted the title and abstract screening independently in accordance with the inclusion and exclusion criteria. Studies were only selected for full-text assessment if both reviewers agreed on inclusion. Controversies between reviewers were resolved by discussions, resulting in consensus. Studies were included if they met the following criteria: (i) describing machine learning algorithms within bariatric surgery, (ii) clinical study, (iii) including adults. Studies were excluded if they (i) did not describe bariatric surgery specifically, (ii) were not written in English, (iii) were certain publication types: reviews, editorials, letters, legal cases, or interviews.
Risk of Bias Evaluation
The ROBINS-I assessment tool was applied by two reviewers (MB and JCP) to evaluate the methodological quality of included non-randomized studies [
25]. Additionally, the PROBAST tool was used by two reviewers (MB and JCP) to assess the quality of machine learning models [
26]. Conflicts between reviewers were solved by discussions.
Data Synthesis and Outcome Assessment
Following full-text screening, the following data were extracted from the included studies; first author, year of publication, country, number of patients included, mean age of the study population, percentage of female patients, study design, follow-up time, surgical procedure, type of machine learning, external validation, purpose of machine learning, outcome measurements, and prediction performance. The categorization of studies was based on machine learning purposes and results were demonstrated separately.
Discussion
From this systematic review, it can be concluded that artificial intelligence has potentials in several fields within bariatric surgery. Various models have been created to predict severe complications with AUCs up to 0.98. Secondly, weight loss was predicted by AUCs ranging from 0.80 to 0.83. Lastly, an AUC up to 0.81 was observed in predicting the postoperative quality of life, diagnosis, and end-organ complications of patients with morbid obesity.
Five studies have applied machine learning models to predict postoperative complications for patients undergoing bariatric surgery. Among several models, neural networks have shown the highest accuracy of 98% in predicting postoperative complications. Ideally, by using machine learning models, bariatric surgeons will be able to better predict (severe) postoperative complications for each unique patient. These predictions can, in theory, influence the decision towards a different type of bariatric operation or different timing of the operation, more specific prophylactic measures to prevent a certain type of complication, or a shared decision with complete informed consent.
In a recent study, the “low-risk bariatric patient” was defined by the absence of factors such as a medical history of thromboembolic events, diabetes mellitus, and kidney or pulmonary disease [
38]. In this review, overlapping risk factors have been identified in the included studies predicting postoperative complications and weight loss (Table
3). It is of no surprise that age, BMI, previous intra-abdominal surgery, diabetes, and cardiovascular disease were identified as risk factors for postoperative severe complications. However, other factors such as race, inflammatory bowel disease, laboratory results, and functional status are more controversial. Not all clinical variables were included in a similar or homogeneous manner across the included studies. This is despite the hypothesis that inclusion of previously excluded variables may improve the accuracy of machine learning models to predict postoperative complications and related risk factors. In the field of breast cancer surgery, the exclusion of variables in machine learning models was prevented by determining many variables based on pre-operative, intra-operative, and post-operative means [
39]. These findings could suggest that guidelines are needed to secure a comprehensive list of clinical factors that can be used for an optimal training process of machine learning models.
Table 3
Summary of overlapping factors for postoperative complications and weight loss
Low BMI | Non-White race | Female gender | Older age |
| Diabetes mellitus* | | Diabetes mellitus* |
| Older age Previous bariatric surgery | | High BMI |
Three studies have attempted to predict postoperative weight loss. Neural networks demonstrated the highest AUC of 0.94 in predicting postoperative weight loss. For decades now, researchers in the bariatric field have attempted to identify all risk factors for insufficient weight loss after bariatric surgery. Multiple studies have shown that postoperative weight loss is dependent on multiple factors, both objective measures such as BMI and subjective measures such as patient-related measures. It could therefore be specifically beneficial and interesting for bariatric surgeons to implement AI as a means of identifying risk factors for, for example, insufficient WL. However, as Nudel et al. noted [
30], external validation of the machine learning model was missing due to insufficient data. Therefore, more large datasets are needed before accurate and valid models can be developed.
For predicting the risk of long-term end-organ complications, such as coronary artery events, heart failure, and nephropathy in patients suffering from type 2 diabetes and morbid obesity, a random forest model showed an AUC of 0.66, 0.73, and 0.73, respectively. According to Aminian et al. [
35], this random forest model may support and accelerate the process of decision-making toward bariatric surgery. This is desirable as the duration of obesity itself and the presence of its related comorbidities have repeatedly been reported to lead to less postoperative weight loss and higher comorbidity-related mortality [
40‐
42]. As weight loss after bariatric surgery is not always associated with health-related quality of life, predicting the increase in quality of life after bariatric surgery is a welcome algorithm in the process of expectation management and shared decision-making, preoperatively [
43,
44]. Neural networks have shown a mean squared error of 0.035 in predicting the postoperative health-related quality of life 1, 2, and 5 years after bariatric surgery, indicating an accurate estimation, since the mean squared error was close to 0. This neural network model might provide the opportunity to improve postoperative care and rehabilitation for patients undergoing bariatric surgery. However, due to missing patient information, the generalizability of this model might be uncertain. Missing data could be solved by imputation, as this was done in the study of Tseng et al. [
45], in which machine learning models were used to predict acute kidney injury after cardiac surgery. One study predicted the presence of hiatal hernias. The importance of hiatal hernia (HH) present at the time of bariatric surgery remains controversial but is increasingly recommended to be corrected simultaneously with the laparoscopic sleeve gastrectomy [
46]. Nevertheless, gastroesophageal reflux symptoms may worsen or persist, and a secondary operation with conversion from sleeve gastrectomy to LRYGB may be necessary [
47]. The foreknowledge of the presence of HH may both influence the patient and surgeon in decision-making towards LRYGB and predict a longer operation time. However, as the authors of this study mention, the accuracy of the models developed is not impressive and the study should be regarded as proof of concept, exploring the possibilities with AI.
Due to the missing external validation in most studies, the first step for future studies in bariatric surgery should be the inclusion of external validation cohorts to gain more generalizability of machine learning models. Afterwards, clinical trials should be conducted to facilitate the implementation of ML models within bariatric surgery. For both steps, large amounts of data are required for the training process of these models. This data could be retrieved from available patient databases or robotic surgery, eventually facilitating the training process of machine learning [
48,
49].
This review has revealed that machine learning models have potentials to predict postoperative complications, weight loss, end-organ complications, quality of life, and preoperative diagnosis. After the necessary steps to improve generalizability and clinical validation, machine learning models may have a significant impact on decision-making within bariatric surgery. As machine learning models are improved and validated, surgeons could be one step closer to achieving personalized decision-making for patients undergoing bariatric procedures.
To use machine learning models for the prediction of surgical outcomes in bariatric surgery, data from laparoscopic bariatric surgery should be accessible [
50]. Laparoscopic videos of bariatric procedures could be collected to serve as a training database for machine learning models. By providing accurate image navigation during surgery, anatomical landmarks and unexpected intraoperative findings such as adhesions and abdominal wall hernias could be identified efficiently by machine learning models [
51]. In addition, perioperative data collected from anesthesiologists could be collected such as continuous blood pressure measures or oxygen saturation as factors possibly predicting postoperative complications. Furthermore, as robotic surgery is often performed in bariatric surgery, machine learning models could also improve the performance of robotic surgery by providing 3D mapping during surgery and evaluating surgical skills afterward [
52].
This review has several limitations. External validation cohorts seem to be missing for most studies, indicating the uncertainty of machine learning models. Therefore, big data from clinical settings are required to achieve appropriate generalizability and accuracy for machine learning models [
53]. Additionally, due to the presence of inconsistencies in reported accuracies and AUCs, a meta-analysis could not be conducted.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.