Introduction
Recent advancements in embolization techniques, combined with an increasing body of supporting data, have led to prostate artery embolization (PAE) emerging as a safe and effective alternative to transurethral resection of the prostate (TURP) for the treatment of benign prostatic hyperplasia (BPH) [
1]. Although PAE appears to be effective in most patients, there are a subset of patients that have suboptimal outcomes. This can include patients that have an International Prostate Symptom Score (IPSS) reduction of < 25% or no improvement in quality of life (QoL) score, clinical recurrence of symptoms (5–28% of cases) [
2] or technical failure (reported as 2–5%) [
3]. Given that no procedure is without risk of complications, this has spurred a growing body of research aimed at assessing the underlying predictors of PAE outcomes [
4]. Recognising BPH’s heterogeneous nature, a tailored approach to treatment, emphasising the pivotal role of patient selection in both medical and surgical management, will ensure optimal care [
4].
Most studies on PAE outcome predictors focus on singular patient factors such as prostate volume, vascular anatomy or IPSS, rather than using a combination of factors [
5‐
9]. Furthermore, predictions are usually based on binary outcomes (responder, non-responder) rather than predictions of actual IPSS scores for that individual patient. Such detailed prediction could be beneficial for clinical decision-making. For instance, even a modest improvement in IPSS might be considered sufficient reason to opt for PAE over more invasive procedures (e.g. TURP), particularly if the latter poses significant risks to the patient.
In the realm of health care, artificial intelligence (AI) signifies a change in thinking, utilising computer systems with data-driven, decision-making processes. Machine learning (ML), a subset of AI, harnesses structured data and algorithms to decipher patterns and predict clinical outcomes. It benefits from the ease of exploring combinations of variables in large datasets to find patterns that might otherwise be missed with traditional statistical approaches. Its success is evident in various medical domains, such as predicting cancer progression (e.g. breast, prostate and lung cancer) and treatment efficacy [
10]. Merging prostate volume and clinicopathological data with AI tools holds promise in forecasting PAE outcomes, refining patient outcomes and facilitating tailored patient consultations.
This pilot study seeks to evaluate the feasibility of leveraging ML to predict PAE outcomes, solely relying on pre-procedural routinely collected data (prostate volume, clinical and urodynamic variables).
Methods
This study was a retrospective complete-case analysis under the ethical approval of IRAS 326704.
Study Cohort
A retrospective analysis of the UK-ROPE study (a prospectively collected database of patients from the UK) was conducted [
11]. Briefly, this was a national observational database of patients treated with PAE or surgical alternatives collated from 17 centres across the UK from January 2014 to July 2016. The inclusion criteria for this subanalysis were patients that underwent PAE and had complete set of records including age, prostate volume, Qmax (maximum urinary flow rate), post-void residual volume, Abrams-Griffiths number (urodynamics score), baseline IPSS and 1-year post-PAE IPSS. In this study, predictor variables were selected based on the number of full datasets available to maximise data points for use in predictions.
This multicentre dataset was combined with a separately collected dataset encompassing patients that underwent PAE at our single institution between 2012 and 2023. These patients also had the same complete sets of predictor variables.
Some of the data from our institution was randomly selected (utilising the ‘random’ function within python) and kept separate from any training data (Dataset 2). This was used for confirmation of findings and to assess generalisability of our model.
All remaining data were used for model development and validation (Dataset 1).
Model Development
All data analyses were conducted using Python programming environment. Established machine learning algorithms were implemented within the ‘scikit-learn’ library. These included linear regression, ridge regression, lasso regression, decision tree and random forests.
For the small sample size we have in this dataset, more complex methods such as neural networks were not appropriate as they will overtrain and performance will not generalise to other datasets.
The target variables to be predicted were:
(1)
Change in IPSS at 1 year (baseline IPSS–1-year IPSS)
(2)
Baseline IPSS—although all patients completed a baseline IPSS questionnaire, we also aimed to evaluate the model’s performance at predicting their baseline IPSS based only on objective clinical measures (termed ‘model-generated baseline IPSS’). The accuracy of this prediction was assessed by calculating the difference between the model-generated baseline IPSS and the actual observed baseline IPSS. This ‘model-generated error’ was then regressed against the change in IPSS (baseline IPSS–1-year IPSS).
Model development and validation were performed only on Dataset 1 (see Table
1). For this analysis, models were trained using a leave-out-one cross-validated approach (LOOCV). In this method, one data point from the Dataset 1 was singled out as a validation data point, whilst the remaining data served as the training set. This was carried out so that every data point in turn was the validation set. This method was used to maximise training data and improve performance given the small sample size. Metrics such as mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE) and where appropriate R
2 were used to account for continuous outcome variables (namely IPSS).
Table 1
Variables used in analysis for Dataset 1 (model development and validation), Dataset 2 (independent test set) and combined dataset
Age (years) | 66.8 (6.7) | 65.6 (8.4) | 66.7 (6.9) |
Prostate volume on US/TRUS/CT/MRI (cc) | 109.8 (60.7) | 91.9 (47.6) | 107.5 (59.4) |
Qmax (ml/s) | 8.5 (3.9) | 7.9 (2.6) | 8.4 (3.7) |
Residual volume (mls) | 179.0 (147.9) | 172.4 (158.6) | 178.2 (148.7) |
Abrams-Griffiths number | 75.0 (37.7) | 76.4 (35.6) | 75.1 (37.4) |
Baseline IPSS | 21.9 (5.9) | 23.2 (3.9) | 22.1 (5.7) |
IPSS at 1-year post-PAE | 10.6 (6.9) | 11.2 (6.7) | 10.7 (6.8) |
Change in IPSS | 11.3 (7.3) | 12.0 (6.6) | 11.4 (7.2) |
In addition to LOOCV, a further assessment of performance was made on an independent separate dataset (Dataset 2, n = 16). This was not used in any model training.
User Interface Design
‘RShiny’ Dashboard is an ‘R’ based package (available at
https://www.rstudio.com/products/shiny/) that has previously been utilised to allow clinician friendly use of computer-based healthcare tools [
12,
13]. This package was used to create a custom user interface that incorporated the final model for research purposes and would be subject to regulatory approval prior to routine clinical use.
Discussion
Our findings suggest that despite using a limited dataset, ML models can be used with routinely collected pre-procedure data to predict the change in IPSS at 1 year following PAE. Interestingly, the most effective way to predict patient outcome was by using purely objective clinical measures to create a ‘model-generated baseline IPSS’. The degree of error between this and the patients’ actual observed IPSS (termed the ‘model-generated baseline IPSS error’) significantly correlated with the ‘change in IPSS’ at 1-year post-procedure and can be used to predict individual patient outcomes with reasonable accuracy.
This finding might reflect a difference between objective and subjective measures of symptoms and points towards a potential psychological element of symptom evaluation through IPSS scoring. Certainly, patient expectations prior to procedures have been shown to significantly influence outcomes. Patients’ beliefs and perceptions about the forthcoming procedure can shape their psychological response, which in turn can influence physiological outcomes, and overall satisfaction with the procedure. For instance, a study by Ellingsen et al. [
14] demonstrated that negative expectations could intensify the experience of pain and discomfort. Moreover, when patients hold positive expectations, they are often more compliant with pre- and post-procedure instructions, leading to improved outcomes and decreased complication rates. Indeed, the opposite also applies, in that patients who were adequately informed and thus had clear expectations had shorter recovery times report higher satisfaction rates [
15]. This emphasises the importance of effective patient education and setting appropriate expectations to optimise both subjective and objective outcomes in interventional radiology. However, the objective nature of urodynamics is also controversial, e.g. there is some evidence that Qmax is effort dependent and influenced by intervention [
16,
17]. Therefore, this is a complex area and needs to be interpreted carefully within this context.
Given the therapeutic intent of PAE is to provide symptomatic relief, it is likely that a combination of psychological and biological factors would lead to symptom improvement. Thus, it remains of pertinent clinical utility to continue using both objective and subjective variables as inputs for any future developed model.
We also found that including a combination of routinely collected variables, notably, prostate volume and urodynamic variables can be used for prediction and is in line with previous studies that have identified prostate volume as a significant predictor of clinical success. Patients with larger prostatic volumes, often above 80 cc, have shown better symptomatic relief post-PAE as compared to those with smaller prostates [
18]. This also applied to our model, in which, increasing prostate volume predicts greater IPSS improvement. (We explored this by increasing prostate volume with our tool and observing the predicted change in IPSS rising.) However, critically, it was not this single variable alone that contributed to model performance. Instead our study utilised a combination of factors to predict IPSS outcomes, thereby benefiting from potential performance gains from variable combinations [
19]. Machine learning also provides a way in which clinical decision support tools can be improved on subsequent iterations, once additional data are trained, as well as being able to be deployed through interfaces such as ‘Rshiny’ [
12].
Notably, our model demonstrated applicability to a separate, blinded dataset (Dataset 2), enhancing the generalisability of our findings. This underscores the potential benefits of establishing a more comprehensive registry of PAE patients. Such an expanded registry could significantly improve model performance, offering deeper insights into patient outcomes and optimising treatment strategies.
Whilst these initial results are promising, it is important to acknowledge the limitations of our study. Firstly, the sample size might mean our models are not representative of the general populations. However, some generalisability was assessed by testing our models on a blinded independent dataset and in part by being trained on multicentre level data. We were also restricted in selecting variables that had full data, as most ML algorithms require complete data. Whilst a single radiological parameter (namely prostate volume) has been used in our models, other radiomic markers might be relevant and imaging data has not been fully utilised. Any future work would also include this readily available and now routinely collected data type, especially given the advantages of performing pre-procedure CT for planning [
20]. In addition, clinical measures from formal urodynamic studies were an important component to the ML model. As many centres do not routinely perform urodynamics prior to PAE, this reduces the wider utility of the model and findings.
Furthermore, we emphasise the use of these models as a tool to support clinicians in their decision-making and not to be used as a triaging software independent of clinical oversight.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.