A comparison of three computational modelling methods for the prediction of virological response to combination HIV therapy
Introduction
Despite the approval of more than 20 antiretroviral drugs, HIV treatment failure due to drug resistance still occurs. HIV genotyping is recommended by a range of HIV treatment guidelines and is commonly employed to help the selection of a new regimen to re-establish viral suppression [1], [2], [3]. However, the complexity of resistance patterns and the expanding range of therapeutic options available have made the interpretation of genotype results in order to optimise virological treatment response extremely challenging [1]. A number of interpretation systems have been developed that relate HIV genotype to single antiretroviral drug susceptibility using different ‘rules’ or algorithms [for example, [4], [5], [6], [7]] and relational databases have been used to predict resistance to specific drugs by matching a test genotype with archived genotypic and phenotypic data [8], [9]. There is no recognised standard interpretation system and different systems can produce different results from the same genotype [10], [11], [12], [13].
Several groups have explored the use of bioinformatics to address the challenges of genotype interpretation and response prediction [14 for a review]. For example, artificial neural networks (ANN) [15], decision trees [16], support vector machines (SVM) [9] or phenotype matching in relational databases [17] have all been used to predict phenotype from genotype. Other groups have gone further to relate the predicted phenotype of individual drugs to virological response. However, the relationship between phenotype and response to combination therapy is not well characterized and attempting to infer response from genotype via the intermediate step of predicted phenotype has serious limitations [18]. Most of the groups that have attempted this have related predicted phenotype to a categorical prediction of response, with cut-offs in predicted fold-changes in phenotypic sensitivity linked to clinical response [e.g. 19]. However, in terms of potential clinical utility, a strong case can be made for predicting response to combination therapy (rather than individual drugs) as a continuous variable [20], directly from genotype. Given the complexity of the drug and genotype permutations the main obstacle facing this approach is the size of the dataset required [21].
The HIV Resistance Response Database Initiative (RDI) is a not-for-profit organization set up to establish a large clinical database and develop bioinformatic techniques to define the relationships between HIV resistance and virological response to treatment. It is hoped that this approach might potentially overcome some of the limitations of current interpretation systems [22]. The development of the database is an international collaboration and data from more than 50,000 HIV patients have already been provided by a variety of private and public research groups.
The ultimate aim is to develop computational models that are able to predict treatment response accurately from genotype and other clinically relevant information, which will then be made freely accessible as an aid to treatment selection.
We recently demonstrated that ANN models trained with datasets from multiple clinical sources can be accurate predictors of virological response to combination therapy [23]. Here we tested the accuracy of two alternative computation modelling methods, namely random forests (RF) and SVM, and compare their performance individually and in combination with that of ANN models, using the same dataset.
The principle of RF is to grow many decision trees in parallel. For a given sample, votes are carried out over all the trees in the forest. The individual trees are built using different sets of samples from the original training dataset. In each node of a tree, the splitting feature is selected from a randomly chosen sample of features. In RF modelling, the training datasets of the individual trees are built by bootstrap replication, leaving about one-third of the samples out of the bootstrap sample, which are used for validation. The injection of randomness makes RF highly resistant to over-fitting [24], [25]. A disadvantage of RF is that the model is complex and cannot be visualised like a single tree [25].
The principle of SVM is to map the data into a high-dimensional feature space and then perform linear regression. SVM searches for a global solution and does not control model complexity by keeping the number of input variables small [26], [27]. It is considered more resistant to ‘over-fitting’ based on the training dataset and, therefore, potentially more generalisable to new data [28]. The drawback of SVM is its high algorithmic complexity [29].
Section snippets
Data
The basic package of information that is used for the training of the RDI's models is the treatment change episode (TCE) as illustrated in Fig. 1 [30]. This comprises key information required by the models from a patient who has had a new treatment started, in order to develop a prediction of virological response. It includes baseline genotype, viral load, CD4+ T-lymphocyte (CD4) count and other information as well as the follow-up viral load value: the response variable that the models are
Results
The correlations and absolute differences between the individual models’ predictions and the actual ΔVL values are summarised in Table 1. These results reveal marked differences between the different methods. The r2 of the individual ANN models varied from 0.318 to 0.546, with a mean (SD) of 0.394 (0.068) and a coefficient of variation of 18%. The r2 of the individual RF models varied from 0.590 to 0.751, with a mean (SD) of 0.674 (0.056) and a coefficient of variation of 8%. The r2 of the
Discussion
In terms of the main measure of the correlation between predicted and actual virological response, individual ANN models performed significantly worse in their predictions of virological response to HIV therapy than RF and SVM models and their predictions were significantly more variable that were those of RF models.
The use of a model committee substantially improved the accuracy of the ANN predictions. For example, the r2 of the ANN committee was 0.689 while the average of the r2 of the
Acknowledgements
This research has been funded with Federal Funds from the National Cancer Institute, National Institutes of Health, under contract No. NO1-CO-12400.
The authors would also like to acknowledge the following institutions and research groups for the provision of data to the RDI: National Institute of Allergy and Infectious Diseases, USA; BC Centre for Excellence in HIV/AIDS, Vancouver, BC Canada; USA Military HIV Research Program; ICONA; The Italian ARCA database; Hospital Clinic of Barcelona,
References (33)
- et al.
Correlation between rules-based interpretation and virtual phenotype interpretation of HIV-1 genotypes for predicting drug resistance in HIV-infected individuals
J Virol Methods
(2004) - et al.
Machine learning approaches for estimation of prediction interval for the model output
Neural Networks
(2006) - et al.
Antiretroviral drug resistance testing in adult HIV-1 infection: 2008 recommendations of an International AIDS Society-USA panel
Clin Infect Dis
(2008) - Department of Health and Human Services Panel on Antiretroviral Guidelines for Adults and Adolescents. Guidelines for...
- et al.
Updated European recommendations for the clinical use of HIV drug resistance testing
Antivir Ther
(2004) - et al.
An algorithm-based genotypic resistance score is associated with clinical outcome in HIV-1-infected adults on antiretroviral therapy
HIV Med
(2004) - et al.
Simple linear model provides highly accurate genotypic predictions of HIV-1 drug resistance
Antivir Ther
(2004) - et al.
Variety of interpretation systems for human immunodeficiency virus type 1 genotyping: confirmatory information or additional confusion?
Curr Drug Targets Infect Disord
(2003) - et al.
Resistance assay interpretation systems vary widely in method and approach
Antivir Ther
(2001) - et al.
Geno2pheno: estimating phenotypic drug resistance from HIV-1 genotypes
Nucleic Acids Res
(2003)
Online comparison of HIV-1 drug resistance algorithms identifies rates and causes of discordant interpretations
Antivir Ther
Comparison between rules-based human immunodeficiency virus type 1 genotype interpretations and real or virtual phenotype: concordance analysis and correlation with clinical outcome in heavily treated patients
J Infect Dis
Comparison of nine resistance interpretation systems for HIV-1 genotyping
Antivir Ther
Variable prediction of antiretroviral treatment outcome by different systems for interpreting genotypic human immunodeficiency virus type 1 drug resistance
J Infect Dis
Computational models for the design of effective therapies against drug resistant HIV strains
Bioinformatics
Enhanced prediction of lopinavir resistance from genotype by use of artificial neural networks
J Infect Dis
Cited by (50)
Artificial intelligence and machine learning assisted drug delivery for effective treatment of infectious diseases
2021, Advanced Drug Delivery ReviewsA survey of machine learning applications in HIV clinical research and care
2017, Computers in Biology and MedicineCitation Excerpt :The selection of the combination of weaker learning methods is made in such a way as to maximize the prediction power of the ensemble algorithm. Ensemble methods include boosting, bootstrap aggregation (bagging), stacking/blending, random forests (RF) [16,21] and their modifications [22]. On the other hand, unsupervised learning involves the analysis of unlabeled (no distinction between input and output) data under assumptions about the structural properties of the data (e.g., algebraic, combinatorial, or probabilistic).
Predicting CD4 count changes among patients on antiretroviral treatment: Application of data mining techniques
2017, Computer Methods and Programs in BiomedicineCitation Excerpt :CART also predicted Global HIV/AIDS prevalence patterns with 95% level of accuracy [17]. Other attempts to apply data mining algorithms in HIV/AIDS medicine shows, ANN, support vector machine (SVM) and Random Forest predicted viroligic response to combination HIV therapy with comparable level of accuracy to a committee of ANN models [18]. Providing CD4 count service has been challenging in developing countries particularly regarding technology selection, developing laboratory infrastructure, human resource, cost-effectiveness, instrument cost and maintenance, and ensuring testing access.
Artificial Intelligence and Machine Learning Based Prediction of Viral Load and CD4 Status of People Living with HIV (PLWH) on Anti-Retroviral Treatment in Gedeo Zone Public Hospitals
2023, International Journal of General MedicineStudying patterns and predictors of HIV viral suppression using A Big Data approach: a research protocol
2022, BMC Infectious Diseases