Elsevier

NeuroImage

Volume 56, Issue 2, 15 May 2011, Pages 809-813
NeuroImage

Machine learning classification with confidence: Application of transductive conformal predictors to MRI-based diagnostic and prognostic markers in depression

https://doi.org/10.1016/j.neuroimage.2010.05.023Get rights and content

Abstract

There is rapidly accumulating evidence that the application of machine learning classification to neuroimaging measurements may be valuable for the development of diagnostic and prognostic prediction tools in psychiatry. However, current methods do not produce a measure of the reliability of the predictions. Knowing the risk of the error associated with a given prediction is essential for the development of neuroimaging-based clinical tools. We propose a general probabilistic classification method to produce measures of confidence for magnetic resonance imaging (MRI) data. We describe the application of transductive conformal predictor (TCP) to MRI images. TCP generates the most likely prediction and a valid measure of confidence, as well as the set of all possible predictions for a given confidence level. We present the theoretical motivation for TCP, and we have applied TCP to structural and functional MRI data in patients and healthy controls to investigate diagnostic and prognostic prediction in depression. We verify that TCP predictions are as accurate as those obtained with more standard machine learning methods, such as support vector machine, while providing the additional benefit of a valid measure of confidence for each prediction.

Introduction

Using support vector machine algorithms (Vapnik, 1995), we have been investigating potential neuroimaging markers for psychiatric disorders (Fu et al., 2008a, Marquand et al., 2008, Costafreda et al., 2009a, Costafreda et al., 2009b). As a diagnostic marker of depression, we found 86% accuracy in identifying individual patients from the functional MRI pattern of brain activity to sad faces (Fu et al., 2008a), while the neural features of verbal working memory, as expected, showed reduced diagnostic accuracy (Marquand et al., 2008). To investigate predictive markers of clinical response, MRI data were acquired in patients experiencing an acute depressive episode who were not taking any medication and before they had begun therapy. In prospective studies, patients then received treatment with antidepressant medication (Fu et al., 2004, Fu et al., 2007, Walsh et al., 2007) or individual psychotherapy, i.e. cognitive behavioral therapy (Fu et al., 2008b). We found that structural MRI features were highly predictive of an individual patient's full clinical response to antidepressant medication with an accuracy of 89% (Costafreda et al., 2009a), while functional MRI responses showed the greatest predictive potential for cognitive behavioral therapy with an accuracy of 79% (Costafreda et al., 2009b). Machine learning analysis of MRI data has also demonstrated high diagnostic accuracy in other disorders, such as Alzheimer's disease (Klöppel et al., 2008) and schizophrenia (Davatzikos et al., 2005). Together, these findings point to the potential of applying machine learning methods to achieve clinically useful diagnostic and prognostic neurobiomarkers based on the pattern of brain activity and structure in psychiatric disorders.

However, a significant obstacle in advancing machine learning algorithms in clinical practice lies in the type of output that they typically provide. Most algorithms only generate a categorical classification, e.g. a “yes” or “no” diagnosis. An essential requirement for clinical applications of machine learning predictions is a measure of the quality of the predictions, also referred to as the confidence of the classification output (Klöppel et al., 2008). While classical statistical methods may produce confidence levels, they are usually applicable to low-dimensional data and include specific assumptions. In the present study, we have adapted a general probabilistic classification method to establish measures of confidence for neuroimaging data.

We propose to apply transductive conformal predictors (hereby referred to as TCP or confidence machines) to generate confidence measures for neuroimaging-based predictions. We have summarized the description of the method presented in Vovk et al., 2005, Gammerman and Vovk, 2007. TCP is a novel method which has been successfully applied to the clinical diagnosis of cancer from proteomics data (Gammerman et al., 2008). TCP may be built using various machine learning algorithms including support vector machines and is thus a natural extension of our work to date (Fu et al., 2008a, Marquand et al., 2008, Costafreda et al., 2009a, Costafreda et al., 2009b).

In the TCP approach, confidence measures are based solely on the randomness assumption, i.e. the assumption that the training and test examples are produced independently and identically from the same distribution, which is also referred to as the iid assumption. The premise of TCP is twofold: to try every possible prediction label as a candidate for a new example, such as each of two possible diagnostic labels: patient and healthy control, and to measure how well the resulting sequence of training and test examples conforms to the randomness assumption. When we assign the correct label to a new example, the randomness assumption is still satisfied in the resulting sequence. However, if we assign an incorrect label to a new example, the randomness assumption is no longer satisfied and the resulting sequence will appear “strange” (or “atypical” within the randomness assumption or simply “non-random”).

The TCP algorithm measures the “strangeness” of the data sequence and computes a probability for each possible label which reflects how well each one conforms to the randomness assumption. If the probability for a given label is low, then the randomness assumption was wrong or a rare event has occurred, which leads to the rejection of that label. Therefore, TCP does not require assumptions other than iid, unlike statistical methods that compute probabilistic predictions based on specific parametric models which are often complemented by a prior distribution on the parameters. Specifically, TCP assigns an individual confidence to each new example that is equal to one minus the minimal significance level at which all prediction labels but one are rejected under the iid assumption, and the remaining label is the announced label. TCP yields accurate and reliable predictions that are complemented by quantitative measures of their quality. These measures can also be used to generate predictions at a desired confidence level which are well-calibrated, in the sense of controlling the probability of issuing erroneous predictions (Vovk, 2002).

In the present study, we present the theoretical motivation of TCP, and we have applied TCP to structural and functional MRI data in patients and healthy individuals to investigate diagnostic and prognostic prediction in depression. For diagnostic prediction, we employed a functional MRI dataset of implicit processing of sad facial expressions in patients with depression and matched healthy controls (Fu et al., 2004, Fu et al., 2008a). For prognostic prediction, we examined the structural MRI scans of these patients while they were in an acute depressive episode, prior to the initiation of any treatment, and clinical response was assessed prospectively following 8 weeks of treatment with an antidepressant medication (Fu et al., 2004, Costafreda et al., 2009a). We sought to develop an algorithm that: (1) provides classification for MRI images from a database, and (2) generates a confidence estimate for each classification output.

Section snippets

Materials and methods

Given a standard machine-learning problem: a training set of examples (x1,y1),…,(x1,y1),, every example zi = (xi,yi) consists of its object xi and its label yi. We are also given a test object xn, while the actual label yn is withheld from us. Our goal is to predict the label yn.

Confidence of predictions is obtained under the general independent and identically distributed (iid) assumption or randomness assumption (Vovk et al., 2005, Gammerman and Vovk, 2007). There is a stochastic mechanism that

Discussion

We have applied TCP to functional and structural MRI data in order to generate diagnostic and prognostic decisions at the individual level. We have found that TCP is as accurate as the usual “forced” predictions in our benchmark datasets. Moreover, the advantage of TCP for psychiatric classification is that they provide measures of confidence which are given to each diagnostic or prognostic decision and thus the risk of an erroneous clinical decision is known for a given individual. We have

Acknowledgments

The authors would like to acknowledge the support of a Medical Research Council (UK) Discipline Hopping Grant.

References (15)

There are more references available in the full text version of this article.

Cited by (139)

  • Classifying rockburst with confidence: A novel conformal prediction approach

    2024, International Journal of Mining Science and Technology
  • Artificial Intelligence in Clinical Psychology

    2022, Comprehensive Clinical Psychology, Second Edition
  • Cost-effectiveness of neuroimaging technologies in management of psychiatric and insomnia disorders: A meta-analysis and prospective cost analysis

    2021, Journal of Neuroradiology
    Citation Excerpt :

    We analyzed 6181 patients in the intervention groups and 6488 patients in the control groups. Overall, 28 studies used fMRI,14–40 23 used MRI,41–74 3 used PET scan,56,75,76 1 used SPECT,77 23 used rs-fMRI,78–100 and 5 used sMRI40,94,101–103 (see PRISMA flow diagram in Supplementary 1). Based on the results of meta-analysis regarding psychiatric disorders, the highest scores were attributed to fMRI and sMRI with specificity of 85% (95% CI: 82–89% and 0.76−0.94%, respectively).

View all citing articles on Scopus
View full text