Automatic text classification to support systematic reviews in medicine

https://doi.org/10.1016/j.eswa.2013.08.047Get rights and content

Highlights

  • Systematic reviews support evidence-based medicine, but are expensive to produce.

  • Text classification can support the screening phase of Systematic Reviews.

  • Determining the best classification parameters is key for good results.

  • Selecting certain sections of the articles can make significant difference.

  • Positive results support this technology as a tool for systematic reviews.

Abstract

Medical systematic reviews answer particular questions within a very specific domain of expertise by selecting and analysing the current pertinent literature. As part of this process, the phase of screening articles usually requires a long time and significant effort as it involves a group of domain experts evaluating thousands of articles in order to find the relevant instances. Our goal is to support this process through automatic tools. There is a recent trend of applying text classification methods to semi-automate the screening phase by providing decision support to the group of experts, hence helping reduce the required time and effort. In this work, we contribute to this line of work by performing a comprehensive set of text classification experiments on a corpus resulting from an actual systematic review in the area of Internet-Based Randomised Controlled Trials. These experiments involved applying multiple machine learning algorithms combined with several feature selection techniques to different parts of the articles (i.e., titles, abstract, or both). Results are generally positive in terms of overall precision and recall measurements, reaching values of up to 84%. It is also revealing in terms of how using only article titles provides virtually as good results as when adding article abstracts. Based on the positive results, it is clear that text classification can support the screening stage of medical systematic reviews . However, selecting the most appropriate machine learning algorithms, related methods, and text sections of articles is a neglected but important requirement because of its significant impact to the end results.

Introduction

Medical Systematic Reviews support the conversion of medical research into practice by bringing together the collection of existing studies that are relevant to a specific medical question. This synthesis of current evidence benefits different stakeholders such as clinicians and policymakers.

Although Systematic Reviews started as early as the 18th century (Lind, 1753), their production exploded after the second half of the 20th century along with a significant increment of publications in medical, nursing, and allied health care (Shonjania & Bero, 2001). Unfortunately, the significant growth of clinical trials in the last decades, has not been matched by a suitable number of systematic reviews produced (Bastian et al., 2010). An analysis of the situation at the time revealed that because the amount of work required to produce reviews is increasing, there was a majority of systematic reviews with many years out of date (Shojania et al., 2007).

The general process for creating a systematic review is based on three main steps: (i) conducting broad searches in the relevant literature, (ii) manually screenning titles and abstract of retrieved citations, and (iii) reviewing full articles of those citations identified as relevant. No matter how critical and necessary these steps are, they are very time consuming, especially the screening of citations and the review of candidate studies.

Multiple text mining techniques have been gaining popularity over the past years as a consequence of the ever increasing amount of available digital documents of unstructured text and, thus, the necessity of analysing their content in flexible ways (Hearst, 1999). From these techniques, one of the most prominent is text classification using machine learning, which consists of automatically predicting one or more suitable categories for unstructured texts written in natural language (e.g., English, Spanish, etc.). Text classification is currently a major research area with many commercial and research applications in a large number of domains. Medicine is one of the most evident areas where text mining methods have multiple applications, such as the discovery of new literature (Swanson, 1986), concept-based search (Ide, Loane, & Demner-Fushman, 2007), or automatic bibliographic update in clinical guidelines (Iruetaguena et al., 2013).

This work was motivated by the hypothesis that text classification could assist the production of Systematic Reviews by supporting reviewers in their process of manually screening published articles. Although this assumption is not new, as there has been recently an incipient while still modest body of research in this direction (Thomas, McNaught, & Ananiadou, 2011), our contribution is focused on: (i) studying the application of a comprehensive selection of machine learning algorithms, (ii) combining these algorithms with multiple feature selection methods and different numbers of features, (iii) selecting different parts of citations (i.e., title, abstract, or both), and (iv) applying these methods to the medical domain of Internet-Based Randomised Controlled Trials.

In such a way, an automatic text classification system could be trained with a set of articles from the medical domain in question after the collection of studies had been already manually screened. As these articles describing primary studies had been manually labelled as either relevant or irrelevant, they fit well with the paradigm of a two-class text classifier. Once the system was trained, it was ready to automatically classify unseen articles, therefore providing input into the screening process similarly to a human expert. In consequence, this system would not aim at replacing the persons involved in the decision process but to complement and assist them. Contrary to other previous studies covered by Section 4, where they directly selected either the abstract of the full article to train and test the classifiers, we were interested in investigating what sections of the articles provided the best results. We also applied a bigger variety of classifiers than other previous studies, in addition to multiple feature selection methods.

This paper is organised as follows. Section 2 describes the methods used in this work to automatically classify articles. Section 3 describes the manual process for performing systematic reviews in medicine and how it can be supported by text classification. Previous efforts in this area of research are described in Section 4. The design and analysis of the experiments proposed to validate our hypothesis is provided by Section 5 The paper concludes with Section 6, which also suggests some ideas for future work.

Section snippets

Text classification

Text mining consists of discovering of previously unknown information from existing text resources (Hearst, 1999). It is also called intelligent text analysis, text data mining, or knowledge-discovery in text. Text mining is related to data mining, which intends to extract useful patterns from structured text or data usually stored in large database repositories. Instead, text mining searches for patterns in unstructured natural language texts (e.g., books, articles, e-mail messages, Web pages,

Medical systematic reviews

A systematic review consists of synthesising the relevant published literature representing the high-quality research evidence that answers a specific research question (Sackett, Rosenberg, Gray, Haynes, & Richardson, 1996). Systematic reviews provide the foundation for Evidence-based Medicine (Greenhalgh, 2010), which is based on the premise that medical knowledge, based on the accumulation of results from multiple scientific studies, is more reliable than expert opinion. Although

Related work

One of the first attempts of testing a similar hypothesis was reported by Aphinyanaphongs, Tsamardinos, Statnikov, and Hardin (2005). They applied naïve bayes and SVM text classifiers to a corpus of internal medicine articles from the ACP Journal Club to discover that SVM offered the best performance in terms of sensitivity, specificity, and precision.

More recently, Wallace, Trikalinos, Lau, Brodley, and Schmid (2010) used a SVM classifier to three different collections of article abstracts

Validation

We considered appropriate to empirically evaluate our hypothesis through a validation exercise. It is important to differentiate verification and validation. The first consists of merely confirming that the implemented system works sensibly and fulfils expectations, while the latter requires an accurate analysis of experimental results with in relation to an existing set of actual results (Mihram, 1972).

For this purpose, Section 5.1 provides details on the characteristics of the collection of

Conclusions and future work

We empirically evaluated the application of automatic text classification to the process of medical systematic reviews, in order to facilitate the manual process carried out by experts during the citation screening phase. The experiments involved multiple classification algorithms combined with several feature selection methods, and number of features, applied to different parts of the given articles. The analysis of these experiments showed overall positive results, especially when using the

Acknowledgements

The corpus used in this work was provided by Anne Brice from the Critical Appraisal Skills Programme in Oxford, UK.

This work was supported by funding received from the Department of Education, Universities and Research of the Basque Government (Grant No. BFI-09-270), the UPV/EHU [GIU08/27, INF10/58, GIU11/28 and UFI11/19], Gipuzkoa Regional Council [OF53/2011], the Department of Industry, Commerce and Tourism — Basque Government [S-PE09UN60 and S-PE11UN115], and the Spanish Ministry of Science

References (43)

  • Hans-Hermann Dubben et al.

    Systematic review of publication bias in studies on publication bias

    BMJ

    (2005)
  • Mohamed Fayad et al.

    Object-oriented application frameworks

    Communications of the ACM

    (1997)
  • Frunza, Oana, Inkpen, Diana, Matwin, Stan, Klement, William, Peter O’Blenis (2011). Exploiting the systematic review...
  • J.J. García Adeva et al.

    Mining text with pimiento

    IEEE Internet Computing

    (2006)
  • Greenhalgh, T. (2010). How to read a paper: The basics of evidence-based medicine. HOW – How To. Wiley. ISBN...
  • Hearst, Marti A. (1999). Untangling text data mining. In Proceedings of the 37th conference on association for...
  • T. Joachims

    Learning to classify text using support vector machines – Methods, theory, and algorithms

    (2002)
  • Kitchenham, B., Brereton, O. P., Budgen, D., Turner, M., Bailey, J., & Linkman, S. (2009). Systematic literature...
  • Kohavi, Ron (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In IJCAI...
  • David D. Lewis

    Naive (bayes) at forty: The independence assumption in information retrieval

  • Lind, James (1753). A treatise of the scurvy. In three parts. Containing an inquiry into the nature, causes and cure,...
  • Cited by (102)

    • Automatic classification of literature in systematic reviews on food safety using machine learning

      2022, Current Research in Food Science
      Citation Excerpt :

      These more difficult articles are the distinguishing factor between the performance of the models. The success of the SVM both as a single model and combined in an ensemble is in line with previous work, where four out of the six studies were most successful with an SVM (Wallace et al., 2010; Bekhuis and Demner-Fushman, 2012; García Adeva et al., 2014; Timsina et al., 2016). SVMs have historically always performed well on text classification (Yang and Liu, 1999; Zhang and Oles, 2001; Mohammad et al., 2016), because of their ability to generalize well on a large number of features (Joachims, 1998; Leopold and Kindermann, 2002).

    • A Multi-Channel Convolutional Neural Network approach to automate the citation screening process

      2021, Applied Soft Computing
      Citation Excerpt :

      This results in that reviewers find it challenging to include the most relevant articles (i.e., high recall) while excluding the most irrelevant articles (i.e., high precision). As reviewers must determine an article’s inclusion, the citation screening process can be viewed as a binary classification problem [4–7]. As a result, machine learning methods can reduce time costs and increase precision and recall.

    View all citing articles on Scopus
    View full text