Automatic text classification to support systematic reviews in medicine
Introduction
Medical Systematic Reviews support the conversion of medical research into practice by bringing together the collection of existing studies that are relevant to a specific medical question. This synthesis of current evidence benefits different stakeholders such as clinicians and policymakers.
Although Systematic Reviews started as early as the 18th century (Lind, 1753), their production exploded after the second half of the 20th century along with a significant increment of publications in medical, nursing, and allied health care (Shonjania & Bero, 2001). Unfortunately, the significant growth of clinical trials in the last decades, has not been matched by a suitable number of systematic reviews produced (Bastian et al., 2010). An analysis of the situation at the time revealed that because the amount of work required to produce reviews is increasing, there was a majority of systematic reviews with many years out of date (Shojania et al., 2007).
The general process for creating a systematic review is based on three main steps: (i) conducting broad searches in the relevant literature, (ii) manually screenning titles and abstract of retrieved citations, and (iii) reviewing full articles of those citations identified as relevant. No matter how critical and necessary these steps are, they are very time consuming, especially the screening of citations and the review of candidate studies.
Multiple text mining techniques have been gaining popularity over the past years as a consequence of the ever increasing amount of available digital documents of unstructured text and, thus, the necessity of analysing their content in flexible ways (Hearst, 1999). From these techniques, one of the most prominent is text classification using machine learning, which consists of automatically predicting one or more suitable categories for unstructured texts written in natural language (e.g., English, Spanish, etc.). Text classification is currently a major research area with many commercial and research applications in a large number of domains. Medicine is one of the most evident areas where text mining methods have multiple applications, such as the discovery of new literature (Swanson, 1986), concept-based search (Ide, Loane, & Demner-Fushman, 2007), or automatic bibliographic update in clinical guidelines (Iruetaguena et al., 2013).
This work was motivated by the hypothesis that text classification could assist the production of Systematic Reviews by supporting reviewers in their process of manually screening published articles. Although this assumption is not new, as there has been recently an incipient while still modest body of research in this direction (Thomas, McNaught, & Ananiadou, 2011), our contribution is focused on: (i) studying the application of a comprehensive selection of machine learning algorithms, (ii) combining these algorithms with multiple feature selection methods and different numbers of features, (iii) selecting different parts of citations (i.e., title, abstract, or both), and (iv) applying these methods to the medical domain of Internet-Based Randomised Controlled Trials.
In such a way, an automatic text classification system could be trained with a set of articles from the medical domain in question after the collection of studies had been already manually screened. As these articles describing primary studies had been manually labelled as either relevant or irrelevant, they fit well with the paradigm of a two-class text classifier. Once the system was trained, it was ready to automatically classify unseen articles, therefore providing input into the screening process similarly to a human expert. In consequence, this system would not aim at replacing the persons involved in the decision process but to complement and assist them. Contrary to other previous studies covered by Section 4, where they directly selected either the abstract of the full article to train and test the classifiers, we were interested in investigating what sections of the articles provided the best results. We also applied a bigger variety of classifiers than other previous studies, in addition to multiple feature selection methods.
This paper is organised as follows. Section 2 describes the methods used in this work to automatically classify articles. Section 3 describes the manual process for performing systematic reviews in medicine and how it can be supported by text classification. Previous efforts in this area of research are described in Section 4. The design and analysis of the experiments proposed to validate our hypothesis is provided by Section 5 The paper concludes with Section 6, which also suggests some ideas for future work.
Section snippets
Text classification
Text mining consists of discovering of previously unknown information from existing text resources (Hearst, 1999). It is also called intelligent text analysis, text data mining, or knowledge-discovery in text. Text mining is related to data mining, which intends to extract useful patterns from structured text or data usually stored in large database repositories. Instead, text mining searches for patterns in unstructured natural language texts (e.g., books, articles, e-mail messages, Web pages,
Medical systematic reviews
A systematic review consists of synthesising the relevant published literature representing the high-quality research evidence that answers a specific research question (Sackett, Rosenberg, Gray, Haynes, & Richardson, 1996). Systematic reviews provide the foundation for Evidence-based Medicine (Greenhalgh, 2010), which is based on the premise that medical knowledge, based on the accumulation of results from multiple scientific studies, is more reliable than expert opinion. Although
Related work
One of the first attempts of testing a similar hypothesis was reported by Aphinyanaphongs, Tsamardinos, Statnikov, and Hardin (2005). They applied naïve bayes and SVM text classifiers to a corpus of internal medicine articles from the ACP Journal Club to discover that SVM offered the best performance in terms of sensitivity, specificity, and precision.
More recently, Wallace, Trikalinos, Lau, Brodley, and Schmid (2010) used a SVM classifier to three different collections of article abstracts
Validation
We considered appropriate to empirically evaluate our hypothesis through a validation exercise. It is important to differentiate verification and validation. The first consists of merely confirming that the implemented system works sensibly and fulfils expectations, while the latter requires an accurate analysis of experimental results with in relation to an existing set of actual results (Mihram, 1972).
For this purpose, Section 5.1 provides details on the characteristics of the collection of
Conclusions and future work
We empirically evaluated the application of automatic text classification to the process of medical systematic reviews, in order to facilitate the manual process carried out by experts during the citation screening phase. The experiments involved multiple classification algorithms combined with several feature selection methods, and number of features, applied to different parts of the given articles. The analysis of these experiments showed overall positive results, especially when using the
Acknowledgements
The corpus used in this work was provided by Anne Brice from the Critical Appraisal Skills Programme in Oxford, UK.
This work was supported by funding received from the Department of Education, Universities and Research of the Basque Government (Grant No. BFI-09-270), the UPV/EHU [GIU08/27, INF10/58, GIU11/28 and UFI11/19], Gipuzkoa Regional Council [OF53/2011], the Department of Industry, Commerce and Tourism — Basque Government [S-PE09UN60 and S-PE11UN115], and the Spanish Ministry of Science
References (43)
- et al.
Text categorization models for high-quality retrieval in internal medicine
JAMIA
(2005) - et al.
Screening nonrandomized studies for medical systematic reviews: A comparative study of classifiers
Artificial Intelligence in Medicine
(2012) - et al.
Essie: A concept-based search engine for structured biomedical text
Journal of the American Medical Informatics Association
(2007) - et al.
Automatic retrieval of current evidence to support update of bibliography in clinical guidelines
Expert Systems with Applications
(2013) - et al.
Term-weighting approaches in automatic text retrieval
Information Processing and Management
(1988) - et al.
The use of bigrams to enhance text categorization
Information Processing Management
(2002) - Aggarwal, Charu C., & Zhai, ChengXiang (2012). A survey of text classification algorithms. In Mining text data (pp....
- et al.
Modern information retrieval
(1999) - Bastian, Hilda, Glasziou, Paul, & Chalmers, Iain (2010). Seventy-five trials and eleven systematic reviews a day: How...
A tutorial on support vector machines for pattern recognition
Data Mining and Knowledge Discovery
(1998)
Systematic review of publication bias in studies on publication bias
BMJ
Object-oriented application frameworks
Communications of the ACM
Mining text with pimiento
IEEE Internet Computing
Learning to classify text using support vector machines – Methods, theory, and algorithms
Naive (bayes) at forty: The independence assumption in information retrieval
Cited by (102)
Knowledge graphs to an analysis and visualization of texts from scientific articles
2023, Procedia Computer ScienceClustering of scientific articles using natural language processing
2022, Procedia Computer ScienceAutomatic classification of literature in systematic reviews on food safety using machine learning
2022, Current Research in Food ScienceCitation Excerpt :These more difficult articles are the distinguishing factor between the performance of the models. The success of the SVM both as a single model and combined in an ensemble is in line with previous work, where four out of the six studies were most successful with an SVM (Wallace et al., 2010; Bekhuis and Demner-Fushman, 2012; García Adeva et al., 2014; Timsina et al., 2016). SVMs have historically always performed well on text classification (Yang and Liu, 1999; Zhang and Oles, 2001; Mohammad et al., 2016), because of their ability to generalize well on a large number of features (Joachims, 1998; Leopold and Kindermann, 2002).
A decision support system for automating document retrieval and citation screening
2021, Expert Systems with ApplicationsA Multi-Channel Convolutional Neural Network approach to automate the citation screening process
2021, Applied Soft ComputingCitation Excerpt :This results in that reviewers find it challenging to include the most relevant articles (i.e., high recall) while excluding the most irrelevant articles (i.e., high precision). As reviewers must determine an article’s inclusion, the citation screening process can be viewed as a binary classification problem [4–7]. As a result, machine learning methods can reduce time costs and increase precision and recall.
Automation of systematic literature reviews: A systematic literature review
2021, Information and Software Technology