ABSTRACT
Background: Since the introduction of the systematic review process to Software Engineering in 2004, researchers have investigated a number of ways to mitigate the amount of effort and time taken to filter through large volumes of literature.
Aim: This study aims to provide a critical analysis of text mining techniques used to support the citation screening stage of the systematic review process.
Method: We critically re-reviewed papers included in a previous systematic review which addressed the use of text mining methods to support the screening of papers for inclusion in a review. The previous review did not provide a detailed analysis of the text mining methods used. We focus on the availability in the papers of information about the text mining methods employed, including the description and explanation of the methods, parameter settings, assessment of the appropriateness of their application given the size and dimensionality of the data used, performance on training, testing and validation data sets, and further information that may support the reproducibility of the included studies.
Results: Support Vector Machines (SVM), Naïve Bayes (NB) and Committee of classifiers (Ensemble) are the most used classification algorithms. In all of the studies, features were represented with Bag-of-Words (BOW) using both binary features (28%) and term frequency (66%). Five studies experimented with n-grams with n between 2 and 4, but mostly the unigram was used. χ2, information gain and tf-idf were the most commonly used feature selection techniques. Feature extraction was rarely used although LDA and topic modelling were used. Recall, precision, F and AUC were the most used metrics and cross validation was also well used. More than half of the studies used a corpus size of below 1,000 documents for their experiments while corpus size for around 80% of the studies was 3,000 or fewer documents. The major common ground we found for comparing performance assessment based on independent replication of studies was the use of the same dataset but a sound performance comparison could not be established because the studies had little else in common. In most of the studies, insufficient information was reported to enable independent replication. The studies analysed generally did not include any discussion of the statistical appropriateness of the text mining method that they applied. In the case of applications of SVM, none of the studies report the number of support vectors that they found to indicate the complexity of the prediction engine that they use, making it impossible to judge the extent to which over-fitting might account for the good performance results.
Conclusions: There is yet to be concrete evidence about the effectiveness of text mining algorithms regarding their use in the automation of citation screening in systematic reviews. The studies indicate that options are still being explored, but there is a need for better reporting as well as more explicit process details and access to datasets to facilitate study replication for evidence strengthening. In general, the reader often gets the impression that text mining algorithms were applied as magic tools in the reviewed papers, relying on default settings or default optimization of available machine learning toolboxes without an in-depth understanding of the statistical validity and appropriateness of such tools for text mining purposes.
- Adeva, J. J. G. and Calvo, R. 2006. Mining text with pimiento. IEEE Internet Computing. 10, 4 (2006), 27--35. Google ScholarDigital Library
- Baharudin, B., Lee, L. H. and Khan, K. 2010. A Review of Machine Learning Algorithms for Text-Documents Classification. Journal of Advances in Information Technology. 1, 1 (2010), 4--20.Google ScholarCross Ref
- Basili, V. R., Shull, F. and Lanubile, F. 1999. Building Knowledge through Families of Software Studies: Software Engineering, IEEE Transactions on. 25, 4 (1999), 456--473. Google ScholarDigital Library
- Bekhuis, T. and Demner-Fushman, D. 2012. Screening nonrandomized studies for medical systematic reviews: A comparative study of classifiers. Artificial Intelligence in Medicine. 55, 3 (2012), 197--207. Google ScholarDigital Library
- Bekhuis, T. and Demner-Fushman, D. 2010. Towards automating the initial screening phase of a systematic review. Studies in Health Technology and Informatics. 160, PART 1 (2010), 146--150.Google Scholar
- Bekhuis, T., Tseytlin, E., Mitchell, K. J. and Demner-Fushman, D. 2014. Feature engineering and a proposed decision-support system for systematic reviewers of medical evidence. PLoS ONE. 9, 1 (2014), 1--10.Google ScholarCross Ref
- Bowes, D., Hall, T. and Beecham, S. 2012. SLuRp: A Tool to Help Large Complex Systematic Literature Reviews Deliver Valid and Rigorous Results. Proceedings of the 2nd international workshop on Evidential assessment of software technologies - EAST '12. (2012), 33--36. Google ScholarDigital Library
- Carver, J. C., Hassler, E., Hernandes, E. and Kraft, N. A. 2013. Identifying Barriers to the Systematic Literature Review Process. 2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (2013), 203--212.Google ScholarCross Ref
- Choi, S., Ryu, B., Yoo, S. and Choi, J. 2012. Combining relevancy and methodological quality into a single ranking for evidence-based medicine. Information Sciences. 214, (2012), 76--90. Google ScholarDigital Library
- Cohen, Aaron M. Ambert, K. and McDonagh, M. 2010. A Prospective Evaluation of an Automated Classification System to Support Evidence-based Medicine and Systematic Review. AMIA Annual Symposium Proceedings. 2010, (2010), 121--125.Google Scholar
- Cohen, A. M. 2006. An effective general purpose approach for automated biomedical document classification. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium (2006), 161--165.Google Scholar
- Cohen, A. M. 2008. Optimizing feature representation for automated systematic review work prioritization. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium (2008), 121--125.Google Scholar
- Cohen, A. M. 2011. Performance of support-vector-machine-based classification on 15 systematic review topics evaluated with the WSS@95 measure. Journal of the American Medical Informatics Association: JAMIA. 18, 1 (2011), 104; author reply 104--105.Google ScholarCross Ref
- Cohen, A. M., Ambert, K. and McDonagh, M. 2010. A Prospective Evaluation of an Automated Classification System to Support Evidence-based Medicine and Systematic Review. AMIA Annual Symposium Proceedings (2010), 121--125.Google Scholar
- Cohen, A. M., Ambert, K. and McDonagh, M. 2009. Cross-Topic Learning for Work Prioritization in Systematic Review Creation and Update. Journal of the American Medical Informatics Association. 16, 5 (2009), 690--704.Google ScholarCross Ref
- Cohen, A. M., Ambert, K. and McDonagh, M. 2012. Studying the potential impact of automated document classification on scheduling a systematic review update. BMC Medical Informatics and Decision Making. 12, 1 (2012), 33.Google ScholarCross Ref
- Cohen, A. M., Hersh, W. R., Peterson, K. and Yen, P. Y. 2006. Reducing workload in systematic review preparation using automated citation classification. Journal of the American Medical Informatics Association. 13, 2 (2006), 206--219.Google ScholarCross Ref
- Dalal, S. R., Shekelle, P. G., Hempel, S., Newberry, S. J., Motala, A. and Shetty, K. D. 2012. A Pilot Study Using Machine Learning and Domain Knowledge to Facilitate Comparative Effectiveness Review Updating. Medical Decision Making. (2012), 1--13.Google Scholar
- Dybå, T., Dingsøyr, T. and Hanssen, G. K. 2007. Applying systematic reviews to diverse study types: An experience report. Proceedings - 1st International Symposium on Empirical Software Engineering and Measurement, ESEM 2007 (2007), 225--234. Google ScholarDigital Library
- Felizardo, K. R., Andery, G. F., Paulovich, F. V., Minghim, R. and Maldonado, J. C. 2012. A visual analysis approach to validate the selection review of primary studies in systematic reviews. Information and Software Technology. 54, 10 (2012), 1079--1091. Google ScholarDigital Library
- Felizardo, K. R., Nakagawa, E. Y., Feitosa, D., Minghim, R., Mapping, S., Mining, V. T. and Maldonado, J. C. 2010. An approach based on visual text mining to support categorization and classification in the systematic mapping. Proc. of EASE (2010), 1--10. Google ScholarDigital Library
- Felizardo, K. R., Riaz, M., Sulayman, M., Mendes, E., MacDonell, S. G. and Maldonado, J. C. 2011. Analysing the use of graphs to represent the results of systematic reviews in software engineering. Proceedings - 25th Brazilian Symposium on Software Engineering, SBES 2011 (2011), 174--183. Google ScholarDigital Library
- Felizardo, K. R., Salleh, N., Martins, R. M., Mendes, E., MacDonell, S. G. and Maldonado, J. C. 2011. Using Visual Text Mining to Support the Study Selection Activity in Systematic Literature Reviews. 2011 International Symposium on Empirical Software Engineering and Measurement. (2011), 77--86. Google ScholarDigital Library
- Felizardo, K. R., Salleh, N., Martins, R. M., Mendes, E., MacDonell, S. G. and Maldonado, J. C. 2011. Using Visual Text Mining to Support the Study Selection Activity in Systematic Literature Reviews. 2011 International Symposium on Empirical Software Engineering and Measurement (2011), 77--86. Google ScholarDigital Library
- Felizardo, K. R., Souza, S. R. S. and Maldonado, J. C. 2013. The use of visual text mining to support the study selection activity in systematic literature reviews: A replication study. Proceedings - 2013 3rd International Workshop on Replication in Empirical Software Engineering Research, RESER 2013. (2013), 91--100. Google ScholarDigital Library
- Fernández-Sáez, A. M., Bocco, M. G. and Romero, F. P. 2010. SLR-Tool a tool for performing systematic literature reviews. ICSOFT 2010 - Proceedings of the 5th International Conference on Software and Data Technologies. 2, September 2015 (2010), 157--166.Google Scholar
- Fiszman, M., Bray, B. E., Shin, D., Kilicoglu, H., Bennett, G. C., Bodenreider, O. and Rindflesch, T. C. 2010. Combining relevance assignment with quality of the evidence to support guideline development. Studies in Health Technology and Informatics. 160, PART 1 (2010), 709--713.Google Scholar
- Fiszman, M., Ortiz, E., Bray, B. E. and Rindflesch, T. C. 2008. Semantic processing to support clinical guideline development. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium (2008), 187--191.Google Scholar
- Frunza, O., Inkpen, D. and Matwin, S. 2010. Building systematic reviews using automatic text classification techniques. Proceedings of the 23rd International Conference on Computational Linguistics: Posters (2010), 301--311. Google ScholarDigital Library
- Frunza, O., Inkpen, D., Matwin, S., Klement, W. and O'Blenis, P. 2011. Exploiting the systematic review protocol for classification of medical abstracts. Artificial Intelligence in Medicine. 51, 1 (2011), 17--25. Google ScholarDigital Library
- García Adeva, J. J., Pikatza Atxa, J. M., Ubeda Carrillo, M. and Ansuategi Zengotitabengoa, E. 2014. Automatic text classification to support systematic reviews in medicine. Expert Systems with Applications. 41, 4 PART 1 (2014), 1498--1508. Google ScholarDigital Library
- Gentleman, R. and Lang, D. T. 2004. Statistical Analyses and Reproducible Research. Journal of Computational and Graphical Statistics. 5, 1 (2004), 38.Google Scholar
- Ghafari, M., Saleh, M. and Ebrahimi, T. 2012. A federated search approach to facilitate systematic literature review in software engineering. International Journal of Software Engineering Applications. 3, 2 (2012), 13--24.Google ScholarCross Ref
- González-Barahona, J. M. and Robles, G. 2011. On the reproducibility of empirical software engineering studies based on data retrieved from development repositories. Empirical Software Engineering. 17, 1--2 (2011), 75--89. Google ScholarDigital Library
- Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P. and Witten, I. H. 2009. The WEKA data mining software: an update. ACM SIGKDD explorations newsletter. 11, 1 (2009), 10--18. Google ScholarDigital Library
- Handbook for Systematic Reviews of Interventions: 2015. http://handbook.cochrane.org/. Accessed: 2015-09-10.Google Scholar
- Hassler, E., Carver, J. C., Kraft, N. A. and Hale, D. 2014. Outcomes of a community workshop to identify and rank barriers to the systematic literature review process. Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering - EASE '14 (2014), 1--10. Google ScholarDigital Library
- Hernandes, E., Zamboni, A., Fabbri, S., Carlos, S., Thommazo, A. Di and Carlos, S. 2012. Using GQM and TAM to evaluate StArt-a tool that supports Systematic Review. CLEI Electronic Journal. 15, 1 (2012), 3.Google ScholarCross Ref
- Ikonomakis, M., Kotsiantis, S. and Tampakas, V. 2005. Text classification using machine learning techniques. WSEAS Transactions on Computers. 4, 8 (2005), 966--974.Google Scholar
- Jonnalagadda, S. and Petitti, D. 2013. A New Iterative Method to Reduce Workload in the Systematic Review Process. International journal of computational biology and drug design. 6, 0 (Feb. 2013), 5--17.Google Scholar
- Jonnalagadda, S., Petitti, D. and Manuscript, A. 2013. A New Iterative Method to Reduce Workload in the Systematic Review Process. International journal of computational biology and drug design. 6, 0 (Feb. 2013), 5--17.Google Scholar
- Jonnalagadda, S. R., Goyal, P. and Huffman, M. D. 2015. Automating data extraction in systematic reviews: a systematic review. Systematic Reviews. 4, 1 (2015), 78.Google ScholarCross Ref
- Kim, S. and Choi, J. 2012. Improving the performance of text categorization models used for the selection of high quality articles. Healthcare Informatics Research. 18, 1 (2012), 18--28.Google ScholarCross Ref
- Kitchenham, B. and Brereton, P. 2013. A systematic review of systematic review process research in software engineering. Information and Software Technology. 55, 12 (2013), 2049--2075. Google ScholarDigital Library
- Kitchenham, B. and Charters, S. 2007. Guidelines for performing Systematic Literature Reviews in Software Engineering. Technical report, Keele University and Durham University Joint Report. EBSE Technical Report. EBSE. Ver. 2.3.Google Scholar
- Korde, V. and Mahender, C. N. 2012. Text Classification and Classifiers: A Survey. International Journal of Artificial Intelligence & Applications. 3, 2 (2012), 85--99.Google ScholarCross Ref
- Kotsiantis, S., Zaharakis, I. and Pintelas, P. 2007. Supervised machine learning: A review of classification techniques. Informatica. 31, (2007), 249--268.Google Scholar
- Kouznetsov, A. and Japkowicz, N. 2010. Using classifier performance visualization to improve collective ranking techniques for biomedical abstracts classification. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 6085 LNAI, (2010), 299--303. Google ScholarDigital Library
- Kouznetsov, A., Matwin, S., Inkpen, D., Razavi, A. H., Frunza, O., Sehatkar, M., Seaward, L. and O'Blenis, P. 2009. Classifying biomedical abstracts using committees of classifiers and collective ranking techniques. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 5549 LNAI, (2009), 224--228. Google ScholarDigital Library
- Kumar, A. A. 2012. Text Data Pre-processing and Dimensionality Reduction Techniques for Document Clustering. International Journal of Engineering Research & Technology (IJERT). 1, 5 (2012), 1--6.Google Scholar
- Ma, Y. 2007. Text classification on imbalanced data: Application to Systematic Reviews Automation. University of Ottawa.Google Scholar
- Marshall, C. and Brereton, P. 2013. Tools to Support Systematic Literature Reviews in Software Engineering: A Mapping Study. 2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (2013), 296--299.Google ScholarCross Ref
- Marshall, C., Brereton, P. and Kitchenham, B. 2014. Tools to Support Systematic Reviews in Software Engineering: A Feature Analysis. Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering (2014), 139--148. Google ScholarDigital Library
- Martinez, D., Karimi, S., Cavedon, L. and Baldwin, T. 2008. Facilitating Biomedical Systematic Reviews Using Ranked Text Retrieval and Classification. Australasian Document Computing Symposium ADCS. December (2008), 53--60.Google Scholar
- Martinez, D., Karimi, S., Cavedon, L. and Baldwin, T. 2008. Facilitating biomedical systematic reviews using ranked text retrieval and classification. Australasian Document Computing Symposium (ADCS) (2008), 53--60.Google Scholar
- Matwin, S., Kouznetsov, A., Inkpen, D., Frunza, O. and O'Blenis, P. 2010. A new algorithm for reducing the workload of experts in performing systematic reviews. Journal of the American Medical Informatics Association: JAMIA. 17, 4 (2010), 446--453.Google ScholarCross Ref
- Matwin, S., Kouznetsov, A., Inkpen, D., Frunza, O. and O'Blenis, P. 2011. Performance of SVM and Bayesian classifiers on the systematic review classification task. Journal of the American Medical Informatics Association. 18, 1 (2011), 104--105.Google ScholarCross Ref
- Matwin, S. and Sazonova, V. 2012. Direct comparison between support vector machine and multinomial naive Bayes algorithms for medical abstract classification. Journal of the American Medical Informatics Association. 19, 5 (2012), 917--917.Google ScholarCross Ref
- Miller, J. 2005. Replicating software engineering experiments: A poisoned chalice or the Holy Grail. Information and Software Technology. 47, 4 (2005), 233--244. Google ScholarDigital Library
- Miwa, M., Thomas, J., O'Mara-Eves, A. and Ananiadou, S. 2014. Reducing systematic review workload through certainty-based screening. Journal of Biomedical Informatics. 51, (2014), 242--253. Google ScholarDigital Library
- Molléri, J. S. and Benitti, F. B. V. 2015. SESRA: a web-based automated tool to support the systematic literature review process. Proceedings of the 19th International Conference on Evaluation and Assessment in Software Engineering (2015), 24. Google ScholarDigital Library
- O Mara-Eves, A., Thomas, J., McNaught, J., Miwa, M. and Ananiadou, S. 2015. Using text mining for study identification in systematic reviews: a systematic review of current approaches. Systematic reviews. 4, 1 (2015), 1--22.Google Scholar
- Olorisade, B. K. 2012. Informal Aggregation Technique for Software Engineering Experiments. IJCSI International Journal of Computer Science Issues. 9, (2012), 199--204.Google Scholar
- Olorisade, B. K., Vegas, S. and Juristo, N. 2013. Determining the effectiveness of three software evaluation techniques through informal aggregation. Information and Software Technology. 55, 9 (2013), 1590--1601.Google ScholarCross Ref
- Paulovich, F. V. and Minghim, R. 2006. Text Map Explorer: A tool to create and explore document maps. Proceedings of the International Conference on Information Visualisation. (2006), 245--251. Google ScholarDigital Library
- Rathbone, J., Hoffmann, T. and Glasziou, P. 2015. Faster title and abstract screening? Evaluating Abstrackr, a semi-automated online screening program for systematic reviewers. Systematic Reviews. 4, 1 (2015), 80.Google ScholarCross Ref
- Razavi, A. H., Matwin, S., Inkpen, D. and Kouznetsov, A. 2009. Parameterized contrast in second order soft co-occurrences: A novel text representation technique in text mining and knowledge extraction. ICDM Workshops 2009 - IEEE International Conference on Data Mining (2009), 471--476. Google ScholarDigital Library
- Sebastiani, F. 2002. Machine Learning in Automated Text Categorization. ACM computing surveys (CSUR). 34, 1 (2002), 1--47. Google ScholarDigital Library
- Shemilt, I., Simon, A., Hollands, G. J., Marteau, T. M., Ogilvie, D., O'Mara-Eves, A., Kelly, M. P. and Thomas, J. 2013. Pinpointing needles in giant haystacks: use of text mining to reduce impractical screening workload in extremely large scoping reviews. Research Synthesis Methods. January (2013), n/a--n/a.Google Scholar
- Shull, F., Mendoncça, M. G., Basili, V., Carver, J., Maldonado, J. C., Fabbri, S., Travassos, G. H. and Ferreira, M. C. 2004. Knowledge-Sharing Issues in Experimental Software Engineering. Empirical Software Engineering. 9, (2004), 111--137. Google ScholarDigital Library
- Shull, F. J., Carver, J. C., Vegas, S. and Juristo, N. 2008. The role of replications in empirical software engineering. Empirical Software Engineering. 13, 2 (2008), 211--218. Google ScholarDigital Library
- Sun, Y., Yang, Y., Zhang, H., Zhang, W. and Wang, Q. 2012. Towards evidence-based ontology for supporting Systematic Literature Review. 16th International Conference on Evaluation & Assessment in Software Engineering (EASE 2012), (2012), 171--175.Google Scholar
- Thomas, J. and O'Mara-Eves, A. 2011. How can we find relevant research more quickly? NCRM Newsletter: MethodsNews. (2011).Google Scholar
- Tomassetti, F., Rizzo, G., Vetro, A., Ardito, L., Torchiano, M. and Morisio, M. 2011. Linked data approach for selection process automation in systematic reviews. 15th Annual Conference on Evaluation and Assessment in Software Engineering (EASE '11), (2011), 31--35.Google ScholarCross Ref
- Vegas, S., Juristo, N., Moreno, A., Solari, M. and Letelier, P. 2006. Analysis of the influence of communication between researchers on experiment replication. International Symposium on Empirical Software Engineering (2006), 28. Google ScholarDigital Library
- Wallace, B. and Small, K. 2011. Who should label what? Instance allocation in multiple expert active learning. Proceedings of the SDM (2011), 176--187.Google Scholar
- Wallace, B. C., Small, K., Brodley, C. E., Lau, J., Schmid, C. H., Bertram, L., Lill, C. M., Cohen, J. T. and Trikalinos, T. A. 2012. Toward modernizing the systematic review pipeline in genetics: efficient updating via data mining. Genetics in Medicine. 14, 7 (2012), 663--669.Google ScholarCross Ref
- Wallace, B. C., Small, K., Brodley, C. E., Lau, J. and Trikalinos, T. A. 2012. Deploying an interactive machine learning system in an evidence-based practice center. Proceedings of the 2nd ACM SIGHIT symposium on International health informatics - IHI '12 (2012), 819. Google ScholarDigital Library
- Wallace, B. C., Small, K., Brodley, C. E., Lau, J. and Trikalinos, T. A. 2010. Modeling Annotation Time to Reduce Workload in Comparative Effectiveness Reviews Categories and Subject Descriptors Active Learning to Mitigate Workload. Proceedings of the 1st ACM International Health Informatics Symposium. ACM, (2010), 28--35. Google ScholarDigital Library
- Wallace, B. C., Small, K., Brodley, C. E. and Trikalinos, T. A. 2010. Active Learning for Biomedical Citation Screening. Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining (2010), 173--182. Google ScholarDigital Library
- Wallace, B. C., Trikalinos, T. A., Lau, J., Brodley, C. and Schmid, C. H. 2010. Semi-automated screening of biomedical citations for systematic reviews. BMC bioinformatics. 11, (2010), 55.Google Scholar
- Yu, W., Clyne, M., Dolan, S. M., Yesupriya, A., Wulf, A., Liu, T., Khoury, M. J. and Gwinn, M. 2008. GAPscreener: an automatic tool for screening human genetic association literature in PubMed using the support vector machine technique. BMC bioinformatics. 9, (2008), 205.Google Scholar
Index Terms
- A critical analysis of studies that address the use of text mining for citation screening in systematic reviews
Recommendations
Screening nonrandomized studies for medical systematic reviews: A comparative study of classifiers
Objectives: To investigate whether (1) machine learning classifiers can help identify nonrandomized studies eligible for full-text screening by systematic reviewers; (2) classifier performance varies with optimization; and (3) the number of citations to ...
Reproducibility of studies on text mining for citation screening in systematic reviews
Display Omitted Studies on the automation of CS in SRs are hard to reproduce due to datasets access.Experimental process information to support reproducibility are often unreported.Reproduction is possible with different levels of difficulty specific to ...
Chinese text classification by the Naïve Bayes Classifier and the associative classifier with multiple confidence threshold values
Each type of classifier has its own advantages as well as certain shortcomings. In this paper, we take the advantages of the associative classifier and the Naive Bayes Classifier to make up the shortcomings of each other, thus improving the accuracy of ...
Comments