Abstract
The goal of the research described here is to develop a multistrategy classifier system that can be used for document categorization. The system automatically discovers classification patterns by applying several empirical learning methods to different representations for preclassified documents belonging to an imbalanced sample. The learners work in a parallel manner, where each learner carries out its own feature selection based on evolutionary techniques and then obtains a classification model. In classifying documents, the system combines the predictions of the learners by applying evolutionary techniques as well. The system relies on a modular, flexible architecture that makes no assumptions about the design of learners or the number of learners available and guarantees the independence of the thematic domain.
- Attardi G., Gulli A., Sebastiani F.: Automatic Web Page Categorization by Link and Content Analysis. Proceedings of THAI'99, European Symposium on Telematics, Hypermedia and Artificial Intelligence. Varese (1999) 105--119.Google Scholar
- Brank, J., Groblenik, M., Milic-Frayling, N., Mladenic, D.: Interaction of Feature Selection Methods and Linear Classification Models. Proceedings of the Nineteenth International Conference on Machine Learning (ICML'02). Sydney, Australia (2002).Google Scholar
- Castillo, Ma. D. del, Gasós, J., García-Alegre, M. C.: Genetic Processing of the Sensorial Information. Sensors & Actuators A, 37--38 (1993) 255--259.Google Scholar
- Castillo, Ma. D. del, Barrios, L. J.: Knowledge Acquisition from Batch Semiconductor Manufacturing Data. Intelligent Data Analysis IDA, 3, Elsevier Science Inc. (1999) 399--408.Google Scholar
- Castillo, Ma. D. del, Sesmero, P.: Perception and Representation in a Multistrategy Learning Process. Proceedings of Learning'00. Madrid (2000).Google Scholar
- Cohen, W.: Text categorization and relational learning. Proceedings of the Twelfth International Conference on Machine Learning. Lake Tahoe, California (1995) 124--132.Google Scholar
- Craven, M., DiPasquo, D., Freitag, D., McCallum, A., Mitchell, T., Nigam, K., Slattery, S.: Learning to Construct Knowledge Bases from the World Wide Web. Artificial Intelligence, 118(1--2) (2000) 69--113. Google ScholarDigital Library
- Doan, A., Domingos, P., Halevy, A.: Learning to Match the Schemas of Data Sources: A Multistrategy Approach. Machine Learning, Vol. 50 (2003) 279--301. Google ScholarDigital Library
- Dumais, S. T., Platt, J., Heckerman, D., and Sahami, M.: Inductive Learning Algorithms and Representation for Text Categorization. In CIKM-98: Proceedings of the Seventh International Conference on Information and Knowledge Management (1998) 148--155. Google ScholarDigital Library
- Freitag, D.: Multistrategy Learning for Information Extraction. Proceedings of the 15th International Conference on Machine Learning (1998) 161--169. Google ScholarDigital Library
- Goldberg, D.: Genetic Algorithms in Search, Optimization and Machine Learning. Addison Wesley (1989). Google ScholarDigital Library
- Grobelnik, M., Mladenic, D.: Efficient Text Categorization. Proceedings of the ECML-98 Text Mining Workshop (1998).Google Scholar
- John, G. H., Kohavi, R., Pfleger, K.: Irrelevant Features and the Subset Selection Problems. Proceedings of the 11th International Conference on Machine Learning (1994).Google Scholar
- Langdon, W. B., Buxton, B. F.: Genetic Programming for Combining Classifiers. Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001) (2001) 66--73.Google Scholar
- Lewis, D.: Feature selection and feature extraction for text categorization. Proceedings of Speech and Natural Language Workshop. Defense Advanced Research Projects Agency, Morgan Kaufmann, February (1992) 212--217. Google ScholarDigital Library
- Lewis, D., Ringuette, M.: A Comparison of Two Learning Algorithms for Text Categorization. Symposium on Document Analysis and IR, ISRI, April 11--13, Las Vegas (1994) 81--93.Google Scholar
- Michalski, R. S., Carbonell J. G., Mitchell T. M.: A theory and methodology of inductive learning. Machine Learning: An Artificial Intelligence Approach. Springer-Verlag (1983). Google ScholarDigital Library
- Mladenic, D.: Feature Subset Selection in Text-Learning. European Conference on Machine Learning (1998) 95--100. Google ScholarDigital Library
- Mladenic, D., Grobelnik, M.: Feature selection for classification based on text hierarchy. Working notes of Learning from Text and the Web, Conference on Automated Learning and Discovery CONALD-98 (1998).Google Scholar
- Mladenic, D., Grobelnik, M.: Feature selection for unbalanced class distribution and Naïve Bayes. Proceedings of the 16th International Conference on Machine Learning (ICML'99) (1999) 258--267. Google ScholarDigital Library
- Oliveira, L. S.: Feature Selection Using Multi-Objective Genetic Algorithms for Hand-written Digit Recognition, ICPR (2002). Google ScholarDigital Library
- Porter, M. F.: An algorithm for suffix stripping. Program, 14(3) (1980) 130--137.Google ScholarCross Ref
- Quinlan J. R.: C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann (1993). Google ScholarDigital Library
- Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys, Vol. 34, Number 1 (2002) 1--47. Google ScholarDigital Library
- Yang, Y., Pedersen, J. P.: A Comparative Study on Feature Selection in Text Categorization. Proceedings of the Fourteenth International Conference on Machine Learning (ICML'97) (1997) 412--420. Google ScholarDigital Library
- Yang, J. and Honavar, V.: Feature subset selection using a genetic algorithm. IEEE Intelligent Systems and their Applications. 13(2) (1998) 44--49. Google ScholarDigital Library
Index Terms
- A multistrategy approach for digital text categorization from imbalanced documents
Recommendations
Classification of Imbalanced Documents by Feature Selection
ICCDA '17: Proceedings of the International Conference on Compute and Data AnalysisWe previously worked on category classification problem of reuter 's newspaper article using SVM and feature selection. In the study, feature selection by SVM-score [Sakai, Hirokawa, 2012] showed high accuracy. It was also expected to be superior to ...
An effective feature selection approach driven genetic algorithm wrapped Bayes naïve
In this paper, an advanced novel feature selection FS algorithm is presented, the hybrid genetic algorithm GA with Bayes naïve BN, which selects the most relevant optimum feature subset to increase the classification accuracy performance and ...
A new rule-based knowledge extraction approach for imbalanced datasets
AbstractClassification consists of extracting a classifier from large datasets. A dataset is imbalanced if it contains more instances in one class compared to the others. An imbalanced dataset contains majority instances and minority ones. It is worth ...
Comments