Article

Free Access

A re-examination of text categorization methods

Authors:
Yiming Yang

School of Computer Science, Carnegie Mellon University, Pittsburgh, PA

School of Computer Science, Carnegie Mellon University, Pittsburgh, PA
View Profile

,
Xin Liu

School of Computer Science, Carnegie Mellon University, Pittsburgh, PA

School of Computer Science, Carnegie Mellon University, Pittsburgh, PA
View Profile

SIGIR '99: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrievalAugust 1999Pages 42–49https://doi.org/10.1145/312624.312647

Published:01 August 1999Publication History

SIGIR '99: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval

Pages 42–49

References

1.C. Apte, N. Damerau, and S. Weiss. Towards language independent automated learning of text categorization models. In Proceedings of the 17th Annual A CM/SIGIR conference, 1994. Google ScholarDigital Library
2.C. Apte, F. Damerau, and S. Weiss. Text mining with decision rules and decision trees. In Proceedings of the Conference on Automated Learning and Discorery, Workshop 6: Learning from Text and the Web, 1998.Google Scholar
3.L. Douglas Baker and Andrew K. Mccallum. Distributional clustering of words for text categorization. In Proceedings of the 21th Ann Int A CM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'98), pages 96-103, 1998. Google ScholarDigital Library
4.D. Berry and B.W. Lindgren. Statistics: Theory and Methods. Brooks/Cole, Pacific Grove, California, 1990.Google Scholar
5.William W. Cohen. Text categorization and relational learning. In The Twelfth International Conference on Machine Learning (ICML'95). Morgan Kaufmann, 1995.Google ScholarDigital Library
6.William W. Cohen and Yoram Singer. Context-sensitive learning methods for text categorization. In SIGIR '96: Proceedings of the 19th Annual International A CM SIGIR Conference on Research and Development in Information Retrieval, 1996. 307-315. Google ScholarDigital Library
7.C. Cortes and V. Vapnik. Support vector networks. Machine Learning, 20:273-297, 1995. Google ScholarDigital Library
8.Belur V. Dasarathy. Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques. McGraw-Hill Computer Science Series. IEEE Computer Society Press, Las Alamitos, California, 1991.Google Scholar
9.N. Fuhr, S. Hartmanna, G. Lustig, M. Schwantner, and K. Tzeras. Air/x - a rule-based multistage indexing systems for large subject fields. In 606-623, editor, Proceedings of RIAO'91, 1991.Google Scholar
10.P.J. Hayes and S. P. Weinstein. Construe/tis: a system for content-based indexing of a database of new stories. In Second Annual Conference on Innovative Applications of ArtificiaI Intelligence, 1990. Google ScholarDigital Library
11.Makato Iwayama and Takenobu Tokunaga. Cluster-based text categorization: a comparison of category search strategies. In Proceedings of the 18th Ann Int A CM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'95), pages 273-281, 1995. Google ScholarDigital Library
12.Thorsten Joachims. Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In European Conference on Machine Learning (ECML), 1998. Google ScholarDigital Library
13.D. Koller and M. Sahami. Hierarchically classifying documents using very few words. In The Fourteenth International Conference on Machine Learning (ICML'97), pages 170-178, 1997. Google ScholarDigital Library
14.W. Lam and C.Y. Ho. Using a generalized instance set for automatic text categorization. In Proceedings of the 21th Ann Int A CM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'98), pages 81-89, 1998. Google ScholarDigital Library
15.David D. Lewis, Robert E. Schapire, James P. Callan, and Ron Papka. Training algorithms for linear text classifiers. In SIGIR '96: Proceedings of the 19th Annual International A CM SIGIR Conference on Research and Development in Information Retrieval, 1996. 298-306. Google ScholarDigital Library
16.D.D. Lewis and M. Ringuette. Comparison of two learning algorithms for text categorization. In Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval (SDAIR'94), 1994.Google Scholar
17.B. Masand, G. Linoff, and D. Waltz. Classifying news stories using memory based reasoning. In 15th Ann Int A CM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'92), pages 59-64, 1992. Google ScholarDigital Library
18.A. McCallum and K. Nigam. A comparison of event models for naive bayes text classification. In AAAI-98 Workshop on Learning for Text Categorization, 1998.Google Scholar
19.Tom Mitchell. Machine Learning. McGraw Hill, 1996. Google ScholarDigital Library
20.I. Moulinier. Is learning bias an issue on the text categorization problem? In Technical report, LAFORIA-LIP6, Universite Paris VI, 1997.Google Scholar
21.I. Moulinier, G. Raskinis, and J. Ganascia. Text categorization: a symbolic approach. In Proceedings of the Fifth Annual Symposium on Document Analysis and Information Retrieval, 1996.Google Scholar
22.H.T. Ng, W.B. Goh, and K.L. Low. Feature selection, perceptron learning, and a usability case study for text categorization. In 20th Ann Int A CM SIGIR Conference on Research and Development in Information Retrieval (SI- GIR'97), pages 67-73, 1997. Google ScholarDigital Library
23.Osuna, R. Freund, and F. Girosi. Support vector machines: Training and applications. In A.L Memo. MIT A.I. Lab, 1996. Google ScholarDigital Library
24.J. Platt. Sequetial minimal optimization: A fast algorithm for training support vector machines. In Technical Report MST-TR-98-14. Microsoft Research, 1998.Google Scholar
25.K. Tzeras and S. Hartman. Automatic indexing based on bayesian inference networks. In Proc 16th Ann Int A CM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'93), pages 22-34, 1993. Google ScholarDigital Library
26.C.J. van Rijsbergen. Information Retrieval. Butterworths, London, 1979. Google ScholarDigital Library
27.V. Vapnic. The Nature of Statistical Learning Theory. Springer, New York, 1995. Google ScholarDigital Library
28.E. Wiener, J.O. Pedersen, and A.S. Weigend. A neural network approach to topic spotting. In Proceedings of the Fourth Annual Symposium on Document Analysis and Information Retrieval (SDAIR'95), 1995.Google Scholar
29.Y. Yang. Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In 17th Ann Int A CM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'94), pages 13- 22, 1994. Google ScholarDigital Library
30.Y. Yang. Sampling strategies and learning efficiency in text categorization. In AAAI Spring Symposium on Machine Learning in Information Access, pages 88-95, 1996.Google Scholar
31.Y. Yang. An evaluation of statistical approaches to text categorization. Journal of Information Retrieval (to appear), 1999. Google ScholarDigital Library
32.Y. Yang and C.G. Chute. An example-based mapping method for text categorization and retrieval. A CM Transaction on Information Systems (TOIS), 12(3):252-277, 1994. Google ScholarDigital Library
33.Y. Yang and J.P. Pedersen. Feature selection in statistical learning of text categorization. In The Fourteenth International Conference on Machine Learning, pages 412-420, 1997. Google ScholarDigital Library

Index Terms

A re-examination of text categorization methods
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Classification and regression trees
2. Mathematics of computing
  1. Probability and statistics

Recommendations

Cross-lingual text categorization: Conquering language boundaries in globalized environments

Text categorization pertains to the automatic learning of a text categorization model from a training set of preclassified documents on the basis of their contents and the subsequent assignment of unclassified documents to appropriate categories. Most ...
Read More
Text categorization: past and present
Abstract
Automatic text categorization is the operation of sorting out the text documents into pre-defined text categories using some machine learning algorithms. Normally, it defines the most important approaches to organizing and making the use of a ...
Read More
Arabic Text Categorization Based on Arabic Wikipedia

This article describes an algorithm for categorizing Arabic text, relying on highly categorized corpus-based datasets obtained from the Arabic Wikipedia by using manual and automated processes to build and customize categories. The categorization ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '99: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
August 1999
339 pages
ISBN:1581130961
DOI:10.1145/312624
Chairmen:
Fredric Gey
Univ. of California
,
Marti Hearst
Univ. of California, Berkeley
,
Richard Tong
Tarragon Consulting Corp.
Copyright © 1999 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 August 1999
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
SIGIR '99 Paper Acceptance Rate33of135submissions,24%Overall Acceptance Rate792of3,983submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1,652
  Total Citations
  View Citations
- 9,182
  Total Downloads
- Downloads (Last 12 months)648
- Downloads (Last 6 weeks)73
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A re-examination of text categorization methods

SIGIR '99: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval

References

Cited By

Index Terms

Recommendations

Cross-lingual text categorization: Conquering language boundaries in globalized environments

Text categorization: past and present

Arabic Text Categorization Based on Arabic Wikipedia

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A re-examination of text categorization methods

SIGIR '99: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval

References

Cited By

Index Terms

Recommendations

Cross-lingual text categorization: Conquering language boundaries in globalized environments

Text categorization: past and present

Arabic Text Categorization Based on Arabic Wikipedia

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media