skip to main content
article

Editorial: special issue on learning from imbalanced data sets

Published:01 June 2004Publication History
First page image

References

  1. In N. Japkowicz, editor, Proceedings of the AAAI'2000 Workshop on Learning from Imbalanced Data Sets, AAAI Tech Report WS-00-05. AAAI, 2000.]]Google ScholarGoogle Scholar
  2. In T. Dietterich, D. Margineantu, F. Provost, and P. Turney, editors, Proceedings of the ICML'2000 Workshop on Cost-sensitive Learning. 2000.]]Google ScholarGoogle Scholar
  3. In N. V. Chawla, N. Japkowicz, and A. Kotcz, editors, Proceedings of the ICML'2003 Workshop on Learning from Imbalanced Data Sets. 2003.]]Google ScholarGoogle Scholar
  4. In C. Ferri, P. Flach, J. Orallo, and N. Lachice, editors, ECAI' 2004 First Workshop on ROC Analysis in AI. ECAI, 2004.]]Google ScholarGoogle Scholar
  5. N. Abe. Invited talk: Sampling approaches to learning from imbalanced datasets: active learning, cost sensitive learning and beyond. http://www.site.uottawa.ca/~nat/Workshop2003/ICML03Workshop_Abe.ppt, 2003.]]Google ScholarGoogle Scholar
  6. G. E. A. P. A. Batista, R. C. Prati, and M. C. Monard. A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explorations, 6(1):20--29, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Castillo and J. Serrano. A multistrategy approach for digital text categorization from imbalanced documents. SIGKDD Explorations, 6(1):70--79, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. K. Chan and S. J. Stolfo. Toward scalable learning with non-uniform class and cost distributions: A case study in credit card fraud detection. In Proceedings of Knowledge Discovery and Data Mining, pages 164--168, 1998.]]Google ScholarGoogle Scholar
  9. N. V. Chawla. C4.5 and imbalanced datasets: Investigating the effect of ampling method, probabilistic estimate, and decision tree structure. In Proceedings of the ICML'03 Workshipshop on Class Imbalances, 2003.]]Google ScholarGoogle Scholar
  10. N. V. Chawla, L. O. Hall, K. W. Bowyer, and W. P. Kegelmeyer. SMOTE: Synthetic Minority Oversampling Technique. Journal of Artificial Intelligence Research, 16:321--357, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. N. V. Chawla, A. Lazarevic, L. O. Hall, and K. W. Bowyer. Smoteboost: Improving prediction of the minority class in boosting. In Proceedings of the Seventh European Conference on Principles and Practice of Knowledge Discovery in Databases, pages 107--119, Dubrovnik, Croatia, 2003.]]Google ScholarGoogle ScholarCross RefCross Ref
  12. P. Domingos, Metacost: A general method for making classifiers cost-sensitive. In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 155--164, San Diego, CA, 1999, ACM Press.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. Drummond and R. Holte. Explicitly representing expected cost: An alternative to ROC representation. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 198--207, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. Drummond and R. Holte. C4.5, class imbalance, and cost sensitivity: Why under-sampling beats over-sampling. In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Data Sets, 2003.]]Google ScholarGoogle Scholar
  15. C. Elkan. The foundations of cost-sensitive learning. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, pages 973--978, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Elkan. Invited talk: The real challenges in data mining: A contrarian view. http://www.site.uottawa.ca/~nat/Workshop2003/realchallenges2.ppt., 2003.]]Google ScholarGoogle Scholar
  17. W. Fan, S. Stolfo, J. Zhang, and P. Chan. Adacost: Misclassification cost-sensitive boosting. In Proceedings of Sixteenth International Conference on Machine Learning, pages 983--990, Slovenia, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. Fawcett, ROC graphs: Notes and practical considerations for researchers. http://www.hpl.hp.com/personal/Tom_Fawcett/papers/index.html, 2003.]]Google ScholarGoogle Scholar
  19. G. Forman. An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research, 3:1289--1305, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Furnkranz and P. Flach. An analysis of rule evaluation metrics. In Proceedings of the Twentieth International Conference on Machine Learning, pages 202--209, 2003.]]Google ScholarGoogle Scholar
  21. H. Guo and H. L. Viktor. Learning from imbalanced data sets with boosting and data generation: The DataBoost-IM approach. SIGKDD Explorations, 6(1):30--39, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. I. Guyon and A. Elisseeff. An introduction to variable and feature selection. Journal of Machine Learning Research, 3:1157--1182, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. Hickey. Learning rare class footprints: the reflex algorithm. In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Data Sets, 2003.]]Google ScholarGoogle Scholar
  24. R. Holte. Summary of the workshop. http://www.site./uottawa.ca/~nat/Workshop2003/workshop2003.html, 2003.]]Google ScholarGoogle Scholar
  25. N. Japkowicz. Concept-learning in the presence of between-class and within-class imbalances. In Proceedings of the Fourteenth Conference of the Canadian Society for Computational Studies of Intelligence, pages 67--77, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. N. Japkowicz. Supervised versus unsupervised binary-learning by feedforward neural networks. Machine Learning, 42(1/2):97--122, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. N. Japkowics, Class imbalance: Are we focusing on the right issue? In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Data Sets, 2003.]]Google ScholarGoogle Scholar
  28. N. Japkowicz and R. Holte. Workshop report: Aaai-2000 workshop on learning from imbalanced data sets. AI Magazine, 22(1), 2001.]]Google ScholarGoogle Scholar
  29. N. Japkowics and S. Stephen. The class imbalance problem: A systematic study. Intelligent Data Analysis, 6(5):203--231, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. T. Jo and N. Japkowicz. Class imbalances versus small disjuncts. SIGKDD Explorations, 6(1):40--49, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M. Joshi, V. Kumar, and R. Agarwal. Evaluating boosting algorithms to classify rare classes: Comparison and improvements. In Proceedings of the First IEEE International Conference on Data Mining, pages 257--264, San Jose, CA, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. P. Juszczak and R. P. W. Duin. Uncertainty sampling methods for one-class classifiers. In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Data Sets, 2003.]]Google ScholarGoogle Scholar
  33. A. Kolcz and J. Alspector. Asymmetric missing-data problems: overcoming the lack of negative data in preference ranking. Information Retrieval, 5(1):5--40, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. A. Kotcz, A. Chowdhury, and J. Alspector. Data duplication: An imbalance problem? In Proceedings of the ICML'2003 Workshop on Learning from Imbalanced Datasets, 2003.]]Google ScholarGoogle Scholar
  35. M. Kubat and S. Matwin. Addressing the curse of imbalanced training sets: One sided selection. In Proceedings of the Fourteenth International Conference on Machine Learning, pages 179--186, Nashville, Tennesse, 1997, Morgan Kaufmann.]]Google ScholarGoogle Scholar
  36. B. Liu, Y. Dai, X. Li, W. S. Lee, and P. Yu. Building text classifiers using positive and unlabeled examples. In Proceedings of the Third IEEE International Conference on Data Mining, pages 19--22, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. Maloof. Learning when data sets are imbalanced and when costs are unequal and unknown. In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Data Sets, 2003.]]Google ScholarGoogle Scholar
  38. L. M. Manevitz and M. Yousef. One-class SVMs for document classification. Journal of Machine Learning Research, 2:139--154, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. D. Mladenic and M. Grobelnik. Feature selection for unbalanced class distribution and naive bayes. In Proceedings of the 16th International Conference on Machine Learning, pages 258--267, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. K. Nigam, A. K. McCallum, s. Thrun, and T. Mitchell. Text classification from labeled and unlabeled documents using EM. Machine Learning, 39:103--134, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. R. Pearson, G. Goney, and J. Shwaber. Imbalanced clustering for microarray time-series. In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Data Sets, 2003.]]Google ScholarGoogle Scholar
  42. C. Phua and D. Alahakoon. Minority report in fraud detection: Classification of skewed data. SIGKDD Explorations, 6(1):50--59, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. F. Provost. Invited talk: Choosing a marginal class distribution for classifier induction. http://www.site.uottawa.ca/~nat/Workshop2003/provost.html, 2003.]]Google ScholarGoogle Scholar
  44. F. Provost and T. Fawcett. Robust classification for imprecise environments. Machine Learning, 42:203--231, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. J. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. P. Radivojac, N. V. Chawla, K. Dunker, and Z. Obradovic. Classification and knowledge discovery in protein databases. Journal of Biomedical Informatics, 2004. Accepted.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. B. Raskutti and A. Kowalczyk. Extreme re-balancing for SVM's: a case study. In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Data Sets, 2003.]]Google ScholarGoogle Scholar
  48. B. Raskutti and A. Kowalczyk. Extreme rebalancing for svms: a case study. SIGKDD Explorations, 6(1):60--69, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. B. Schölkopf, J. C. Platt, J. Shawe-Taylor. A. J. Smola, and R. C. Williamson. Estimating the support of a high-dimensional distribution. Neural Computation, 13(7):1443--1472, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. D. Tax. One-class classification. PhD thesis, Delft University of Technology, 2001.]]Google ScholarGoogle Scholar
  51. K. M. Ting. A comparative study of cost-sensitive boosting algorithms. In Proceedings of Seventeenth International Conference on Machine Learning, pages 983--990, Stanford, CA, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. K. M. Ting. An instance-weighting method to induce cost-sensitive trees. IEEE Transaction on Knowledge and Data Engineering. 14:659--665, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. P. Turney. Types of cost in inductive concept learning. In Proceedings of the ICML'2000 Workshop on Cost-Sensitive Learning, pages 15--21, 2000.]]Google ScholarGoogle Scholar
  54. S. Visa and A. Ralescu. Learning imbalanced and overlapping classes using fuzzy sets. In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Data Sets, 2003.]]Google ScholarGoogle Scholar
  55. G. Weiss. Mining with rarity: A unifying framework. SIGKDD Explorations, 6(1):7--19, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. G. Weiss and F. Provost. Learning when training data are costly: The effect of class distribution on tree induction. Journal of Artificial Intelligence Research, 19:315--354, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. G. Wu and E. Y. Chang. Class-boundary alignment for imbalanced dataset learning. In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Data Sets, 2003.]]Google ScholarGoogle Scholar
  58. B. Zadrozny and C. Elkan. Learning and making decisions when costs and probabilities are both unknown. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 204--213, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. B. Zadrozny, J. Langford, and N. Abe. Cost-sensitive learning by cost-proportionate example weighting. In Proceedings of the Third IEEE International Conference on Data Mining, pages 435--442, Melbourne, FL, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. J. Zhang and I. Mani. knn approach to unbalanced data distributions: A case study involving information extraction. In Proceedings of the ICML'2003 Workshop on Learning from Imbalanced Datasets, 2003.]]Google ScholarGoogle Scholar
  61. Z. Zheng and R. Srihari. Optimally combining positive and negative features for text categorization. In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Date Sets, 2003.]]Google ScholarGoogle Scholar
  62. Z. Zheng, X. Wu, and R. Srihari. Feature selection for text categorization on imbalanced data. SIGKDD Explorations, 6(1):80--89, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Editorial: special issue on learning from imbalanced data sets
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGKDD Explorations Newsletter
        ACM SIGKDD Explorations Newsletter  Volume 6, Issue 1
        Special issue on learning from imbalanced datasets
        June 2004
        117 pages
        ISSN:1931-0145
        EISSN:1931-0153
        DOI:10.1145/1007730
        Issue’s Table of Contents

        Copyright © 2004 Authors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 June 2004

        Check for updates

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader