- In N. Japkowicz, editor, Proceedings of the AAAI'2000 Workshop on Learning from Imbalanced Data Sets, AAAI Tech Report WS-00-05. AAAI, 2000.]]Google Scholar
- In T. Dietterich, D. Margineantu, F. Provost, and P. Turney, editors, Proceedings of the ICML'2000 Workshop on Cost-sensitive Learning. 2000.]]Google Scholar
- In N. V. Chawla, N. Japkowicz, and A. Kotcz, editors, Proceedings of the ICML'2003 Workshop on Learning from Imbalanced Data Sets. 2003.]]Google Scholar
- In C. Ferri, P. Flach, J. Orallo, and N. Lachice, editors, ECAI' 2004 First Workshop on ROC Analysis in AI. ECAI, 2004.]]Google Scholar
- N. Abe. Invited talk: Sampling approaches to learning from imbalanced datasets: active learning, cost sensitive learning and beyond. http://www.site.uottawa.ca/~nat/Workshop2003/ICML03Workshop_Abe.ppt, 2003.]]Google Scholar
- G. E. A. P. A. Batista, R. C. Prati, and M. C. Monard. A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explorations, 6(1):20--29, 2004.]] Google ScholarDigital Library
- M. Castillo and J. Serrano. A multistrategy approach for digital text categorization from imbalanced documents. SIGKDD Explorations, 6(1):70--79, 2004.]] Google ScholarDigital Library
- P. K. Chan and S. J. Stolfo. Toward scalable learning with non-uniform class and cost distributions: A case study in credit card fraud detection. In Proceedings of Knowledge Discovery and Data Mining, pages 164--168, 1998.]]Google Scholar
- N. V. Chawla. C4.5 and imbalanced datasets: Investigating the effect of ampling method, probabilistic estimate, and decision tree structure. In Proceedings of the ICML'03 Workshipshop on Class Imbalances, 2003.]]Google Scholar
- N. V. Chawla, L. O. Hall, K. W. Bowyer, and W. P. Kegelmeyer. SMOTE: Synthetic Minority Oversampling Technique. Journal of Artificial Intelligence Research, 16:321--357, 2002.]] Google ScholarDigital Library
- N. V. Chawla, A. Lazarevic, L. O. Hall, and K. W. Bowyer. Smoteboost: Improving prediction of the minority class in boosting. In Proceedings of the Seventh European Conference on Principles and Practice of Knowledge Discovery in Databases, pages 107--119, Dubrovnik, Croatia, 2003.]]Google ScholarCross Ref
- P. Domingos, Metacost: A general method for making classifiers cost-sensitive. In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 155--164, San Diego, CA, 1999, ACM Press.]] Google ScholarDigital Library
- C. Drummond and R. Holte. Explicitly representing expected cost: An alternative to ROC representation. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 198--207, 2001.]] Google ScholarDigital Library
- C. Drummond and R. Holte. C4.5, class imbalance, and cost sensitivity: Why under-sampling beats over-sampling. In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Data Sets, 2003.]]Google Scholar
- C. Elkan. The foundations of cost-sensitive learning. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, pages 973--978, 2001.]] Google ScholarDigital Library
- C. Elkan. Invited talk: The real challenges in data mining: A contrarian view. http://www.site.uottawa.ca/~nat/Workshop2003/realchallenges2.ppt., 2003.]]Google Scholar
- W. Fan, S. Stolfo, J. Zhang, and P. Chan. Adacost: Misclassification cost-sensitive boosting. In Proceedings of Sixteenth International Conference on Machine Learning, pages 983--990, Slovenia, 1999.]] Google ScholarDigital Library
- T. Fawcett, ROC graphs: Notes and practical considerations for researchers. http://www.hpl.hp.com/personal/Tom_Fawcett/papers/index.html, 2003.]]Google Scholar
- G. Forman. An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research, 3:1289--1305, 2003.]] Google ScholarDigital Library
- J. Furnkranz and P. Flach. An analysis of rule evaluation metrics. In Proceedings of the Twentieth International Conference on Machine Learning, pages 202--209, 2003.]]Google Scholar
- H. Guo and H. L. Viktor. Learning from imbalanced data sets with boosting and data generation: The DataBoost-IM approach. SIGKDD Explorations, 6(1):30--39, 2004.]] Google ScholarDigital Library
- I. Guyon and A. Elisseeff. An introduction to variable and feature selection. Journal of Machine Learning Research, 3:1157--1182, 2003.]] Google ScholarDigital Library
- R. Hickey. Learning rare class footprints: the reflex algorithm. In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Data Sets, 2003.]]Google Scholar
- R. Holte. Summary of the workshop. http://www.site./uottawa.ca/~nat/Workshop2003/workshop2003.html, 2003.]]Google Scholar
- N. Japkowicz. Concept-learning in the presence of between-class and within-class imbalances. In Proceedings of the Fourteenth Conference of the Canadian Society for Computational Studies of Intelligence, pages 67--77, 2001.]] Google ScholarDigital Library
- N. Japkowicz. Supervised versus unsupervised binary-learning by feedforward neural networks. Machine Learning, 42(1/2):97--122, 2001.]] Google ScholarDigital Library
- N. Japkowics, Class imbalance: Are we focusing on the right issue? In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Data Sets, 2003.]]Google Scholar
- N. Japkowicz and R. Holte. Workshop report: Aaai-2000 workshop on learning from imbalanced data sets. AI Magazine, 22(1), 2001.]]Google Scholar
- N. Japkowics and S. Stephen. The class imbalance problem: A systematic study. Intelligent Data Analysis, 6(5):203--231, 2002.]] Google ScholarDigital Library
- T. Jo and N. Japkowicz. Class imbalances versus small disjuncts. SIGKDD Explorations, 6(1):40--49, 2004.]] Google ScholarDigital Library
- M. Joshi, V. Kumar, and R. Agarwal. Evaluating boosting algorithms to classify rare classes: Comparison and improvements. In Proceedings of the First IEEE International Conference on Data Mining, pages 257--264, San Jose, CA, 2001.]] Google ScholarDigital Library
- P. Juszczak and R. P. W. Duin. Uncertainty sampling methods for one-class classifiers. In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Data Sets, 2003.]]Google Scholar
- A. Kolcz and J. Alspector. Asymmetric missing-data problems: overcoming the lack of negative data in preference ranking. Information Retrieval, 5(1):5--40, 2002.]] Google ScholarDigital Library
- A. Kotcz, A. Chowdhury, and J. Alspector. Data duplication: An imbalance problem? In Proceedings of the ICML'2003 Workshop on Learning from Imbalanced Datasets, 2003.]]Google Scholar
- M. Kubat and S. Matwin. Addressing the curse of imbalanced training sets: One sided selection. In Proceedings of the Fourteenth International Conference on Machine Learning, pages 179--186, Nashville, Tennesse, 1997, Morgan Kaufmann.]]Google Scholar
- B. Liu, Y. Dai, X. Li, W. S. Lee, and P. Yu. Building text classifiers using positive and unlabeled examples. In Proceedings of the Third IEEE International Conference on Data Mining, pages 19--22, 2003.]] Google ScholarDigital Library
- M. Maloof. Learning when data sets are imbalanced and when costs are unequal and unknown. In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Data Sets, 2003.]]Google Scholar
- L. M. Manevitz and M. Yousef. One-class SVMs for document classification. Journal of Machine Learning Research, 2:139--154, 2001.]] Google ScholarDigital Library
- D. Mladenic and M. Grobelnik. Feature selection for unbalanced class distribution and naive bayes. In Proceedings of the 16th International Conference on Machine Learning, pages 258--267, 1999.]] Google ScholarDigital Library
- K. Nigam, A. K. McCallum, s. Thrun, and T. Mitchell. Text classification from labeled and unlabeled documents using EM. Machine Learning, 39:103--134, 2000.]] Google ScholarDigital Library
- R. Pearson, G. Goney, and J. Shwaber. Imbalanced clustering for microarray time-series. In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Data Sets, 2003.]]Google Scholar
- C. Phua and D. Alahakoon. Minority report in fraud detection: Classification of skewed data. SIGKDD Explorations, 6(1):50--59, 2004.]] Google ScholarDigital Library
- F. Provost. Invited talk: Choosing a marginal class distribution for classifier induction. http://www.site.uottawa.ca/~nat/Workshop2003/provost.html, 2003.]]Google Scholar
- F. Provost and T. Fawcett. Robust classification for imprecise environments. Machine Learning, 42:203--231, 2001.]] Google ScholarDigital Library
- J. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.]] Google ScholarDigital Library
- P. Radivojac, N. V. Chawla, K. Dunker, and Z. Obradovic. Classification and knowledge discovery in protein databases. Journal of Biomedical Informatics, 2004. Accepted.]] Google ScholarDigital Library
- B. Raskutti and A. Kowalczyk. Extreme re-balancing for SVM's: a case study. In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Data Sets, 2003.]]Google Scholar
- B. Raskutti and A. Kowalczyk. Extreme rebalancing for svms: a case study. SIGKDD Explorations, 6(1):60--69, 2004.]] Google ScholarDigital Library
- B. Schölkopf, J. C. Platt, J. Shawe-Taylor. A. J. Smola, and R. C. Williamson. Estimating the support of a high-dimensional distribution. Neural Computation, 13(7):1443--1472, 2001.]] Google ScholarDigital Library
- D. Tax. One-class classification. PhD thesis, Delft University of Technology, 2001.]]Google Scholar
- K. M. Ting. A comparative study of cost-sensitive boosting algorithms. In Proceedings of Seventeenth International Conference on Machine Learning, pages 983--990, Stanford, CA, 2000.]] Google ScholarDigital Library
- K. M. Ting. An instance-weighting method to induce cost-sensitive trees. IEEE Transaction on Knowledge and Data Engineering. 14:659--665, 2002.]] Google ScholarDigital Library
- P. Turney. Types of cost in inductive concept learning. In Proceedings of the ICML'2000 Workshop on Cost-Sensitive Learning, pages 15--21, 2000.]]Google Scholar
- S. Visa and A. Ralescu. Learning imbalanced and overlapping classes using fuzzy sets. In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Data Sets, 2003.]]Google Scholar
- G. Weiss. Mining with rarity: A unifying framework. SIGKDD Explorations, 6(1):7--19, 2004.]] Google ScholarDigital Library
- G. Weiss and F. Provost. Learning when training data are costly: The effect of class distribution on tree induction. Journal of Artificial Intelligence Research, 19:315--354, 2003.]] Google ScholarDigital Library
- G. Wu and E. Y. Chang. Class-boundary alignment for imbalanced dataset learning. In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Data Sets, 2003.]]Google Scholar
- B. Zadrozny and C. Elkan. Learning and making decisions when costs and probabilities are both unknown. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 204--213, 2001.]] Google ScholarDigital Library
- B. Zadrozny, J. Langford, and N. Abe. Cost-sensitive learning by cost-proportionate example weighting. In Proceedings of the Third IEEE International Conference on Data Mining, pages 435--442, Melbourne, FL, 2003.]] Google ScholarDigital Library
- J. Zhang and I. Mani. knn approach to unbalanced data distributions: A case study involving information extraction. In Proceedings of the ICML'2003 Workshop on Learning from Imbalanced Datasets, 2003.]]Google Scholar
- Z. Zheng and R. Srihari. Optimally combining positive and negative features for text categorization. In Proceedings of the ICML'03 Workshop on Learning from Imbalanced Date Sets, 2003.]]Google Scholar
- Z. Zheng, X. Wu, and R. Srihari. Feature selection for text categorization on imbalanced data. SIGKDD Explorations, 6(1):80--89, 2004.]] Google ScholarDigital Library
Index Terms
- Editorial: special issue on learning from imbalanced data sets
Recommendations
Guest editorial: Special issue on models and methodologies for co-design of embedded systems
This special issue is based on innovative ideas presented and discussed during the first ACM/IEEE Conference on Formal Methods and Models for Co-Design (MEMOCODE) held at Mont Saint Michel in France during the summer of 2003. Selected papers from the ...
Editorial: Introduction to the Special Issue on Multimedia Data Mining
The twelve papers in this special issue focus on multimedia data mining. The special issue evolved from a successful workshop organized in conjunction with the 2006 ACM KDD conference, but the special issue was open to the whole community.
Comments