skip to main content
10.1145/502512.502540acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Learning and making decisions when costs and probabilities are both unknown

Authors Info & Claims
Published:26 August 2001Publication History

ABSTRACT

In many data mining domains, misclassification costs are different for different examples, in the same way that class membership probabilities are example-dependent. In these domains, both costs and probabilities are unknown for test examples, so both cost estimators and probability estimators must be learned. After discussing how to make optimal decisions given cost and probability estimates, we present decision tree and naive Bayesian learning methods for obtaining well-calibrated probability estimates. We then explain how to obtain unbiased estimators for example-dependent costs, taking into account the difficulty that in general, probabilities and costs are not independent random variables, and the training examples for which costs are known are not representative of all examples. The latter problem is called sample selection bias in econometrics. Our solution to it is based on Nobel prize-winning work due to the economist James Heckman. We show that the methods we propose perform better than MetaCost and all other known methods, in a comprehensive experimental comparison that uses the well-known, large, and challenging dataset from the KDD'98 data mining contest.

References

  1. 1.E. Bauer and R. Kohavi. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 36:105-139, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2.S. D. Bay. UCI KDD archive. Department of Information and Computer Sciences, University of California, Irvine, 2000. http://kdd, its.uci.edu/.Google ScholarGoogle Scholar
  3. 3.P. N. Bennett. Assessing the calibration of naive Bayes' posterior estimates. Technical Report CMU-CS-00-155, School of Computer Science, Carnegie Mellon University, 2000.Google ScholarGoogle Scholar
  4. 4.J. Bradford, C. Kunz, R. Kohavi, C. Brunk, and C. Brodley. Pruning decision trees with misclassification costs. In Proceedings of the European Conference on Machine Learning, pages 131-136, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5.L. Breiman. Bagging predictors. Machine Learning, 24:123-140, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6.L. Breiman, J. H. Friedman, R. A. Olsen, and C. J. Stone. Classification and Regression Trees. Wadsworth International Group, 1984.Google ScholarGoogle Scholar
  7. 7.J. Cussens. Bayes and pseudo-Bayes estimates of conditional probabilities and their reliability. In Proceedings of the European Conference on Machine Learning, pages 136-152. Springer Verlag, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.P. Domingos. MetaCost: A general method for making classifiers cost sensitive. In Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining, pages 155-164. ACM Press, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9.P. Domingos and M. Pazzani. Beyond independence: Conditions for the optimality of the simple Bayesian classifier. In Proceedings of the Thirteenth International Conference on Machine Learning, pages 105-112. Morgan Kanfmann Publishers, Inc., 1996.Google ScholarGoogle Scholar
  10. 10.C. Elkan. Cost-sensitive learning and decision-making when costs are unknown. In Workshop Notes, Workshop on Cost-Sensitive Learning at the Seventeenth International Conference on Machine Learning, 2000.Google ScholarGoogle Scholar
  11. 11.C. Elkan. The foundations of cost-sensitive learning. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, Aug. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12.F. Esposito, D. Malerba, and G. Semeraro. A comparative analysis of methods for pruning decision trees. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(5):476-491, May 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13.J. Heckman. Sample selection bias as a specification error. Econometrica, 47:153-161, 1979.Google ScholarGoogle ScholarCross RefCross Ref
  14. 14.E. C. Malthouse. Assessing the performance of direct marketing scoring models. Journal of Interactive Marketing, 15(1):49-62, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  15. 15.D. Margineantu. On class probability estimates and cost-sensitive evaluation of classifiers. In Workshop Notes, Workshop on Cost.Sensitive Learning, International Conference on Machine Learning, June 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16.F. Provost and P. Domingos. Well-trained PETs: Improving probability estimation trees. CDER Working Paper 2000-04-1S, Stern School of Business, New York University, NY, NY 10012, 2000.Google ScholarGoogle Scholar
  17. 17.F. Provost and T. Fawcett. Robust classification for imprecise environments. Machine Learnin9, 42:203-231, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. 18.J. Quinla 9. C4.5: Programs for Machine Learning. San Marco, CA: Morgan Kaufmann, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19.P. Smyth, A. Gray, and U. Fayyad. Retrofitting decision tree classifiers using kernel density estimation. In Proceedings of the Twelfth International Conference on Machine Learning, pages 506-514. Morgan Kaufmann Publishers, Inc., 1995.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. 20.J. R. Sobehart, R. M. Stein, V. Mikityanskaya, and L. Li. Moody's public firm risk model: A hybrid approach to modeling short term default risk. Technical report, Moody's Investors Service, Global Credit Research, 2000. Available at http://www, moodysqra, com} research/crm/53853, asp.Google ScholarGoogle Scholar
  21. 21.K. Turner and J. Ghosh. Theoretical foundations linear and order statistics combiners for neural pattern classifiers. Technical Report TR-95-02-98, Computer and Vision Research Center, The University of Texas at Austin, 1995.Google ScholarGoogle Scholar
  22. 22.P. Turney. Cost-sensitive learning bibliography. Institute for Information Technology, National Research Council, Ottawa, Canada, 2000. http ://extractor.iit.nrc.ca/ bibliographies/cost -sensit ive. html.Google ScholarGoogle Scholar
  23. 23.B. Zadrozny and C. Elkan. Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers. In Proceedings of the Eighteenth International Conference on Machine Learning, 2001. To appear. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Learning and making decisions when costs and probabilities are both unknown

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
                August 2001
                493 pages
                ISBN:158113391X
                DOI:10.1145/502512

                Copyright © 2001 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 26 August 2001

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • Article

                Acceptance Rates

                KDD '01 Paper Acceptance Rate31of237submissions,13%Overall Acceptance Rate1,133of8,635submissions,13%

                Upcoming Conference

                KDD '24

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader