Article

Learning and making decisions when costs and probabilities are both unknown

Authors:
Bianca Zadrozny

University of California, San Diego, La Jolla, California

University of California, San Diego, La Jolla, California
View Profile

,
Charles Elkan

University of California, San Diego, La Jolla, California

University of California, San Diego, La Jolla, California
View Profile

KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2001Pages 204–213https://doi.org/10.1145/502512.502540

Published:26 August 2001Publication History

KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 204–213

ABSTRACT

In many data mining domains, misclassification costs are different for different examples, in the same way that class membership probabilities are example-dependent. In these domains, both costs and probabilities are unknown for test examples, so both cost estimators and probability estimators must be learned. After discussing how to make optimal decisions given cost and probability estimates, we present decision tree and naive Bayesian learning methods for obtaining well-calibrated probability estimates. We then explain how to obtain unbiased estimators for example-dependent costs, taking into account the difficulty that in general, probabilities and costs are not independent random variables, and the training examples for which costs are known are not representative of all examples. The latter problem is called sample selection bias in econometrics. Our solution to it is based on Nobel prize-winning work due to the economist James Heckman. We show that the methods we propose perform better than MetaCost and all other known methods, in a comprehensive experimental comparison that uses the well-known, large, and challenging dataset from the KDD'98 data mining contest.

References

1.E. Bauer and R. Kohavi. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 36:105-139, 1999. Google ScholarDigital Library
2.S. D. Bay. UCI KDD archive. Department of Information and Computer Sciences, University of California, Irvine, 2000. http://kdd, its.uci.edu/.Google Scholar
3.P. N. Bennett. Assessing the calibration of naive Bayes' posterior estimates. Technical Report CMU-CS-00-155, School of Computer Science, Carnegie Mellon University, 2000.Google Scholar
4.J. Bradford, C. Kunz, R. Kohavi, C. Brunk, and C. Brodley. Pruning decision trees with misclassification costs. In Proceedings of the European Conference on Machine Learning, pages 131-136, 1998. Google ScholarDigital Library
5.L. Breiman. Bagging predictors. Machine Learning, 24:123-140, 1996. Google ScholarDigital Library
6.L. Breiman, J. H. Friedman, R. A. Olsen, and C. J. Stone. Classification and Regression Trees. Wadsworth International Group, 1984.Google Scholar
7.J. Cussens. Bayes and pseudo-Bayes estimates of conditional probabilities and their reliability. In Proceedings of the European Conference on Machine Learning, pages 136-152. Springer Verlag, 1993. Google ScholarDigital Library
8.P. Domingos. MetaCost: A general method for making classifiers cost sensitive. In Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining, pages 155-164. ACM Press, 1999. Google ScholarDigital Library
9.P. Domingos and M. Pazzani. Beyond independence: Conditions for the optimality of the simple Bayesian classifier. In Proceedings of the Thirteenth International Conference on Machine Learning, pages 105-112. Morgan Kanfmann Publishers, Inc., 1996.Google Scholar
10.C. Elkan. Cost-sensitive learning and decision-making when costs are unknown. In Workshop Notes, Workshop on Cost-Sensitive Learning at the Seventeenth International Conference on Machine Learning, 2000.Google Scholar
11.C. Elkan. The foundations of cost-sensitive learning. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, Aug. 2001. Google ScholarDigital Library
12.F. Esposito, D. Malerba, and G. Semeraro. A comparative analysis of methods for pruning decision trees. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(5):476-491, May 1997. Google ScholarDigital Library
13.J. Heckman. Sample selection bias as a specification error. Econometrica, 47:153-161, 1979.Google ScholarCross Ref
14.E. C. Malthouse. Assessing the performance of direct marketing scoring models. Journal of Interactive Marketing, 15(1):49-62, 2001.Google ScholarCross Ref
15.D. Margineantu. On class probability estimates and cost-sensitive evaluation of classifiers. In Workshop Notes, Workshop on Cost.Sensitive Learning, International Conference on Machine Learning, June 2000. Google ScholarDigital Library
16.F. Provost and P. Domingos. Well-trained PETs: Improving probability estimation trees. CDER Working Paper 2000-04-1S, Stern School of Business, New York University, NY, NY 10012, 2000.Google Scholar
17.F. Provost and T. Fawcett. Robust classification for imprecise environments. Machine Learnin9, 42:203-231, 2001. Google ScholarDigital Library
18.J. Quinla 9. C4.5: Programs for Machine Learning. San Marco, CA: Morgan Kaufmann, 1993. Google ScholarDigital Library
19.P. Smyth, A. Gray, and U. Fayyad. Retrofitting decision tree classifiers using kernel density estimation. In Proceedings of the Twelfth International Conference on Machine Learning, pages 506-514. Morgan Kaufmann Publishers, Inc., 1995.Google ScholarDigital Library
20.J. R. Sobehart, R. M. Stein, V. Mikityanskaya, and L. Li. Moody's public firm risk model: A hybrid approach to modeling short term default risk. Technical report, Moody's Investors Service, Global Credit Research, 2000. Available at http://www, moodysqra, com} research/crm/53853, asp.Google Scholar
21.K. Turner and J. Ghosh. Theoretical foundations linear and order statistics combiners for neural pattern classifiers. Technical Report TR-95-02-98, Computer and Vision Research Center, The University of Texas at Austin, 1995.Google Scholar
22.P. Turney. Cost-sensitive learning bibliography. Institute for Information Technology, National Research Council, Ottawa, Canada, 2000. http ://extractor.iit.nrc.ca/ bibliographies/cost -sensit ive. html.Google Scholar
23.B. Zadrozny and C. Elkan. Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers. In Proceedings of the Eighteenth International Conference on Machine Learning, 2001. To appear. Google ScholarDigital Library

Index Terms

Learning and making decisions when costs and probabilities are both unknown

Recommendations

Predicting good probabilities with supervised learning
ICML '05: Proceedings of the 22nd international conference on Machine learning

We examine the relationship between the predictions made by different learning algorithms and true posterior probabilities. We show that maximum margin methods such as boosted trees and boosted stumps push probability mass away from 0 and 1 yielding a ...
Read More
Learning and Making Decisions When Costs and Probabilities are Both Unknown
Read More
How to use several noisy channels with unknown error probabilities

Consider a sender transmitting a given sequence of bits simultaneously through m binary symmetric channels with error probabilities ε₁,..... ε_m such that 0 ≤ ε_i < 1/2 for each i = 1, ..... m. The receiver can base its guess for the true transmitted bit ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
August 2001
493 pages
ISBN:158113391X
DOI:10.1145/502512
Conference Chair:
Doheon Lee
Chonnam National University, Korea
,
General Chair:
Mario Schkolnick
SGI
,
Program Chairs:
Foster Provost
New York University
,
Ramakrishnan Srikant
IBM Almaden Research Center
Copyright © 2001 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 August 2001
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
KDD '01 Paper Acceptance Rate31of237submissions,13%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 287
  Total Citations
  View Citations
- 1,764
  Total Downloads
- Downloads (Last 12 months)66
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Learning and making decisions when costs and probabilities are both unknown

KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Predicting good probabilities with supervised learning

Learning and Making Decisions When Costs and Probabilities are Both Unknown

How to use several noisy channels with unknown error probabilities