Predicting the Relationship Between the Size of Training Sample and the Predictive Power of Classifiers

Boonyanunta, Natthaphan; Zeephongsekul, Panlop

doi:10.1007/978-3-540-30134-9_71

Natthaphan Boonyanunta⁴ &
Panlop Zeephongsekul⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3215))

Included in the following conference series:

International Conference on Knowledge-Based and Intelligent Information and Engineering Systems

1136 Accesses
9 Citations

Abstract

The main objective of this paper is to investigate the relationship between the size of training sample and the predictive power of well-known classification techniques. We first display this relationship using the results of some empirical studies and then propose a general mathematical model which can explain this relationship. Next, we validate this model on some real data sets and found that the model provides a good fit to the data. This model also allow a more objective determination of optimum training sample size in contrast to current training sample size selection approaches which tend to be ad hoc or subjective.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Freund, Y.: An adaptive version of the boosting by majority algorithm. Machine Learning 43(3), 293–318 (2001)
Article Google Scholar
Schapire, R.E.: Drifting games. Machine Learning 43(3), 265–291 (2001)
Article Google Scholar
Ueda, N.: Optimal linear combination of neural networks for improving classification performance. Proceedings of IEEE Transactions on Pattern Analysis and Machine Intelligence 22(2), 207–215 (2000)
Article Google Scholar
Webb, G.I.: MultiBoosting: a technique for combining boosting and wagging. Machine Learning 40(2), 159–196 (2000)
Article Google Scholar
Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On combining classifiers. Proceedings of IEEE Transactions on Pattern Analysis and Machine Intelligence 20(3), 226–239 (1998)
Article Google Scholar
Weiss, S.M., Indurkhya, N.: Predictive Data Mining: A Practical Guide. Morgan Kaufmann Publishers, California (1998)
MATH Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann Publishers, California (2000)
Google Scholar
Groth, R.: Data Mining: Building Competitive Advantage. Prentice Hall, New Jersey (2000)
Google Scholar
Berry, M.J.A., Linoff, G.: Data Mining Techniques For Marketing, Sales and Customer Support. John Wiley & Sons, New York (1997)
Google Scholar
Lewis, E.M.: An Introduction to Credit Scoring. Athena Press, California (1992)
Google Scholar
Valiant, L.G.: A theory of the learnable. Communications of the ACM 27, 1134–1142 (1984)
Article Google Scholar
Schapire, R.E.: The strength of weak learnability. Machine Learning 5, 197–227 (1990)
Google Scholar
Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Statistics, RMIT University, Melbourne, Australia
Natthaphan Boonyanunta & Panlop Zeephongsekul

Authors

Natthaphan Boonyanunta
View author publications
You can also search for this author in PubMed Google Scholar
Panlop Zeephongsekul
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

KES International, 2nd Floor, 145-157 St John Street, EC1V 4PY, London, United Kingdom
Mircea Gh. Negoita
Centre for SMART systems Engineering Research Centre, University of Brighton, BN2 4GJ, Moulsecoomb, Brighton, UK
Robert J. Howlett
School of Electrical and Information Engineering, Knowledge Based Intelligent Engineering Systems Centre, University of South Australia, Mawson Lakes, 5095, SA, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Boonyanunta, N., Zeephongsekul, P. (2004). Predicting the Relationship Between the Size of Training Sample and the Predictive Power of Classifiers. In: Negoita, M.G., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2004. Lecture Notes in Computer Science(), vol 3215. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30134-9_71

Download citation

DOI: https://doi.org/10.1007/978-3-540-30134-9_71
Published: 14 October 2004
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23205-6
Online ISBN: 978-3-540-30134-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics