Impact of Sampling on Neural Network Classification Performance in the Context of Repeat Movie Viewing

Fitkov-Norris, Elena; Folorunso, Sakinat Oluwabukonla

doi:10.1007/978-3-642-41013-0_22

Impact of Sampling on Neural Network Classification Performance in the Context of Repeat Movie Viewing

Elena Fitkov-Norris⁴ &
Sakinat Oluwabukonla Folorunso⁵

Conference paper

1764 Accesses
7 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 383))

Abstract

This paper assesses the impact of different sampling approaches on neural network classification performance in the context of repeat movie going. The results showed that synthetic oversampling of the minority class, either on its own or combined with under-sampling and removal of noisy examples from the majority class offered the best overall performance. The identification of the best sampling approach for this data set is not trivial since the alternatives would be highly dependent on the metrics used, as the accuracy ranks of the approaches did not agree across the different accuracy measures used. In addition, the findings suggest that including examples generated as part of the oversampling procedure in the holdout sample, leads to a significant overestimation of the accuracy of the neural network. Further research is necessary to understand the relationship between degree of synthetic over-sampling and the efficacy of the holdout sample as a neural network accuracy estimator.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: Special Issue on Learning from Imbalanced Data Sets. SIGKDD Explor. Newsl. 6(1), 1–6 (2004)
Article Google Scholar
Japkowicz, N., Stephen, S.: The Class Imbalance Problem: A Systematic Study. Intell. Data. Anal. 6(5), 429–449 (2002)
MATH Google Scholar
Fernández, A., García, S., Herrera, F.: Addressing the Classification with Imbalanced Data: Open Problems and New Challenges on Class Distribution. In: Corchado, E., Kurzyński, M., Woźniak, M. (eds.) HAIS 2011, Part I. LNCS, vol. 6678, pp. 1–10. Springer, Heidelberg (2011)
Chapter Google Scholar
Pearson, R., Goney, G., Shwaber, J.: Imbalanced Clustering of Microarray Time-Series. In: Fawcett, T., Mishra, S. (eds.) 12th International Conference on Machine Learning Workshop on Learning from Imbalanced Datasets II, Washington DC, vol. 3 (2003)
Google Scholar
Kubat, M., Matwin, S.: Addressing the Curse of Imbalanced Training Sets: One-Sided Selection. In: 14th International Conference on Machine Learning, Nashville, Tennessee, USA, pp. 179–186 (1997)
Google Scholar
Manevitz, L.M., Yousef, M.: One-Class SVMs for Document Classification. JMLR 2, 139–154 (2002)
MATH Google Scholar
Thai-Nghe, N., Busche, A., Schmidt-Thieme, L.: Improving Academic Performance Prediction by Dealing with Class Imbalance. In: 9th IEEE International Conference on Intelligent Systems Design and Applications, Pisa, Italy, pp. 878–883 (2009)
Google Scholar
Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data. SIGKDD Explor. Newsl. 6(1), 20–29 (2004)
Article Google Scholar
Folorunso, S.O., Adeyemo, A.B.: Theoretical Comparison of Undersampling Techniques Against Their Underlying Data Reduction Techniques. In: EIE 2nd International Conference Computing, Energy, Networking, Robotics and Telecommunications (EIECON 2012), Lagos, Nigeria, pp. 92–97 (2012)
Google Scholar
Kotsiantis, S., Kanellopoulos, D., Pintelas, P.: Handling Imbalanced Datasets: A Review. GESTS International Transactions on Computer Science and Engineering 30(1), 25–36 (2006)
Google Scholar
Zhou, Z.-H., Liu, X.-Y.: Training Cost-Sensitive Neural Networks with Methods Addressing the Class Imbalance Problem. IEEE T. Knowl. Data. En. 18(1), 63–77 (2006)
Article Google Scholar
Mazurowski, M.A., Habas, P.A., Zurada, J.M., Lo, J.Y., Baker, J.A., Tourassi, G.D.: Training Neural Network Classifiers for Medical Decision Making: The Effects of Imbalanced Datasets on Classification Performance. Neural Networks 21(2), 427–436 (2008)
Article Google Scholar
Crone, S.F., Finlay, S.: Instance Sampling in Credit Scoring: an Empirical Study of Sample Size and Balancing. Int. J. Forecasting 28(1), 224–238 (2011)
Article Google Scholar
Collins, A., Hand, C., Linnell, M.: Analyzing Repeat Consumption of Identical Cultural Goods: Some Exploratory Evidence from Moviegoing. J. Cult. Econ. 32(3), 187–199 (2008)
Article Google Scholar
Sawhney, M., Eliashberg, J.: A Parsimonious Model for Forecasting Gross Box-Office Revenues of Motion Pictures. Market. Sci., 113–131 (2001)
Google Scholar
Sharda, R., Delen, D.: Predicting Box-Office Success of Motion Pictures with Neural Networks. Expert Syst. Appl. 30(2), 243–254 (2006)
Article Google Scholar
Paliwal, M., Kumar, U.A.: Neural Networks and Statistical Techniques: A Review of Applications. Expert Syst. Appl. 36(1), 2–17 (2009)
Article Google Scholar
Fitkov-Norris, E., Vahid, S., Hand, C.: Evaluating the Impact of Categorical Data Encoding and Scaling on Neural Network Classification Performance: The Case of Repeat Consumption of Identical Cultural Goods. In: Jayne, C., Yue, S., Iliadis, L. (eds.) EANN 2012. CCIS, vol. 311, pp. 343–352. Springer, Heidelberg (2012)
Chapter Google Scholar
Hart, P.E.: The Condensed Nearest Neighbor Rule. IEEE T. Inform. Theory 14(3), 515–516 (1968)
Article Google Scholar
Laurikkala, J.: Improving Identification of Difficult Small Classes by Balancing Class Distribution. In: Quaglini, S., Barahona, P., Andreassen, S. (eds.) AIME 2001. LNCS (LNAI), vol. 2101, pp. 63–66. Springer, Heidelberg (2001)
Chapter Google Scholar
Tomek, I.: Two Modifications of CNN. IEEE T. Syst. Man. Cyb. 11(6), 769–772 (1976)
Google Scholar
Wilson, D.L.: Asymptotic Properties of Nearest Neighbor Rules Using Edited Data. IEEE T. Syst. Man. Cyb. SMC-2(3), 408–421 (1972)
Article Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Over-Sampling Technique. J. Artif. Intell. Res. 16, 321–357 (2002)
MATH Google Scholar
Ramentol, E., Caballero, Y., Bello, R., Herrera, F.: SMOTE-RSB^*: a Hybrid Preprocessing Approach Based on Oversampling and Undersampling for High Imbalanced Data-Sets Using SMOTE and Rough Sets Theory. Knowl. Inf. Syst. 33(2), 245–265 (2011)
Article Google Scholar
García, S., Herrera, F.: Evolutionary Undersampling for Classification with Imbalanced Datasets: Proposals and Taxonomy. Evol. Comput. 17(3), 275–306 (2009)
Article Google Scholar
Chen, S., He, H., Garcia, E.A.: RAMOBoost: Ranked Minority Oversampling in Boosting. IEEE T. Neural Networ. 21(10), 1624–1642 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Kingston University, Kingston Hill, Kingston-upon-Thames, KT2 7LB, UK
Elena Fitkov-Norris
Mathematical Sciences Department, Olabisi Onabanjo University (OOU) Ago, Iwoye, Ogun State, Nigeria
Sakinat Oluwabukonla Folorunso

Authors

Elena Fitkov-Norris
View author publications
You can also search for this author in PubMed Google Scholar
Sakinat Oluwabukonla Folorunso
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Forestry & Management of the Environment and Natural Resources, Democritus University of Thrace, GR-68200, Orestiada, Hellas
Lazaros Iliadis
Frederick University of Cyprus, Cyprus
Harris Papadopoulos
Faculty of Engineering and Computing, Coventry University, UK
Chrisina Jayne

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fitkov-Norris, E., Folorunso, S.O. (2013). Impact of Sampling on Neural Network Classification Performance in the Context of Repeat Movie Viewing. In: Iliadis, L., Papadopoulos, H., Jayne, C. (eds) Engineering Applications of Neural Networks. EANN 2013. Communications in Computer and Information Science, vol 383. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41013-0_22

Download citation

DOI: https://doi.org/10.1007/978-3-642-41013-0_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41012-3
Online ISBN: 978-3-642-41013-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics