On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality

Friedman, Jerome H.

doi:10.1023/A:1009778005914

On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality

Published: March 1997

Volume 1, pages 55–77, (1997)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Jerome H. Friedman¹

4135 Accesses
605 Citations
1 Altmetric
Explore all metrics

Abstract

The classification problem is considered in which an outputvariable y assumes discrete values with respectiveprobabilities that depend upon the simultaneous values of a set of input variablesx = {x_1,....,x_n}. At issue is how error in the estimates of theseprobabilities affects classification error when the estimates are used ina classification rule. These effects are seen to be somewhat counterintuitive in both their strength and nature. In particular the bias andvariance components of the estimation error combine to influenceclassification in a very different way than with squared error on theprobabilities themselves. Certain types of (very high) bias can becanceled by low variance to produce accurate classification. This candramatically mitigate the effect of the bias associated with some simpleestimators like “naive” Bayes, and the bias induced by thecurse-of-dimensionality on nearest-neighbor procedures. This helps explainwhy such simple methods are often competitive with and sometimes superiorto more sophisticated ones for classification, and why“bagging/aggregating” classifiers can often improveaccuracy. These results also suggest simple modifications to theseprocedures that can (sometimes dramatically) further improve theirclassification performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Bellman, R.E. 1961. Adaptive Control Processes. Princeton University Press.
Breiman, L. 1995. Bagging predictors. Dept. of Statistics, University of California, Berkeley, Technical Report.
Breiman, L. 1996. Bias, variance, and arcing classifiers. Dept. of Statistics, University of California, Technical Report (revised).
Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.J. 1984. Classification and Regression Trees. Wadsworth.
Chow, W.S. and Chen, Y.C. 1992. A new fast algorithm for effective training of neural classifiers. Pattern Recognition, 25:423–429.
Google Scholar
Dietterich, T.G. and Kong, E.B. 1995. Machine learning bias, statistical bias, and statistical variance of decision tree algorithms. Dept. of Computer Science, Oregon State University Technical Report.
Efron, B. and Tibshirani, R. 1995. Cross-validation and the bootstrap: Estimating the error rate of a prediction rule. Dept. of Statistics, Stanford University Technical Report.
Fix, E. and Hodges, J.L. 1951. Discriminatory analysis-nonparametric discrimination: Consistency properties. Randolf Field Texas: U.S. Airforce School of Aviation Medicine Technical Report No. 4.
Friedman, J.H. 1985. Classification and multiple response regression through projection pursuit. Dept. of Statistics, Stanford University Technical Report LCS012.
Geman, S., Bienenstock, E., and Doursat, R. 1992. Neural networks and the bias/variance dilemma. Neural Comp., 4:1–48.
Google Scholar
Good, I.J. 1965. The Estimation of Probabilities: An Essay on Modern Bayesian Methods. M.I.T. Press.
Hand, D.J. 1982. Kernel discriminant analysis. Chichester: Research Studies Press.
Heckerman, D., Geiger, D., and Chickering, D. 1994. Learning Bayesian networks: the combination of knowledge and statistical data. In Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence, pp. 293–301, AAAI Press and M.I.T. Press.
Henley, W.E. and Hand, D.J. 1996. A k-nearest neighbour classifier for assessing consumer credit risk. The Statistician, 45:77–95.
Google Scholar
Holte, R.C. 1993. Very simple classification rules perform well on most commonly used data sets. Machine Learning, 11:63–90.
Google Scholar
Kohavi, R. and Wolpert, D.H. 1996. Bias plus variance decomposition for zero-one loss functions. Dept. of Computer Science, Stanford University Technical Report.
Kohonen, T. 1990. The self-organizing map. Proceedings of the IEEE, 78:1464–1480.
Google Scholar
Langley, P., Iba, W., and Thompson, K. 1992. An analysis of Bayesian classifiers. In Proceedings of the Tenth National Conference on Artificial Intelligence, pp. 223–228, AAAI Press and M.I.T. Press.
Lippmann, R. 1989. Pattern classification using neural networks. IEEE Communications Magazine, 11:47–64.
Google Scholar
McLachlan, G.J. 1992. Discriminant Analysis and Statistical Pattern Recognition. Wiley.
Quinlan, J.R. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann.
Rosen, D.B., Burke, H.B., and Goodman, P.H. 1995. Local learning methods in high dimension: Beating the bias-variance dilemma via recalibration. Workshop Machines That Learn-Neural Networks for Computing, Snowbird Utah.
Tibshirani, R. 1996. Bias, variance and prediction error for classification rules. Dept. of Statistics, University of Toronto Technical Report.
Titterington, D.M., Murray, G.D., Murray, L.S., Spiegelhalter, D.J., Skene, A.M., Habbema, J.D.F., and Gelpke, G.J. 1981. Comparison of discrimination techniques applied to a complex data set of head injured patients. J. Roy. Statist. Soc. A, 144:145–175.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics and Stanford Linear Accelerator Center, Stanford University, USA
Jerome H. Friedman

Authors

Jerome H. Friedman
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Friedman, J.H. On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality. Data Mining and Knowledge Discovery 1, 55–77 (1997). https://doi.org/10.1023/A:1009778005914

Download citation

Issue Date: March 1997
DOI: https://doi.org/10.1023/A:1009778005914

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality

Abstract

Access this article

Similar content being viewed by others

ImageNet Large Scale Visual Recognition Challenge

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Siamese Neural Networks: An Overview

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality

Abstract

Access this article

Similar content being viewed by others

ImageNet Large Scale Visual Recognition Challenge

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Siamese Neural Networks: An Overview

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation