Skip to main content
Log in

On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

The classification problem is considered in which an outputvariable y assumes discrete values with respectiveprobabilities that depend upon the simultaneous values of a set of input variablesx = {x_1,....,x_n}. At issue is how error in the estimates of theseprobabilities affects classification error when the estimates are used ina classification rule. These effects are seen to be somewhat counterintuitive in both their strength and nature. In particular the bias andvariance components of the estimation error combine to influenceclassification in a very different way than with squared error on theprobabilities themselves. Certain types of (very high) bias can becanceled by low variance to produce accurate classification. This candramatically mitigate the effect of the bias associated with some simpleestimators like “naive” Bayes, and the bias induced by thecurse-of-dimensionality on nearest-neighbor procedures. This helps explainwhy such simple methods are often competitive with and sometimes superiorto more sophisticated ones for classification, and why“bagging/aggregating” classifiers can often improveaccuracy. These results also suggest simple modifications to theseprocedures that can (sometimes dramatically) further improve theirclassification performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Bellman, R.E. 1961. Adaptive Control Processes. Princeton University Press.

  • Breiman, L. 1995. Bagging predictors. Dept. of Statistics, University of California, Berkeley, Technical Report.

  • Breiman, L. 1996. Bias, variance, and arcing classifiers. Dept. of Statistics, University of California, Technical Report (revised).

  • Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.J. 1984. Classification and Regression Trees. Wadsworth.

  • Chow, W.S. and Chen, Y.C. 1992. A new fast algorithm for effective training of neural classifiers. Pattern Recognition, 25:423–429.

    Google Scholar 

  • Dietterich, T.G. and Kong, E.B. 1995. Machine learning bias, statistical bias, and statistical variance of decision tree algorithms. Dept. of Computer Science, Oregon State University Technical Report.

  • Efron, B. and Tibshirani, R. 1995. Cross-validation and the bootstrap: Estimating the error rate of a prediction rule. Dept. of Statistics, Stanford University Technical Report.

  • Fix, E. and Hodges, J.L. 1951. Discriminatory analysis-nonparametric discrimination: Consistency properties. Randolf Field Texas: U.S. Airforce School of Aviation Medicine Technical Report No. 4.

  • Friedman, J.H. 1985. Classification and multiple response regression through projection pursuit. Dept. of Statistics, Stanford University Technical Report LCS012.

  • Geman, S., Bienenstock, E., and Doursat, R. 1992. Neural networks and the bias/variance dilemma. Neural Comp., 4:1–48.

    Google Scholar 

  • Good, I.J. 1965. The Estimation of Probabilities: An Essay on Modern Bayesian Methods. M.I.T. Press.

  • Hand, D.J. 1982. Kernel discriminant analysis. Chichester: Research Studies Press.

  • Heckerman, D., Geiger, D., and Chickering, D. 1994. Learning Bayesian networks: the combination of knowledge and statistical data. In Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence, pp. 293–301, AAAI Press and M.I.T. Press.

  • Henley, W.E. and Hand, D.J. 1996. A k-nearest neighbour classifier for assessing consumer credit risk. The Statistician, 45:77–95.

    Google Scholar 

  • Holte, R.C. 1993. Very simple classification rules perform well on most commonly used data sets. Machine Learning, 11:63–90.

    Google Scholar 

  • Kohavi, R. and Wolpert, D.H. 1996. Bias plus variance decomposition for zero-one loss functions. Dept. of Computer Science, Stanford University Technical Report.

  • Kohonen, T. 1990. The self-organizing map. Proceedings of the IEEE, 78:1464–1480.

    Google Scholar 

  • Langley, P., Iba, W., and Thompson, K. 1992. An analysis of Bayesian classifiers. In Proceedings of the Tenth National Conference on Artificial Intelligence, pp. 223–228, AAAI Press and M.I.T. Press.

  • Lippmann, R. 1989. Pattern classification using neural networks. IEEE Communications Magazine, 11:47–64.

    Google Scholar 

  • McLachlan, G.J. 1992. Discriminant Analysis and Statistical Pattern Recognition. Wiley.

  • Quinlan, J.R. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann.

  • Rosen, D.B., Burke, H.B., and Goodman, P.H. 1995. Local learning methods in high dimension: Beating the bias-variance dilemma via recalibration. Workshop Machines That Learn-Neural Networks for Computing, Snowbird Utah.

  • Tibshirani, R. 1996. Bias, variance and prediction error for classification rules. Dept. of Statistics, University of Toronto Technical Report.

  • Titterington, D.M., Murray, G.D., Murray, L.S., Spiegelhalter, D.J., Skene, A.M., Habbema, J.D.F., and Gelpke, G.J. 1981. Comparison of discrimination techniques applied to a complex data set of head injured patients. J. Roy. Statist. Soc. A, 144:145–175.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Friedman, J.H. On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality. Data Mining and Knowledge Discovery 1, 55–77 (1997). https://doi.org/10.1023/A:1009778005914

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1009778005914

Navigation