Skip to main content
Log in

A random forest guided tour

  • Invited Paper
  • Published:
TEST Aims and scope Submit manuscript

Abstract

The random forest algorithm, proposed by L. Breiman in 2001, has been extremely successful as a general-purpose classification and regression method. The approach, which combines several randomized decision trees and aggregates their predictions by averaging, has shown excellent performance in settings where the number of variables is much larger than the number of observations. Moreover, it is versatile enough to be applied to large-scale problems, is easily adapted to various ad hoc learning tasks, and returns measures of variable importance. The present article reviews the most recent theoretical and methodological developments for random forests. Emphasis is placed on the mathematical forces driving the algorithm, with special attention given to the selection of parameters, the resampling mechanism, and variable importance measures. This review is intended to provide non-experts easy access to the main ideas.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Amaratunga D, Cabrera J, Lee Y-S (2008) Enriched random forests. Bioinformatics 24:2010–2014

    Article  Google Scholar 

  • Amit Y, Geman D (1997) Shape quantization and recognition with randomized trees. Neural Comput 9:1545–1588

    Article  Google Scholar 

  • Archer KJ, Kimes RV (2008) Empirical characterization of random forest variable importance measures. Comput Stat Data Anal 52:2249–2260

    Article  MATH  MathSciNet  Google Scholar 

  • Arlot S, Genuer R (2014) Analysis of purely random forests bias. arXiv:1407.3939

  • Auret L, Aldrich C (2011) Empirical comparison of tree ensemble variable importance measures. Chemom Intell Lab Syst 105:157–170

    Article  Google Scholar 

  • Bai Z-H, Devroye L, Hwang H-K, Tsai T-H (2005) Maxima in hypercubes. Random Struct Algorithms 27:290–309

    Article  MATH  MathSciNet  Google Scholar 

  • Banerjee M, McKeague IW (2007) Confidence sets for split points in decision trees. Ann Stat 35:543–574

    Article  MATH  MathSciNet  Google Scholar 

  • Barndorff-Nielsen O, Sobel M (1966) On the distribution of the number of admissible points in a vector random sample. Theory Probab Appl 11:249–269

    Article  MATH  MathSciNet  Google Scholar 

  • Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2:1–127

    Article  MATH  Google Scholar 

  • Bernard S, Heutte L, Adam S (2008) Forest-RK: a new random forest induction method. In: Huang D-S, Wunsch DC II, Levine DS, Jo K-H (eds) Advanced intelligent computing theories and applications. With aspects of artificial intelligence. Springer, Berlin, pp 430–437

    Chapter  Google Scholar 

  • Bernard S, Adam S, Heutte L (2012) Dynamic random forests. Pattern Recognit Lett 33:1580–1586

    Article  Google Scholar 

  • Biau G (2012) Analysis of a random forests model. J Mach Learn Res 13:1063–1095

    MATH  MathSciNet  Google Scholar 

  • Biau G, Devroye L (2010) On the layered nearest neighbour estimate, the bagged nearest neighbour estimate and the random forest method in regression and classification. J Multivar Anal 101:2499–2518

    Article  MATH  MathSciNet  Google Scholar 

  • Biau G, Devroye L (2013) Cellular tree classifiers. Electron J Stat 7:1875–1912

    Article  MATH  MathSciNet  Google Scholar 

  • Biau G, Devroye L, Lugosi G (2008) Consistency of random forests and other averaging classifiers. J Mach Learn Res 9:2015–2033

    MATH  MathSciNet  Google Scholar 

  • Biau G, Cérou F, Guyader A (2010) On the rate of convergence of the bagged nearest neighbor estimate. J Mach Learn Res 11:687–712

    MATH  MathSciNet  Google Scholar 

  • Boulesteix A-L, Janitza S, Kruppa J, König IR (2012) Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics. Wiley Interdiscip Rev Data Mining Knowl Discov 2:493–507

    Article  Google Scholar 

  • Breiman L (1996) Bagging predictors. Mach Learn 24:123–140

    MATH  MathSciNet  Google Scholar 

  • Breiman L (2000a) Some infinity theory for predictor ensembles. Technical Report 577, University of California, Berkeley

  • Breiman L (2000b) Randomizing outputs to increase prediction accuracy. Mach Learn 40:229–242

    Article  MATH  Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45:5–32

    Article  MATH  Google Scholar 

  • Breiman L (2003a) Setting up, using, and understanding random forests V3.1. https://www.stat.berkeley.edu/~breiman/Using_random_forests_V3.1.pdf

  • Breiman L (2003b) Setting up, using, and understanding random forests V4.0. https://www.stat.berkeley.edu/~breiman/Using_random_forests_v4.0.pdf

  • Breiman L (2004) Consistency for a simple model of random forests. Technical Report 670, University of California, Berkeley

  • Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Chapman & Hall/CRC, Boca Raton

    MATH  Google Scholar 

  • Bühlmann P, Yu B (2002) Analyzing bagging. Ann Stat 30:927–961

    Article  MATH  MathSciNet  Google Scholar 

  • Chen C, Liaw A, Breiman L (2004) Using random forest to learn imbalanced data. Technical Report 666, University of California, Berkeley

  • Clémençon S, Depecker M, Vayatis N (2013) Ranking forests. J Mach Learn Res 14:39–73

    MATH  MathSciNet  Google Scholar 

  • Clémençon S, Vayatis N (2009) Tree-based ranking methods. IEEE Trans Inform Theory 55:4316–4336

    Article  MATH  MathSciNet  Google Scholar 

  • Criminisi A, Shotton J, Konukoglu E (2011) Decision forests: a unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning. Found Trends Comput Graph Vis 7:81–227

    Article  MATH  Google Scholar 

  • Crookston NL, Finley AO (2008) yaImpute: an R package for \(k\)NN imputation. J Stat Softw 23:1–16

    Article  Google Scholar 

  • Cutler A, Zhao G (2001) PERT—perfect random tree ensembles. Comput Sci Stat 33:490–497

    Google Scholar 

  • Cutler DR, Edwards TC Jr, Beard KH, Cutler A, Hess KT, Gibson J, Lawler JJ (2007) Random forests for classification in ecology. Ecology 88:2783–2792

    Article  Google Scholar 

  • Davies A, Ghahramani Z (2014) The Random Forest Kernel and creating other kernels for big data from random partitions. arXiv:1402.4293

  • Deng H, Runger G (2012) Feature selection via regularized trees. In: The 2012 international joint conference on neural networks, pp 1–8

  • Deng H, Runger G (2013) Gene selection with guided regularized random forest. Pattern Recognit 46:3483–3489

    Article  Google Scholar 

  • Denil M, Matheson D, de Freitas N (2013) Consistency of online random forests. In: International conference on machine learning (ICML)

  • Denil M, Matheson D, de Freitas N (2014) Narrowing the gap: random forests in theory and in practice. In: International conference on machine learning (ICML)

  • Désir C, Bernard S, Petitjean C, Heutte L (2013) One class random forests. Pattern Recognit 46:3490–3506

    Article  Google Scholar 

  • Devroye L, Györfi L, Lugosi G (1996) A probabilistic theory of pattern recognition. Springer, New York

    Book  MATH  Google Scholar 

  • Díaz-Uriarte R, Alvarez de Andrés S (2006) Gene selection and classification of microarray data using random forest. BMC Bioinform 7:1–13

  • Dietterich TG (2000) Ensemble methods in machine learning. In: Kittler J, Roli F (eds) Multiple classifier systems. Springer, Berlin, pp 1–15

  • Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Stat 7:1–26

    Article  MATH  MathSciNet  Google Scholar 

  • Efron B (1982) The jackknife, the bootstrap and other resampling plans, vol 38. CBMS-NSF Regional Conference Series in Applied Mathematics, Philadelphia

  • Fink D, Hochachka WM, Zuckerberg B, Winkler DW, Shaby B, Munson MA, Hooker G, Riedewald M, Sheldon D, Kelling S (2010) Spatiotemporal exploratory models for broad-scale survey data. Ecol Appl 20:2131–2147

    Article  Google Scholar 

  • Friedman J, Hastie T, Tibshirani R (2009) The elements of statistical learning, 2nd edn. Springer, New York

    MATH  Google Scholar 

  • Genuer R (2012) Variance reduction in purely random forests. J Nonparametr Stat 24:543–562

    Article  MATH  MathSciNet  Google Scholar 

  • Genuer R, Poggi J-M, Tuleau-Malot C (2010) Variable selection using random forests. Pattern Recognit Lett 31:2225–2236

    Article  Google Scholar 

  • Geremia E, Menze BH, Ayache N (2013) Spatially adaptive random forests. In: IEEE international symposium on biomedical imaging: from nano to macro, pp 1332–1335

  • Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63:3–42

    Article  MATH  Google Scholar 

  • Gregorutti B, Michel B, Saint Pierre P (2016) Correlation and variable importance in random forests. Stat Comput. doi:10.1007/s11222-016-9646-1

  • Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422

    Article  MATH  Google Scholar 

  • Györfi L, Kohler M, Krzyżak A, Walk H (2002) A distribution-free theory of nonparametric regression. Springer, New York

    Book  MATH  Google Scholar 

  • Ho T (1998) The random subspace method for constructing decision forests. Pattern Anal Mach Intell 20:832–844

    Article  Google Scholar 

  • Hothorn T, Hornik K, Zeileis A (2006) Unbiased recursive partitioning: a conditional inference framework. J Comput Graph Stat 15:651–674

    Article  MathSciNet  Google Scholar 

  • Howard J, Bowles M (2012) The two most important algorithms in predictive modeling today. In: Strata Conference: Santa Clara. http://strataconf.com/strata2012/public/schedule/detail/22658

  • Ishioka T (2013) Imputation of missing values for unsupervised data using the proximity in random forests. In: eLmL 2013, The fifth international conference on mobile, hybrid, and on-line learning, pp 30–36. International Academy, Research, and Industry Association

  • Ishwaran H (2007) Variable importance in binary regression trees and forests. Electron J Stat 1:519–537

    Article  MATH  MathSciNet  Google Scholar 

  • Ishwaran H (2013) The effect of splitting on random forests. Mach Learn 99:75–118

    Article  MATH  MathSciNet  Google Scholar 

  • Ishwaran H, Kogalur UB (2010) Consistency of random survival forests. Stat Probab Lett 80:1056–1064

    Article  MATH  MathSciNet  Google Scholar 

  • Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS (2008) Random survival forests. Ann Appl Stat 2:841–860

    Article  MATH  MathSciNet  Google Scholar 

  • Ishwaran H, Kogalur UB, Chen X, Minn AJ (2011) Random survival forests for high-dimensional data. Stat Anal Data Mining ASA Data Sci J 4:115–132

    Article  MathSciNet  Google Scholar 

  • Jeffrey D, Sanja G (2008) Simplified data processing on large clusters. Commun ACM 51:107–113

    Google Scholar 

  • Joly A, Geurts P, Wehenkel L (2014) Random forests with random projections of the output space for high dimensional multi-label classification. In: Calders T, Esposito F, Hüllermeier E, Meo R (eds) Machine learning and knowledge discovery in databases. Springer, Berlin, pp 607–622

    Google Scholar 

  • Kim H, Loh W-Y (2001) Classification trees with unbiased multiway splits. J Am Stat Assoc 96:589–604

    Article  MathSciNet  Google Scholar 

  • Kleiner A, Talwalkar A, Sarkar P, Jordan MI (2014) A scalable bootstrap for massive data. J Royal Stat Soc Ser B (Stat Methodol) 76:795–816

    Article  MathSciNet  Google Scholar 

  • Konukoglu E, Ganz M (2014) Approximate false positive rate control in selection frequency for random forest. arXiv:1410.2838

  • Kruppa J, Schwarz A, Arminger G, Ziegler A (2013) Consumer credit risk: individual probability estimates using machine learning. Expert Syst Appl 40:5125–5131

    Article  Google Scholar 

  • Kruppa J, Liu Y, Biau G, Kohler M, König IR, Malley JD, Ziegler A (2014a) Probability estimation with machine learning methods for dichotomous and multicategory outcome: theory. Biometr J 56:534–563

    Article  MATH  MathSciNet  Google Scholar 

  • Kruppa J, Liu Y, Diener H-C, Holste T, Weimar C, König IR, Ziegler A (2014b) Probability estimation with machine learning methods for dichotomous and multicategory outcome: applications. Biometr J 56:564–583

    Article  MATH  MathSciNet  Google Scholar 

  • Kuhn M, Johnson K (2013) Applied predictive modeling. Springer, New York

    Book  MATH  Google Scholar 

  • Kyrillidis A, Zouzias A (2014) Non-uniform feature sampling for decision tree ensembles. In: IEEE international conference on acoustics, speech and signal processing, pp 4548–4552

  • Lakshminarayanan B, Roy DM, Teh YW (2014) Mondrian forests: efficient online random forests. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in neural information processing systems, pp 3140–3148

  • Latinne P, Debeir O, Decaestecker C (2001) Limiting the number of trees in random forests. In: Kittler J, Roli F (eds) Multiple classifier systems. Springer, Berlin, pp 178–187

    Chapter  Google Scholar 

  • Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2:18–22

    Google Scholar 

  • Lin Y, Jeon Y (2006) Random forests and adaptive nearest neighbors. J Am Stat Assoc 101:578–590

    Article  MATH  MathSciNet  Google Scholar 

  • Louppe G, Wehenkel L, Sutera A, Geurts P (2013) Understanding variable importances in forests of randomized trees. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems, pp 431–439

  • Malley JD, Kruppa J, Dasgupta A, Malley KG, Ziegler A (2012) Probability machines: consistent probability estimation using nonparametric learning machines. Methods Inform Med 51:74–81

    Article  Google Scholar 

  • Meinshausen N (2006) Quantile regression forests. J Mach Learn Res 7:983–999

    MATH  MathSciNet  Google Scholar 

  • Meinshausen N (2009) Forest Garrote. Electron J Stat 3:1288–1304

    Article  MATH  MathSciNet  Google Scholar 

  • Mentch L, Hooker G (2014) A novel test for additivity in supervised ensemble learners. arXiv:1406.1845

  • Mentch L, Hooker G (2015) Quantifying uncertainty in random forests via confidence intervals and hypothesis tests. J Mach Learn Res (in press)

  • Menze BH, Kelm BM, Splitthoff DN, Koethe U, Hamprecht FA (2011) On oblique random forests. In: Gunopulos D, Hofmann T, Malerba D, Vazirgiannis M (eds) Machine learning and knowledge discovery in databases. Springer, Berlin, pp 453–469

  • Nadaraya EA (1964) On estimating regression. Theory Probab Appl 9:141–142

    Article  MATH  Google Scholar 

  • Nicodemus KK, Malley JD (2009) Predictor correlation impacts machine learning algorithms: Implications for genomic studies. Bioinformatics 25:1884–1890

    Article  Google Scholar 

  • Politis DN, Romano JP, Wolf M (1999) Subsampling. Springer, New York

    Book  MATH  Google Scholar 

  • Prasad AM, Iverson LR, Liaw A (2006) Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems 9:181–199

    Article  Google Scholar 

  • Qian SS, King RS, Richardson CJ (2003) Two statistical methods for the detection of environmental thresholds. Ecol Model 166:87–97

    Article  Google Scholar 

  • Rieger A, Hothorn T, Strobl C (2010) Random forests with missing values in the covariates. Technical Report 79, University of Munich, Munich

  • Saffari A, Leistner C, Santner J, Godec M, Bischof H (2009) On-line random forests. In: IEEE 12th international conference on computer vision workshops, pp 1393–1400

  • Schwarz DF, König IR, Ziegler A (2010) On safari to random jungle: a fast implementation of random forests for high-dimensional data. Bioinformatics 26:1752–1758

    Article  Google Scholar 

  • Scornet E (2015a) On the asymptotics of random forests. J Multivar Anal 146:72–83

  • Scornet E (2015b) Random forests and kernel methods. IEEE Trans Inform Theory 62:1485–1500

  • Scornet E, Biau G, Vert J-P (2015) Consistency of random forests. Ann Stat 43:1716–1741

    Article  MATH  MathSciNet  Google Scholar 

  • Segal MR (1988) Regression trees for censored data. Biometrics 44:35–47

    Article  MATH  Google Scholar 

  • Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A (2011) Real-time human pose recognition in parts from single depth images. In: IEEE conference on computer vision and pattern recognition, pp 1297–1304

  • Stone CJ (1977) Consistent nonparametric regression. Ann Stat 5:595–645

    Article  MATH  MathSciNet  Google Scholar 

  • Stone CJ (1980) Optimal rates of convergence for nonparametric estimators. Ann Stat 8:1348–1360

    Article  MATH  MathSciNet  Google Scholar 

  • Stone CJ (1982) Optimal global rates of convergence for nonparametric regression. Ann Stat 10:1040–1053

    Article  MATH  MathSciNet  Google Scholar 

  • Strobl C, Boulesteix A-L, Kneib T, Augustin T, Zeileis A (2008) Conditional variable importance for random forests. BMC Bioinform 9:307

    Article  Google Scholar 

  • Svetnik V, Liaw A, Tong C, Culberson JC, Sheridan RP, Feuston BP (2003) Random forest: a classification and regression tool for compound classification and QSAR modeling. J Chem Inform Comput Sci 43:1947–1958

    Article  Google Scholar 

  • Toloşi L, Lengauer T (2011) Classification with correlated features: unreliability of feature ranking and solutions. Bioinformatics 27:1986–1994

    Article  Google Scholar 

  • Truong AKY (2009) Fast growing and interpretable oblique trees via logistic regression models. PhD thesis, University of Oxford, Oxford

  • Varian H (2014) Big data: new tricks for econometrics. J Econ Perspect 28:3–28

    Article  Google Scholar 

  • Wager S (2014) Asymptotic theory for random forests. arXiv:1405.0352

  • Wager S, Hastie T, Efron B (2014) Confidence intervals for random forests: the jackknife and the infinitesimal jackknife. J Mach Learn Res 15:1625–1651

    MATH  MathSciNet  Google Scholar 

  • Watson GS (1964) Smooth regression analysis. Sankhy\(\bar{a}\) Ser A 26:359–372

  • Welbl J (2014) Casting random forests as artificial neural networks and profiting from it. In: Jiang X, Hornegger J, Koch R (eds) Pattern recognition. Springer, Berlin, pp 765–771

    Google Scholar 

  • Winham SJ, Freimuth RR, Biernacka JM (2013) A weighted random forests approach to improve predictive performance. Stat Anal Data Mining ASA Data Sci J 6:496–505

    Article  MATH  MathSciNet  Google Scholar 

  • Yan D, Chen A, Jordan MI (2013) Cluster forests. Comput Stat Data Anal 66:178–192

    Article  MathSciNet  Google Scholar 

  • Yang F, Wang J, Fan G (2010) Kernel induced random survival forests. arXiv:1008.3952

  • Yi Z, Soatto S, Dewan M, Zhan Y (2012) Information forests. In: 2012 information theory and applications workshop, pp 143–146

  • Zhu R, Zeng D, Kosorok MR (2015) Reinforcement learning trees. J Am Stat Assoc 110(512):1770–1784

  • Ziegler A, König IR (2014) Mining data with random forests: current options for real-world applications. Wiley Interdiscip Rev Data Mining Knowl Discov 4:55–63

    Article  Google Scholar 

Download references

Acknowledgments

We thank the Editors and three anonymous referees for valuable comments and insightful suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gérard Biau.

Additional information

This invited paper is discussed in comments available at: doi:10.1007/s11749-016-0482-6; doi:10.1007/s11749-016-0483-5; doi:10.1007/s11749-016-0484-4; doi:10.1007/s11749-016-0485-3; doi:10.1007/s11749-016-0487-1.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Biau, G., Scornet, E. A random forest guided tour. TEST 25, 197–227 (2016). https://doi.org/10.1007/s11749-016-0481-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11749-016-0481-7

Keywords

Mathematics Subject Classification

Navigation