Abstract
The goal of combining the predictions of multiple learned models is to form an improved estimator. A combining strategy must be able to robustly handle the inherent correlation, or multicollinearity, of the learned models while identifying the unique contributions of each. A progression of existing approaches and their limitations with respect to these two issues are discussed. A new approach, PCR*, based on principal components regression is proposed to address these limitations. An evaluation of the new approach on a collection of domains reveals that (1) PCR* was the most robust combining method, (2) correlation could be handled without eliminating any of the learned models, and (3) the principal components of the learned models provided a continuum of “regularized” weights from which PCR* could choose.
Article PDF
Similar content being viewed by others
References
Breiman, L. (1994). Heuristics of instability in model selection (Tech. rep.). Department of Statistics, University of California at Berkeley.
Breiman, L. (1996). Stacked regressions. Machine Learning, 24(1), 49–64.
Draper, N., & Smith, H. (1981). Applied regression analysis. John Wiley and Sons.
Freund, Y., & Schapire, R.E. (1995). A decision-theoretic generalization of on-line learning and an application to boosting. Proceedings of the Second European Conference on Computational Learning Theory (pp. 23–37). Springer-Verlag.
Freund, Y., & Schapire, R.E. (1996). Experiments with a new boosting algorithm. Proceedings of the 13th International Conference on Machine Learning, Morgan Kaufmann.
Freund, Y., Seung, H.S., Shamir, E., & Tishby, N. (1993). Information, prediction, and query by committee. In S.J. Hanson, J.D. Cowan, & C.L. Giles (Eds.), Advances in neural information processing systems (Vol. 5, pp. 483–490). San Mateo, CA: Morgan Kaufmann.
Friedman, J.H. (1991). Multivariate adaptive regression splines. Annal of Statistics, 19, 1–141.
Geman, S., Bienenstock, E., & Doursat, R. (1992). Neural networks and the bias/variance dilemma. Neural Computation, 4(1), 1–58.
Hashem, S., & Schmeiser, B. (1995). Improving model accuracy using optimal linear combinations of trained neural networks. IEEE Transactions on Neural Networks, 6(3), 792–794.
Jacobs, R.A., Jordan, M.I., Nowlan, S.J., & Hinton, G.E. (1991). Adaptive mixtures of local experts. Neural Computation, 3(1), 79–87.
Jordan, M.I., & Jacobs, R.A. (1994). Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6, 181–214.
Kivinen, J., & Warmuth, M. (1997). Exponentiated gradient descent versus gradient descent for linear predictors. Information and Computation, 132(1), 1–63.
Kong, E.B., & Dietterich, T.G. (1995). Error-correcting output coding corrects bias and variance. Proceedings of the 12th International Conference on Machine Learning (pp. 313–321). Morgan Kaufmann.
Krogh, A., & Vedelsby, J. (1995). Neural network ensembles, cross validation, and active learning. In G. Tesauro, D. Touretzky, & T. Leen (Eds.), Advances in neural information processing systems (Vol. 7, pp. 231–238). The MIT Press.
Kubinyi, H. (1997). The QSAR and modelling society home page.
Lawson, J., & Hanson, R. (1974). Solving least squares problems. New Jersey: Prentice-Hall.
Leblanc, M., & Tibshirani, R. (1993). Combining estimates in regression and classification (Tech. rep.). Department of Statistics, University of Toronto.
Levin, A.U., Leen, T.K., & Moody, J.E. (1994). Fast pruning using principal components. In J. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in neural information processing systems (Vol. 6). Morgan Kaufmann.
Meir, R. (1995). Bias, variance and the combination of least squares estimators. In G. Tesauro, D. Touretzky, & T. Leen (Eds.), Advances in neural information processing systems (Vol. 7, pp. 295–302). MIT Press.
Merz, C. (1998). Classification and regression by combining models. Ph.D. thesis, University of California, Irvine.
Merz, C., & Murphy, P. (1996). UCI repository of machine learning databases. http://www.ics.uci.edu/ mlearn/MLRepository.html.
Meyer, M. (1997). The CMU statlib home page.
Montgomery, D., & Friedman, D. (1993). Prediction using regression models with multicollinear predictor variables. IIE Transactions, 25(3), 73–85.
Opitz, D.W., & Shavlik, J.W. (1996). Generating accurate and diverse members of a neural-network ensemble. In D.S. Touretzky, M.C. Mozer, & M.E. Hasselmo (Eds.), Advances in neural information processing systems (Vol. 8, pp. 535–541). MIT Press.
Perrone, M.P., & Cooper, L.N. (1993). When networks disagree: Ensemble methods for hybrid neural networks. In R.J. Mammone (Ed.), Artificial neural networks for speech and vision (pp. 126–142). London: Chapman & Hall.
Press, W.H. (1992). Numerical recipes in C: The art of scientific computing (pp. 59–70). Cambridge University Press.
Quinlan, J.R. (1996). Bagging, boosting, and C4.5. Proceedings of theFourteenth National Conference on Artificial Intelligence.
Rumelhart, D.E., Hinton, G.E., & Williams, R.J. (1986). Learning internal representations by error propagation. In D.E. Rumelhart, J.L. McClelland, & the PDP research group (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition (Vol. 1: Foundations). MIT Press.
Schapire, R.E. (1990). The strength of weak learnability. Machine Learning, 5(2), 197–227.
Seung, H.S., Opper, M., & Sompolinsky, H. (1992). Query by committee. In D. Haussler (Ed.), Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory (pp. 287–294). ACM Press.
Tresp, V., & Taniguchi, M. (1995). Combining estimators using non-constant weighting functions. In G. Tesauro, D. Touretzky, & T. Leen (Eds.), Advances in neural information processing systems (Vol. 7, pp. 419–426). MIT Press.
Wolpert, D.H. (1992). Stacked generalization. Neural Networks, 5, 241–259.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Merz, C.J., Pazzani, M.J. A Principal Components Approach to Combining Regression Estimates. Machine Learning 36, 9–32 (1999). https://doi.org/10.1023/A:1007507221352
Issue Date:
DOI: https://doi.org/10.1023/A:1007507221352