A Principal Components Approach to Combining Regression Estimates

Merz, Christopher J.; Pazzani, Michael J.

doi:10.1023/A:1007507221352

A Principal Components Approach to Combining Regression Estimates

Published: July 1999

Volume 36, pages 9–32, (1999)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

A Principal Components Approach to Combining Regression Estimates

Download PDF

Christopher J. Merz¹ &
Michael J. Pazzani¹

2252 Accesses
71 Citations
3 Altmetric
Explore all metrics

Abstract

The goal of combining the predictions of multiple learned models is to form an improved estimator. A combining strategy must be able to robustly handle the inherent correlation, or multicollinearity, of the learned models while identifying the unique contributions of each. A progression of existing approaches and their limitations with respect to these two issues are discussed. A new approach, PCR*, based on principal components regression is proposed to address these limitations. An evaluation of the new approach on a collection of domains reveals that (1) PCR* was the most robust combining method, (2) correlation could be handled without eliminating any of the learned models, and (3) the principal components of the learned models provided a continuum of “regularized” weights from which PCR* could choose.

References

Breiman, L. (1994). Heuristics of instability in model selection (Tech. rep.). Department of Statistics, University of California at Berkeley.
Google Scholar
Breiman, L. (1996). Stacked regressions. Machine Learning, 24(1), 49–64.
Google Scholar
Draper, N., & Smith, H. (1981). Applied regression analysis. John Wiley and Sons.
Freund, Y., & Schapire, R.E. (1995). A decision-theoretic generalization of on-line learning and an application to boosting. Proceedings of the Second European Conference on Computational Learning Theory (pp. 23–37). Springer-Verlag.
Freund, Y., & Schapire, R.E. (1996). Experiments with a new boosting algorithm. Proceedings of the 13th International Conference on Machine Learning, Morgan Kaufmann.
Freund, Y., Seung, H.S., Shamir, E., & Tishby, N. (1993). Information, prediction, and query by committee. In S.J. Hanson, J.D. Cowan, & C.L. Giles (Eds.), Advances in neural information processing systems (Vol. 5, pp. 483–490). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Friedman, J.H. (1991). Multivariate adaptive regression splines. Annal of Statistics, 19, 1–141.
Google Scholar
Geman, S., Bienenstock, E., & Doursat, R. (1992). Neural networks and the bias/variance dilemma. Neural Computation, 4(1), 1–58.
Google Scholar
Hashem, S., & Schmeiser, B. (1995). Improving model accuracy using optimal linear combinations of trained neural networks. IEEE Transactions on Neural Networks, 6(3), 792–794.
Google Scholar
Jacobs, R.A., Jordan, M.I., Nowlan, S.J., & Hinton, G.E. (1991). Adaptive mixtures of local experts. Neural Computation, 3(1), 79–87.
Google Scholar
Jordan, M.I., & Jacobs, R.A. (1994). Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6, 181–214.
Google Scholar
Kivinen, J., & Warmuth, M. (1997). Exponentiated gradient descent versus gradient descent for linear predictors. Information and Computation, 132(1), 1–63.
Google Scholar
Kong, E.B., & Dietterich, T.G. (1995). Error-correcting output coding corrects bias and variance. Proceedings of the 12th International Conference on Machine Learning (pp. 313–321). Morgan Kaufmann.
Krogh, A., & Vedelsby, J. (1995). Neural network ensembles, cross validation, and active learning. In G. Tesauro, D. Touretzky, & T. Leen (Eds.), Advances in neural information processing systems (Vol. 7, pp. 231–238). The MIT Press.
Kubinyi, H. (1997). The QSAR and modelling society home page.
Lawson, J., & Hanson, R. (1974). Solving least squares problems. New Jersey: Prentice-Hall.
Google Scholar
Leblanc, M., & Tibshirani, R. (1993). Combining estimates in regression and classification (Tech. rep.). Department of Statistics, University of Toronto.
Levin, A.U., Leen, T.K., & Moody, J.E. (1994). Fast pruning using principal components. In J. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in neural information processing systems (Vol. 6). Morgan Kaufmann.
Meir, R. (1995). Bias, variance and the combination of least squares estimators. In G. Tesauro, D. Touretzky, & T. Leen (Eds.), Advances in neural information processing systems (Vol. 7, pp. 295–302). MIT Press.
Merz, C. (1998). Classification and regression by combining models. Ph.D. thesis, University of California, Irvine.
Google Scholar
Merz, C., & Murphy, P. (1996). UCI repository of machine learning databases. http://www.ics.uci.edu/ mlearn/MLRepository.html.
Meyer, M. (1997). The CMU statlib home page.
Montgomery, D., & Friedman, D. (1993). Prediction using regression models with multicollinear predictor variables. IIE Transactions, 25(3), 73–85.
Google Scholar
Opitz, D.W., & Shavlik, J.W. (1996). Generating accurate and diverse members of a neural-network ensemble. In D.S. Touretzky, M.C. Mozer, & M.E. Hasselmo (Eds.), Advances in neural information processing systems (Vol. 8, pp. 535–541). MIT Press.
Perrone, M.P., & Cooper, L.N. (1993). When networks disagree: Ensemble methods for hybrid neural networks. In R.J. Mammone (Ed.), Artificial neural networks for speech and vision (pp. 126–142). London: Chapman & Hall.
Google Scholar
Press, W.H. (1992). Numerical recipes in C: The art of scientific computing (pp. 59–70). Cambridge University Press.
Quinlan, J.R. (1996). Bagging, boosting, and C4.5. Proceedings of theFourteenth National Conference on Artificial Intelligence.
Rumelhart, D.E., Hinton, G.E., & Williams, R.J. (1986). Learning internal representations by error propagation. In D.E. Rumelhart, J.L. McClelland, & the PDP research group (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition (Vol. 1: Foundations). MIT Press.
Schapire, R.E. (1990). The strength of weak learnability. Machine Learning, 5(2), 197–227.
Google Scholar
Seung, H.S., Opper, M., & Sompolinsky, H. (1992). Query by committee. In D. Haussler (Ed.), Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory (pp. 287–294). ACM Press.
Tresp, V., & Taniguchi, M. (1995). Combining estimators using non-constant weighting functions. In G. Tesauro, D. Touretzky, & T. Leen (Eds.), Advances in neural information processing systems (Vol. 7, pp. 419–426). MIT Press.
Wolpert, D.H. (1992). Stacked generalization. Neural Networks, 5, 241–259.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information and Computer Science, University of California, Irvine, CA, 92697-3425
Christopher J. Merz & Michael J. Pazzani

Authors

Christopher J. Merz
View author publications
You can also search for this author in PubMed Google Scholar
Michael J. Pazzani
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Merz, C.J., Pazzani, M.J. A Principal Components Approach to Combining Regression Estimates. Machine Learning 36, 9–32 (1999). https://doi.org/10.1023/A:1007507221352

Download citation

Issue Date: July 1999
DOI: https://doi.org/10.1023/A:1007507221352

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A Principal Components Approach to Combining Regression Estimates

Abstract

Article PDF

Similar content being viewed by others

Correlated Component Regression: Re-thinking Regression in the Presence of Near Collinearity

Factor Analysis Regression for Predictive Modeling with High-Dimensional Data

Regression Analysis and Its Development

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

A Principal Components Approach to Combining Regression Estimates

Abstract

Article PDF

Similar content being viewed by others

Correlated Component Regression: Re-thinking Regression in the Presence of Near Collinearity

Factor Analysis Regression for Predictive Modeling with High-Dimensional Data

Regression Analysis and Its Development

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation