Article

Free Access

A comparison of algorithms for maximum entropy parameter estimation

Author:
Robert Malouf

Alfa-Informatica, Groningen, The Netherlands

Alfa-Informatica, Groningen, The Netherlands
View Profile

COLING-02: proceedings of the 6th conference on Natural language learning - Volume 20August 2002Pages 1–7https://doi.org/10.3115/1118853.1118871

Published:31 August 2002Publication History

COLING-02: proceedings of the 6th conference on Natural language learning - Volume 20

Pages 1–7

ABSTRACT

Conditional maximum entropy (ME) models provide a general purpose machine learning technique which has been successfully applied to fields as diverse as computer vision and econometrics, and which is used for a wide variety of classification problems in natural language processing. However, the flexibility of ME models is not without cost. While parameter estimation for ME models is conceptually straightforward, in practice ME models for typical natural language tasks are very large, and may well contain many thousands of free parameters. In this paper, we consider a number of algorithms for estimating the parameters of ME models, including iterative scaling, gradient ascent, conjugate gradient, and variable metric methods. Sur-prisingly, the standardly used iterative scaling algorithms perform quite poorly in comparison to the others, and for all of the test problems, a limited-memory variable metric algorithm outperformed the other choices.

References

Steven P. Abney. 1997. Stochastic attribute-value grammars. Computational Linguistics, 23:597--618. Google ScholarDigital Library
Satish Balay, William D. Gropp, Lois Curfman McInnes, and Barry F. Smith. 1997. Efficienct management of parallelism in object oriented numerical software libraries. In E. Arge, A. M. Bruaset, and H. P. Langtangen, editors, Modern Software Tools in Scientific Computing, pages 163--202. Birkhauser Press. Google ScholarDigital Library
Satish Balay, Kris Buschelman, William D. Gropp, Dinesh Kaushik, Lois Curfman McInnes, and Barry F. Smith. 2001. PETSc home page. http://www.mcs.anl.gov/petsc.Google Scholar
Satish Balay, William D. Gropp, Lois Curfman McInnes, and Barry F. Smith. 2002. PETSc users manual. Technical Report ANL-95/11--Revision 2.1.2, Argonne National Laboratory.Google Scholar
Steven J. Benson and Jorge J. Moré. 2001. A limited memory variable metric method for bound constrained minimization. Preprint ANL/ACS-P909--0901, Argonne National Laboratory.Google Scholar
Steven J. Benson, Lois Curfman McInnes, Jorge J. Moré, and Jason Sarich. 2002. TAO users manual. Technical Report ANL/MCS-TM-242--Revision 1.4, Argonne National Laboratory.Google Scholar
Adam Berger, Stephen Della Pietra, and Vincent Della Pietra. 1996. A maximum entropy approach to natural language processing. Computational Linguistics, 22. Google ScholarDigital Library
Gosse Bouma, Gertjan van Noord, and Robert Mal-ouf. 2001. Alpino: wide coverage computational analysis of Dutch. In W. Daelemans, K. Sima'an, J. Veenstra, and J. Zavrel, editors, Computational Linguistics in the Netherlands 2000, pages 45--59. Rodolpi, Amsterdam.Google Scholar
Zhiyi Chi. 1998. Probability models for complex systems. Ph.D. thesis, Brown University. Google ScholarDigital Library
J. Darroch and D. Ratcliff. 1972. Generalized iterative scaling for log-linear models. Ann. Math. Statistics, 43:1470--1480.Google Scholar
Stephen Della Pietra, Vincent Della Pietra, and John Lafferty. 1997. Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19:380--393. Google ScholarDigital Library
W. E. Deming and F. F. Stephan. 1940. On a least squares adjustment of a sampled frequency table when the expected marginals are known. Annals of Mathematical Statistics, 11:427--444.Google ScholarCross Ref
Elizabeth D. Dolan and Jorge J. Moré. 2002. Benchmarking optimization software with performance profiles. Mathematical Programming, 91:201--213.Google Scholar
Mark Johnson, Stuart Geman, Stephen Canon, Zhiyi Chi, and Stefan Riezler. 1999. Estimators for stochastic "unification-based" grammars. In Proceedings of the 37th Annual Meeting of the ACL, pages 535--541, College Park, Maryland. Google ScholarDigital Library
John Lafferty and Bernhard Suhm. 1996. Cluster expansions and iterative scaling for maximum entropy language models. In K. Hanson and R. Silver, editors, Maximum Entropy and Bayesian Methods. Kluwer.Google Scholar
John Lafferty, Fernando Pereira, and Andrew McCallum. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In International Conference on Machine Learning (ICML). Google ScholarDigital Library
Thomas P. Minka. 2001. Algorithms for maximum-likelihood logistic regression. Statistics Tech Report 758, CMU.Google Scholar
Jorge Nocedal and Stephen J. Wright. 1999. Numerical Optimization. Springer, New York.Google Scholar
Jorge Nocedal. 1997. Large scale unconstrained optimization. In A. Watson and I. Duff, editors, The State of the Art in Numerical Analysis, pages 311--338. Oxford University Press.Google Scholar
Miles Osborne. 2002. Shallow parsing using noisy and non-stationary training material. Journal of Machine Learning Research, 2:695--719. Google ScholarDigital Library
Miles Osborne. to appear. Using maximum entropy for sentence extraction. In Proceedings of the ACL 2002 Workshop on Automatic Summarization, Philadelphia. Google ScholarDigital Library
Adwait Ratnaparkhi. 1998. Maximum entropy models for natural language ambiguity resolution. Ph.D. thesis, University of Pennsylvania. Google ScholarDigital Library
Jun Wu and Sanjeev Khudanpur. 2000. Efficient training methods for maximum entropy language modelling. In Proceedings of ICSLP2000, volume 3, pages 114--117, Beijing.Google Scholar

A comparison of algorithms for maximum entropy parameter estimation

Recommendations

MAP-MRF super-resolution image reconstruction using maximum pseudo-likelihood parameter estimation
ICIP'09: Proceedings of the 16th IEEE international conference on Image processing

In this paper, we address the parameter estimation of a Super-Resolution Image Reconstruction approach following a Maximum a Posteriori Probability (MAP) algorithm. The Generalized Isotropic Multi-Level Logistic (GIMLL) Markov Random Field (MRF) model ...
Read More
An exact forward-backward maximum likelihood autoregressive parameter estimation method

A method for obtaining an exact maximum likelihood estimate (MLE) of the autoregressive (AR) parameters is proposed. The method is called the forward-backward maximum likelihood algorithm. Based on a new form of the log likelihood function for a ...
Read More
Second-order parameter estimation

This work provides a general framework for the design of second-order blind estimators without adopting any approximation about the observation statistics or the a priori distribution of the parameters. The proposed solution is obtained minimizing the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

COLING-02: proceedings of the 6th conference on Natural language learning - Volume 20
August 2002
207 pages
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 31 August 2002
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,537of1,537submissions,100%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 141
  Total Citations
  View Citations
- 2,825
  Total Downloads
- Downloads (Last 12 months)108
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A comparison of algorithms for maximum entropy parameter estimation

COLING-02: proceedings of the 6th conference on Natural language learning - Volume 20

ABSTRACT

References

Cited By

Recommendations

MAP-MRF super-resolution image reconstruction using maximum pseudo-likelihood parameter estimation

An exact forward-backward maximum likelihood autoregressive parameter estimation method

Second-order parameter estimation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A comparison of algorithms for maximum entropy parameter estimation

COLING-02: proceedings of the 6th conference on Natural language learning - Volume 20

ABSTRACT

References

Cited By

Recommendations

MAP-MRF super-resolution image reconstruction using maximum pseudo-likelihood parameter estimation

An exact forward-backward maximum likelihood autoregressive parameter estimation method

Second-order parameter estimation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media