research-article

Free Access

Design challenges and misconceptions in named entity recognition

Authors:
Lev Ratinov

University of Illinois, Urbana, IL

University of Illinois, Urbana, IL
View Profile

,
Dan Roth

University of Illinois, Urbana, IL

University of Illinois, Urbana, IL
View Profile

Authors Info & Claims

CoNLL '09: Proceedings of the Thirteenth Conference on Computational Natural Language LearningJune 2009Pages 147–155

Published:04 June 2009Publication History

CoNLL '09: Proceedings of the Thirteenth Conference on Computational Natural Language Learning

Pages 147–155

ABSTRACT

We analyze some of the fundamental design challenges and misconceptions that underlie the development of an efficient and robust NER system. In particular, we address issues such as the representation of text chunks, the inference approach needed to combine local NER decisions, the sources of prior knowledge and how to use them within an NER system. In the process of comparing several solutions to these challenges we reach some surprising conclusions, as well as develop an NER system that achieves 90.8 F₁ score on the CoNLL-2003 NER shared task, the best reported result for this dataset.

References

R. K. Ando and T. Zhang. 2005. A high-performance semi-supervised learning method for text chunking. In ACL. Google ScholarDigital Library
P. F. Brown, P. V. deSouza, R. L. Mercer, V. J. D. Pietra, and J. C. Lai. 1992. Class-based n-gram models of natural language. Computational Linguistics, 18(4):467--479. Google ScholarDigital Library
X. Carreras, L. Màrquez, and L. Padró. 2003. Learning a perceptron-based named entity chunker via online recognition feedback. In CoNLL. Google ScholarDigital Library
H. Chieu and H. T. Ng. 2003. Named entity recognition with a maximum entropy approach. In Proceedings of CoNLL. Google ScholarDigital Library
W. W. Cohen. 2004. Exploiting dictionaries in named entity extraction: Combining semi-markov extraction processes and data integration methods. In KDD. Google ScholarDigital Library
M. Collins. 2002. Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. In EMNLP. Google ScholarDigital Library
L. Edward. 2007. Finding good sequential model structures using output transformations. In EMNLP).Google Scholar
O. Etzioni, M. J. Cafarella, D. Downey, A. Popescu, T. Shaked, S. Soderland, D. S. Weld, and A. Yates. 2005. Unsupervised named-entity extraction from the web: An experimental study. Artificial Intelligence, 165(1):91--134. Google ScholarDigital Library
J. R. Finkel, T. Grenager, and C. D. Manning. 2005. Incorporating non-local information into information extraction systems by gibbs sampling. In ACL. Google ScholarDigital Library
R. Florian, A. Ittycheriah, H. Jing, and T. Zhang. 2003. Named entity recognition through classifier combination. In CoNLL. Google ScholarDigital Library
Y. Freund and R. Schapire. 1999. Large margin classification using the perceptron algorithm. Machine Learning, 37(3):277--296. Google ScholarDigital Library
J. Kazama and K. Torisawa. 2007a. Exploiting wikipedia as external knowledge for named entity recognition. In EMNLP.Google Scholar
J. Kazama and K. Torisawa. 2007b. A new perceptron algorithm for sequence labeling with non-local features. In EMNLP-CoNLL.Google Scholar
T. Koo, X. Carreras, and M. Collins. 2008. Simple semi-supervised dependency parsing. In ACL.Google Scholar
V. Krishnan and C. D. Manning. 2006. An effective two-stage model for exploiting non-local dependencies in named entity recognition. In ACL. Google ScholarDigital Library
J. Lafferty, A. McCallum, and F. Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML. Morgan Kaufmann. Google ScholarDigital Library
P. Liang. 2005. Semi-supervised learning for natural language. Masters thesis, Massachusetts Institute of Technology.Google Scholar
S. Miller, J. Guinness, and A. Zamanian. 2004. Name tagging with word clusters and discriminative training. In HLT-NAACL.Google Scholar
A. Molina and F. Pla. 2002. Shallow parsing using specialized hmms. The Journal of Machine Learning Research, 2:595--613. Google ScholarDigital Library
A. Niculescu-Mizil and R. Caruana. 2005. Predicting good probabilities with supervised learning. In ICML. Google ScholarDigital Library
V. Punyakanok and D. Roth. 2001. The use of classifiers in sequential inference. In NIPS.Google Scholar
L. R. Rabiner. 1989. A tutorial on hidden markov models and selected applications in speech recognition. In IEEE.Google Scholar
E. Riloff and R. Jones. 1999. Learning dictionaries for information extraction by multi-level bootstrapping. In AAAI. Google ScholarDigital Library
N. Rizzolo and D. Roth. 2007. Modeling discriminative global inference. In ICSC. Google ScholarDigital Library
D. Roth and D. Zelenko. 1998. Part of speech tagging using a network of linear separators. In COLING-ACL. Google ScholarDigital Library
H. Shen and A. Sarkar. 2005. Voting between multiple data representations for text chunking. Advances in Artificial Intelligence, pages 389--400. Google ScholarDigital Library
J. Suzuki and H. Isozaki. 2008. Semi-supervised sequential labeling and segmentation using giga-word scale unlabeled data. In ACL.Google Scholar
E. Tjong, K. and F. De Meulder. 2003. Introduction to the conll-2003 shared task: Language-independent named entity recognition. In CoNLL. Google ScholarDigital Library
A. Toral and R. Munoz. 2006. A proposal to automatically build and maintain gazetteers for named entity recognition by using wikipedia. In EACL.Google Scholar
K. Toutanova, D. Klein, C. Manning, and Y. Singer. 2003. Feature-rich part-of-speech tagging with a cyclic dependency network. In NAACL. Google ScholarDigital Library
J. Veenstra. 1999. Representing text chunks. In EACL.Google Scholar
T. Zhang and D. Johnson. 2003. A robust risk minimization based named entity recognition system. In CoNLL. Google ScholarDigital Library

Index Terms

Design challenges and misconceptions in named entity recognition
1. Computing methodologies
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory
      1. Inductive inference

Recommendations

Named entity recognition in Wikipedia
People's Web '09: Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources

Named entity recognition (NER) is used in many domains beyond the newswire text that comprises current gold-standard corpora. Recent work has used Wikipedia's link structure to automatically generate near gold-standard annotations. Until now, these ...
Read More
Exploring entity relations for named entity disambiguation
HLT-SS '11: Proceedings of the ACL 2011 Student Session

Named entity disambiguation is the task of linking an entity mention in a text to the correct real-world referent predefined in a knowledge base, and is a crucial subtask in many areas like information retrieval or topic detection and tracking. Named ...
Read More
A joint named entity recognition and entity linking system
HYBRID '12: Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data

We present a joint system for named entity recognition (NER) and entity linking (EL), allowing for named entities mentions extracted from textual data to be matched to uniquely identifiable entities. Our approach relies on combined NER modules which ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CoNLL '09: Proceedings of the Thirteenth Conference on Computational Natural Language Learning
June 2009
243 pages
ISBN:9781932432299
Conference Chairs:
Suzanne Stevenson
University of Toronto
,
Xavier Carreras
MIT
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 4 June 2009
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 149
  Total Citations
  View Citations
- 5,186
  Total Downloads
- Downloads (Last 12 months)208
- Downloads (Last 6 weeks)13
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Design challenges and misconceptions in named entity recognition

CoNLL '09: Proceedings of the Thirteenth Conference on Computational Natural Language Learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

Named entity recognition in Wikipedia

Exploring entity relations for named entity disambiguation

A joint named entity recognition and entity linking system

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Design challenges and misconceptions in named entity recognition

CoNLL '09: Proceedings of the Thirteenth Conference on Computational Natural Language Learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

Named entity recognition in Wikipedia

Exploring entity relations for named entity disambiguation

A joint named entity recognition and entity linking system

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media