Article

Free Access

Memory-based one-step named-entity recognition: effects of seed list features, classifier stacking, and unannotated data

Authors:
Iris Hendrickx

Tilburg University, The Netherlands

Tilburg University, The Netherlands
View Profile

,
Antal van den Bosch

Tilburg University, The Netherlands

Tilburg University, The Netherlands
View Profile

CONLL '03: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4May 2003Pages 176–179https://doi.org/10.3115/1119176.1119203

Published:31 May 2003Publication History

CONLL '03: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4

Pages 176–179

ABSTRACT

We present a memory-based named-entity recognition system that chunks and labels named entities in a oneshot task. Training and testing on CoNLL-2003 shared task data, we measure the effects of three extensions. First, we incorporate features that signal the presence of wordforms in external, language-specific seed (gazetteer) lists. Second, we build a second-stage stacked classifier that corrects first-stage output errors. Third, we add selected instances from classified unannotated data to the training material. The system that incorporates all attains an overall F-rate on the final test set of 78.20 on English and 63.02 on German.

References

R. H. Baayen, R. Piepenbrock, and H. van Rijn. 1993. The CELEX lexical data base on CD-ROM. Linguistic Data Consortium, Philadelphia, PA.Google Scholar
S. Buchholz and A. Van den Bosch. 2000. Integrating seed names and n-grams for a named entity list and classifier. In LREC-2000 (Second International Conference on Language Resources and Evaluation) Proceedings. Vol. II, pages 1215--1221.Google Scholar
X. Carreras, L. Marques, and L. Padro. 2002. Named entity extraction using AdaBoost. In Proceedings of CoNLL-2002, pages 167--170. Google ScholarDigital Library
S. Cucerzan and D. Yarowsky. 1999. Language independent named entity recognition combining morphological and contextual evidence. In Proceedings of 1999 Joint SIGDAT Conference on EMNLP and VLC.Google Scholar
W. Daelemans, J. Zavrel, K. Van der Sloot, and A. Van den Bosch. 2002. TiMBL: Tilburg Memory Based Learner, version 4.3, reference guide. Technical Report ILK-0210, ILK, Tilburg University.Google Scholar
R. Kohavi and G. John. 1997. Wrappers for feature subset selection. Artificial Intelligence Journal, 97(1--2):273--324. Google ScholarDigital Library
F. Provost, D. Jensen, and T. Oates. 1999. Efficient progressive sampling. In Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining, pages 23--32. Google ScholarDigital Library
Erik F. Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the conll-2003 shared task: Language-independent named entity recognition. In Proceedings of CoNLL-2003. Edmonton, Canada. Google ScholarDigital Library
A. Van den Bosch and W. Daelemans. 1999. Memory-based morphological analysis. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, pages 285--292, New Brunswick, NJ. ACL. Google ScholarDigital Library
J. Veenstra. 1998. Fast np chunking using memory-based learning techniques. In Proceedings of Bene-learn 1998, pages 71--79.Google Scholar
D. H. Wolpert. 1992. On overfitting avoidance as bias. Technical Report SFI TR 92-03-5001, The Santa Fe Institute.Google Scholar
D. Yarowsky. 1995. Unsupervised word sense disambiguation rivaling supervised methods. In ACL33, pages 189--196, Cambridge, MA. Google ScholarDigital Library

Recommendations

Learning multilingual named entity recognition from Wikipedia

We automatically create enormous, free and multilingual silver-standard training annotations for named entity recognition (ner) by exploiting the text and structure of Wikipedia. Most ner systems rely on statistical models of annotated data to identify ...
Read More
Named entity recognition in Wikipedia
People's Web '09: Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources

Named entity recognition (NER) is used in many domains beyond the newswire text that comprises current gold-standard corpora. Recent work has used Wikipedia's link structure to automatically generate near gold-standard annotations. Until now, these ...
Read More
Unsupervised biomedical named entity recognition

Display Omitted BM-NER is approached by an unsupervised stepwise method.Noun phrase chunking is a good approximation of boundary detection.Distributional semantics works well in classifying entities.The system performs well on clinical and biological ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CONLL '03: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
May 2003
213 pages
Conference Chairs:
Walter Daelemans
University of Antwerp and Tilburg University
,
Miles Osborne
University of Edinburgh
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 31 May 2003
Qualifiers
- Article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 237
  Total Downloads
- Downloads (Last 12 months)23
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Memory-based one-step named-entity recognition: effects of seed list features, classifier stacking, and unannotated data

CONLL '03: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4

ABSTRACT

References

Cited By

Recommendations

Learning multilingual named entity recognition from Wikipedia

Named entity recognition in Wikipedia

Unsupervised biomedical named entity recognition

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Memory-based one-step named-entity recognition: effects of seed list features, classifier stacking, and unannotated data

CONLL '03: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4

ABSTRACT

References

Cited By

Recommendations

Learning multilingual named entity recognition from Wikipedia

Named entity recognition in Wikipedia

Unsupervised biomedical named entity recognition

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media