skip to main content
10.3115/1119176.1119203dlproceedingsArticle/Chapter ViewAbstractPublication PagesconllConference Proceedingsconference-collections
Article
Free Access

Memory-based one-step named-entity recognition: effects of seed list features, classifier stacking, and unannotated data

Published:31 May 2003Publication History

ABSTRACT

We present a memory-based named-entity recognition system that chunks and labels named entities in a oneshot task. Training and testing on CoNLL-2003 shared task data, we measure the effects of three extensions. First, we incorporate features that signal the presence of wordforms in external, language-specific seed (gazetteer) lists. Second, we build a second-stage stacked classifier that corrects first-stage output errors. Third, we add selected instances from classified unannotated data to the training material. The system that incorporates all attains an overall F-rate on the final test set of 78.20 on English and 63.02 on German.

References

  1. R. H. Baayen, R. Piepenbrock, and H. van Rijn. 1993. The CELEX lexical data base on CD-ROM. Linguistic Data Consortium, Philadelphia, PA.Google ScholarGoogle Scholar
  2. S. Buchholz and A. Van den Bosch. 2000. Integrating seed names and n-grams for a named entity list and classifier. In LREC-2000 (Second International Conference on Language Resources and Evaluation) Proceedings. Vol. II, pages 1215--1221.Google ScholarGoogle Scholar
  3. X. Carreras, L. Marques, and L. Padro. 2002. Named entity extraction using AdaBoost. In Proceedings of CoNLL-2002, pages 167--170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Cucerzan and D. Yarowsky. 1999. Language independent named entity recognition combining morphological and contextual evidence. In Proceedings of 1999 Joint SIGDAT Conference on EMNLP and VLC.Google ScholarGoogle Scholar
  5. W. Daelemans, J. Zavrel, K. Van der Sloot, and A. Van den Bosch. 2002. TiMBL: Tilburg Memory Based Learner, version 4.3, reference guide. Technical Report ILK-0210, ILK, Tilburg University.Google ScholarGoogle Scholar
  6. R. Kohavi and G. John. 1997. Wrappers for feature subset selection. Artificial Intelligence Journal, 97(1--2):273--324. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. F. Provost, D. Jensen, and T. Oates. 1999. Efficient progressive sampling. In Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining, pages 23--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Erik F. Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the conll-2003 shared task: Language-independent named entity recognition. In Proceedings of CoNLL-2003. Edmonton, Canada. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Van den Bosch and W. Daelemans. 1999. Memory-based morphological analysis. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, pages 285--292, New Brunswick, NJ. ACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Veenstra. 1998. Fast np chunking using memory-based learning techniques. In Proceedings of Bene-learn 1998, pages 71--79.Google ScholarGoogle Scholar
  11. D. H. Wolpert. 1992. On overfitting avoidance as bias. Technical Report SFI TR 92-03-5001, The Santa Fe Institute.Google ScholarGoogle Scholar
  12. D. Yarowsky. 1995. Unsupervised word sense disambiguation rivaling supervised methods. In ACL33, pages 189--196, Cambridge, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image DL Hosted proceedings
    CONLL '03: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
    May 2003
    213 pages

    Publisher

    Association for Computational Linguistics

    United States

    Publication History

    • Published: 31 May 2003

    Qualifiers

    • Article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader