ABSTRACT
We present a memory-based named-entity recognition system that chunks and labels named entities in a oneshot task. Training and testing on CoNLL-2003 shared task data, we measure the effects of three extensions. First, we incorporate features that signal the presence of wordforms in external, language-specific seed (gazetteer) lists. Second, we build a second-stage stacked classifier that corrects first-stage output errors. Third, we add selected instances from classified unannotated data to the training material. The system that incorporates all attains an overall F-rate on the final test set of 78.20 on English and 63.02 on German.
- R. H. Baayen, R. Piepenbrock, and H. van Rijn. 1993. The CELEX lexical data base on CD-ROM. Linguistic Data Consortium, Philadelphia, PA.Google Scholar
- S. Buchholz and A. Van den Bosch. 2000. Integrating seed names and n-grams for a named entity list and classifier. In LREC-2000 (Second International Conference on Language Resources and Evaluation) Proceedings. Vol. II, pages 1215--1221.Google Scholar
- X. Carreras, L. Marques, and L. Padro. 2002. Named entity extraction using AdaBoost. In Proceedings of CoNLL-2002, pages 167--170. Google ScholarDigital Library
- S. Cucerzan and D. Yarowsky. 1999. Language independent named entity recognition combining morphological and contextual evidence. In Proceedings of 1999 Joint SIGDAT Conference on EMNLP and VLC.Google Scholar
- W. Daelemans, J. Zavrel, K. Van der Sloot, and A. Van den Bosch. 2002. TiMBL: Tilburg Memory Based Learner, version 4.3, reference guide. Technical Report ILK-0210, ILK, Tilburg University.Google Scholar
- R. Kohavi and G. John. 1997. Wrappers for feature subset selection. Artificial Intelligence Journal, 97(1--2):273--324. Google ScholarDigital Library
- F. Provost, D. Jensen, and T. Oates. 1999. Efficient progressive sampling. In Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining, pages 23--32. Google ScholarDigital Library
- Erik F. Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the conll-2003 shared task: Language-independent named entity recognition. In Proceedings of CoNLL-2003. Edmonton, Canada. Google ScholarDigital Library
- A. Van den Bosch and W. Daelemans. 1999. Memory-based morphological analysis. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, pages 285--292, New Brunswick, NJ. ACL. Google ScholarDigital Library
- J. Veenstra. 1998. Fast np chunking using memory-based learning techniques. In Proceedings of Bene-learn 1998, pages 71--79.Google Scholar
- D. H. Wolpert. 1992. On overfitting avoidance as bias. Technical Report SFI TR 92-03-5001, The Santa Fe Institute.Google Scholar
- D. Yarowsky. 1995. Unsupervised word sense disambiguation rivaling supervised methods. In ACL33, pages 189--196, Cambridge, MA. Google ScholarDigital Library
Recommendations
Learning multilingual named entity recognition from Wikipedia
We automatically create enormous, free and multilingual silver-standard training annotations for named entity recognition (ner) by exploiting the text and structure of Wikipedia. Most ner systems rely on statistical models of annotated data to identify ...
Named entity recognition in Wikipedia
People's Web '09: Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic ResourcesNamed entity recognition (NER) is used in many domains beyond the newswire text that comprises current gold-standard corpora. Recent work has used Wikipedia's link structure to automatically generate near gold-standard annotations. Until now, these ...
Unsupervised biomedical named entity recognition
Display Omitted BM-NER is approached by an unsupervised stepwise method.Noun phrase chunking is a good approximation of boundary detection.Distributional semantics works well in classifying entities.The system performs well on clinical and biological ...
Comments