skip to main content
research-article
Free Access

Information Extraction: Distilling structured data from unstructured text

Published:01 November 2005Publication History
Skip Abstract Section

Abstract

In 2001 the U.S. Department of Labor was tasked with building a Web site that would help people find continuing education opportunities at community colleges, universities, and organizations across the country. The department wanted its Web site to support fielded Boolean searches over locations, dates, times, prerequisites, instructors, topic areas, and course descriptions. Ultimately it was also interested in mining its new database for patterns and educational trends. This was a major data-integration project, aiming to automatically gather detailed, structured information from tens of thousands of individual institutions every three months.

References

  1. McCallum, A., Corrada-Emanuel, A., and Wang, X. 2005. Topic and role discovery in social networks. International Joint Conferences on Artificial Intelligence. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Collins, M., and Singer, Y. 1999. Unsupervised models for named entity classification.Google ScholarGoogle Scholar
  3. Lafferty, J., McCallum, A., and Pereira, F. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Proceedings of the ICML: 282--289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Klein, D., Smarr, J., Nguyen, H., and Manning, C. 2003. Named entity recognition with character-level models. Proceedings of the Seventh Conference on Natural Language Learning. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Wang, X., Mohanty, N., and McCallum, A. 2005. Group and topic discovery from relations and text. In Workshop on Link Discovery (LinkKDD), Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bikel, D. M., Miller, S., Schwartz, R., and Weischedel, R. 1997. Nymble: A high-performance learning name-finder. Proceedings of ANLP: 194--201. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. McCallum, A., and Jensen, D. 2003. A note on the unification of information extraction and data mining using conditional-probability, relational models. IJCAI Workshop on Learning Statistical Models from Relational Data.Google ScholarGoogle Scholar
  8. Lawrence, S., Giles, C. L., and Bollacker, K. 1999. Digital libraries and autonomous citation indexing. IEEE Computer 32(6): 67--71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Soderland, S., and Lehnert, W. G. 1994. Corpus-driven knowledge acquisition for discourse analysis. AAAI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Kleinberg, J. 2002. Bursty and hierarchical structure in streams. ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. See reference 5.Google ScholarGoogle Scholar
  12. Carvalho, V. R., and Cohen, W. W. 2004. Learning to extract signature and reply lines from e-mail. Conference on E-mail and Spam (CEAS).Google ScholarGoogle Scholar
  13. Califf, M. E., and Mooney, R. 1999. Relational learning of pattern-match rules for information extraction. Proceedings of the National Conference on Artificial Intelligence. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. See reference 6.Google ScholarGoogle Scholar
  15. See reference 4.Google ScholarGoogle Scholar
  16. See reference 7.Google ScholarGoogle Scholar
  17. See reference 8.Google ScholarGoogle Scholar
  18. Freitag, D., and McCallum, A. K. 1999. Information extraction with HMMs and shrinkage. Proceedings of the AAAI Workshop on Machine Learning for Information Extraction.Google ScholarGoogle Scholar
  19. Roth, D., and Yih, W. 2002. Probabilistic reasoning for entity and relation recognition. COLING. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. See reference 1.Google ScholarGoogle Scholar
  21. See reference 3.Google ScholarGoogle Scholar
  22. Nahm, U. Y., and Mooney, R. J. 2000. A mutually beneficial integration of data mining and information extraction. AAAI/IAAI: 627--632. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. See reference 9.Google ScholarGoogle Scholar
  24. Culotta, A., and Sorensen, J. 2004. Dependency tree kernels for relation extraction. Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Ando, R. K., and Zhang, T. 2005. A high-performance semi-supervised learning method for text chunking. Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. See reference 3.Google ScholarGoogle Scholar
  27. McCallum, A., Freitag, D., and Pereira, F. 2000. Maximum entropy Markov models for information extraction and segmentation. Proceedings of ICML: 591--598. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Wellner, B., McCallum, A., Peng, F., and Hay, M. 2004. An integrated, conditional model of information extraction and co-reference with application to citation matching. Conference on Uncertainty in Artificial Intelligence (UAI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Kristjannson, T., Culotta, A., Viola, P., and McCallum, A. 2004. Interactive information extraction with conditional random fields. Nineteenth National Conference on Artificial Intelligence. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Information Extraction: Distilling structured data from unstructured text

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image Queue
            Queue  Volume 3, Issue 9
            Social Computing
            November 2005
            48 pages
            ISSN:1542-7730
            EISSN:1542-7749
            DOI:10.1145/1105664
            Issue’s Table of Contents

            Copyright © 2005 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 1 November 2005

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Popular
            • Editor picked

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format