skip to main content
article

Cumulated gain-based evaluation of IR techniques

Published:01 October 2002Publication History
Skip Abstract Section

Abstract

Modern large retrieval environments tend to overwhelm their users by their large output. Since all documents are not of equal relevance to their users, highly relevant documents should be identified and ranked first for presentation. In order to develop IR techniques in this direction, it is necessary to develop evaluation approaches and methods that credit IR methods for their ability to retrieve highly relevant documents. This can be done by extending traditional evaluation methods, that is, recall and precision based on binary relevance judgments, to graded relevance judgments. Alternatively, novel measures based on graded relevance judgments may be developed. This article proposes several novel measures that compute the cumulative gain the user obtains by examining the retrieval result up to a given ranked position. The first one accumulates the relevance scores of retrieved documents along the ranked result list. The second one is similar but applies a discount factor to the relevance scores in order to devaluate late-retrieved documents. The third one computes the relative-to-the-ideal performance of IR techniques, based on the cumulative gain they are able to yield. These novel measures are defined and discussed and their use is demonstrated in a case study using TREC data: sample system run results for 20 queries in TREC-7. As a relevance base we used novel graded relevance judgments on a four-point scale. The test results indicate that the proposed measures credit IR methods for their ability to retrieve highly relevant documents and allow testing of statistical significance of effectiveness differences. The graphs based on the measures also provide insight into the performance IR techniques and allow interpretation, for example, from the user point of view.

References

  1. Blair, D. C. and Maron, M. E. 1985. An evaluation of retrieval effectiveness for a full-text document-retrieval system. Commun. ACM 28, 3, 289--299. Google ScholarGoogle Scholar
  2. Borlund, P. 2000. Evaluation of interactive information retrieval systems. PhD Dissertation. Åbo University Press.Google ScholarGoogle Scholar
  3. Borlund, P. and Ingwersen, P. 1998. Measures of relative relevance and ranked half-life: Performance indicators for interactive IR. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, W. B. Croft, A. Moffat, C. J. van Rijsbergen, R. Wilkinson, and J. Zobel, Eds., ACM, New York, 324--331. Google ScholarGoogle Scholar
  4. Conover, W. J. 1980. Practical Nonparametric Statistics, 2nd ed., Wiley, New York.Google ScholarGoogle Scholar
  5. Cooper, W. S. 1968. Expected search length: A single measure of retrieval effectiveness based on weak ordering action of retrieval systems. J. Am. Soc. Inf. Sci. 19, 1, 30--41.Google ScholarGoogle Scholar
  6. Hersh, W. R. and Hickam, D. H. 1995. An evaluation of interactive Boolean and natural language searching with an online medical textbook. J. Am. Soc. Inf. Sci. 46, 7, 478--489. Google ScholarGoogle Scholar
  7. Hull, D. 1993. Using statistical testing in the evaluation of retrieval experiments. In Proceedings of the Sixteenth International Conference on Research and Development in Information Retrieval, R. Korfhage, E. M. Rasmussen, and P. Willett, Eds., ACM, New York, 349--338. Google ScholarGoogle Scholar
  8. Järvelin, K. and Kekäläinen, J. 2000. IR evaluation methods for retrieving highly relevant documents. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, N. Belkin, P. Ingwersen, AND M.-K. Leong, Eds., ACM, New York, 41--48. Google ScholarGoogle Scholar
  9. Kekäläinen, J. and Järvelin, K. 1998. The impact of query structure and query expansion on retrieval performance. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, W. B. Croft, A. Moffat, C. J. Van Rijsbergen, R. Wilkinson, AND J. Zobel, Eds., ACM, New York, 130--137. Google ScholarGoogle Scholar
  10. Kekäläinen, J. and Järvelin, K. 2000. The co-effects of query structure and expansion on retrieval performance in probabilistic text retrieval. Inf. Retrieval 1, 4, 329--344. Google ScholarGoogle Scholar
  11. Kekäläinen, J. and Järvelin, K. 2002a. Using graded relevance assessments in IR evaluation. J. Am. Soc. Inf. Sci. Technol. 53 (to appear). Google ScholarGoogle Scholar
  12. Kekäläinen, J. and Järvelin, K. 2002b. Evaluating information retrieval systems under the challenges of interaction and multidimensional dynamic relevance. In Proceedings of the CoLIS 4 Conference, H. Bruce, R. Fidel, P. Ingwersen, AND P. Vakkari, Eds., Libraries Unlimited: Greenwood Village, Colo., 253--270.Google ScholarGoogle Scholar
  13. Korfhage, R. R. 1997. Information Storage and Retrieval. Wiley, New York. Google ScholarGoogle Scholar
  14. Losee, R. M. 1998. Text Retrieval and Filtering: Analytic Models of Performance. Kluwer Academic, Boston. Google ScholarGoogle Scholar
  15. Myaeng, S. H. and Korfhage, R. R. 1990. Integration of user profiles: Models and experiments in information retrieval. Inf. Process. Manage. 26, 6, 719--738. Google ScholarGoogle Scholar
  16. Pollack, S. M. 1968. Measures for the comparison of information retrieval systems. Am. Doc. 19, 4, 387--397.Google ScholarGoogle Scholar
  17. Over, P. 1999. TREC-7 interactive track report {On-line}. Available at http://trec.nist.gov/pubs/trec7/papers/t7irep.pdf.gz. In NIST Special Publication 500-242: The Seventh Text REtrieval Conference (TREC 7).Google ScholarGoogle Scholar
  18. Robertson, S. E. and Belkin, N. J. 1978. Ranking in principle. J. Doc. 34, 2, 93--100.Google ScholarGoogle Scholar
  19. Rocchio, J. J., Jr. 1966. Document retrieval systems---Optimization and evaluation. PhD Dissertation. Harvard Computation Laboratory, Harvard University.Google ScholarGoogle Scholar
  20. Sakai, T. and Sparck-Jones, K. 2001. Generic summaries for indexing in information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, W. B. Croft, D. J. Harper, D. H. Kraft, and J. Zobel, Eds., ACM, New York, 190--198. Google ScholarGoogle Scholar
  21. Salton, G. and Mcgill, M. J. 1983. Introduction to Modern Information Retrieval. McGraw-Hill, London. Google ScholarGoogle Scholar
  22. Saracevic, T. Kantor, P. Chamis, A., and Trivison, D. 1988. A study of information seeking and retrieving. I. Background and methodology. J. Am. Soc. Inf. Sci. 39, 3, 161--176.Google ScholarGoogle Scholar
  23. Sormunen, E. 2000. A method for measuring wide range performance of Boolean queries in full-text databases {On-line}. Available at http://acta.uta.fi/pdf/951-44-4732-8.pdf. PhD Dissertation. Department of Information Studies, University of Tampere.Google ScholarGoogle Scholar
  24. Sormunen, E. 2001. Extensions to the STAIRS study---Empirical evidence for the hypothesised ineffectiveness of Boolean queries in large full-text databases. Inf. Retrieval 4, 3/4, 257--273. Google ScholarGoogle Scholar
  25. Sormunen, E. 2002. Liberal relevance criteria of TREC---Counting on negligible documents? In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, M. Beaulieu, R. Baeza-Yates, S. H. Myaeng, and K. Järvelin, Eds., ACM, New York, 324--330. Google ScholarGoogle Scholar
  26. Sparck-Jones, K. 1974. Automatic indexing. J. Doc. 30, 393--432.Google ScholarGoogle Scholar
  27. Spink, A., Geisdorf, H., and Bateman, J. 1998. From highly relevant to non relevant: Examining different regions of relevance. Inf. Process. Manage. 34, 5, 599--622. Google ScholarGoogle Scholar
  28. Tang, R., Shaw, W. M., and Vevea, J. L. 1999. Towards the identification of the optimal number of relevance categories. J. Am. Soc. Inf. Sci. 50, 3, 254--264. Google ScholarGoogle Scholar
  29. Trec Homepage 2001. Data---English relevance judgements {On-line}. Available at http://trec.nist.gov/data/reljudge_eng.html.Google ScholarGoogle Scholar
  30. Vakkari, P. and Hakala, N. 2000. Changes in relevance criteria and problem stages in task performance. J. Doc. 56, 540--562.Google ScholarGoogle Scholar
  31. Voorhees, E. 2001. Evaluation by highly relevant documents. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, W. B. Croft, D. J. Harper, D. H. Kraft, AND J. Zobel, Eds., ACM, New York, 74--82. Google ScholarGoogle Scholar
  32. Voorhees, E. and Harman, D. 1999. Overview of the Seventh Text REtrieval Conference (TREC-7) {On-line}. Available at http://trec.nist.gov/pubs/trec7/papers/overview7.pdf.gz. In NIST Special Publication 500-242: The Seventh Text REtrieval Conference (TREC 7).Google ScholarGoogle Scholar
  33. Zobel, J. 1998. How reliable are the results of large-scale information retrieval experiments? In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, W. B. Croft, A. Moffat, C. J. Van Rijsbergen, R. Wilkinson, AND J. Zobel, Eds., ACM, New York, 307--314. Google ScholarGoogle Scholar

Index Terms

  1. Cumulated gain-based evaluation of IR techniques

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Information Systems
      ACM Transactions on Information Systems  Volume 20, Issue 4
      October 2002
      90 pages
      ISSN:1046-8188
      EISSN:1558-2868
      DOI:10.1145/582415
      Issue’s Table of Contents

      Copyright © 2002 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 October 2002
      Published in tois Volume 20, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader