article

Cumulated gain-based evaluation of IR techniques

Authors:
Kalervo Järvelin

University of Tampere, Finland

University of Tampere, Finland
View Profile

,
Jaana Kekäläinen

University of Tampere, Finland

University of Tampere, Finland
View Profile

Authors Info & Claims

ACM Transactions on Information Systems Volume 20 Issue 4pp 422–446https://doi.org/10.1145/582415.582418

Published:01 October 2002Publication History

ACM Transactions on Information Systems

Abstract

Modern large retrieval environments tend to overwhelm their users by their large output. Since all documents are not of equal relevance to their users, highly relevant documents should be identified and ranked first for presentation. In order to develop IR techniques in this direction, it is necessary to develop evaluation approaches and methods that credit IR methods for their ability to retrieve highly relevant documents. This can be done by extending traditional evaluation methods, that is, recall and precision based on binary relevance judgments, to graded relevance judgments. Alternatively, novel measures based on graded relevance judgments may be developed. This article proposes several novel measures that compute the cumulative gain the user obtains by examining the retrieval result up to a given ranked position. The first one accumulates the relevance scores of retrieved documents along the ranked result list. The second one is similar but applies a discount factor to the relevance scores in order to devaluate late-retrieved documents. The third one computes the relative-to-the-ideal performance of IR techniques, based on the cumulative gain they are able to yield. These novel measures are defined and discussed and their use is demonstrated in a case study using TREC data: sample system run results for 20 queries in TREC-7. As a relevance base we used novel graded relevance judgments on a four-point scale. The test results indicate that the proposed measures credit IR methods for their ability to retrieve highly relevant documents and allow testing of statistical significance of effectiveness differences. The graphs based on the measures also provide insight into the performance IR techniques and allow interpretation, for example, from the user point of view.

References

Blair, D. C. and Maron, M. E. 1985. An evaluation of retrieval effectiveness for a full-text document-retrieval system. Commun. ACM 28, 3, 289--299. Google Scholar
Borlund, P. 2000. Evaluation of interactive information retrieval systems. PhD Dissertation. Åbo University Press.Google Scholar
Borlund, P. and Ingwersen, P. 1998. Measures of relative relevance and ranked half-life: Performance indicators for interactive IR. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, W. B. Croft, A. Moffat, C. J. van Rijsbergen, R. Wilkinson, and J. Zobel, Eds., ACM, New York, 324--331. Google Scholar
Conover, W. J. 1980. Practical Nonparametric Statistics, 2nd ed., Wiley, New York.Google Scholar
Cooper, W. S. 1968. Expected search length: A single measure of retrieval effectiveness based on weak ordering action of retrieval systems. J. Am. Soc. Inf. Sci. 19, 1, 30--41.Google Scholar
Hersh, W. R. and Hickam, D. H. 1995. An evaluation of interactive Boolean and natural language searching with an online medical textbook. J. Am. Soc. Inf. Sci. 46, 7, 478--489. Google Scholar
Hull, D. 1993. Using statistical testing in the evaluation of retrieval experiments. In Proceedings of the Sixteenth International Conference on Research and Development in Information Retrieval, R. Korfhage, E. M. Rasmussen, and P. Willett, Eds., ACM, New York, 349--338. Google Scholar
Järvelin, K. and Kekäläinen, J. 2000. IR evaluation methods for retrieving highly relevant documents. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, N. Belkin, P. Ingwersen, AND M.-K. Leong, Eds., ACM, New York, 41--48. Google Scholar
Kekäläinen, J. and Järvelin, K. 1998. The impact of query structure and query expansion on retrieval performance. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, W. B. Croft, A. Moffat, C. J. Van Rijsbergen, R. Wilkinson, AND J. Zobel, Eds., ACM, New York, 130--137. Google Scholar
Kekäläinen, J. and Järvelin, K. 2000. The co-effects of query structure and expansion on retrieval performance in probabilistic text retrieval. Inf. Retrieval 1, 4, 329--344. Google Scholar
Kekäläinen, J. and Järvelin, K. 2002a. Using graded relevance assessments in IR evaluation. J. Am. Soc. Inf. Sci. Technol. 53 (to appear). Google Scholar
Kekäläinen, J. and Järvelin, K. 2002b. Evaluating information retrieval systems under the challenges of interaction and multidimensional dynamic relevance. In Proceedings of the CoLIS 4 Conference, H. Bruce, R. Fidel, P. Ingwersen, AND P. Vakkari, Eds., Libraries Unlimited: Greenwood Village, Colo., 253--270.Google Scholar
Korfhage, R. R. 1997. Information Storage and Retrieval. Wiley, New York. Google Scholar
Losee, R. M. 1998. Text Retrieval and Filtering: Analytic Models of Performance. Kluwer Academic, Boston. Google Scholar
Myaeng, S. H. and Korfhage, R. R. 1990. Integration of user profiles: Models and experiments in information retrieval. Inf. Process. Manage. 26, 6, 719--738. Google Scholar
Pollack, S. M. 1968. Measures for the comparison of information retrieval systems. Am. Doc. 19, 4, 387--397.Google Scholar
Over, P. 1999. TREC-7 interactive track report {On-line}. Available at http://trec.nist.gov/pubs/trec7/papers/t7irep.pdf.gz. In NIST Special Publication 500-242: The Seventh Text REtrieval Conference (TREC 7).Google Scholar
Robertson, S. E. and Belkin, N. J. 1978. Ranking in principle. J. Doc. 34, 2, 93--100.Google Scholar
Rocchio, J. J., Jr. 1966. Document retrieval systems---Optimization and evaluation. PhD Dissertation. Harvard Computation Laboratory, Harvard University.Google Scholar
Sakai, T. and Sparck-Jones, K. 2001. Generic summaries for indexing in information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, W. B. Croft, D. J. Harper, D. H. Kraft, and J. Zobel, Eds., ACM, New York, 190--198. Google Scholar
Salton, G. and Mcgill, M. J. 1983. Introduction to Modern Information Retrieval. McGraw-Hill, London. Google Scholar
Saracevic, T. Kantor, P. Chamis, A., and Trivison, D. 1988. A study of information seeking and retrieving. I. Background and methodology. J. Am. Soc. Inf. Sci. 39, 3, 161--176.Google Scholar
Sormunen, E. 2000. A method for measuring wide range performance of Boolean queries in full-text databases {On-line}. Available at http://acta.uta.fi/pdf/951-44-4732-8.pdf. PhD Dissertation. Department of Information Studies, University of Tampere.Google Scholar
Sormunen, E. 2001. Extensions to the STAIRS study---Empirical evidence for the hypothesised ineffectiveness of Boolean queries in large full-text databases. Inf. Retrieval 4, 3/4, 257--273. Google Scholar
Sormunen, E. 2002. Liberal relevance criteria of TREC---Counting on negligible documents? In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, M. Beaulieu, R. Baeza-Yates, S. H. Myaeng, and K. Järvelin, Eds., ACM, New York, 324--330. Google Scholar
Sparck-Jones, K. 1974. Automatic indexing. J. Doc. 30, 393--432.Google Scholar
Spink, A., Geisdorf, H., and Bateman, J. 1998. From highly relevant to non relevant: Examining different regions of relevance. Inf. Process. Manage. 34, 5, 599--622. Google Scholar
Tang, R., Shaw, W. M., and Vevea, J. L. 1999. Towards the identification of the optimal number of relevance categories. J. Am. Soc. Inf. Sci. 50, 3, 254--264. Google Scholar
Trec Homepage 2001. Data---English relevance judgements {On-line}. Available at http://trec.nist.gov/data/reljudge_eng.html.Google Scholar
Vakkari, P. and Hakala, N. 2000. Changes in relevance criteria and problem stages in task performance. J. Doc. 56, 540--562.Google Scholar
Voorhees, E. 2001. Evaluation by highly relevant documents. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, W. B. Croft, D. J. Harper, D. H. Kraft, AND J. Zobel, Eds., ACM, New York, 74--82. Google Scholar
Voorhees, E. and Harman, D. 1999. Overview of the Seventh Text REtrieval Conference (TREC-7) {On-line}. Available at http://trec.nist.gov/pubs/trec7/papers/overview7.pdf.gz. In NIST Special Publication 500-242: The Seventh Text REtrieval Conference (TREC 7).Google Scholar
Zobel, J. 1998. How reliable are the results of large-scale information retrieval experiments? In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, W. B. Croft, A. Moffat, C. J. Van Rijsbergen, R. Wilkinson, AND J. Zobel, Eds., ACM, New York, 307--314. Google Scholar

Index Terms

Cumulated gain-based evaluation of IR techniques
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
      1. Relevance assessment

Recommendations

eXtended cumulated gain measures for the evaluation of content-oriented XML retrieval

We propose and evaluate a family of measures, the eXtended Cumulated Gain (XCG) measures, for the evaluation of content-oriented XML retrieval approaches. Our aim is to provide an evaluation framework that allows the consideration of dependency among ...
Read More
Discounted cumulated gain based evaluation of multiple-query IR sessions
ECIR'08: Proceedings of the IR research, 30th European conference on Advances in information retrieval

IR research has a strong tradition of laboratory evaluation of systems. Such research is based on test collections, pre-defined test topics, and standard evaluation metrics. While recent research has emphasized the user viewpoint by proposing user-based ...
Read More
Using graded relevance assessments in IR evaluation

This article proposes evaluation methods based on the use of nondichotomous relevance judgements in IR experiments. It is argued that evaluation methods should credit IR methods for their ability to retrieve highly relevant documents. This is desirable ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Information Systems Volume 20, Issue 4
October 2002
90 pages
ISSN:1046-8188
EISSN:1558-2868
DOI:10.1145/582415
Issue’s Table of Contents

Copyright © 2002 ACM
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 October 2002
Published in tois Volume 20, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Graded relevance judgments
cumulated gain
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3,056
  Total Citations
  View Citations
- 9,380
  Total Downloads
- Downloads (Last 12 months)517
- Downloads (Last 6 weeks)74
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems

Abstract

References

Cited By

Index Terms

Recommendations

eXtended cumulated gain measures for the evaluation of content-oriented XML retrieval

Discounted cumulated gain based evaluation of multiple-query IR sessions

Using graded relevance assessments in IR evaluation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems

Abstract

References

Cited By

Index Terms

Recommendations

eXtended cumulated gain measures for the evaluation of content-oriented XML retrieval

Discounted cumulated gain based evaluation of multiple-query IR sessions

Using graded relevance assessments in IR evaluation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media