Skip to main content

Modeling Relevance as a Function of Retrieval Rank

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9994))

Abstract

Batched evaluations in IR experiments are commonly built using relevance judgments formed over a sampled pool of documents. However, judgment coverage tends to be incomplete relative to the metrics being used to compute effectiveness, since collection size often makes it financially impractical to judge every document. As a result, a considerable body of work has arisen exploring the question of how to fairly compare systems in the face of unjudged documents. Here we consider the same problem from another perspective, and investigate the relationship between relevance likelihood and retrieval rank, seeking to identify plausible methods for estimating document relevance and hence computing an inferred gain. A range of models are fitted against two typical TREC datasets, and evaluated both in terms of their goodness of fit relative to the full set of known relevance judgments, and also in terms of their predictive ability when shallower initial pools are presumed, and extrapolated metric scores are computed based on models developed from those shallow pools.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Aslam, J.A., Pavlu, V., Yilmaz, E.: A statistical method for system evaluation using incomplete judgments. In: Proceedings of SIGIR, pp. 541–548 (2006)

    Google Scholar 

  2. Buckley, C., Dimmick, D., Soboroff, I., Voorhees, E.M.: Bias and the limits of pooling for large collections. Inf. Retr. 10(6), 491–508 (2007)

    Article  Google Scholar 

  3. Buckley, C., Voorhees, E.M.: Retrieval evaluation with incomplete information. In: Proceedings of SIGIR, pp. 25–32 (2004)

    Google Scholar 

  4. Büttcher, S., Clarke, C.L.A., Soboroff, I.: The TREC 2006 Terabyte Track. In: Proceedings of TREC, pp. 39–53 (2006)

    Google Scholar 

  5. Büttcher, S., Clarke, C.L.A., Yeung, P.C.K., Soboroff, I.: Reliable information retrieval evaluation with incomplete and biased judgements. In: Proceedings of SIGIR, pp. 63–70 (2007)

    Google Scholar 

  6. Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20(4), 422–446 (2002)

    Article  Google Scholar 

  7. Lipani, A., Lupu, M., Hanbury, A.: Splitting water: Precision and anti-precision to reduce pool bias. In: Proceedings of SIGIR, pp. 103–112 (2015)

    Google Scholar 

  8. Lu, X., Moffat, A., Culpepper, J.S.: The effect of pooling and evaluation depth on IR metrics. Inf. Retr. 19(4), 416–445 (2016)

    Article  Google Scholar 

  9. Moffat, A., Zobel, J.: Rank-biased precision for measurement of retrieval effectiveness. ACM Trans. Inf. Syst. 27(1), 2 (2008)

    Article  Google Scholar 

  10. Ravana, S.D., Moffat, A.: Score estimation, incomplete judgments, and significance testing in IR evaluation. In: Cheng, P.-J., Kan, M.-Y., Lam, W., Nakov, P. (eds.) AIRS 2010. LNCS, vol. 6458, pp. 97–109. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  11. Sakai, T.: Alternatives to BPref. In: Proceedings of SIGIR, pp. 71–78 (2007)

    Google Scholar 

  12. Soboroff, I.: A comparison of pooled and sampled relevance judgments in the TREC 2006 Terabyte Track. In: Proceedings of EVIA (2007)

    Google Scholar 

  13. Voorhees, E.M.: Overview of the TREC 2004 robust retrieval track. In: Proceedings of TREC, pp. 69–77 (2004)

    Google Scholar 

  14. Voorhees, E.M.: The effect of sampling strategy on inferred measures. In: Proceedings of SIGIR, pp. 1119–1122 (2014)

    Google Scholar 

  15. Voorhees, E.M., Harman, D.K. (eds.): TREC: Experiment and Evaluation in Information Retrieval. The MIT Press, Cambridge (2005)

    Google Scholar 

  16. Webber, W., Park, L.A.F.: Score adjustment for correction of pooling bias. In: Proceedings of SIGIR, pp. 444–451 (2009)

    Google Scholar 

  17. Yilmaz, E., Aslam, J.A.: Estimating average precision when judgments are incomplete. Knowl. Inf. Syst. 16(2), 173–211 (2008)

    Article  Google Scholar 

  18. Yilmaz, E., Kanoulas, E., Aslam, J.A.: A simple and efficient sampling method for estimating AP and NDCG. In: Proceedings of SIGIR, pp. 603–610 (2008)

    Google Scholar 

  19. Zobel, J.: How reliable are the results of large-scale information retrieval experiments? In: Proceedings of SIGIR, pp. 307–314 (1998)

    Google Scholar 

Download references

Acknowledgment

This work was supported by the Australian Research Council’s Discovery Projects Scheme (DP140101587). Shane Culpepper is the recipient of an Australian Research Council DECRA Research Fellowship (DE140100275).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaolu Lu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Lu, X., Moffat, A., Culpepper, J.S. (2016). Modeling Relevance as a Function of Retrieval Rank. In: Ma, S., et al. Information Retrieval Technology. AIRS 2016. Lecture Notes in Computer Science(), vol 9994. Springer, Cham. https://doi.org/10.1007/978-3-319-48051-0_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-48051-0_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-48050-3

  • Online ISBN: 978-3-319-48051-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics