Abstract
Batched evaluations in IR experiments are commonly built using relevance judgments formed over a sampled pool of documents. However, judgment coverage tends to be incomplete relative to the metrics being used to compute effectiveness, since collection size often makes it financially impractical to judge every document. As a result, a considerable body of work has arisen exploring the question of how to fairly compare systems in the face of unjudged documents. Here we consider the same problem from another perspective, and investigate the relationship between relevance likelihood and retrieval rank, seeking to identify plausible methods for estimating document relevance and hence computing an inferred gain. A range of models are fitted against two typical TREC datasets, and evaluated both in terms of their goodness of fit relative to the full set of known relevance judgments, and also in terms of their predictive ability when shallower initial pools are presumed, and extrapolated metric scores are computed based on models developed from those shallow pools.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Aslam, J.A., Pavlu, V., Yilmaz, E.: A statistical method for system evaluation using incomplete judgments. In: Proceedings of SIGIR, pp. 541–548 (2006)
Buckley, C., Dimmick, D., Soboroff, I., Voorhees, E.M.: Bias and the limits of pooling for large collections. Inf. Retr. 10(6), 491–508 (2007)
Buckley, C., Voorhees, E.M.: Retrieval evaluation with incomplete information. In: Proceedings of SIGIR, pp. 25–32 (2004)
Büttcher, S., Clarke, C.L.A., Soboroff, I.: The TREC 2006 Terabyte Track. In: Proceedings of TREC, pp. 39–53 (2006)
Büttcher, S., Clarke, C.L.A., Yeung, P.C.K., Soboroff, I.: Reliable information retrieval evaluation with incomplete and biased judgements. In: Proceedings of SIGIR, pp. 63–70 (2007)
Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20(4), 422–446 (2002)
Lipani, A., Lupu, M., Hanbury, A.: Splitting water: Precision and anti-precision to reduce pool bias. In: Proceedings of SIGIR, pp. 103–112 (2015)
Lu, X., Moffat, A., Culpepper, J.S.: The effect of pooling and evaluation depth on IR metrics. Inf. Retr. 19(4), 416–445 (2016)
Moffat, A., Zobel, J.: Rank-biased precision for measurement of retrieval effectiveness. ACM Trans. Inf. Syst. 27(1), 2 (2008)
Ravana, S.D., Moffat, A.: Score estimation, incomplete judgments, and significance testing in IR evaluation. In: Cheng, P.-J., Kan, M.-Y., Lam, W., Nakov, P. (eds.) AIRS 2010. LNCS, vol. 6458, pp. 97–109. Springer, Heidelberg (2010)
Sakai, T.: Alternatives to BPref. In: Proceedings of SIGIR, pp. 71–78 (2007)
Soboroff, I.: A comparison of pooled and sampled relevance judgments in the TREC 2006 Terabyte Track. In: Proceedings of EVIA (2007)
Voorhees, E.M.: Overview of the TREC 2004 robust retrieval track. In: Proceedings of TREC, pp. 69–77 (2004)
Voorhees, E.M.: The effect of sampling strategy on inferred measures. In: Proceedings of SIGIR, pp. 1119–1122 (2014)
Voorhees, E.M., Harman, D.K. (eds.): TREC: Experiment and Evaluation in Information Retrieval. The MIT Press, Cambridge (2005)
Webber, W., Park, L.A.F.: Score adjustment for correction of pooling bias. In: Proceedings of SIGIR, pp. 444–451 (2009)
Yilmaz, E., Aslam, J.A.: Estimating average precision when judgments are incomplete. Knowl. Inf. Syst. 16(2), 173–211 (2008)
Yilmaz, E., Kanoulas, E., Aslam, J.A.: A simple and efficient sampling method for estimating AP and NDCG. In: Proceedings of SIGIR, pp. 603–610 (2008)
Zobel, J.: How reliable are the results of large-scale information retrieval experiments? In: Proceedings of SIGIR, pp. 307–314 (1998)
Acknowledgment
This work was supported by the Australian Research Council’s Discovery Projects Scheme (DP140101587). Shane Culpepper is the recipient of an Australian Research Council DECRA Research Fellowship (DE140100275).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Lu, X., Moffat, A., Culpepper, J.S. (2016). Modeling Relevance as a Function of Retrieval Rank. In: Ma, S., et al. Information Retrieval Technology. AIRS 2016. Lecture Notes in Computer Science(), vol 9994. Springer, Cham. https://doi.org/10.1007/978-3-319-48051-0_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-48051-0_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48050-3
Online ISBN: 978-3-319-48051-0
eBook Packages: Computer ScienceComputer Science (R0)