Abstract
Evaluating the effectiveness of retrieval models has been a mainstay in the IR community since its inception. Generally speaking, the goal is to provide a rigorous framework to compare the quality of two or more models, and determine which of them is the “better”. However, defining “better” or “best” in this context is not a simple task. Computing the average effectiveness over many queries is the most common approach used in Cranfield-style evaluations. But averages can hide subtle trade-offs in retrieval models – a percentage of the queries may well perform worse than a previous iteration of the model as a result of an optimization to improve some other subset. A growing body of work referred to as risk-sensitive evaluation, seeks to incorporate these effects. We scrutinize current approaches to risk-sensitive evaluation, and consider how risk and reward might be recast to better account for human expectations of result quality on a query by query basis.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Abdellaoui, M., Kemel, E.: Eliciting prospect theory when consequences are measured in time units: “time is not money”. Manag. Sci. 60(7), 1844–1859 (2013)
Allan, J., Carterette, B., Lewis, J.: When will information retrieval be good enough? In: Proceedings of SIGIR, pp. 433–440 (2005)
Benham, R., Culpepper, J.S.: Risk-reward trade-offs in rank fusion. In: Proceedings of ADCS, pp. 1:1–1:8 (2017)
Benham, R., Culpepper, J.S., Gallagher, L., Lu, X., Mackenzie, J.: Towards efficient and effective query variant generation. In: Proceedings of DESIRES (2018)
Burges, C.: From RankNet to LambdaRank to LambdaMART: an overview. Learning 11(23–581), 81 (2010)
Collins-Thompson, K., Macdonald, C., Bennett, P., Diaz, F., Voorhees, E.M.: TREC 2013 web track overview. In: Proceedings of TREC (2014)
Collins-Thompson, K., Macdonald, C., Bennett, P., Diaz, F., Voorhees, E.M.: TREC 2014 web track overview. In: Proceedings of TREC (2015)
Collins-Thompson, K.: Accounting for stability of retrieval algorithms using risk-reward curves. In: Proceedings of SIGIR, pp. 27–28 (2009)
Dinçer, B.T., Macdonald, C., Ounis, I.: Risk-sensitive evaluation and learning to rank using multiple baselines. In: Proceedings of SIGIR, pp. 483–492 (2016)
Dinçer, B.T., Macdonald, C., Ounis, I.: Hypothesis testing for the risk-sensitive evaluation of retrieval systems. In: Proceedings of SIGIR, pp. 23–32 (2014)
Dinçer, B.T., Ounis, I., Macdonald, C.: Tackling biased baselines in the risk-sensitive evaluation of retrieval systems. In: de Rijke, M., et al. (eds.) ECIR 2014. LNCS, vol. 8416, pp. 26–38. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06028-6_3
Festjens, A., Bruyneel, S., Diecidue, E., Dewitte, S.: Time-based versus money-based decision making under risk: an experimental investigation. J. Econ. Psychol. 50, 52–72 (2015)
Gallagher, L., Mackenzie, J., Culpepper, J.S.: Revisiting spam filtering in web search. In: Proceedings of ADCS, p. 5 (2018)
Hashemi, S.H., Kamps, J.: University of Amsterdam at TREC 2014: contextual suggestion and web tracks. In: Proceedings of TREC (2014)
Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap Methods and Permutation Tests, vol. 5. W. H. Freeman and Company, New York (2005)
Liu, C., Yan, X., Han, J.: Mining control flow abnormality for logic error isolation. In: Proceedings of SDM, pp. 106–117 (2006)
McCreadie, R., et al.: University of Glasgow at TREC 2014: experiments with terrier in contextual suggestion, temporal summarisation and web tracks. In: Proceedings of TREC (2014)
Sousa, D.X.D., Canuto, S.D., Rosa, T.C., Martins, W.S., Gonçalves, M.A.: Incorporating risk-sensitiveness into feature selection for learning to rank. In: Proceedings of CIKM, pp. 257–266 (2016)
Turpin, A., Scholer, F.: User performance versus precision measures for simple search tasks. In: Proceedings of SIGIR, pp. 11–18 (2006)
Tversky, A., Kahneman, D.: Advances in prospect theory: cumulative representation of uncertainty. J. Risk Uncertain. 5(4), 297–323 (1992)
Urbano, J., Lima, H., Hanjalic, A.: Statistical significance testing in information retrieval: an empirical analysis of type I, type II and type III errors. In: Proceeding of SIGIR, pp. 505–514 (2019)
Voorhees, E.M.: Overview of TREC 2003. In: Proceeding of TREC, pp. 1–13 (2003)
Wang, J., Zhuhan, J.: Portfolio theory of information retrieval. In: Proceeding of SIGIR, pp. 115–122 (2009)
Wang, L., Bennett, P.N., Collins-Thompson, K.: Robust ranking models via risk-sensitive optimization. In: Proceeding of SIGIR, pp. 761–770 (2012)
Webber, W., Moffat, A., Zobel, J.: Score standardization for inter-collection comparison of retrieval systems. In: Proceeding of SIGIR, pp. 51–58 (2008)
Webber, W., Moffat, A., Zobel, J.: A similarity measure for indefinite rankings. ACM Trans. Inf. Sys. 28(4), 20:1–20:38 (2010)
Zhang, P., Hao, L., Song, D., Wang, J., Hou, Y., Hu, B.: Generalized bias-variance evaluation of TREC participated systems. In: Proceedings of CIKM, pp. 1911–1914 (2014)
Zhang, P., Song, D., Wang, J., Hou, Y.: Bias-variance decomposition of IR evaluation. In: Proceedings of SIGIR, pp. 1021–1024 (2013)
Acknowledgments
The first author was supported by an RMIT Vice Chancellor’s PhD Scholarship. This work was also partially supported by the Australian Research Council’s Discovery Projects Scheme (DP190101113).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Benham, R., Moffat, A., Culpepper, J.S. (2020). On the Pluses and Minuses of Risk. In: Wang, F., et al. Information Retrieval Technology. AIRS 2019. Lecture Notes in Computer Science(), vol 12004. Springer, Cham. https://doi.org/10.1007/978-3-030-42835-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-42835-8_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-42834-1
Online ISBN: 978-3-030-42835-8
eBook Packages: Computer ScienceComputer Science (R0)