ABSTRACT
Online metrics measured through A/B tests have become the gold standard for many evaluation questions. But can we get the same results as A/B tests without actually fielding a new system? And can we train systems to optimize online metrics without subjecting users to an online learning algorithm? This tutorial summarizes and unifies the emerging body of methods on counterfactual evaluation and learning. These counterfactual techniques provide a well-founded way to evaluate and optimize online metrics by exploiting logs of past user interactions. In particular, the tutorial unifies the causal inference, information retrieval, and machine learning view of this problem, providing the basis for future research in this emerging area of great potential impact. Supplementary material and resources are available online at http://www.cs.cornell.edu/~adith/CfactSIGIR2016.
- S. Athey and G. Imbens. Recursive Partitioning for Heterogeneous Causal Effects. ArXiv e-prints, 2015.Google Scholar
- A. Beygelzimer and J. Langford. The offset tree for learning with partial labels. In KDD, pages 129--138, 2009. Google ScholarDigital Library
- L. Bottou, J. Peters, J. Q. Candela, D. X. Charles, M. Chickering, E. Portugaly, D. Ray, P. Y. Simard, and E. Snelson. Counterfactual reasoning and learning systems: The example of computational advertising. Journal of Machine Learning Research, 14(1):3207--3260, 2013. Google ScholarDigital Library
- B. Carterette, E. Kanoulas, V. Pavlu, and H. Fang. Reusable test collections through experimental design. In SIGIR, pages 547--554, 2010. Google ScholarDigital Library
- B. Carterette, E. Kanoulas, and E. Yilmaz. Advances on the development of evaluation measures. In SIGIR, pages 1200--1201, 2012. Google ScholarDigital Library
- M. Dudík, D. Erhan, J. Langford, and L. Li. Doubly robust policy evaluation and optimization. Statistical Science, pages 485--511, 2014.Google ScholarCross Ref
- M. Dudík, J. Langford, and L. Li. Doubly robust policy evaluation and learning. In ICML, pages 1097--1104, 2011.Google Scholar
- N. Gupta, E. Koh, and L. Li. Workshop on online and offline evaluation of web-based services. In WWW Companion, 2015.Google Scholar
- K. Hofmann, A. Schuth, S. Whiteson, and M. de Rijke. Reusing historical interaction data for faster online learning to rank for IR. In WSDM, pages 183--192, 2013. Google ScholarDigital Library
- R. Kohavi, R. Longbotham, D. Sommerfield, and R. M. Henne. Controlled experiments on the web: survey and practical guide. Knowledge Discovery and Data mining, pages 140--181, 2009. Google ScholarDigital Library
- J. Langford, A. Strehl, and J. Wortman. Exploration scavenging. In ICML, pages 528--535, 2008. Google ScholarDigital Library
- L. Li, S. Chen, J. Kleban, and A. Gupta. Counterfactual estimation and optimization of click metrics in search engines: A case study. In WWW Companion, pages 929--934, 2015. Google ScholarDigital Library
- L. Li, W. Chu, J. Langford, and X. Wang. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In WSDM, pages 297--306, 2011. Google ScholarDigital Library
- L. Li, R. Munos, and C. Szepesvari. Toward minimax off-policy value estimation. In AISTATS, 2015.Google Scholar
- J. Mary, P. Preux, and O. Nicol. Improving offline evaluation of contextual bandit algorithms via bootstrapping techniques. In ICML, pages 172--180, 2014.Google Scholar
- P. Rosenbaum and D. Rubin. The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1):41--55, 1983.Google ScholarCross Ref
- R. Rubinstein and D. Kroese. Simulation and the Monte Carlo Method. Wiley, 2 edition, 2008. Google ScholarDigital Library
- T. Schnabel, A. Swaminathan, A. Singh, N. Chandak, and T. Joachims. Recommendations as treatments: Debiasing learning and evaluation. Preprint, 2016.Google Scholar
- A. L. Strehl, J. Langford, L. Li, and S. Kakade. Learning from logged implicit exploration data. In NIPS, pages 2217--2225, 2010.Google Scholar
- R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. The MIT Press, 1998. Google ScholarDigital Library
- A. Swaminathan and T. Joachims. Counterfactual risk minimization: Learning from logged bandit feedback. In ICML, 2015.Google Scholar
- A. Swaminathan and T. Joachims. The self-normalized estimator for counterfactual learning. In NIPS, pages 3213--3221, 2015. Google ScholarDigital Library
- A. Swaminathan, A. Krishnamurthy, A. Agarwal, M. Dudík, and J. Langford. Off-policy evaluation and optimization for slate recommendation. 2016.Google Scholar
- E. Yilmaz, E. Kanoulas, and J. A. Aslam. A simple and efficient sampling method for estimating AP and NDCG. In SIGIR, pages 603--610, 2008. Google ScholarDigital Library
- B. Zadrozny, J. Langford, and N. Abe. Cost-sensitive learning by cost-proportionate example weighting. In ICDM, pages 435--, 2003. Google ScholarDigital Library
Index Terms
- Counterfactual Evaluation and Learning for Search, Recommendation and Ad Placement
Recommendations
Unifying Online and Counterfactual Learning to Rank: A Novel Counterfactual Estimator that Effectively Utilizes Online Interventions
WSDM '21: Proceedings of the 14th ACM International Conference on Web Search and Data MiningOptimizing ranking systems based on user interactions is a well-studied problem. State-of-the-art methods for optimizing ranking systems based on user interactions are divided into online approaches - that learn by directly interacting with users - and ...
Counterfactual Explainable Recommendation
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge ManagementBy providing explanations for users and system designers to facilitate better understanding and decision making, explainable recommendation has been an important research problem. In this paper, we propose Counterfactual Explainable Recommendation (...
Unbiased Learning for the Causal Effect of Recommendation
RecSys '20: Proceedings of the 14th ACM Conference on Recommender SystemsIncreasing users’ positive interactions, such as purchases or clicks, is an important objective of recommender systems. Recommenders typically aim to select items that users will interact with. If the recommended items are purchased, an increase in ...
Comments