Counterfactual Evaluation and Learning for Search, Recommendation and Ad Placement

Authors:
Thorsten Joachims

Cornell University, ITHACA, NY, USA

Cornell University, ITHACA, NY, USA
View Profile

,
Adith Swaminathan

Cornell University, ITHACA, NY, USA

Cornell University, ITHACA, NY, USA
View Profile

SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information RetrievalJuly 2016Pages 1199–1201https://doi.org/10.1145/2911451.2914803

Published:07 July 2016Publication History

SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval

Pages 1199–1201

ABSTRACT

Online metrics measured through A/B tests have become the gold standard for many evaluation questions. But can we get the same results as A/B tests without actually fielding a new system? And can we train systems to optimize online metrics without subjecting users to an online learning algorithm? This tutorial summarizes and unifies the emerging body of methods on counterfactual evaluation and learning. These counterfactual techniques provide a well-founded way to evaluate and optimize online metrics by exploiting logs of past user interactions. In particular, the tutorial unifies the causal inference, information retrieval, and machine learning view of this problem, providing the basis for future research in this emerging area of great potential impact. Supplementary material and resources are available online at http://www.cs.cornell.edu/~adith/CfactSIGIR2016.

References

S. Athey and G. Imbens. Recursive Partitioning for Heterogeneous Causal Effects. ArXiv e-prints, 2015.Google Scholar
A. Beygelzimer and J. Langford. The offset tree for learning with partial labels. In KDD, pages 129--138, 2009. Google ScholarDigital Library
L. Bottou, J. Peters, J. Q. Candela, D. X. Charles, M. Chickering, E. Portugaly, D. Ray, P. Y. Simard, and E. Snelson. Counterfactual reasoning and learning systems: The example of computational advertising. Journal of Machine Learning Research, 14(1):3207--3260, 2013. Google ScholarDigital Library
B. Carterette, E. Kanoulas, V. Pavlu, and H. Fang. Reusable test collections through experimental design. In SIGIR, pages 547--554, 2010. Google ScholarDigital Library
B. Carterette, E. Kanoulas, and E. Yilmaz. Advances on the development of evaluation measures. In SIGIR, pages 1200--1201, 2012. Google ScholarDigital Library
M. Dudík, D. Erhan, J. Langford, and L. Li. Doubly robust policy evaluation and optimization. Statistical Science, pages 485--511, 2014.Google ScholarCross Ref
M. Dudík, J. Langford, and L. Li. Doubly robust policy evaluation and learning. In ICML, pages 1097--1104, 2011.Google Scholar
N. Gupta, E. Koh, and L. Li. Workshop on online and offline evaluation of web-based services. In WWW Companion, 2015.Google Scholar
K. Hofmann, A. Schuth, S. Whiteson, and M. de Rijke. Reusing historical interaction data for faster online learning to rank for IR. In WSDM, pages 183--192, 2013. Google ScholarDigital Library
R. Kohavi, R. Longbotham, D. Sommerfield, and R. M. Henne. Controlled experiments on the web: survey and practical guide. Knowledge Discovery and Data mining, pages 140--181, 2009. Google ScholarDigital Library
J. Langford, A. Strehl, and J. Wortman. Exploration scavenging. In ICML, pages 528--535, 2008. Google ScholarDigital Library
L. Li, S. Chen, J. Kleban, and A. Gupta. Counterfactual estimation and optimization of click metrics in search engines: A case study. In WWW Companion, pages 929--934, 2015. Google ScholarDigital Library
L. Li, W. Chu, J. Langford, and X. Wang. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In WSDM, pages 297--306, 2011. Google ScholarDigital Library
L. Li, R. Munos, and C. Szepesvari. Toward minimax off-policy value estimation. In AISTATS, 2015.Google Scholar
J. Mary, P. Preux, and O. Nicol. Improving offline evaluation of contextual bandit algorithms via bootstrapping techniques. In ICML, pages 172--180, 2014.Google Scholar
P. Rosenbaum and D. Rubin. The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1):41--55, 1983.Google ScholarCross Ref
R. Rubinstein and D. Kroese. Simulation and the Monte Carlo Method. Wiley, 2 edition, 2008. Google ScholarDigital Library
T. Schnabel, A. Swaminathan, A. Singh, N. Chandak, and T. Joachims. Recommendations as treatments: Debiasing learning and evaluation. Preprint, 2016.Google Scholar
A. L. Strehl, J. Langford, L. Li, and S. Kakade. Learning from logged implicit exploration data. In NIPS, pages 2217--2225, 2010.Google Scholar
R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. The MIT Press, 1998. Google ScholarDigital Library
A. Swaminathan and T. Joachims. Counterfactual risk minimization: Learning from logged bandit feedback. In ICML, 2015.Google Scholar
A. Swaminathan and T. Joachims. The self-normalized estimator for counterfactual learning. In NIPS, pages 3213--3221, 2015. Google ScholarDigital Library
A. Swaminathan, A. Krishnamurthy, A. Agarwal, M. Dudík, and J. Langford. Off-policy evaluation and optimization for slate recommendation. 2016.Google Scholar
E. Yilmaz, E. Kanoulas, and J. A. Aslam. A simple and efficient sampling method for estimating AP and NDCG. In SIGIR, pages 603--610, 2008. Google ScholarDigital Library
B. Zadrozny, J. Langford, and N. Abe. Cost-sensitive learning by cost-proportionate example weighting. In ICDM, pages 435--, 2003. Google ScholarDigital Library

Index Terms

Counterfactual Evaluation and Learning for Search, Recommendation and Ad Placement
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
    2. Retrieval models and ranking

Recommendations

Unifying Online and Counterfactual Learning to Rank: A Novel Counterfactual Estimator that Effectively Utilizes Online Interventions
WSDM '21: Proceedings of the 14th ACM International Conference on Web Search and Data Mining

Optimizing ranking systems based on user interactions is a well-studied problem. State-of-the-art methods for optimizing ranking systems based on user interactions are divided into online approaches - that learn by directly interacting with users - and ...
Read More
Counterfactual Explainable Recommendation
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

By providing explanations for users and system designers to facilitate better understanding and decision making, explainable recommendation has been an important research problem. In this paper, we propose Counterfactual Explainable Recommendation (...
Read More
Unbiased Learning for the Causal Effect of Recommendation
RecSys '20: Proceedings of the 14th ACM Conference on Recommender Systems

Increasing users’ positive interactions, such as purchases or clicks, is an important objective of recommender systems. Recommenders typically aim to select items that users will interact with. If the recommended items are purchased, an increase in ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval
July 2016
1296 pages
ISBN:9781450340694
DOI:10.1145/2911451
General Chairs:
Raffaele Perego
ISTI-CNR, Italy
,
Fabrizio Sebastiani
Qatar Computing Research Institute, HBKU, Qatar
,
Program Chairs:
Javed Aslam
Northeastern University, US
,
Ian Ruthven
University of Strathclyde, UK
,
Justin Zobel
University of Melbourne, Australia
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 July 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
batch learning from bandit feedback
causal inference
counterfactual estimation
learning to rank
Qualifiers
- research-article
Conference

Acceptance Rates
SIGIR '16 Paper Acceptance Rate62of341submissions,18%Overall Acceptance Rate792of3,983submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 25
  Total Citations
  View Citations
- 1,388
  Total Downloads
- Downloads (Last 12 months)112
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Counterfactual Evaluation and Learning for Search, Recommendation and Ad Placement

SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Unifying Online and Counterfactual Learning to Rank: A Novel Counterfactual Estimator that Effectively Utilizes Online Interventions

Counterfactual Explainable Recommendation

Unbiased Learning for the Causal Effect of Recommendation