skip to main content
10.1145/3460231.3474247acmconferencesArticle/Chapter ViewAbstractPublication PagesrecsysConference Proceedingsconference-collections
research-article

Pessimistic Reward Models for Off-Policy Learning in Recommendation

Published:13 September 2021Publication History

ABSTRACT

Methods for bandit learning from user interactions often require a model of the reward a certain context-action pair will yield – for example, the probability of a click on a recommendation. This common machine learning task is highly non-trivial, as the data-generating process for contexts and actions is often skewed by the recommender system itself. Indeed, when the deployed recommendation policy at data collection time does not pick its actions uniformly-at-random, this leads to a selection bias that can impede effective reward modelling. This in turn makes off-policy learning – the typical setup in industry – particularly challenging.

In this work, we propose and validate a general pessimistic reward modelling approach for off-policy learning in recommendation. Bayesian uncertainty estimates allow us to express scepticism about our own reward model, which can in turn be used to generate a conservative decision rule. We show how it alleviates a well-known decision making phenomenon known as the Optimiser’s Curse, and draw parallels with existing work on pessimistic policy learning. Leveraging the available closed-form expressions for both the posterior mean and variance when a ridge regressor models the reward, we show how to apply pessimism effectively and efficiently to an off-policy recommendation use-case. Empirical observations in a wide range of environments show that being conservative in decision-making leads to a significant and robust increase in recommendation performance. The merits of our approach are most outspoken in realistic settings with limited logging randomisation, limited training samples, and larger action spaces.

Skip Supplemental Material Section

Supplemental Material

RecSys2021_Video_PaperA_4K.mp4

mp4

231.1 MB

References

  1. A. Agarwal, S. Basu, T. Schnabel, and T. Joachims. 2017. Effective Evaluation Using Logged Bandit Feedback from Multiple Loggers. In Proc. of the 23rd ACM SIGKDD International Conference on Knowledge Discovery & Data Mining(KDD ’17). ACM, 687–696.Google ScholarGoogle Scholar
  2. A. Agarwal, K. Takatsu, I. Zaitsev, and T. Joachims. 2019. A General Framework for Counterfactual Learning-to-Rank. In Proc. of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR’19). ACM, 5–14.Google ScholarGoogle Scholar
  3. A. Agarwal, X. Wang, C. Li, M. Bendersky, and M. Najork. 2019. Addressing Trust Bias for Unbiased Learning-to-Rank. In Proc. of the 2019 World Wide Web Conference(WWW ’19). ACM, 4–14.Google ScholarGoogle Scholar
  4. J. O. Berger and R. L. Wolpert. 1988. The Likelihood Principle. IMS.Google ScholarGoogle Scholar
  5. L. Bottou, J. Peters, J. Quiñonero-Candela, D. Charles, D. Chickering, E. Portugaly, D. Ray, P. Simard, and E. Snelson. 2013. Counterfactual reasoning and learning systems: The example of computational advertising. The Journal of Machine Learning Research 14, 1 (2013), 3207–3260.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Chaney, B. Stewart, and B. Engelhardt. 2018. How Algorithmic Confounding in Recommendation Systems Increases Homogeneity and Decreases Utility. In Proc. of the 12th ACM Conference on Recommender Systems(RecSys ’18). ACM, 224–232.Google ScholarGoogle Scholar
  7. O. Chapelle and L. Li. 2011. An Empirical Evaluation of Thompson Sampling. In Proc. of the 24th International Conference on Neural Information Processing Systems(NIPS’11). 2249–2257.Google ScholarGoogle Scholar
  8. M. Chen, A. Beutel, P. Covington, S. Jain, F. Belletti, and E. H. Chi. 2019. Top-K Off-Policy Correction for a REINFORCE Recommender System. In Proc. of the 12th ACM International Conference on Web Search and Data Mining(WSDM ’19). ACM, 456–464.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Y. Chen, Y. Wang, X. Zhao, J. Zou, and M. de Rijke. 2020. Block-Aware Item Similarity Models for Top-N Recommendation. ACM Trans. Inf. Syst. 38, 4, Article 42 (Sept. 2020), 26 pages.Google ScholarGoogle Scholar
  10. Z. Chen, Y. Wang, D. Lin, D. Z. Cheng, L. Hong, E. H. Chi, and C. Cui. 2021. Beyond Point Estimate: Inferring Ensemble Prediction Variation from Neuron Activation Strength in Recommender Systems. In Proc. of the 14th ACM International Conference on Web Search and Data Mining(WSDM ’21). ACM, 76–84.Google ScholarGoogle Scholar
  11. M. Choi, J. Kim, J. Lee, H. Shim, and J. Lee. 2021. Session-aware Linear Item-Item Models for Session-based Recommendation. In Proc. of the 2021 World Wide Web Conference(WWW ’21).Google ScholarGoogle Scholar
  12. M. F. Dacrema, P. Cremonesi, and D. Jannach. 2019. Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches. In Proc. of the 13th ACM Conference on Recommender Systems(RecSys ’19). ACM, 101–109.Google ScholarGoogle Scholar
  13. M. Dudík, J. Langford, and L. Li. 2011. Doubly Robust Policy Evaluation and Learning. In Proc. of the 28th International Conference on International Conference on Machine Learning(ICML’11). 1097–1104.Google ScholarGoogle Scholar
  14. B. Dumitrascu, K. Feng, and B. E. Engelhardt. 2018. PG-TS: Improved Thompson Sampling for Logistic Contextual Bandits. In Proc. of the 32nd International Conference on Neural Information Processing Systems(NIPS’18). 4629–4638.Google ScholarGoogle Scholar
  15. B. Efron and R. J. Tibshirani. 1994. An introduction to the bootstrap. CRC press.Google ScholarGoogle Scholar
  16. E. Elahi, W. Wang, D. Ray, A. Fenton, and T. Jebara. 2019. Variational Low Rank Multinomials for Collaborative Filtering with Side-information. In Proc. of the 13th ACM Conference on Recommender Systems(RecSys ’19). ACM, 340–347.Google ScholarGoogle Scholar
  17. V. Elvira, L. Martino, D. Luengo, and M. F. Bugallo. 2019. Generalized Multiple Importance Sampling. Statist. Sci. 34, 1 (02 2019), 129–155.Google ScholarGoogle Scholar
  18. M. Farajtabar, Y. Chow, and M. Ghavamzadeh. 2018. More Robust Doubly Robust Off-policy Evaluation. In Proc. of the 35th International Conference on Machine Learning(ICML’18, Vol. 80). PMLR, 1447–1456.Google ScholarGoogle Scholar
  19. L. Faury, U. Tanielian, F. Vasile, E. Smirnova, and E. Dohmatob. 2020. Distributionally Robust Counterfactual Risk Minimization. In Proc. of the 34th AAAI Conference on Artificial Intelligence(AAAI’20). AAAI Press.Google ScholarGoogle Scholar
  20. Y. Gal and Z. Ghahramani. 2016. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In Proc. of The 33rd International Conference on Machine Learning(ICML ’16). PMLR, 1050–1059.Google ScholarGoogle Scholar
  21. F. Garcin, B. Faltings, O. Donatsch, A. Alazzawi, C. Bruttin, and A. Huber. 2014. Offline and Online Evaluation of News Recommender Systems at Swissinfo.Ch. In Proc. of the 8th ACM Conference on Recommender Systems(RecSys ’14). 169–176.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Gilotte, C. Calauzènes, T. Nedelec, A. Abraham, and S. Dollé. 2018. Offline A/B Testing for Recommender Systems. In Proc. of the 11th ACM International Conference on Web Search and Data Mining(WSDM ’18). ACM, 198–206.Google ScholarGoogle Scholar
  23. D. Guo, S. I. Ktena, P. K. Myana, F. Huszar, W. Shi, A. Tejani, M. Kneier, and S. Das. 2020. Deep Bayesian Bandits: Exploring in Online Personalized Recommendations. In Proc. of the 14th ACM Conference on Recommender Systems. ACM, 456–461.Google ScholarGoogle Scholar
  24. X. He, O. Pan, J.and Jin, T. Xu, B. Liu, T. Xu, Y. Shi, A. Atallah, R. Herbrich, S. Bowers, and J. Q. Candela. 2014. Practical Lessons from Predicting Clicks on Ads at Facebook. In Proc. of the 8th International Workshop on Data Mining for Online Advertising(ADKDD’14). ACM, 1–9.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. L. Hui and M. Belkin. 2021. Evaluation of Neural Architectures Trained with Square Loss vs Cross-Entropy in Classification Tasks. In Proc. of the 9th International Conference on Learning Representations(ICLR ’21). arxiv:2006.07322 [cs.LG]Google ScholarGoogle Scholar
  26. E. L. Ionides. 2008. Truncated Importance Sampling. Journal of Computational and Graphical Statistics 17, 2(2008), 295–311.Google ScholarGoogle ScholarCross RefCross Ref
  27. O. Jeunen. 2019. Revisiting Offline Evaluation for Implicit-feedback Recommender Systems. In Proc. of the 13th ACM Conference on Recommender Systems(RecSys ’19). ACM, 596–600.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. O. Jeunen and B. Goethals. 2020. An Empirical Evaluation of Doubly Robust Learning for Recommendation. In Proc. of the ACM RecSys Workshop on Bandit Learning from User Interactions(REVEAL ’20).Google ScholarGoogle Scholar
  29. O. Jeunen, D. Mykhaylov, D. Rohde, F. Vasile, A. Gilotte, and M. Bompaire. 2019. Learning from Bandit Feedback: An Overview of the State-of-the-art. In Proc. of the ACM RecSys Workshop on Reinforcement Learning and Robust Estimators for Recommendation(REVEAL ’19).Google ScholarGoogle Scholar
  30. O. Jeunen, D. Rohde, and F. Vasile. 2019. On the Value of Bandit Feedback for Offline Recommender System Evaluation. In Proc. of the ACM RecSys Workshop on Reinforcement Learning and Robust Estimators for Recommendation(REVEAL ’19).Google ScholarGoogle Scholar
  31. O. Jeunen, D. Rohde, F. Vasile, and M. Bompaire. 2020. Joint Policy-Value Learning for Recommendation. In Proc. of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining(KDD ’20). ACM, 1223–1233.Google ScholarGoogle Scholar
  32. O. Jeunen, J. Van Balen, and B. Goethals. 2020. Closed-Form Models for Collaborative Filtering with Side-Information. In Proc. of the 14th ACM Conference on Recommender Systems(RecSys ’20). ACM, 651–656.Google ScholarGoogle Scholar
  33. Y. Jin, Z. Yang, and Z. Wang. 2020. Is Pessimism Provably Efficient for Offline RL?arxiv:2012.15085 [cs.LG]Google ScholarGoogle Scholar
  34. T. Joachims, A. Swaminathan, and M. de Rijke. 2018. Deep Learning with Logged Bandit Feedback. In Proc. of the 6th International Conference on Learning Representations(ICLR ’18).Google ScholarGoogle Scholar
  35. T. Joachims, A. Swaminathan, and T. Schnabel. 2017. Unbiased Learning-to-Rank with Biased Feedback. In Proc. of the 10th ACM International Conference on Web Search and Data Mining(WSDM ’17). ACM, 781–789.Google ScholarGoogle Scholar
  36. R. Kidambi, A. Rajeswaran, P. Netrapalli, and T. Joachims. 2020. MOReL: Model-Based Offline Reinforcement Learning. In Advances in Neural Information Processing Systems(NeurIPS ’20, Vol. 33).Google ScholarGoogle Scholar
  37. A. Kumar, A. Zhou, G. Tucker, and S. Levine. 2020. Conservative Q-Learning for Offline Reinforcement Learning. In Advances in Neural Information Processing Systems(NeurIPS ’20, Vol. 33).Google ScholarGoogle Scholar
  38. D. Lefortier, A. Swaminathan, X. Gu, T. Joachims, and M. de Rijke. 2016. Large-scale validation of counterfactual learning methods: A test-bed. arXiv preprint arXiv:1612.00367(2016).Google ScholarGoogle Scholar
  39. S. Levine, A. Kumar, G. Tucker, and J. Fu. 2020. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems. arxiv:2005.01643 [cs.LG]Google ScholarGoogle Scholar
  40. L. Li, W. Chu, J. Langford, and R. E. Schapire. 2010. A Contextual-Bandit Approach to Personalized News Article Recommendation. In Proc. of the 19th International Conference on World Wide Web(WWW ’10). ACM, 661–670.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. S. Li, A. Karatzoglou, and C. Gentile. 2016. Collaborative Filtering Bandits. In Proc. of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’16). ACM, 539–548.Google ScholarGoogle Scholar
  42. D. Liang, R. G. Krishnan, M. D Hoffman, and T. Jebara. 2018. Variational autoencoders for collaborative filtering. In Proc. of the 2018 World Wide Web Conference(WWW ’18). ACM, 689–698.Google ScholarGoogle Scholar
  43. Y. Liu, A. Swaminathan, A. Agarwal, and E. Brunskill. 2020. Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration. In Advances in Neural Information Processing Systems(NeurIPS ’20, Vol. 33).Google ScholarGoogle Scholar
  44. B. London and T. Sandler. 2019. Bayesian Counterfactual Risk Minimization. In Proc. of the 36th International Conference on Machine Learning(ICML ’19, Vol. 97). PMLR, 4125–4133.Google ScholarGoogle Scholar
  45. R. Lopez, I. Dhillion, and M. I. Jordan. 2021. Learning from eXtreme Bandit Feedback. In Proc. of the 35th AAAI Conference on Artificial Intelligence(AAAI’21). AAAI Press.Google ScholarGoogle ScholarCross RefCross Ref
  46. C. Ma, L. Ma, Y. Zhang, R. Tang, X. Liu, and M. Coates. 2020. Probabilistic Metric Learning with Adaptive Margin for Top-K Recommendation. In Proc. of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining(KDD ’20). ACM, 1036–1044.Google ScholarGoogle Scholar
  47. J. Ma, Z. Zhao, X. Yi, J. Chen, L. Hong, and E. H. Chi. 2018. Modeling Task Relationships in Multi-Task Learning with Multi-Gate Mixture-of-Experts. In Proc. of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD ’18). ACM, 1930–1939.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. J. Ma, Z. Zhao, X. Yi, J. Yang, M. Chen, J. Tang, L. Hong, and E. H. Chi. 2020. Off-Policy Learning in Two-Stage Recommender Systems. In Proc. of the 2020 World Wide Web Conference(WWW ’20). ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Y. Ma, Y. Wang, and B. Narayanaswamy. 2019. Imitation-Regularized Offline Learning. In Proc. of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS)(AIStats ’19, Vol. 89). PMLR, 2956–2965.Google ScholarGoogle Scholar
  50. M. Mansoury, H. Abdollahpouri, M. Pechenizkiy, B. Mobasher, and R. Burke. 2020. Feedback Loop and Bias Amplification in Recommender Systems. In Proc. of the 29th ACM International Conference on Information & Knowledge Management(CIKM ’20). ACM, 2145–2148.Google ScholarGoogle Scholar
  51. A. Masegosa. 2020. Learning under Model Misspecification: Applications to Variational and Ensemble methods. In Advances in Neural Information Processing Systems(NeurIPS ’20, Vol. 33). 5479–5491.Google ScholarGoogle Scholar
  52. A. Maurer and M. Pontil. 2009. Empirical Bernstein Bounds and Sample Variance Penalization. Stat. 1050(2009), 21.Google ScholarGoogle Scholar
  53. B. C. May, N. Korda, A. Lee, and D. S. Leslie. 2012. Optimistic Bayesian Sampling in Contextual-Bandit Problems. J. Mach. Learn. Res. 13, 1 (June 2012), 2069–2106.Google ScholarGoogle Scholar
  54. J. McInerney, B. Lacker, S. Hansen, K. Higley, H. Bouchard, A. Gruson, and R. Mehrotra. 2018. Explore, Exploit, and Explain: Personalizing Explainable Recommendations with Bandits. In Proc. of the 12th ACM Conference on Recommender Systems(RecSys ’18). ACM, 31–39.Google ScholarGoogle Scholar
  55. H. B. McMahan, G. Holt, D. Sculley, M. Young, D. Ebner, J. Grady, L. Nie, T. Phillips, E. Davydov, D. Golovin, 2013. Ad click prediction: a view from the trenches. In Proc. of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1222–1230.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. R. Mehrotra, J. McInerney, H. Bouchard, M. Lalmas, and F. Diaz. 2018. Towards a Fair Marketplace: Counterfactual Evaluation of the Trade-off between Relevance, Fairness & Satisfaction in Recommendation Systems. In Proc. of the 27th ACM International Conference on Information and Knowledge Management(CIKM ’18). ACM, 2243–2251.Google ScholarGoogle Scholar
  57. R. Mehrotra, N. Xue, and M. Lalmas. 2020. Bandit Based Optimization of Multiple Objectives on a Music Streaming Platform. In Proc. of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining(KDD ’20). ACM, 3224–3233.Google ScholarGoogle Scholar
  58. K. P. Murphy. 2021. Probabilistic Machine Learning: An introduction. MIT Press.Google ScholarGoogle Scholar
  59. D. Mykhaylov, D. Rohde, F. Vasile, M. Bompaire, and O. Jeunen. 2019. Three Methods for Training on Bandit Feedback. In Proc. of the NeurIPS Workshop on Causality and Machine Learning(CausalML ’19).Google ScholarGoogle Scholar
  60. X. Ning and G. Karypis. 2011. SLIM: Sparse Linear Methods for Top-N Recommender Systems. In Proc. of the 2011 IEEE 11th International Conference on Data Mining(ICDM ’11). IEEE Computer Society, 497–506.Google ScholarGoogle Scholar
  61. H. Oosterhuis and M. de Rijke. 2020. Policy-Aware Unbiased Learning to Rank for Top-k Rankings. In Proc. of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’20). ACM, 489–498.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. I. Osband, C. Blundell, A. Pritzel, and B. Van Roy. 2016. Deep Exploration via Bootstrapped DQN. In Advances in Neural Information Processing Systems, Vol. 29. 4026–4034.Google ScholarGoogle Scholar
  63. A. B. Owen. 2013. Monte Carlo theory, methods and examples.Google ScholarGoogle Scholar
  64. D. Rohde, S. Bonner, T. Dunlop, F. Vasile, and A. Karatzoglou. 2018. RecoGym: A Reinforcement Learning Environment for the problem of Product Recommendation in Online Advertising. In Proc. of the ACM RecSys Workshop on Offline Evaluation for Recommender Systems(REVEAL ’18).Google ScholarGoogle Scholar
  65. M. Rossetti, F. Stella, and M. Zanker. 2016. Contrasting Offline and Online Results when Evaluating Recommendation Algorithms. In Proc. of the 10th ACM Conference on Recommender Systems(RecSys ’16). ACM, 31–34.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. N. Sachdeva, Y. Su, and T. Joachims. 2020. Off-Policy Bandits with Deficient Support. In Proc. of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 965–975.Google ScholarGoogle Scholar
  67. Y. Saito, S. Aihara, M. Matsutani, and Y. Narita. 2020. Large-scale Open Dataset, Pipeline, and Benchmark for Bandit Algorithms. arxiv:2008.07146 [cs.LG]Google ScholarGoogle Scholar
  68. O. Sakhi, S. Bonner, D. Rohde, and F. Vasile. 2020. BLOB : A Probabilistic Model for Recommendation that Combines Organic and Bandit Signals. In Proc. of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining(KDD ’20). ACM, 783–793.Google ScholarGoogle Scholar
  69. S. Sedhain, A. Menon, S. Sanner, and D. Braziunas. 2016. On the Effectiveness of Linear Models for One-Class Collaborative Filtering. Proc. of the AAAI Conference on Artificial Intelligence 30, 1(2016).Google ScholarGoogle Scholar
  70. I. Shenbin, A. Alekseev, E. Tutubalina, V. Malykh, and S. I. Nikolenko. 2020. RecVAE: A New Variational Autoencoder for Top-N Recommendations with Implicit Feedback. In Proc. of the 13th International Conference on Web Search and Data Mining(WSDM ’20). ACM, 528–536.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. H. Shimodaira. 2000. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference 90, 2 (2000), 227 – 244.Google ScholarGoogle ScholarCross RefCross Ref
  72. N. Si, F. Zhang, Z. Zhou, and J. Blanchet. 2020. Distributionally Robust Policy Evaluation and Learning in Offline Contextual Bandits. In International Conference on Machine Learning(ICML’20).Google ScholarGoogle Scholar
  73. J. E. Smith and R. L. Winkler. 2006. The Optimizer’s Curse: Skepticism and Postdecision Surprise in Decision Analysis. Management Science 52, 3 (2006), 311–322.Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. H. Steck. 2019. Embarrassingly Shallow Autoencoders for Sparse Data. In The World Wide Web Conference(WWW ’19). ACM, 3251–3257.Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Y. Su, M. Dimakopoulou, A. Krishnamurthy, and M. Dudik. 2020. Doubly robust off-policy evaluation with shrinkage. In Proc. of the 37th International Conference on Machine Learning(ICML ’20). PMLR, 9167–9176.Google ScholarGoogle Scholar
  76. Y. Su, L. Wang, M. Santacatterina, and T. Joachims. 2019. CAB: Continuous Adaptive Blending for Policy Evaluation and Learning. In International Conference on Machine Learning(ICML’19). 6005–6014.Google ScholarGoogle Scholar
  77. A. Swaminathan and T. Joachims. 2015. Counterfactual Risk Minimization: Learning from Logged Bandit Feedback. In Proc. of the 32nd International Conference on International Conference on Machine Learning(ICML’15). JMLR.org, 814–823.Google ScholarGoogle Scholar
  78. A. Swaminathan and T. Joachims. 2015. The Self-Normalized Estimator for Counterfactual Learning. In Advances in Neural Information Processing Systems. 3231–3239.Google ScholarGoogle Scholar
  79. H. Tang, J. Liu, M. Zhao, and X. Gong. 2020. Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations. In Proc. of the 14th ACM Conference on Recommender Systems(RecSys ’20). ACM, 269–278.Google ScholarGoogle Scholar
  80. F. Vasile, D. Rohde, O. Jeunen, and A. Benhalloum. 2020. A Gentle Introduction to Recommendation as Counterfactual Policy Learning. In Proc. of the 28th ACM Conference on User Modeling, Adaptation and Personalization(UMAP ’20). ACM, 392–393.Google ScholarGoogle Scholar
  81. T. J. Walsh, I. Szita, C. Diuk, and M. L. Littman. 2009. Exploring Compact Reinforcement-Learning Representations with Linear Regression. In Proc. of the 25th Conference on Uncertainty in Artificial Intelligence(UAI ’09). AUAI Press, 591–598.Google ScholarGoogle Scholar
  82. X. Xin, A. Karatzoglou, I. Arapakis, and J. M. Jose. 2020. Self-Supervised Reinforcement Learning for Recommender Systems. In Proc. of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’20). ACM, 931–940.Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. T. Yu, G. Thomas, L. Yu, S. Ermon, J. Y. Zou, S. Levine, C. Finn, and T. Ma. 2020. MOPO: Model-Based Offline Policy Optimization. In Advances in Neural Information Processing Systems(NeurIPS ’20, Vol. 33).Google ScholarGoogle Scholar
  84. Z. Zhao, L. Hong, L. Wei, J. Chen, A. Nath, S. Andrews, A. Kumthekar, M. Sathiamoorthy, X. Yi, and E. H. Chi. 2019. Recommending What Video to Watch next: A Multitask Ranking System. In Proceedings of the 13th ACM Conference on Recommender Systems(RecSys ’19). ACM, 43–51.Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    RecSys '21: Proceedings of the 15th ACM Conference on Recommender Systems
    September 2021
    883 pages
    ISBN:9781450384582
    DOI:10.1145/3460231

    Copyright © 2021 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 13 September 2021

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate254of1,295submissions,20%

    Upcoming Conference

    RecSys '24
    18th ACM Conference on Recommender Systems
    October 14 - 18, 2024
    Bari , Italy

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format