skip to main content
10.1145/3018661.3018702acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Real-Time Bidding by Reinforcement Learning in Display Advertising

Authors Info & Claims
Published:02 February 2017Publication History

ABSTRACT

The majority of online display ads are served through real-time bidding (RTB) --- each ad display impression is auctioned off in real-time when it is just being generated from a user visit. To place an ad automatically and optimally, it is critical for advertisers to devise a learning algorithm to cleverly bid an ad impression in real-time. Most previous works consider the bid decision as a static optimization problem of either treating the value of each impression independently or setting a bid price to each segment of ad volume. However, the bidding for a given ad campaign would repeatedly happen during its life span before the budget runs out. As such, each bid is strategically correlated by the constrained budget and the overall effectiveness of the campaign (e.g., the rewards from generated clicks), which is only observed after the campaign has completed. Thus, it is of great interest to devise an optimal bidding strategy sequentially so that the campaign budget can be dynamically allocated across all the available impressions on the basis of both the immediate and future rewards. In this paper, we formulate the bid decision process as a reinforcement learning problem, where the state space is represented by the auction information and the campaign's real-time parameters, while an action is the bid price to set. By modeling the state transition via auction competition, we build a Markov Decision Process framework for learning the optimal bidding policy to optimize the advertising performance in the dynamic real-time bidding environment. Furthermore, the scalability problem from the large real-world auction volume and campaign budget is well handled by state value approximation using neural networks. The empirical study on two large-scale real-world datasets and the live A/B testing on a commercial platform have demonstrated the superior performance and high efficiency compared to state-of-the-art methods.

References

  1. K. Amin, M. Kearns, P. Key, and A. Schwaighofer. Budget optimization for sponsored search: Censored learning in mdps. UAI, 2012.Google ScholarGoogle Scholar
  2. J. Boyan and A. W. Moore. Generalization in reinforcement learning: Safely approximating the value function. NIPS, pages 369--376, 1995.Google ScholarGoogle Scholar
  3. O. Chapelle. Modeling delayed feedback in display advertising. In KDD, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Y. Chen, P. Berkhin, B. Anderson, and N. R. Devanur. Real-time bidding algorithms for performance-based display ad allocation. In KDD, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Y. Cui, R. Zhang, W. Li, and J. Mao. Bid landscape forecasting in online ad exchange marketplace. In KDD, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. B. Edelman, M. Ostrovsky, and M. Schwarz. Internet advertising and the generalized second price auction: Selling billions of dollars worth of keywords. Technical report, National Bureau of Economic Research, 2005. Google ScholarGoogle ScholarCross RefCross Ref
  7. Google. The arrival of real-time bidding, 2011.Google ScholarGoogle Scholar
  8. G. J. Gordon. Stable function approximation in dynamic programming. In ICML, pages 261--268, 1995. Google ScholarGoogle ScholarCross RefCross Ref
  9. T. Graepel, J. Q. Candela, T. Borchert, and R. Herbrich. Web-scale bayesian click-through rate prediction for sponsored search advertising in microsoft's bing search engine. In ICML, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. X. He, J. Pan, O. Jin, T. Xu, B. Liu, T. Xu, Y. Shi, A. Atallah, R. Herbrich, S. Bowers, et al. Practical lessons from predicting clicks on ads at facebook. In ADKDD, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. K. Hosanagar and V. Cherepanov. Optimal bidding in stochastic budget constrained slot auctions. In EC, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. V. Krishna. Auction theory. Academic press, 2009.Google ScholarGoogle Scholar
  13. K.-C. Lee, A. Jalali, and A. Dasdan. Real time bid optimization with smooth budget delivery in online advertising. In ADKDD, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. K.-c. Lee, B. Orten, A. Dasdan, and W. Li. Estimating conversion rate in display advertising from past erformance data. In KDD, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. X. Li and D. Guan. Programmatic buying bidding strategies with win rate and winning price estimation in real time mobile advertising. In PAKDD. 2014. Google ScholarGoogle ScholarCross RefCross Ref
  16. H. B. McMahan, G. Holt, D. Sculley, M. Young, D. Ebner, J. Grady, L. Nie, T. Phillips, E. Davydov, D. Golovin, et al. Ad click prediction: a view from the trenches. In KDD, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. J. Oentaryo, E.-P. Lim, J.-W. Low, D. Lo, and M. Finegold. Predicting response in mobile advertising with hierarchical importance-aware factorization machine. In WSDM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. Perlich, B. Dalessandro, R. Hook, O. Stitelman, T. Raeder, and F. Provost. Bid optimizing and inventory scoring in targeted online advertising. In KDD, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587), 2016. Google ScholarGoogle ScholarCross RefCross Ref
  20. A. L. Strehl, L. Li, E. Wiewiora, J. Langford, and M. L. Littman. Pac model-free reinforcement learning. In ICML, pages 881--888. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. S. Sutton and A. G. Barto. Introduction to reinforcement learning, volume 135. MIT Press Cambridge, 1998.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. G. Taylor and R. Parr. Kernelized value function approximation for reinforcement learning. In ICML, pages 1017--1024. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Wang and S. Yuan. Real-time bidding: A new frontier of computational advertising research. In WSDM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Wang, W. Zhang, and S. Yuan. Display advertising with real-time bidding (RTB) and behavioural targeting. arXiv preprint arXiv:1610.03013, 2016.Google ScholarGoogle Scholar
  25. Y. Wang, K. Ren, W. Zhang, J. Wang, and Y. Yu. Functional bid landscape forecasting for display advertising. In ECML-PKDD, pages 115--131, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. C. J. Watkins and P. Dayan. Q-learning. Machine learning, 8(3--4):279--292, 1992.Google ScholarGoogle Scholar
  27. W. C.-H. Wu, M.-Y. Yeh, and M.-S. Chen. Predicting winning price in real time bidding with censored data. In KDD, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Xu, K.-c. Lee, W. Li, H. Qi, and Q. Lu. Smart pacing for effective online ad campaign optimization. In KDD, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Yuan and J. Wang. Sequential selection of correlated ads by pomdps. In CIKM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. Yuan, J. Wang, and X. Zhao. Real-time bidding for online advertising: measurement and analysis. In ADKDD, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. W. Zhang and J. Wang. Statistical arbitrage mining for display advertising. In KDD, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. W. Zhang, S. Yuan, and J. Wang. Optimal real-time bidding for display advertising. In KDD, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. W. Zhang, T. Zhou, J. Wang, and J. Xu. Bid-aware gradient descent for unbiased learning with censored data in display advertising. In KDD. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Real-Time Bidding by Reinforcement Learning in Display Advertising

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining
          February 2017
          868 pages
          ISBN:9781450346757
          DOI:10.1145/3018661

          Copyright © 2017 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 2 February 2017

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          WSDM '17 Paper Acceptance Rate80of505submissions,16%Overall Acceptance Rate498of2,863submissions,17%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader