ABSTRACT
Sponsored search is one of the enabling technologies for today's Web search engines. It corresponds to matching and showing ads related to the user query on the search engine results page. Users are likely to click on topically related ads and the advertisers pay only when a user clicks on their ad. Hence, it is important to be able to predict if an ad is likely to be clicked, and maximize the number of clicks. We investigate the sponsored search problem from a machine learning perspective with respect to three main sub-problems: how to use click data for training and evaluation, which learning framework is more suitable for the task, and which features are useful for existing models. We perform a large scale evaluation based on data from a commercial Web search engine. Results show that it is possible to learn and evaluate directly and exclusively on click data encoding pairwise preferences following simple and conservative assumptions. We find that online multilayer perceptron learning, based on a small set of features representing content similarity of different kinds, significantly outperforms an information retrieval baseline and other learning models, providing a suitable framework for the sponsored search task.
- E. Agichtein, E. Brill, and S. Dumais. Improving web search ranking by incorporating user behavior information. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, 2006. Google ScholarDigital Library
- A. Broder, M. Fontoura, V. Josifovski, and L. Riedel. A semantic approach to contextual advertising. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 559--566. ACM Press, 2007. Google ScholarDigital Library
- C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In Proceedings of the 22nd International Conference on Machine Learning (ICML), pages 89--96, 2005. Google ScholarDigital Library
- J. Carrasco, D. Fain, K. Lang, and L. Zhukov. Clustering of bipartite advertiser-keyword graph. In Proceedings of the Workshop on Clustering Large Datasets, IEEE Conference on Data Mining. IEEE Computer Society Press, 2003.Google Scholar
- G. Cauwenberghs and T. Poggio. Incremental and decremental support vector machine learning. In Advances in Neural Information Processing Systems, pages 409--415, 2000.Google Scholar
- M. Ciaramita, V. Murdock, and V. Plachouras. Semantic associations for contextual advertising. International Journal of Electronic Commerce Research - Special Issue on Online Advertising and Sponsored Search, 9(1), 2008.Google Scholar
- M. Collins and B. Roark. Incremental parsing with the perceptron algorithm. In Proceedings of ACL 2004, 2004. Google ScholarDigital Library
- R. Duda, P. Hart, and D. Stork. Pattern Classification (2nd ed.). Wiley-Interscience, 2000. Google ScholarDigital Library
- J. Feng, H. Bhargava, and D. Pennock. Implementing sponsored search in web search engines: Computational evaluation of alternative mechanisms. INFORMS Journal on Computing, 19(1):137--148, 2007. Google ScholarDigital Library
- Y. Freund and R. Schapire. Large margin classification using the perceptron algorithm. Machine Learning, 37(3):277--296, 1999. Google ScholarDigital Library
- L. Granka, T. Joachims, and G. Gay. Eye-tracking analysis of user behavior in WWW search. In ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pages 478--479, 2004. Google ScholarDigital Library
- IAB. Internet advertising revenue report: 2006 full-year results, 2007. http://www.iab.net/resources/.Google Scholar
- T. Joachims. Optimizing search engines using clickthrough data. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD). ACM, 2002. Google ScholarDigital Library
- T. Joachims, L. Granka, B. Pang, H. Hembrooke, and G. Gay. Accurately interpreting clickthrough data as implicit feedback. In ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pages 154--161, 2005. Google ScholarDigital Library
- R. Jones, B. Rey, O. Madani, and W. Greiner. Generating query substitutions. In Proceedings of the 15th International World Wide Web Conference (WWW), 2006. Google ScholarDigital Library
- D. Kelly. Implicit feedback: Using behavior to infer relevance. In New Directions in Cognitive Information Retrieval, pages 169--186. Springer Publishing, 2005.Google ScholarCross Ref
- J. Kivinen, A. Smola, and R. Williamson. Online learning with kernels. In Advances in Neural Information Processing Systems, pages 785--792, 2001.Google Scholar
- R. Krovetz. Viewing morphology as an inference process. In Proceedings of the 16th International ACM SIGIR Conference on Research and Development in Information Retrieval, 1993. Google ScholarDigital Library
- A. Lacerda, M. Cristo, M. Goncalves, W. Fan, N. Ziviani, and B. Ribeiro-Neto. Learning to advertise. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 549--556. ACM Press, 2006. Google ScholarDigital Library
- Y. Li, H. Zaragoza, R. Herbrich, J. Shawe-Taylor, and J. Kandola. The perceptron algorithm with uneven margins. In Proceedings of the Nineteenth International Conference on Machine Learning (ICML), pages 379--386, San Francisco, CA, USA, 2002. Morgan Kaufmann Publishers Inc. Google ScholarDigital Library
- T. Liu, J. Xu, T. Qin, W. Xiong, and H. Li. Letor: Benchmark dataset for research on learning to rank for information retrieval. In Proceedings of the SIGIR 2007 Workshop on Learning to Rank for Information Retrieval, 2007.Google Scholar
- M. Minsky and S. Papert. Perceptrons. MIT Press, Cambridge, MA, 1969.Google Scholar
- V. Murdock, M. Ciaramita, and V. Plachouras. A noisy channel approach to contextual advertising. In Proceedings of the 1st International Workshop on Data Mining and Audience Intelligence for Advertising (ADKDD'07), pages 21--27, 2007. Google ScholarDigital Library
- OneUpWeb. How keyword length affects conversion rates, 2005. http://www.oneupweb.com/landing/keywordstudy_landing.htm.Google Scholar
- B. Ribeiro-Neto, M. Cristo, P. Golgher, and E. D. Moura. Impedance coupling in content-targeted advertising. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 496--503. ACM Press, 2005. Google ScholarDigital Library
- F. Rosemblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6):386--408, 1958.Google ScholarCross Ref
- D. Rumelhart, G. Hinton, and R. Williams. Learning internal representation by backpropagating errors. Nature, 323(99):533--536, 1986.Google ScholarCross Ref
- M. Sahami and T. D. Heilman. A web-based kernel function for measuring the similarity of short text snippets. In Proceedings of the 15th international conference on World Wide Web, pages 377--386, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
- F. Sha and F. Pereira. Shallow parsing with conditional random fields. In Proceedings of Human Language Technology and North-American Chapter of the Association for Computational Linguistics (HLT-NAACL), 2003. Google ScholarDigital Library
- L. Shen and A. Joshi. Ranking and reranking with perceptron. Machine Learning. Special Issue on Learning in Speech and Language Technologies, 60(1-3):73--96, 2005. Google ScholarDigital Library
- M. Surdeanu and M. Ciaramita. Robust information extraction with perceptrons. In Proceedings of NIST Automatic Content Extraction Workshop (ACE), 2007.Google Scholar
- G.-R. Xue, H.-J. Zeng, Z. Chen, Y. Yu, W.-Y. Ma, W. Xi, and W. Fan. Optimizing web search using web click-through data. In Proceedings of the thirteenth ACM international conference on Information and knowledge management (CIKM), pages 118--126, New York, NY, USA, 2004. ACM. Google ScholarDigital Library
- W. Yih, J. Goodman, and V. Carvalho. Finding advertising keywords on web pages. In Proceedings of the 15th international conference on World Wide Web, pages 213--222, 2006. Google ScholarDigital Library
- C. Y. Yoo. Preattentive Processing of Web Advertising. PhD thesis, University of Texas at Austin, 2006. Google ScholarDigital Library
- W. V. Zhang, X. He, B. Rey, and R. Jones. Query rewriting using active learning for sponsored search. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, 2007. Google ScholarDigital Library
Index Terms
- Online learning from click data for sponsored search
Recommendations
Optimizing search engine revenue in sponsored search
SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrievalDisplaying sponsored ads alongside the search results is a key monetization strategy for search engine companies. Since users are more likely to click ads that are relevant to their query, it is crucial for search engine to deliver the right ads for the ...
Personalized click prediction in sponsored search
WSDM '10: Proceedings of the third ACM international conference on Web search and data miningSponsored search is a multi-billion dollar business that generates most of the revenue for search engines. Predicting the probability that users click on ads is crucial to sponsored search because the prediction is used to influence ranking, filtering, ...
Sponsored Search: Is Money a Motivator for Providing Relevant Results?
Analysis of data from a major metasearch engine reveals that sponsored-link click-through rates appear lower than previously reported. Combining sponsored and nonsponsored links in a single listing, while providing some benefits to users, does not ...
Comments