research-article

Online learning from click data for sponsored search

Authors:
Massimiliano Ciaramita

Yahoo! Research, Barcelona, Spain

Yahoo! Research, Barcelona, Spain
View Profile

,
Vanessa Murdock

Yahoo! Research, Barcelona, Spain

Yahoo! Research, Barcelona, Spain
View Profile

,
Vassilis Plachouras

Yahoo! Research, Barcelona, Spain

Yahoo! Research, Barcelona, Spain
View Profile

WWW '08: Proceedings of the 17th international conference on World Wide WebApril 2008Pages 227–236https://doi.org/10.1145/1367497.1367529

Published:21 April 2008Publication History

WWW '08: Proceedings of the 17th international conference on World Wide Web

Pages 227–236

ABSTRACT

Sponsored search is one of the enabling technologies for today's Web search engines. It corresponds to matching and showing ads related to the user query on the search engine results page. Users are likely to click on topically related ads and the advertisers pay only when a user clicks on their ad. Hence, it is important to be able to predict if an ad is likely to be clicked, and maximize the number of clicks. We investigate the sponsored search problem from a machine learning perspective with respect to three main sub-problems: how to use click data for training and evaluation, which learning framework is more suitable for the task, and which features are useful for existing models. We perform a large scale evaluation based on data from a commercial Web search engine. Results show that it is possible to learn and evaluate directly and exclusively on click data encoding pairwise preferences following simple and conservative assumptions. We find that online multilayer perceptron learning, based on a small set of features representing content similarity of different kinds, significantly outperforms an information retrieval baseline and other learning models, providing a suitable framework for the sponsored search task.

References

E. Agichtein, E. Brill, and S. Dumais. Improving web search ranking by incorporating user behavior information. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, 2006. Google ScholarDigital Library
A. Broder, M. Fontoura, V. Josifovski, and L. Riedel. A semantic approach to contextual advertising. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 559--566. ACM Press, 2007. Google ScholarDigital Library
C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In Proceedings of the 22nd International Conference on Machine Learning (ICML), pages 89--96, 2005. Google ScholarDigital Library
J. Carrasco, D. Fain, K. Lang, and L. Zhukov. Clustering of bipartite advertiser-keyword graph. In Proceedings of the Workshop on Clustering Large Datasets, IEEE Conference on Data Mining. IEEE Computer Society Press, 2003.Google Scholar
G. Cauwenberghs and T. Poggio. Incremental and decremental support vector machine learning. In Advances in Neural Information Processing Systems, pages 409--415, 2000.Google Scholar
M. Ciaramita, V. Murdock, and V. Plachouras. Semantic associations for contextual advertising. International Journal of Electronic Commerce Research - Special Issue on Online Advertising and Sponsored Search, 9(1), 2008.Google Scholar
M. Collins and B. Roark. Incremental parsing with the perceptron algorithm. In Proceedings of ACL 2004, 2004. Google ScholarDigital Library
R. Duda, P. Hart, and D. Stork. Pattern Classification (2nd ed.). Wiley-Interscience, 2000. Google ScholarDigital Library
J. Feng, H. Bhargava, and D. Pennock. Implementing sponsored search in web search engines: Computational evaluation of alternative mechanisms. INFORMS Journal on Computing, 19(1):137--148, 2007. Google ScholarDigital Library
Y. Freund and R. Schapire. Large margin classification using the perceptron algorithm. Machine Learning, 37(3):277--296, 1999. Google ScholarDigital Library
L. Granka, T. Joachims, and G. Gay. Eye-tracking analysis of user behavior in WWW search. In ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pages 478--479, 2004. Google ScholarDigital Library
IAB. Internet advertising revenue report: 2006 full-year results, 2007. http://www.iab.net/resources/.Google Scholar
T. Joachims. Optimizing search engines using clickthrough data. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD). ACM, 2002. Google ScholarDigital Library
T. Joachims, L. Granka, B. Pang, H. Hembrooke, and G. Gay. Accurately interpreting clickthrough data as implicit feedback. In ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pages 154--161, 2005. Google ScholarDigital Library
R. Jones, B. Rey, O. Madani, and W. Greiner. Generating query substitutions. In Proceedings of the 15th International World Wide Web Conference (WWW), 2006. Google ScholarDigital Library
D. Kelly. Implicit feedback: Using behavior to infer relevance. In New Directions in Cognitive Information Retrieval, pages 169--186. Springer Publishing, 2005.Google ScholarCross Ref
J. Kivinen, A. Smola, and R. Williamson. Online learning with kernels. In Advances in Neural Information Processing Systems, pages 785--792, 2001.Google Scholar
R. Krovetz. Viewing morphology as an inference process. In Proceedings of the 16th International ACM SIGIR Conference on Research and Development in Information Retrieval, 1993. Google ScholarDigital Library
A. Lacerda, M. Cristo, M. Goncalves, W. Fan, N. Ziviani, and B. Ribeiro-Neto. Learning to advertise. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 549--556. ACM Press, 2006. Google ScholarDigital Library
Y. Li, H. Zaragoza, R. Herbrich, J. Shawe-Taylor, and J. Kandola. The perceptron algorithm with uneven margins. In Proceedings of the Nineteenth International Conference on Machine Learning (ICML), pages 379--386, San Francisco, CA, USA, 2002. Morgan Kaufmann Publishers Inc. Google ScholarDigital Library
T. Liu, J. Xu, T. Qin, W. Xiong, and H. Li. Letor: Benchmark dataset for research on learning to rank for information retrieval. In Proceedings of the SIGIR 2007 Workshop on Learning to Rank for Information Retrieval, 2007.Google Scholar
M. Minsky and S. Papert. Perceptrons. MIT Press, Cambridge, MA, 1969.Google Scholar
V. Murdock, M. Ciaramita, and V. Plachouras. A noisy channel approach to contextual advertising. In Proceedings of the 1st International Workshop on Data Mining and Audience Intelligence for Advertising (ADKDD'07), pages 21--27, 2007. Google ScholarDigital Library
OneUpWeb. How keyword length affects conversion rates, 2005. http://www.oneupweb.com/landing/keywordstudy_landing.htm.Google Scholar
B. Ribeiro-Neto, M. Cristo, P. Golgher, and E. D. Moura. Impedance coupling in content-targeted advertising. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 496--503. ACM Press, 2005. Google ScholarDigital Library
F. Rosemblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6):386--408, 1958.Google ScholarCross Ref
D. Rumelhart, G. Hinton, and R. Williams. Learning internal representation by backpropagating errors. Nature, 323(99):533--536, 1986.Google ScholarCross Ref
M. Sahami and T. D. Heilman. A web-based kernel function for measuring the similarity of short text snippets. In Proceedings of the 15th international conference on World Wide Web, pages 377--386, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
F. Sha and F. Pereira. Shallow parsing with conditional random fields. In Proceedings of Human Language Technology and North-American Chapter of the Association for Computational Linguistics (HLT-NAACL), 2003. Google ScholarDigital Library
L. Shen and A. Joshi. Ranking and reranking with perceptron. Machine Learning. Special Issue on Learning in Speech and Language Technologies, 60(1-3):73--96, 2005. Google ScholarDigital Library
M. Surdeanu and M. Ciaramita. Robust information extraction with perceptrons. In Proceedings of NIST Automatic Content Extraction Workshop (ACE), 2007.Google Scholar
G.-R. Xue, H.-J. Zeng, Z. Chen, Y. Yu, W.-Y. Ma, W. Xi, and W. Fan. Optimizing web search using web click-through data. In Proceedings of the thirteenth ACM international conference on Information and knowledge management (CIKM), pages 118--126, New York, NY, USA, 2004. ACM. Google ScholarDigital Library
W. Yih, J. Goodman, and V. Carvalho. Finding advertising keywords on web pages. In Proceedings of the 15th international conference on World Wide Web, pages 213--222, 2006. Google ScholarDigital Library
C. Y. Yoo. Preattentive Processing of Web Advertising. PhD thesis, University of Texas at Austin, 2006. Google ScholarDigital Library
W. V. Zhang, X. He, B. Rey, and R. Jones. Query rewriting using active learning for sponsored search. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, 2007. Google ScholarDigital Library

Index Terms

Online learning from click data for sponsored search

Recommendations

Optimizing search engine revenue in sponsored search
SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval

Displaying sponsored ads alongside the search results is a key monetization strategy for search engine companies. Since users are more likely to click ads that are relevant to their query, it is crucial for search engine to deliver the right ads for the ...
Read More
Personalized click prediction in sponsored search
WSDM '10: Proceedings of the third ACM international conference on Web search and data mining

Sponsored search is a multi-billion dollar business that generates most of the revenue for search engines. Predicting the probability that users click on ads is crucial to sponsored search because the prediction is used to influence ranking, filtering, ...
Read More
Sponsored Search: Is Money a Motivator for Providing Relevant Results?

Analysis of data from a major metasearch engine reveals that sponsored-link click-through rates appear lower than previously reported. Combining sponsored and nonsponsored links in a single listing, while providing some benefits to users, does not ...
Read More

Reviews

Reviewer: Julien Velcin

Ranking advertisements in sponsored search is a very strategic task nowadays. Using machine learning (ML) techniques based on queries and clicks is a very natural way to solve this kind of task. The paper proposes two contributions: using only users' logs for learning and evaluation, and comparing three methods of ML-simple perceptron, ranking perceptron, and multilayer perceptron (MLP). A minor contribution consists of using different features to describe the instance of the learning problem, such as cosine and word overlap. The text is well structured and the related references are very accurate. Using only click data for learning is a valid approach. However, sponsored search is highly related to the economic model-this is clearly stated in the paper. Unfortunately, Ciaramita et al. do not go into detail on the question of aggregating the two models. Another weak point relates to learning algorithms. It is well known that the binary classifier is not to be adapted to the ranking task; I wonder why the authors consider it in their experiments. The description of MLP is not very clear, lacking a simple figure. In Equation 11, there seems to be a mistake in the back-propagation formula. Also, this paper lacks a comparison against other nonlinear classifiers, such as support vector machines (SVMs). Finally, the authors do not explain why they use two different stemming algorithms-Krovetz and Porter. Otherwise, this paper is of high quality Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '08: Proceedings of the 17th international conference on World Wide Web
April 2008
1326 pages
ISBN:9781605580852
DOI:10.1145/1367497
General Chairs:
Jinpeng Huai
Beihang University, China
,
Robin Chen
AT&T Labs, USA
,
Hsiao-Wuen Hon
Microsoft Research Asia, China
,
Yunhao Liu
HK University of Science and Technology, Hong Kong
,
Program Chairs:
Wei-Ying Ma
Microsoft Research Asia, China
,
Andrew Tomkins
Yahoo! Research, USA
,
Xiaodong Zhang
The Ohio State University, USA
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 April 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
online learning
perceptrons
ranking
sponsored search
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 66
  Total Citations
  View Citations
- 1,087
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Online learning from click data for sponsored search

WWW '08: Proceedings of the 17th international conference on World Wide Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Optimizing search engine revenue in sponsored search

Personalized click prediction in sponsored search

Sponsored Search: Is Money a Motivator for Providing Relevant Results?

Reviews

Access critical reviews of Computing literature here