research-article

Understanding user's query intent with wikipedia

Authors:
Jian Hu

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

,
Gang Wang

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

,
Fred Lochovsky

The Hong Kong University of Science and Technology, Hong Kong, China

The Hong Kong University of Science and Technology, Hong Kong, China
View Profile

,
Jian-tao Sun

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

,
Zheng Chen

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

WWW '09: Proceedings of the 18th international conference on World wide webApril 2009Pages 471–480https://doi.org/10.1145/1526709.1526773

Published:20 April 2009Publication History

WWW '09: Proceedings of the 18th international conference on World wide web

Pages 471–480

ABSTRACT

Understanding the intent behind a user's query can help search engine to automatically route the query to some corresponding vertical search engines to obtain particularly relevant contents, thus, greatly improving user satisfaction. There are three major challenges to the query intent classification problem: (1) Intent representation; (2) Domain coverage and (3) Semantic interpretation. Current approaches to predict the user's intent mainly utilize machine learning techniques. However, it is difficult and often requires many human efforts to meet all these challenges by the statistical machine learning approaches. In this paper, we propose a general methodology to the problem of query intent classification. With very little human effort, our method can discover large quantities of intent concepts by leveraging Wikipedia, one of the best human knowledge base. The Wikipedia concepts are used as the intent representation space, thus, each intent domain is represented as a set of Wikipedia articles and categories. The intent of any input query is identified through mapping the query into the Wikipedia representation space. Compared with previous approaches, our proposed method can achieve much better coverage to classify queries in an intent domain even through the number of seed intent examples is very small. Moreover, the method is very general and can be easily applied to various intent domains. We demonstrate the effectiveness of this method in three different applications, i.e., travel, job, and person name. In each of the three cases, only a couple of seed intent queries are provided. We perform the quantitative evaluations in comparison with two baseline methods, and the experimental results shows that our method significantly outperforms other methods in each intent domain.

References

S. Beitzel, E. Jensen, O. Frieder, D. Lewis, A. Chowdhury, and A. Kolcz. Improving automatic query classification via semi-supervised learning. In Proc. of the 5th IEEE International Conference on Data Mining(ICDM-05), 2005. Google ScholarDigital Library
Robertson, S., Zaragoza, H. and Taylor, M., Simple BM25 extension to multiple weighted fields. In Proc. of the Thirteenth ACM Conference on Information and Knowledge Management (CIKM-04), 2004. Google ScholarDigital Library
A. Broder, M. Fontoura, E. Gabrilovich, A. Joshi, V. Josifovski, and T. Zhang. Robust classification of rare queries using web knowledge. In Proc. of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-07), July 2007. Google ScholarDigital Library
Toral, A. and Munoz, R., A proposal to automatically build and maintain gazetteers for Named Entity Recognition by using Wikipedia. In Proc. of the 11th Conference of the European Chapter of the Association for Computational Linguistics(EACL-06),2006.Google Scholar
D. Shen, J. Sun, Q. Yang, and Z. Chen. Building bridges for web query classification. In Proc. of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-06), 2006. Google ScholarDigital Library
D. Shen, R. Pan, J. Sun, J. Pan, K. Wu, J. Yin, and Q. Yang. Q2C@UST: Our winning solution to query classification in KDDCUP 2005. In SIGKDD Explorations, volume 7, pages 100--110. ACM, 2005. Google ScholarDigital Library
Frank, E., Paynter, G.W., Witten, I.H., Gutwin, C. and Nevill-Manning, C.G., Domain-specific keyphrase extraction. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI-99), 1999. Google ScholarDigital Library
Schonhofen, P., Identifying document topics using the Wikipedia category network. In Proc. of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI-06), 2006. Google ScholarDigital Library
D. Vogel, S. Bickel, P. Haider, R. Schimpfky, P. Siemen, S. Bridges, and T. Scheffer. Classifying search engine queries using the web as background knowledge. In SIGKDD Explorations, volume 7. ACM, 2005 Google ScholarDigital Library
S Strube, M. and Ponzetto, S.P., Deriving a large scale taxonomy from Wikipedia. In Proc. of the Twenty-Second National Conference on Artificial Intelligence (AAAI-2007), 2007. Google ScholarDigital Library
Bunescu, R. and Pasca, M., Using encyclopedic knowledge for named entity disambiguation. In Proc. of the 11th Conference of the European Chapter of the Association for Computational Linguistics(EACL-06),2006Google Scholar
Cucerzan, S., Large-scale named entity disambiguation based on Wikipedia data. in Proc. of the 2007 Conference on Empirical Methods in Natural Language Processing (EMNLP-07).Google Scholar
Honghua (Kathy) Dai, Lingzhi Zhao, Zaiqing Nie, Ji-Rong Wen, Lee Wang, Ying Li: Detecting online commercial intention (OCI). In Proc. of the 15th World Wide Web Conference (WWW-06), 2006. Google ScholarDigital Library
Xiao Li, Ye-Yi Wang, Alex Acero: Learning query intent from regularized click graphs. In Proc. of the 31th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-08), 2008. Google ScholarDigital Library
Dou Shen, Toby Walkery, Zijian Zheng, Qiang Yang, Ying Li: Personal name classification in web queries. In Proc of the First ACM International Conference on Web Search and Data Mining (WSDM-08), 2008. Google ScholarDigital Library
C. J. van Rijsbergen. Information Retrieval. Butterworths, London, second edition, 1979. Google ScholarDigital Library
Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C. and Nevill-Manning, C.G. "KEA: Practical automatic keyphrase extraction". In Proc. of The Fourth ACM Conference on Digital Libraries, 1999. Google ScholarDigital Library
B. J. Jansen, A. Spink, and T. Saracevic. Real life, real users, and real needs: a study and analysis of user queries on the web. Information Processing and Management, 36(2):207---227, 2000. Google ScholarDigital Library
Hugo Zaragoza, Henning Rode, Peter Mika, Jordi Atserias, Massimiliano Ciaramita, Giuseppe Attardi: Ranking very many typed entities on Wikipedia. In Proc. of the Sixteenth ACM Conference on Information and Knowledge Management (CIKM-07), 2007. Google ScholarDigital Library
David Vallet, Hugo Zaragoza: Inferring the most important types of a query: A semantic approach. In Proc. of the 31th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-08), 2008. Google ScholarDigital Library
J.H. Lee: Combining multiple evidence from different properties of weighting schemes. In Proc. of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-95), 1995. Google ScholarDigital Library
Gabrilovich, E. and Markovitch, S., Overcoming the brittleness bottleneck using Wikipedia: Enhancing text categorization with encyclopedic knowledge. In Proc. of the Twenty-First National Conference on Artificial Intelligence (AAAI-2006), 2006. Google ScholarDigital Library
Gabrilovich, E. and Markovitch, S., Computing semantic relatedness using Wikipedia based explicit semantic analysis. In Proc. of the 20th International Joint Conference on Artificial Intelligence (IJCAI-07), 2007. Google ScholarDigital Library
Ruiz-Casado, M., Alfonseca, E., and Castells, P., Automatic extraction of semantic relationships for WordNet by means of pattern learning from Wikipedia. In Proc of the 11th International Conference on Applications of Natural Language to Information Systems (NLDB2006), 2006.Google Scholar
Strube, M. and Ponzetto, S.P., WikiRelate! Computing semantic relatedness using Wikipedia. In Proc. of the Twenty-First National Conference on Artificial Intelligence (AAAI-2006), 2006. Google ScholarDigital Library
Pu, W., Jian, H., Hua-Jun, Z., Zheng, C., Improving text classification by using encyclopedia knowledge. In Proc. of the 7th IEEE International Conference on Data Mining(ICDM-07), 2007. Google ScholarDigital Library
Jian Hu, Lujun Fang, Yang Cao, Hua-Jun Zeng, Hua Li, Qiang Yang, Zheng Chen: Enhancing text clustering by leveraging Wikipedia semantics. In Proc. of the 31th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-08), 2008. Google ScholarDigital Library
D. Liu and J. Nocedal. On the limited memory BFGS method for large-scale optimization. Mathematical Programming, 45:503--528, 1989. Google ScholarDigital Library

Index Terms

Understanding user's query intent with wikipedia
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing

Recommendations

Learning query intent from regularized click graphs
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

This work presents the use of click graphs in improving query intent classifiers, which are critical if vertical search and general-purpose search services are to be offered in a unified user interface. Previous works on query classification have ...
Read More
Determining the user intent of web search engine queries
WWW '07: Proceedings of the 16th international conference on World Wide Web

Determining the user intent of Web searches is a difficult problem due to the sparse data available concerning the searcher. In this paper, we examine a method to determine the user intent underlying Web search engine queries. We qualitatively analyze ...
Read More
Impact of query intent and search context on clickthrough behavior in sponsored search

Implicit feedback techniques may be used for query intent detection, taking advantage of user behavior to understand their interests and preferences. In sponsored search, a primary concern is the user's interest in purchasing or utilizing a commercial ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '09: Proceedings of the 18th international conference on World wide web
April 2009
1280 pages
ISBN:9781605584874
DOI:10.1145/1526709
General Chairs:
Juan Quemada
DIT-UPM
,
Gonzalo León
DIT-UPM
,
Program Chairs:
Yoelle Maarek
Google Inc., Israel
,
Wolfgang Nejdl
L3S and Hannover University
Copyright © 2009 IW3C2 org
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 April 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
query classification
query intent
user intent
wikipedia
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Upcoming Conference
WWW '24

Sponsor:

sigweb

The ACM Web Conference 2024

May 13 - 17, 2024

Singapore , Singapore
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 164
  Total Citations
  View Citations
- 2,539
  Total Downloads
- Downloads (Last 12 months)35
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Understanding user's query intent with wikipedia

WWW '09: Proceedings of the 18th international conference on World wide web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Learning query intent from regularized click graphs

Determining the user intent of web search engine queries

Impact of query intent and search context on clickthrough behavior in sponsored search