ABSTRACT
Understanding the intent behind a user's query can help search engine to automatically route the query to some corresponding vertical search engines to obtain particularly relevant contents, thus, greatly improving user satisfaction. There are three major challenges to the query intent classification problem: (1) Intent representation; (2) Domain coverage and (3) Semantic interpretation. Current approaches to predict the user's intent mainly utilize machine learning techniques. However, it is difficult and often requires many human efforts to meet all these challenges by the statistical machine learning approaches. In this paper, we propose a general methodology to the problem of query intent classification. With very little human effort, our method can discover large quantities of intent concepts by leveraging Wikipedia, one of the best human knowledge base. The Wikipedia concepts are used as the intent representation space, thus, each intent domain is represented as a set of Wikipedia articles and categories. The intent of any input query is identified through mapping the query into the Wikipedia representation space. Compared with previous approaches, our proposed method can achieve much better coverage to classify queries in an intent domain even through the number of seed intent examples is very small. Moreover, the method is very general and can be easily applied to various intent domains. We demonstrate the effectiveness of this method in three different applications, i.e., travel, job, and person name. In each of the three cases, only a couple of seed intent queries are provided. We perform the quantitative evaluations in comparison with two baseline methods, and the experimental results shows that our method significantly outperforms other methods in each intent domain.
- S. Beitzel, E. Jensen, O. Frieder, D. Lewis, A. Chowdhury, and A. Kolcz. Improving automatic query classification via semi-supervised learning. In Proc. of the 5th IEEE International Conference on Data Mining(ICDM-05), 2005. Google ScholarDigital Library
- Robertson, S., Zaragoza, H. and Taylor, M., Simple BM25 extension to multiple weighted fields. In Proc. of the Thirteenth ACM Conference on Information and Knowledge Management (CIKM-04), 2004. Google ScholarDigital Library
- A. Broder, M. Fontoura, E. Gabrilovich, A. Joshi, V. Josifovski, and T. Zhang. Robust classification of rare queries using web knowledge. In Proc. of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-07), July 2007. Google ScholarDigital Library
- Toral, A. and Munoz, R., A proposal to automatically build and maintain gazetteers for Named Entity Recognition by using Wikipedia. In Proc. of the 11th Conference of the European Chapter of the Association for Computational Linguistics(EACL-06),2006.Google Scholar
- D. Shen, J. Sun, Q. Yang, and Z. Chen. Building bridges for web query classification. In Proc. of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-06), 2006. Google ScholarDigital Library
- D. Shen, R. Pan, J. Sun, J. Pan, K. Wu, J. Yin, and Q. Yang. Q2C@UST: Our winning solution to query classification in KDDCUP 2005. In SIGKDD Explorations, volume 7, pages 100--110. ACM, 2005. Google ScholarDigital Library
- Frank, E., Paynter, G.W., Witten, I.H., Gutwin, C. and Nevill-Manning, C.G., Domain-specific keyphrase extraction. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI-99), 1999. Google ScholarDigital Library
- Schonhofen, P., Identifying document topics using the Wikipedia category network. In Proc. of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI-06), 2006. Google ScholarDigital Library
- D. Vogel, S. Bickel, P. Haider, R. Schimpfky, P. Siemen, S. Bridges, and T. Scheffer. Classifying search engine queries using the web as background knowledge. In SIGKDD Explorations, volume 7. ACM, 2005 Google ScholarDigital Library
- S Strube, M. and Ponzetto, S.P., Deriving a large scale taxonomy from Wikipedia. In Proc. of the Twenty-Second National Conference on Artificial Intelligence (AAAI-2007), 2007. Google ScholarDigital Library
- Bunescu, R. and Pasca, M., Using encyclopedic knowledge for named entity disambiguation. In Proc. of the 11th Conference of the European Chapter of the Association for Computational Linguistics(EACL-06),2006Google Scholar
- Cucerzan, S., Large-scale named entity disambiguation based on Wikipedia data. in Proc. of the 2007 Conference on Empirical Methods in Natural Language Processing (EMNLP-07).Google Scholar
- Honghua (Kathy) Dai, Lingzhi Zhao, Zaiqing Nie, Ji-Rong Wen, Lee Wang, Ying Li: Detecting online commercial intention (OCI). In Proc. of the 15th World Wide Web Conference (WWW-06), 2006. Google ScholarDigital Library
- Xiao Li, Ye-Yi Wang, Alex Acero: Learning query intent from regularized click graphs. In Proc. of the 31th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-08), 2008. Google ScholarDigital Library
- Dou Shen, Toby Walkery, Zijian Zheng, Qiang Yang, Ying Li: Personal name classification in web queries. In Proc of the First ACM International Conference on Web Search and Data Mining (WSDM-08), 2008. Google ScholarDigital Library
- C. J. van Rijsbergen. Information Retrieval. Butterworths, London, second edition, 1979. Google ScholarDigital Library
- Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C. and Nevill-Manning, C.G. "KEA: Practical automatic keyphrase extraction". In Proc. of The Fourth ACM Conference on Digital Libraries, 1999. Google ScholarDigital Library
- B. J. Jansen, A. Spink, and T. Saracevic. Real life, real users, and real needs: a study and analysis of user queries on the web. Information Processing and Management, 36(2):207---227, 2000. Google ScholarDigital Library
- Hugo Zaragoza, Henning Rode, Peter Mika, Jordi Atserias, Massimiliano Ciaramita, Giuseppe Attardi: Ranking very many typed entities on Wikipedia. In Proc. of the Sixteenth ACM Conference on Information and Knowledge Management (CIKM-07), 2007. Google ScholarDigital Library
- David Vallet, Hugo Zaragoza: Inferring the most important types of a query: A semantic approach. In Proc. of the 31th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-08), 2008. Google ScholarDigital Library
- J.H. Lee: Combining multiple evidence from different properties of weighting schemes. In Proc. of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-95), 1995. Google ScholarDigital Library
- Gabrilovich, E. and Markovitch, S., Overcoming the brittleness bottleneck using Wikipedia: Enhancing text categorization with encyclopedic knowledge. In Proc. of the Twenty-First National Conference on Artificial Intelligence (AAAI-2006), 2006. Google ScholarDigital Library
- Gabrilovich, E. and Markovitch, S., Computing semantic relatedness using Wikipedia based explicit semantic analysis. In Proc. of the 20th International Joint Conference on Artificial Intelligence (IJCAI-07), 2007. Google ScholarDigital Library
- Ruiz-Casado, M., Alfonseca, E., and Castells, P., Automatic extraction of semantic relationships for WordNet by means of pattern learning from Wikipedia. In Proc of the 11th International Conference on Applications of Natural Language to Information Systems (NLDB2006), 2006.Google Scholar
- Strube, M. and Ponzetto, S.P., WikiRelate! Computing semantic relatedness using Wikipedia. In Proc. of the Twenty-First National Conference on Artificial Intelligence (AAAI-2006), 2006. Google ScholarDigital Library
- Pu, W., Jian, H., Hua-Jun, Z., Zheng, C., Improving text classification by using encyclopedia knowledge. In Proc. of the 7th IEEE International Conference on Data Mining(ICDM-07), 2007. Google ScholarDigital Library
- Jian Hu, Lujun Fang, Yang Cao, Hua-Jun Zeng, Hua Li, Qiang Yang, Zheng Chen: Enhancing text clustering by leveraging Wikipedia semantics. In Proc. of the 31th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-08), 2008. Google ScholarDigital Library
- D. Liu and J. Nocedal. On the limited memory BFGS method for large-scale optimization. Mathematical Programming, 45:503--528, 1989. Google ScholarDigital Library
Index Terms
- Understanding user's query intent with wikipedia
Recommendations
Learning query intent from regularized click graphs
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrievalThis work presents the use of click graphs in improving query intent classifiers, which are critical if vertical search and general-purpose search services are to be offered in a unified user interface. Previous works on query classification have ...
Determining the user intent of web search engine queries
WWW '07: Proceedings of the 16th international conference on World Wide WebDetermining the user intent of Web searches is a difficult problem due to the sparse data available concerning the searcher. In this paper, we examine a method to determine the user intent underlying Web search engine queries. We qualitatively analyze ...
Impact of query intent and search context on clickthrough behavior in sponsored search
Implicit feedback techniques may be used for query intent detection, taking advantage of user behavior to understand their interests and preferences. In sponsored search, a primary concern is the user's interest in purchasing or utilizing a commercial ...
Comments