skip to main content
10.1145/1526709.1526773acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Understanding user's query intent with wikipedia

Authors Info & Claims
Published:20 April 2009Publication History

ABSTRACT

Understanding the intent behind a user's query can help search engine to automatically route the query to some corresponding vertical search engines to obtain particularly relevant contents, thus, greatly improving user satisfaction. There are three major challenges to the query intent classification problem: (1) Intent representation; (2) Domain coverage and (3) Semantic interpretation. Current approaches to predict the user's intent mainly utilize machine learning techniques. However, it is difficult and often requires many human efforts to meet all these challenges by the statistical machine learning approaches. In this paper, we propose a general methodology to the problem of query intent classification. With very little human effort, our method can discover large quantities of intent concepts by leveraging Wikipedia, one of the best human knowledge base. The Wikipedia concepts are used as the intent representation space, thus, each intent domain is represented as a set of Wikipedia articles and categories. The intent of any input query is identified through mapping the query into the Wikipedia representation space. Compared with previous approaches, our proposed method can achieve much better coverage to classify queries in an intent domain even through the number of seed intent examples is very small. Moreover, the method is very general and can be easily applied to various intent domains. We demonstrate the effectiveness of this method in three different applications, i.e., travel, job, and person name. In each of the three cases, only a couple of seed intent queries are provided. We perform the quantitative evaluations in comparison with two baseline methods, and the experimental results shows that our method significantly outperforms other methods in each intent domain.

References

  1. S. Beitzel, E. Jensen, O. Frieder, D. Lewis, A. Chowdhury, and A. Kolcz. Improving automatic query classification via semi-supervised learning. In Proc. of the 5th IEEE International Conference on Data Mining(ICDM-05), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Robertson, S., Zaragoza, H. and Taylor, M., Simple BM25 extension to multiple weighted fields. In Proc. of the Thirteenth ACM Conference on Information and Knowledge Management (CIKM-04), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Broder, M. Fontoura, E. Gabrilovich, A. Joshi, V. Josifovski, and T. Zhang. Robust classification of rare queries using web knowledge. In Proc. of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-07), July 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Toral, A. and Munoz, R., A proposal to automatically build and maintain gazetteers for Named Entity Recognition by using Wikipedia. In Proc. of the 11th Conference of the European Chapter of the Association for Computational Linguistics(EACL-06),2006.Google ScholarGoogle Scholar
  5. D. Shen, J. Sun, Q. Yang, and Z. Chen. Building bridges for web query classification. In Proc. of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-06), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Shen, R. Pan, J. Sun, J. Pan, K. Wu, J. Yin, and Q. Yang. Q2C@UST: Our winning solution to query classification in KDDCUP 2005. In SIGKDD Explorations, volume 7, pages 100--110. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Frank, E., Paynter, G.W., Witten, I.H., Gutwin, C. and Nevill-Manning, C.G., Domain-specific keyphrase extraction. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI-99), 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Schonhofen, P., Identifying document topics using the Wikipedia category network. In Proc. of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI-06), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Vogel, S. Bickel, P. Haider, R. Schimpfky, P. Siemen, S. Bridges, and T. Scheffer. Classifying search engine queries using the web as background knowledge. In SIGKDD Explorations, volume 7. ACM, 2005 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S Strube, M. and Ponzetto, S.P., Deriving a large scale taxonomy from Wikipedia. In Proc. of the Twenty-Second National Conference on Artificial Intelligence (AAAI-2007), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Bunescu, R. and Pasca, M., Using encyclopedic knowledge for named entity disambiguation. In Proc. of the 11th Conference of the European Chapter of the Association for Computational Linguistics(EACL-06),2006Google ScholarGoogle Scholar
  12. Cucerzan, S., Large-scale named entity disambiguation based on Wikipedia data. in Proc. of the 2007 Conference on Empirical Methods in Natural Language Processing (EMNLP-07).Google ScholarGoogle Scholar
  13. Honghua (Kathy) Dai, Lingzhi Zhao, Zaiqing Nie, Ji-Rong Wen, Lee Wang, Ying Li: Detecting online commercial intention (OCI). In Proc. of the 15th World Wide Web Conference (WWW-06), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Xiao Li, Ye-Yi Wang, Alex Acero: Learning query intent from regularized click graphs. In Proc. of the 31th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-08), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Dou Shen, Toby Walkery, Zijian Zheng, Qiang Yang, Ying Li: Personal name classification in web queries. In Proc of the First ACM International Conference on Web Search and Data Mining (WSDM-08), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. J. van Rijsbergen. Information Retrieval. Butterworths, London, second edition, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C. and Nevill-Manning, C.G. "KEA: Practical automatic keyphrase extraction". In Proc. of The Fourth ACM Conference on Digital Libraries, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. J. Jansen, A. Spink, and T. Saracevic. Real life, real users, and real needs: a study and analysis of user queries on the web. Information Processing and Management, 36(2):207---227, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Hugo Zaragoza, Henning Rode, Peter Mika, Jordi Atserias, Massimiliano Ciaramita, Giuseppe Attardi: Ranking very many typed entities on Wikipedia. In Proc. of the Sixteenth ACM Conference on Information and Knowledge Management (CIKM-07), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. David Vallet, Hugo Zaragoza: Inferring the most important types of a query: A semantic approach. In Proc. of the 31th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-08), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J.H. Lee: Combining multiple evidence from different properties of weighting schemes. In Proc. of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-95), 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Gabrilovich, E. and Markovitch, S., Overcoming the brittleness bottleneck using Wikipedia: Enhancing text categorization with encyclopedic knowledge. In Proc. of the Twenty-First National Conference on Artificial Intelligence (AAAI-2006), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Gabrilovich, E. and Markovitch, S., Computing semantic relatedness using Wikipedia based explicit semantic analysis. In Proc. of the 20th International Joint Conference on Artificial Intelligence (IJCAI-07), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ruiz-Casado, M., Alfonseca, E., and Castells, P., Automatic extraction of semantic relationships for WordNet by means of pattern learning from Wikipedia. In Proc of the 11th International Conference on Applications of Natural Language to Information Systems (NLDB2006), 2006.Google ScholarGoogle Scholar
  25. Strube, M. and Ponzetto, S.P., WikiRelate! Computing semantic relatedness using Wikipedia. In Proc. of the Twenty-First National Conference on Artificial Intelligence (AAAI-2006), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Pu, W., Jian, H., Hua-Jun, Z., Zheng, C., Improving text classification by using encyclopedia knowledge. In Proc. of the 7th IEEE International Conference on Data Mining(ICDM-07), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Jian Hu, Lujun Fang, Yang Cao, Hua-Jun Zeng, Hua Li, Qiang Yang, Zheng Chen: Enhancing text clustering by leveraging Wikipedia semantics. In Proc. of the 31th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-08), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. D. Liu and J. Nocedal. On the limited memory BFGS method for large-scale optimization. Mathematical Programming, 45:503--528, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Understanding user's query intent with wikipedia

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader