skip to main content
10.1145/1460027.1460046acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

Cross-lingual query classification: a preliminary study

Published:30 October 2008Publication History

ABSTRACT

The non-English Web is growing at breakneck speed, but available language processing tools are mostly English based. Taxonomies are a case in point: while there are plenty of commercial and non-commercial taxonomies for the English Web, taxonomies for other languages are either not available or of very limited quality. Given that building taxonomies in all non-English languages is prohibitively expensive, it is natural to ask whether existing English taxonomies can be leveraged, possibly via machine translation, to enable information processing tasks in other languages. Preliminary results presented in this paper indicate that the answer is affirmative with respect to query classification, a task which is essential both for understanding the user intent and thus provide better search results, and for better targeting of search-based advertising, the economic underpinning of commercial Web search engines. We propose a robust method for classifying non-English queries against an English taxonomy and classifier using widely available, off-the-shelf machine translation systems. In particular, we show that by viewing the search results in the query's original language as independent sources of information, we can alleviate the impact of poor quality or erroneous machine translations. Empirical results for Chinese queries show that we achieve remarkably encouraging results.

References

  1. N. Bel, C. H. A. Koster, and M. Villegas. Cross-lingual text categorization. In Proceedings of the 7th European Conference on Research and Advanced Technology for Digital Libraries, pages 126--139, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  2. A. Broder, P. Ciccolo, M. Fontoura, E. Gabrilovich, V. Josifovski, and L. Riedel. Search advertising using Web relevance feedback. In Proceedings of the 17th ACM Conference on Information and Knowledge Management, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Z. Broder, M. Fontoura, E. Gabrilovich, A. Joshi, V. Josifovski, and T. Zhang. Robust classification of rare queries using web knowledge. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 231--238, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Gliozzo and C. Strapparava. Exploiting comparable corpora and bilingual dictionaries for cross-language text categorization. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pages 553--560, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. E.-H. Han and G. Karypis. Centroid-based document classification: Analysis and experimental results. In Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, pages 424--431, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Y. Li and J. Shawe-Taylor. Advanced learning algorithms for cross-language patent retrieval and classification. Information Processing and Management, 43(5):1183--1199, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. X. Ling, G.-R. Xue, W. Dai, Y. Jiang, Q. Yang, and Y. Yu. Can chinese web pages be classified with english data source? In Proceeding of the 17th international conference on World Wide Web, pages 969--978, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. S. Olsson, D. W. Oard, and J. Hajič. Cross-language text classification. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 645--646, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. L. Rigutini, M. Maggini, and B. Liu. An EM based training algorithm for cross-language text categorization. In Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence, pages 529--535, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Cross-lingual query classification: a preliminary study

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        iNEWS '08: Proceedings of the 2nd ACM workshop on Improving non english web searching
        October 2008
        112 pages
        ISBN:9781605584164
        DOI:10.1145/1460027

        Copyright © 2008 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 30 October 2008

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • short-paper

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader