short-paper

Cross-lingual query classification: a preliminary study

Authors:
Xuerui Wang

University of Massachusetts, Amherst, MA, USA

University of Massachusetts, Amherst, MA, USA
View Profile

,
Andrei Broder

Yahoo! Research, Santa Clara, CA, USA

Yahoo! Research, Santa Clara, CA, USA
View Profile

,
Evgeniy Gabrilovich

Yahoo! Research, Santa Clara, CA, USA

Yahoo! Research, Santa Clara, CA, USA
View Profile

,
Vanja Josifovski

Yahoo! Research, Santa Clara, CA, USA

Yahoo! Research, Santa Clara, CA, USA
View Profile

,
Bo Pang

Yahoo! Research, Santa Clara, CA, USA

Yahoo! Research, Santa Clara, CA, USA
View Profile

iNEWS '08: Proceedings of the 2nd ACM workshop on Improving non english web searchingOctober 2008Pages 101–104https://doi.org/10.1145/1460027.1460046

Published:30 October 2008Publication History

iNEWS '08: Proceedings of the 2nd ACM workshop on Improving non english web searching

Pages 101–104

ABSTRACT

The non-English Web is growing at breakneck speed, but available language processing tools are mostly English based. Taxonomies are a case in point: while there are plenty of commercial and non-commercial taxonomies for the English Web, taxonomies for other languages are either not available or of very limited quality. Given that building taxonomies in all non-English languages is prohibitively expensive, it is natural to ask whether existing English taxonomies can be leveraged, possibly via machine translation, to enable information processing tasks in other languages. Preliminary results presented in this paper indicate that the answer is affirmative with respect to query classification, a task which is essential both for understanding the user intent and thus provide better search results, and for better targeting of search-based advertising, the economic underpinning of commercial Web search engines. We propose a robust method for classifying non-English queries against an English taxonomy and classifier using widely available, off-the-shelf machine translation systems. In particular, we show that by viewing the search results in the query's original language as independent sources of information, we can alleviate the impact of poor quality or erroneous machine translations. Empirical results for Chinese queries show that we achieve remarkably encouraging results.

References

N. Bel, C. H. A. Koster, and M. Villegas. Cross-lingual text categorization. In Proceedings of the 7th European Conference on Research and Advanced Technology for Digital Libraries, pages 126--139, 2003.Google ScholarCross Ref
A. Broder, P. Ciccolo, M. Fontoura, E. Gabrilovich, V. Josifovski, and L. Riedel. Search advertising using Web relevance feedback. In Proceedings of the 17th ACM Conference on Information and Knowledge Management, 2008. Google ScholarDigital Library
A. Z. Broder, M. Fontoura, E. Gabrilovich, A. Joshi, V. Josifovski, and T. Zhang. Robust classification of rare queries using web knowledge. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 231--238, 2007. Google ScholarDigital Library
A. Gliozzo and C. Strapparava. Exploiting comparable corpora and bilingual dictionaries for cross-language text categorization. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pages 553--560, 2006. Google ScholarDigital Library
E.-H. Han and G. Karypis. Centroid-based document classification: Analysis and experimental results. In Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, pages 424--431, 2000. Google ScholarDigital Library
Y. Li and J. Shawe-Taylor. Advanced learning algorithms for cross-language patent retrieval and classification. Information Processing and Management, 43(5):1183--1199, 2007. Google ScholarDigital Library
X. Ling, G.-R. Xue, W. Dai, Y. Jiang, Q. Yang, and Y. Yu. Can chinese web pages be classified with english data source? In Proceeding of the 17th international conference on World Wide Web, pages 969--978, 2008. Google ScholarDigital Library
J. S. Olsson, D. W. Oard, and J. Hajič. Cross-language text classification. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 645--646, 2005. Google ScholarDigital Library
L. Rigutini, M. Maggini, and B. Liu. An EM based training algorithm for cross-language text categorization. In Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence, pages 529--535, 2005. Google ScholarDigital Library

Index Terms

Cross-lingual query classification: a preliminary study
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Machine translation
2. Information systems
  1. Information retrieval

Recommendations

Cross-language query classification using web search for exogenous knowledge
WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining

The non-English Web is growing at phenomenal speed, but available language processing tools and resources are predominantly English-based. Taxonomies are a case in point: while there are plenty of commercial and non-commercial taxonomies for the English ...
Read More
Manipuri–English comparable corpus for cross-lingual studies
Abstract
This paper presents Mni-EnCC, a temporal alligned Manipuri–English comparable corpus, to facilitate cross-lingual studies between Manipuri and English. Mni-EnCC has been created by collating text from two publicly published news sources in ...
Read More
Cross-lingual query suggestion using query logs of different languages
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

Query suggestion aims to suggest relevant queries for a given query, which help users better specify their information needs. Previously, the suggested terms are mostly in the same language of the input query. In this paper, we extend it to cross-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
iNEWS '08: Proceedings of the 2nd ACM workshop on Improving non english web searching
October 2008
112 pages
ISBN:9781605584164
DOI:10.1145/1460027
Program Chairs:
Fotis Lazarinis
Technological Educational Institute of Mesolongli, Greece
,
Efthimis N. Efthimiadis
University of Washington, USA
,
Jesus Vilares
University of A Coruna, Spain
,
John I. Tait
Information Retrieval Facility, Austria
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 October 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cross language
machine translation
query classification
relevance feedback
web search
Qualifiers
- short-paper
Conference
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 12
  Total Citations
  View Citations
- 174
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Cross-lingual query classification: a preliminary study

iNEWS '08: Proceedings of the 2nd ACM workshop on Improving non english web searching

ABSTRACT

References

Cited By

Index Terms

Recommendations

Cross-language query classification using web search for exogenous knowledge

Manipuri–English comparable corpus for cross-lingual studies

Cross-lingual query suggestion using query logs of different languages

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Cross-lingual query classification: a preliminary study

iNEWS '08: Proceedings of the 2nd ACM workshop on Improving non english web searching

ABSTRACT

References

Cited By

Index Terms

Recommendations

Cross-language query classification using web search for exogenous knowledge

Manipuri–English comparable corpus for cross-lingual studies

Cross-lingual query suggestion using query logs of different languages

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media