skip to main content
research-article

Cross-Lingual Adaptation Using Structural Correspondence Learning

Published:01 October 2011Publication History
Skip Abstract Section

Abstract

Cross-lingual adaptation is a special case of domain adaptation and refers to the transfer of classification knowledge between two languages. In this article we describe an extension of Structural Correspondence Learning (SCL), a recently proposed algorithm for domain adaptation, for cross-lingual adaptation in the context of text classification. The proposed method uses unlabeled documents from both languages, along with a word translation oracle, to induce a cross-lingual representation that enables the transfer of classification knowledge from the source to the target language. The main advantages of this method over existing methods are resource efficiency and task specificity.

We conduct experiments in the area of cross-language topic and sentiment classification involving English as source language and German, French, and Japanese as target languages. The results show a significant improvement of the proposed method over a machine translation baseline, reducing the relative error due to cross-lingual adaptation by an average of 30% (topic classification) and 59% (sentiment classification). We further report on empirical analyses that reveal insights into the use of unlabeled data, the sensitivity with respect to important hyperparameters, and the nature of the induced cross-lingual word correspondences.

References

  1. Ando, R. K. and Zhang, T. 2005a. A framework for learning predictive structures from multiple tasks and unlabeled data. J. Mach. Learn. Res. 6, 1817--1853. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ando, R. K. and Zhang, T. 2005b. A high-performance semi-supervised learning method for text chunking. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05). Association for Computational Linguistics, Morristown, NJ. 1--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bautin, M., Vijayarenu, L., and Skiena, S. 2008. International sentiment analysis for news and blogs. In Proceedings of the AAAI International Conference on Weblogs and Social Media (ICWSM’08). 19--26.Google ScholarGoogle Scholar
  4. Bel, N., Koster, C. H. A., and Villegas, M. 2003. Cross-lingual text categorization. In Proceedings of the European Conference on Digital Libraries (ECDL’03). 126--139.Google ScholarGoogle Scholar
  5. Berry, M. W. 1992. Large-scale sparse singular value computations. Int. J. Supercomput. Appl. 6, 1, 13--49.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bickel, S., Brückner, M., and Scheffer, T. 2009. Discriminative learning under covariate shift. J. Mach. Learn. Res. 10, 2137--2155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Blitzer, J., Dredze, M., and Pereira, F. 2007. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL’07). Association for Computational Linguistics. 440--447.Google ScholarGoogle Scholar
  8. Blitzer, J., Mcdonald, R., and Pereira, F. 2006. Domain adaptation with structural correspondence learning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’06). Association for Computational Linguistics, 120--128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Cortes, C., Mohri, M., Riley, M., and Rostamizadeh, A. 2008. Sample selection bias correction theory. In Algorithmic Learning Theory, Y. Freund, L. Györfi, G. Turán, and T. Zeugmann Eds., Lecture Notes in Computer Science, vol. 5254, Springer Berlin, Chapter 8, 38--53. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Crammer, K., Dredze, M., and Kulesza, A. 2009. Multi-class confidence weighted algorithms. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’09). Association for Computational Linguistics, Morristown, NJ, 496--504. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Dai, W., Chen, Y., Xue, G.-R., Yang, Q., and Yu, Y. 2008. Translated learning: Transfer learning across different feature spaces. In Advances in Neural Information Processing Systems 21. MIT Press, 353--360.Google ScholarGoogle Scholar
  12. Daume, III, H. 2007. Frustratingly easy domain adaptation. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL’07). Association for Computational Linguistics. 256--263.Google ScholarGoogle Scholar
  13. Dietterich, T. G. 1998. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10, 1895--1923. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Duchi, J., Shalev-Shwartz, S., Singer, Y., and Chandra, T. 2008. Efficient projections onto the l 1-ball for learning in high dimensions. In Proceedings of the 25th International Conference on Machine Learning. ACM, New York, 272--279. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Dumais, S. T., Letsche, T. A., Littman, M. L., and Landauer, T. K. 1997. Automatic cross-language retrieval using latent semantic indexing. In Proceedings of the AAAI Symposium on CrossLanguage Text and Speech Retrieval. American Association for Artificial Intelligence.Google ScholarGoogle Scholar
  16. Finkel, J. R. and Manning, C. D. 2009. Hierarchical bayesian domain adaptation. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL’09). Association for Computational Linguistics, Morristown, NJ. 602--610. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Fortuna, B. and Shawe-Taylor, J. 2005. The use of machine translation tools for cross-lingual text mining. In Proceedings of the Workshop on Learning with Multiple Views (ICML’05).Google ScholarGoogle Scholar
  18. Gao, J., Andrew, G., Johnson, M., and Toutanova, K. 2007. A comparative study of parameter estimation methods for statistical natural language processing. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL’07). The Association for Computer Linguistics. 824--831.Google ScholarGoogle Scholar
  19. Gliozzo, A. and Strapparava, C. 2005. Cross language text categorization by acquiring multilingual domain models from comparable corpora. In Proceedings of the ACL Workshop on Building and Using Parallel Texts (ParaText’05). Association for Computational Linguistics, Morristown, NJ. 9--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Gliozzo, A. and Strapparava, C. 2006. Exploiting comparable corpora and bilingual dictionaries for cross-language text categorization. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics (ACL’06). Association for Computational Linguistics, Morristown, NJ. 553--560. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Hiroshi, K., Tetsuya, N., and Hideo, W. 2004. Deeper sentiment analysis using machine translation technology. In Proceedings of the 20th International Conference on Computational Linguistics (ACL’04). Association for Computational Linguistics, Morristown, NJ. 494+. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Jiang, J. and Zhai, C. 2007. A two-stage approach to domain adaptation for statistical classifiers. In Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM’07). ACM, New York. 401--410. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Langford, J., Li, L., and Zhang, T. 2009. Sparse online learning via truncated gradient. J. Mach. Learn. Res. 10, 777--801. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Lavrenko, V., Choquette, M., and Croft, W. B. 2002. Cross-lingual relevance models. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’02). ACM, New York. 175--182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Li, Y. and Taylor, J. S. 2007. Advanced learning algorithms for cross-language patent retrieval and classification. Inf. Process. Manage. 43, 5, 1183--1199. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Ling, X., Xue, G. R., Dai, W., Jiang, Y., Yang, Q., and Yu, Y. 2008. Can chinese web pages be classified with english data source? In Proceeding of the 17th International Conference on World Wide Web (WWW’08). ACM, New York, 969--978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Margolis, A., Livescu, K., and Ostendorf, M. 2010. Domain adaptation with unlabeled data for dialog act tagging. In Proceedings of the Workshop on Domain Adaptation for Natural Language Processing (DANLP’10). Association for Computational Linguistics. 45--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Oard, D. W. 1998. A comparative study of query and document translation for cross-language information retrieval. In Proceedings of AMTA. D. Farwell, L. Gerber, E. H. Hovy, D. Farwell, L. Gerber, and E. H. Hovy, Eds. Lecture Notes in Computer Science, vol. 1529, Springer. 472--483. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Olsson, J. S., Oard, D. W., and Hajič, J. 2005. Cross-language text classification. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’05). ACM, New York. 645--646. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Pan, S. J. and Yang, Q. 2009. A survey on transfer learning. IEEE Trans. Knowl. Data Engin. 99, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Pang, B., Lee, L., and Vaithyanathan, S. 2002. Thumbs up?: Sentiment classification using machine learning techniques. In Proceedings of the ACL Conference on Empirical Methods in Natural Language Processing (EMNLP’02). Association for Computational Linguistics, Morristown, NJ. 79--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Potthast, M., Stein, B., and Anderka, M. 2008. A wikipedia-based multilingual retrieval model. In Advances in Information Retrieval. Lecture Notes in Computer Science, Chapter 51, 522--530. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Prettenhofer, P. and Stein, B. 2010. Cross-Language text classification using structural correspondence learning. In Proceedings of the 48th Annual Meeting of the Association of Computational Linguistics (ACL’10). Association for Computational Linguistics, 1118--1127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Quattoni, A., Collins, M., and Darrell, T. 2007. Learning visual representations using images with captions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 1--8.Google ScholarGoogle Scholar
  35. Quionero-Candela, J., Sugiyama, M., Schwaighofer, A., and Lawrence, N. D. 2009. Dataset Shift in Machine Learning. The MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Rigutini, L., Maggini, M., and Liu, B. 2005. An EM based training algorithm for cross-language text categorization. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence. 529--535. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Riloff, E., Schafer, C., and Yarowsky, D. 2002. Inducing information extraction systems for new languages via cross-language projection. In Proceedings of the 19th International Conference on Computational Linguistics. Association for Computational Linguistics, Morristown, NJ. 1--7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Shalev-Shwartz, S., Singer, Y., and Srebro, N. 2007. Pegasos: Primal estimated sub-gradient solver for svm. In Proceedings of the 24th International Conference on Machine Learning (ICML’07). ACM, New York. 807--814. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Shimodaira, H. 2000. Improving predictive inference under covariate shift by weighting the log-likelihood function. J. Statist. Plan. Inference 90, 2, 227--244.Google ScholarGoogle ScholarCross RefCross Ref
  40. Tibshirani, R. 1996. Regression shrinkage and selection via the lasso. J. Royal Statist. Soc. B 58, 1, 267--288.Google ScholarGoogle ScholarCross RefCross Ref
  41. Tsuruoka, Y., Tsujii, J., and Ananiadou, S. 2009. Stochastic gradient descent training for l1-regularized log-linear models with cumulative penalty. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. Association for Computational Linguistics. 477--485. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Wan, X. 2009. Co-training for cross-lingual sentiment classification. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. Association for Computational Linguistics. 235--243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Wei, B. and Pal, C. 2010. Cross lingual adaptation: An experiment on sentiment classifications. In Proceedings of the ACL Conference Short Papers. Association for Computational Linguistics. 258--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Wu, K., Wang, X., and Lu, B.-L. 2008. Cross language text categorization using a bilingual lexicon. In Proceedings of the 3rd International Joint Conference on Natural Language Processing.Google ScholarGoogle Scholar
  45. Zhang, T. 2004. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In Proceedings of the 21st International Conference on Machine Learning (ICML’04). ACM, New York. 116--124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Zou, H. and Hastie, T. 2005. Regularization and variable selection via the elastic net. J. Royal Statist. Soc. B 67, 2, 301--320.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Cross-Lingual Adaptation Using Structural Correspondence Learning

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Intelligent Systems and Technology
          ACM Transactions on Intelligent Systems and Technology  Volume 3, Issue 1
          October 2011
          391 pages
          ISSN:2157-6904
          EISSN:2157-6912
          DOI:10.1145/2036264
          Issue’s Table of Contents

          Copyright © 2011 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 October 2011
          • Accepted: 1 December 2010
          • Revised: 1 October 2010
          • Received: 1 June 2010
          Published in tist Volume 3, Issue 1

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader