Skip to main content

Shallow Syntactic Preprocessing for Statistical Machine Translation

  • Conference paper
Advances in Natural Language Processing (JapTAL 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7614))

Included in the following conference series:

Abstract

Reordering is of essential importance for phrase based statistical machine translation. In this paper, we would like to present a new method of reordering in phrase based statistical machine translation. We inspired from [1] using preprocessing reordering approaches. We used shallow parsing and transformation rules to reorder the source sentence. The experiment results from English-Vietnamese pair showed that our approach achieves significant improvements over MOSES which is the state-of-the art phrase based system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Xia, F., McCord, M.: Improving a Statistical MT System with Automatically Learned Rewrite Patterns. In: Proceedings of Coling 2004, Geneva, Switzerland, August 23–August 27, pp. 508–514 (2004)

    Google Scholar 

  2. Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proceedings of HLT-NAACL 2003, Edmonton, Canada, pp. 127–133 (2003)

    Google Scholar 

  3. Och, F.J., Ney, H.: The alignment template approach to statistical machine translation. Computational Linguistics 30(4), 417–449 (2004)

    Article  MATH  Google Scholar 

  4. Chiang, D.: A hierarchical phrase-based model for statistical machine translation. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005), Ann Arbor, Michigan, pp. 263–270 (June 2005)

    Google Scholar 

  5. Collins, M., Koehn, P., Kucerová, I.: Clause restructuring for statistical machine translation. In: Proc. ACL 2005, Ann Arbor, USA, pp. 531–540 (2005)

    Google Scholar 

  6. Quirk, C., Menezes, A., Cherry, C.: Dependency treelet translation: Syntactically informed phrasal smt. In: Proceedings of ACL 2005, Ann Arbor, Michigan, USA, pp. 271–279 (2005)

    Google Scholar 

  7. Huang, L., Mi, H.: Efficient incremental decoding for tree-to-string translation. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 273–283. Association for Computational Linguistics, Cambridge (2010)

    Google Scholar 

  8. Xu, P., Kang, J., Ringgaard, M., Och, F.: Using a dependency parser to improve smt for subject-object-verb languages. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 245–253. Association for Computational Linguistics, Boulder (2009)

    Google Scholar 

  9. Talbot, D., Kazawa, H., Ichikawa, H., Katz-Brown, J., Seno, M., Och, F.: A lightweight evaluation framework for machine translation reordering. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 12–21. Association for Computational Linguistics, Edinburgh (2011)

    Google Scholar 

  10. Katz-Brown, J., Petrov, S., McDonald, R., Och, F., Talbot, D., Ichikawa, H., Seno, M., Kazawa, H.: Training a parser for machine translation reordering. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, pp. 183–192. Association for Computational Linguistics, Scotland (2011)

    Google Scholar 

  11. Nguyen, T.P., Shimazu, A.: Improving phrase-based smt with morpho-syntactic analysis and transformation. In: Proceedings AMTA 2006 (2006)

    Google Scholar 

  12. Wang, C., Collins, M., Koehn, P.: Chinese syntactic reordering for statistical machine translation. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 737–745. Association for Computational Linguistics, Prague (2007)

    Google Scholar 

  13. Habash, N.: Syntactic preprocessing for statistical machine translation. Proceedings of the 11th MT Summit (2007)

    Google Scholar 

  14. Zhang, Y., Zens, R., Ney, H.: Chunk-level reordering of source language sentences with automatically learned rules for statistical machine translation. In: Proceedings of SSST, NAACL-HLT 2007 / AMTA Workshop on Syntax and Structure in Statistical Translation, pp. 1–8 (2007)

    Google Scholar 

  15. Nguyen, P.T., Shimazu, A., Nguyen, L.-M., Nguyen, V.-V.: A syntactic transformation model for statistical machine translation. International Journal of Computer Processing of Oriental Languages (IJCPOL) 20(2), 1–20 (2007)

    Google Scholar 

  16. Tsuruoka, Y., Tsujii, J., Ananiadou, S.: Fast full parsing by linear-chain conditional random fields. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2009, pp. 790–798. Association for Computational Linguistics, Stroudsburg (2009)

    Google Scholar 

  17. Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: Open source toolkit for statistical machine translation. In: Proceedings of ACL, Demonstration Session (2007)

    Google Scholar 

  18. Nguyen, T.P., Shimazu, A., Ho, T.B., Nguyen, M.L., Nguyen, V.V.: A tree-to-string phrase-based model for statistical machine translation. In: Proceedings of the Twelfth Conference on Computational Natural Language Learning (CoNLL 2008), Coling 2008 Organizing Committee, pp. 143–150 (August 2008)

    Google Scholar 

  19. Stolcke, A.: Srilm - an extensible language modeling toolkit. In: Proceedings of International Conference on Spoken Language Processing, vol. 29, pp. 901–904 (2002)

    Google Scholar 

  20. Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)

    Article  MATH  Google Scholar 

  21. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proc. of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, PA, pp. 311–318 (July 2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Vuong, HT., Tu, D.N., Le Nguyen, M., Van Nguyen, V. (2012). Shallow Syntactic Preprocessing for Statistical Machine Translation. In: Isahara, H., Kanzaki, K. (eds) Advances in Natural Language Processing. JapTAL 2012. Lecture Notes in Computer Science(), vol 7614. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33983-7_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33983-7_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33982-0

  • Online ISBN: 978-3-642-33983-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics