Robust Bilingual Word Alignment for Machine Aided Translation

Dagan, I.; Church, K.; Gale, W.

doi:10.1007/978-94-017-2390-9_13

I. Dagan,
K. Church &
W. Gale

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 11))

371 Accesses
2 Citations

Abstract

We have developed a new program called word_align for aligning parallel text, text such as the Canadian Hansards that are available in two or more languages. The program takes the output of char_align (Church, 1993), a robust alternative to sentence-based alignment programs, and applies word-level constraints using a version of Brown et al.’s Model 2 (Brown et al., 1993), modified and extended to deal with robustness issues. Word_align was tested on a subset of Canadian Hansards supplied by Simard (Simard et al., 1992). The combination of word_align plus char_align reduces the variance (average square error) by a factor of 5 over char_align alone. More importantly, because word_align and char_align were designed to work robustly on texts that are smaller and more noisy than the Hansards, it has been possible to successfully deploy the programs at AT&T Language Line Services, a commercial translation service, to help them with difficult terminology.

(Part of) This work was accomplished at AT&T.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Baum, L. E. 1972. An inequality and an associated maximization technique in statistical estimation of probabilistic functions of a Markov process. Inequalities, 3: 1–8.
Google Scholar
Brown, P., Cocke, J., Della Pietra, S., Della Pietra, V., Jelinek, F., Mercer, R. L. and Roossin, P.S. 1990. A statistical approach to language translation. Computational Linguistics, 16 (2): 79–85.
Google Scholar
Brown, P., Lai, J. and Mercer, R. 1991a. Aligning sentences in parallel corpora. In Proceedings of the 29th Annual Meeting of the ACL, pp. 169–176.
Google Scholar
Brown, P., Della Pietra, S., Della Pietra, V. and Mercer, R. 1991b. Word sense disambiguation using statistical methods. In Proceedings of the 29th Annual Meeting of the ACL, pp. 264–270.
Google Scholar
Brown, P., Della Pietra, S., Della Pietra, V. and Mercer, R. 1993. The mathematics of statistical machine translation: parameter estimation. Computational Linguistics, 19 (2): 263–311.
Google Scholar
Church, K. W. 1993. Char_align: A program for aligning parallel texts at the character level. In Proceedings of the 31st Annual Meeting of the ACL, pp. 1–8.
Google Scholar
Dempster, A. P., Laird, N. M. and Rubin, D. B. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39 (B): 1–38.
Google Scholar
Gale, W. and Church, K. 1991a. Identifying word correspondence in parallel text. In Proceedings of the DARPA Workshop on Speech and Natural Language.
Google Scholar
Gale, W. and Church, K. 1991b. A program for aligning sentences in bilingual corpora. In Proceedings of the 29th Annual Meeting of the ACL, pp. 177–184.
Google Scholar
Gale, W., Church, K. and Yarowsky, D. 1992. Using bilingual materials to develop word sense disambiguation methods. In Proceedings of the International Conference on Theoretical and Methodological Issues in Machine Translation, pp. 101–112.
Google Scholar
Isabelle, P. 1992. Bi-textual aids for translators. In Proceedings of the Annual Conference of the UW Center for the New OED and Text Research.
Google Scholar
Kay, M. and Roscheisen, M. 1993. Text-translation alignment. Computational Linguistics, 19 (1): 121–142.
Google Scholar
Klavans, J. and Tzoukermann, E. 1990. The BICORD system. In Proceedings of COLING 1990, Helsinki, Finland, pp. 174–178.
Google Scholar
Kupiec, J. 1993. An algorithm for finding noun phrase correspondences in bilingual corpora. In Proceedings of the 31st Annual Meeting of the ACL, pp. 17–22.
Google Scholar
Landauer, T. K. and Littman, M. L. 1990. Fully automatic cross-language document retrieval using latent semantic indexing. In Proceedings of the Annual Conference of the UW Center for the New OED and Text Research, pp. 31–38.
Google Scholar
Matsumoto, Y., Ishimoto, H., Utsuro, T. and Nagao, M. 1993. Structural matching of parallel texts. In Proceedings of the 31st Annual Meeting of the ACL, pp. 23–30.
Google Scholar
Ogden, W. and Gonzales, M. 1993. Norm — a system for translators. Demonstration at ARPA Workshop on Human Language Technology.
Google Scholar
Sadler, V. 1989. Working with analogical semantics: Disambiguation techniques in DLT. Foris Publications.
Google Scholar
Simard, M. Foster, G. and Isabelle, P. 1992. Using cognates to align sentences in bilingual corpora. In Proceedings of the International Conference on Theoretical and Methodological Issues in Machine Translation, pp. 67–82.
Google Scholar
Smadja, F. 1992. How to compile a bilingual collocational lexicon automatically. In AAAI Workshop on Statistically-based Natural Language Processing Techniques,July.
Google Scholar
Warwick, S., Hajic, J. and Russell, G. 1990. Searching on tagged corpora: linguistically motivated concordance analysis. In Proceedings of the Annual Conference of the UW Center for the New OED and Text Research, pp. 10–18.
Google Scholar

Download references

Authors

I. Dagan
View author publications
You can also search for this author in PubMed Google Scholar
K. Church
View author publications
You can also search for this author in PubMed Google Scholar
W. Gale
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ISSCO, University of Geneva, Switzerland
Susan Armstrong & Sandra Manzi &
AT & T Labs-Research, USA
Kenneth Church
Xerox Research Centre Europe, France
Pierre Isabelle
Bell Laboratories, Lucent, USA
Evelyne Tzoukermann
Johns Hopkins University, Baltimore, Maryland, USA
David Yarowsky

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Dagan, I., Church, K., Gale, W. (1999). Robust Bilingual Word Alignment for Machine Aided Translation. In: Armstrong, S., Church, K., Isabelle, P., Manzi, S., Tzoukermann, E., Yarowsky, D. (eds) Natural Language Processing Using Very Large Corpora. Text, Speech and Language Technology, vol 11. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-2390-9_13

Download citation

DOI: https://doi.org/10.1007/978-94-017-2390-9_13
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-5349-7
Online ISBN: 978-94-017-2390-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics