article

Correction of errors in a verb modality corpus for machine translation with a machine-learning method

Authors:
Masaki Murata

National Institute of Information and Communications Technology

National Institute of Information and Communications Technology
View Profile

,
Masao Utiyama

National Institute of Information and Communications Technology

National Institute of Information and Communications Technology
View Profile

,
Kiyotaka Uchimoto

National Institute of Information and Communications Technology

National Institute of Information and Communications Technology
View Profile

,
Hitoshi Isahara

National Institute of Information and Communications Technology

National Institute of Information and Communications Technology
View Profile

,
Qing Ma

Ryukoku University, and National Institute of Information and Communications Technology

Ryukoku University, and National Institute of Information and Communications Technology
View Profile

ACM Transactions on Asian Language Information Processing Volume 4 Issue 1pp 18–37https://doi.org/10.1145/1066078.1066080

Published:01 March 2005Publication History

ACM Transactions on Asian Language Information Processing

Abstract

In recent years, various types of tagged corpora have been constructed and much research using tagged corpora has been done. However, tagged corpora contain errors, which impedes the progress of research. Therefore, the correction of errors in corpora is an important research issue. In this study we investigate the correction of such errors, which we call corpus correction. Using machine-learning methods, we applied corpus correction to a verb modality corpus for machine translation. We used the maximum-entropy and decision-list methods as machine-learning methods. We compared several kinds of methods for corpus correction in our experiments, and determined which is most effective by using a statistical test. We obtained several noteworthy findings: (1) Precision was almost the same for both detection and correction, so it is more convenient to do both correction and detection, rather than detection only. (2) In general, the maximum-entropy method worked better than the decision-list method; but the two methods had almost the same precision for the top 50 pieces of extracted data when closed data was used. (3) In terms of precision, the use of closed data was better than the use of open data; however, in terms of the total number of extracted errors, the use of open data was better than the use of closed data. Based on our analysis of these results, we developed a good method for corpus correction. We confirmed the effectiveness of our method by carrying out experiments on machine translation. As corpus-based machine translation continues to be developed, the corpus correction we discuss in this article should prove to be increasingly significant.

References

Abney, S., Schapire, R. E., and Singer, Y. 1999. Boosting applied to tagging and PP attachment. EMNLP/VLC-99.Google Scholar
Cristianini, N. and Shawe-Taylor, J. 2000. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press. Google Scholar
Eskin, E. 2000. Detecting errors within a corpus using anomaly detection. NAACL-2000. Google Scholar
Fukunaga, K. 1972. Introduction to Statistical Pattern Recognition. Academic Press. Google Scholar
Kume, M., Toyoshima, T., and Nagata, M. 1990. Japanese aspect processing for spoken language translation. In Information Processing Society of Japan, the 40th National Convention, 1F-7. 415--416. (In Japanese).Google Scholar
Murata, M., Ma, Q., Uchimoto, K., and Isahara, H. 1999. An example-based approach to Japanese-to-English translation of tense, aspect, and modality. In TMI '99. 66--76.Google Scholar
Murata, M., Uchimoto, K., Ma, Q., and Isahara, H. 2001. Using a support-vector machine for Japanese-to-English translation of tense, aspect, and modality. In Proceedings of the ACL Workshop on the Data-Driven Machine Translation. ACM Press, New York. Google Scholar
Murata, M., Ma, Q., and Isahara, H. 2002. Comparison of three machine-learning methods for Thai part-of-speech tagging. ACM Trans. Asian Language Information Processing 1, 2 (2002), 145--158. Google Scholar
Nagao, M. 1984. A framework of a mechanical translation between Japanese and English by analogy principle. Artificial and Human Intelligence. 173--180. Google Scholar
Pietra, S. D., Pietra, V. D., and Lafferty, J. 1995. Inducing features of random fields. Tech. Rep., CMU-CS-95-144. Carnegie Mellon University. Google Scholar
Ristad, E. S. 1997. Maximum entropy modeling for natural language. ACL/EACL Tutorial Program, Madrid.Google Scholar
Ristad, E. S. 1998. Maximum entropy modeling toolkit. Release 1.6 beta. http://www.mnemonic.com/software/memt.Google Scholar
Rivest, R. L. 1987. Learning decision lists. Machine Learning 2 (1987), 229--246. Google Scholar
Sato, S. 1993. Example-based translation of technical terms. In TMI-93. 58--68.Google Scholar
Shimizu, M. and Narita, N., eds. 1976. The KODANSHA Japanese-English Dictionary. Kodansha. (In Japanese).Google Scholar
Shirai, S., Yokoo, A., and Bond, F. 1990. Generation of tense in newspaper translation. In Proceedings of The Institute of Electronics, Information and Communication Engineers, Autumn Convention. D-69. (In Japanese).Google Scholar
Sumita, E. 1992. Example-based transfer of Japanese adnominal particles into English. IEICE Trans Information and Systems (1992), E75-D(4).Google Scholar
Yarowsky, D. 1994. Decision lists for lexical ambiguity resolution: Application to accent restoration in Spanish and French. In Proceedings of the 32rd Annual Meeting of the Association of the Computational Linguistics. 88--95. Google Scholar

Index Terms

Correction of errors in a verb modality corpus for machine translation with a machine-learning method
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Machine translation

Recommendations

Word Sense Based Hindi-Tamil Statistical Machine Translation

Corpus based natural language processing has emerged with great success in recent years. It is not only used for languages like English, French, Spanish, and Hindi but also is widely used for languages like Tamil, Telugu etc. This paper focuses to ...
Read More
Overview of Verb Phrase Translation in Machine Translation: English to Tamil and Hindi to Tamil
FIRE '18: Proceedings of the 10th Annual Meeting of the Forum for Information Retrieval Evaluation

We present an overview of verb phrase translation in machine translation from English to Tamil and Hindi to Tamil track, where English, Hindi and Tamil belong to three different language families, namely, Indo-European, Indo-Aryan and Dravidian family ...
Read More
Learning to Recognize Textual Entailment in Japanese Texts with the Utilization of Machine Translation
Special Issue on RITE

Recognizing Textual Entailment (RTE) is a fundamental task in Natural Language Understanding. The task is to decide whether the meaning of a text can be inferred from the meaning of another one. In this article, we conduct an empirical study of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Asian Language Information Processing Volume 4, Issue 1
March 2005
52 pages
ISSN:1530-0226
EISSN:1558-3430
DOI:10.1145/1066078
Issue’s Table of Contents

Copyright © 2005 ACM
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 March 2005
Published in talip Volume 4, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
corpus correction
machine learning
machine translation
modality corpus
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 10
  Total Citations
  View Citations
- 584
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Correction of errors in a verb modality corpus for machine translation with a machine-learning method

ACM Transactions on Asian Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Word Sense Based Hindi-Tamil Statistical Machine Translation

Overview of Verb Phrase Translation in Machine Translation: English to Tamil and Hindi to Tamil

Learning to Recognize Textual Entailment in Japanese Texts with the Utilization of Machine Translation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Correction of errors in a verb modality corpus for machine translation with a machine-learning method

ACM Transactions on Asian Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Word Sense Based Hindi-Tamil Statistical Machine Translation

Overview of Verb Phrase Translation in Machine Translation: English to Tamil and Hindi to Tamil

Learning to Recognize Textual Entailment in Japanese Texts with the Utilization of Machine Translation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media