research-article

Deep Learning Approach for the Morphological Synthesis in Malayalam and Tamil at the Character Level

Authors:
B. Premjith

Center for Computational Engineering and Networking (CEN), Amrita Vishwa Vidyapeetham, India

Center for Computational Engineering and Networking (CEN), Amrita Vishwa Vidyapeetham, India
View Profile

,
K. P. Soman

Center for Computational Engineering and Networking (CEN), Amrita Vishwa Vidyapeetham, India

Center for Computational Engineering and Networking (CEN), Amrita Vishwa Vidyapeetham, India
View Profile

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 20 Issue 6Article No.: 94pp 1–17https://doi.org/10.1145/3457976

Published:12 August 2021Publication History

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

Morphological synthesis is one of the main components of Machine Translation (MT) frameworks, especially when any one or both of the source and target languages are morphologically rich. Morphological synthesis is the process of combining two words or two morphemes according to the Sandhi rules of the morphologically rich language. Malayalam and Tamil are two languages in India which are morphologically abundant as well as agglutinative. Morphological synthesis of a word in these two languages is challenging basically because of the following reasons: (1) Abundance in morphology; (2) Complex Sandhi rules; (3) The possibilty in Malayalam to form words by combining words that belong to different syntactic categories (for example, noun and verb); and (4) The construction of a sentence by combining multiple words. We formulated the task of the morphological generation of nouns and verbs of Malayalam and Tamil as a character-to-character sequence tagging problem. In this article, we used deep learning architectures like Recurrent Neural Network (RNN), Long Short-Term Memory Networks (LSTM), Gated Recurrent Unit (GRU), and their stacked and bidirectional versions for the implementation of morphological synthesis at the character level. In addition to that, we investigated the performance of the combination of the aforementioned deep learning architectures and the Conditional Random Field (CRF) in the morphological synthesis of nouns and verbs in Malayalam and Tamil. We observed that the addition of CRF to the Bidirectional LSTM/GRU architecture achieved more than 99% accuracy in the morphological synthesis of Malayalam and Tamil nouns and verbs.

References

V. P. Abeera, S. Aparna, R. U. Rekha, M. Anand Kumar, V. Dhanalakshmi, K. P., Soman, and S. Rajendran. 2012. Morphological analyzer for Malayalam using machine learning. In Data Engineering and Management. Springer, 252–254. Google ScholarDigital Library
M. Anand Kumar, V. Dhanalakshmi, R. U. Rekha, K. P. Soman, and S. Rajendran. 2010. A novel data driven algorithm for Tamil morphological generator. International Journal of Computer Applications12 (2010), 52–56.Google Scholar
P. Anandan, T. V. Geetha, and R. Paratasarathy. 2001. Morphological generator for Tamil. In Tamil Inayam Conference. 46–54.Google Scholar
Premjith B, K. P. Soman, and M. Anand Kumar. 2018. A deep learning approach for Malayalam morphological analysis at character level. Procedia Computer Science 132 (2018), 47–54.Google ScholarDigital Library
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).Google Scholar
Robert Caldwell. 1875. A Comparative Grammar of the Dravidian or South-Indian Family of Languages. Trübner.Google Scholar
M. Christopher. [n.d.]. WordGen for Indian languages: A reverser engineering approach using morphological analyzer database. ([n.d.]).Google Scholar
Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014).Google ScholarDigital Library
W. Daelemans, J. Zavrel, K. van der Sloot, and A. van den Bosch. 2010. TiMBL: Tilburg memory based learner, version 6.3, reference guide (ILK technical report no. 10-01). Tilburg, the Netherlands: Tilburg University (2010).Google Scholar
T. Dhanabalan and T. V. Geetha. 2003. UNL deconverter for Tamil. In International Conference on the Convergences of Knowledge, Culture, Language and Information Technologies.Google Scholar
Jeffrey L. Elman. 1990. Finding structure in time. Cognitive Science 14, 2 (1990), 179–211.Google ScholarCross Ref
P. M. Girish. 2012. Prototype effect in the morphological categories in Malayalam. International Journal of Dravidian Linguistics (IJDL) 41, 2 (2012), 55–63.Google Scholar
Mourad Gridach. 2017. Character-level neural network for biomedical named entity recognition. Journal of Biomedical Informatics 70 (2017), 85–91.Google ScholarCross Ref
Misiriya Shahul Hameed, C. N. Subalalitha, T. V. Geetha, and Ranjani Parthasarathi. 2012. A deconverter framework for Malayalam. In International Conference on Advances in Computing, Communications and Informatics. ACM, 847–856. Google ScholarDigital Library
Sepp Hochreiter, Yoshua Bengio, Paolo Frasconi, and Jürgen Schmidhuber. 2001. Gradient flow in recurrent nets: The difficulty of learning long-term dependencies. A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press.Google Scholar
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735–1780. Google ScholarDigital Library
Zhiheng Huang, Wei Xu, and Kai Yu. 2015. Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015).Google Scholar
Mans Hulden. 2009. FOMA: A finite-state compiler and library. In 12th Conference of the European Chapter of the Association for Computational Linguistics: Demonstrations Session. Association for Computational Linguistics, 29–32. Google ScholarDigital Library
Jisha P. Jayan, R. R. Rajeev, and S. Rajendran. 2011. Morphological analyser and morphological generator for Malayalam-Tamil machine translation. International Journal of Computer Applications 13, 8 (2011), 0975–8887.Google ScholarCross Ref
Dan Jurafsky. 2000. Speech & Language Processing. Pearson Education India. Google ScholarDigital Library
Philipp Koehn. 2009. Statistical Machine Translation. Cambridge University Press. Google ScholarDigital Library
M. Anand Kumar, B. Premjith, Shivkaran Singh, S. Rajendran, and K. P. Soman. 2019. An overview of the shared task on machine translation in Indian languages (MTIL)–2017. Journal of Intelligent Systems 28, 3 (2019), 455–464.Google ScholarCross Ref
Thomas Lehmann. 1993. A Grammar of Modern Tamil. Pondicherry Institute of Linguistics and Culture.Google Scholar
Sivaneasharajah Lushanthan, A. R. Weerasinghe, and D. L. Herath. 2014. Morphological analyzer and generator for Tamil language. In 2014 14th International Conference on Advances in ICT for Emerging Regions (ICTer). IEEE, 190–196.Google Scholar
Xuezhe Ma and Eduard Hovy. 2016. End-to-end sequence labeling via bi-directional lstm-cnns-crf. arXiv preprint arXiv:1603.01354 (2016).Google Scholar
S. Radhakrishnan Mallassery. 1994. Postpositions in a Dravidian Language: Transformational Analysis of Malayalam. Mittal Publications.Google Scholar
S. Menaka, Vijay Sundar Ram, and Sobha Lalitha Devi. 2010. Morphological generator for Tamil. In Knowledge Sharing event on Morphological Analysers and Generators (LDC-IL), (Mysore, India, March 22-23 2010).82–96.Google Scholar
A. G. Menon, S. Saravanan, R. Loganathan, and K. Soman. 2009. Amrita morph analyzer and generator for Tamil: A rule based approach. In Tamil Internet Conference. 239–243.Google Scholar
Guido Minnen, John Carroll, and Darren Pearce. 2000. Robust, applied morphological generation. In 1st International Conference on Natural Language Generation-Volume 14. Association for Computational Linguistics, 201–208. Google ScholarDigital Library
Biji Nair, R. R. Rajeev, and Elizabeth Sherly. 2014. Language dependent features for UNL-Malayalam deconversion. International Journal of Computer Applications 975 (2014), 8887.Google Scholar
K. Parameshwari. 2011. An implementation of APERTIUM morphological analyzer and generator for Tamil. Parsing in Indian Languages (2011), 41.Google Scholar
B. Premjith, M. Anand Kumar, and K. P. Soman. 2019. Neural machine translation system for English to Indian language translation using MTIL parallel corpus. Journal of Intelligent Systems 28, 3 (2019), 387–398.Google ScholarCross Ref
K. T. Radhika and P. C. Reghu Raj. 2013. Semantic role extraction and general concept understanding in Malayalam using Paninian grammar. International Journal of Engineering Research and Development 9, 3 (2013).Google Scholar
K. Rahmath Reji and P. C. Reghu Raj. 2015. A memory based approach to Malayalam noun generation. In 2015 International Conference on Control Communication & Computing India (ICCC). IEEE, 634–637.Google Scholar
Nils Reimers and Iryna Gurevych. 2017. Optimal hyperparameters for deep LSTM-networks for sequence labeling tasks. arXiv preprint arXiv:1707.06799 (2017).Google Scholar
David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. 1985. Learning Internal Representations by Error Propagation. Technical Report. California Univ San Diego La Jolla Inst for Cognitive Science.Google Scholar
Mike Schuster and Kuldip K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45, 11 (1997), 2673–2681. Google ScholarDigital Library
Kai Song, Yue Zhang, Min Zhang, and Weihua Luo. 2018. Improved English to Russian translation by neural suffix prediction. arXiv preprint arXiv:1801.03615 (2018).Google Scholar
S. Sreelekha and Pushpak Bhattacharyya. 2018. Morphology injection for English-Malayalam statistical machine translation. In 1th International Conference on Language Resources and Evaluation (LREC’18).Google Scholar
R. Sunil, Nimtha Manohar, V. Jayan, and K. G. Sulochana. 2012. Morphological analysis and synthesis of verbs in Malayalam. ICTAM-2012 (2012).Google Scholar
Trang Uyen Tran, Ha Thi-Thanh Hoang, and Hiep Xuan Huynh. 2020. Bidirectional independently long short-term memory and conditional random field integrated model for aspect extraction in sentiment analysis. In Frontiers in Intelligent Computing: Theory and Applications. Springer, 131–140.Google Scholar
M. Vidya and Swathy Sekhar. 2018. Malayalam grammar learning software based on finite state transducer. In 2018 International Conference on Power, Signals, Control and Computation (EPSCICON). IEEE, 1–7.Google ScholarCross Ref
Andrew J. Viterbi. 2010. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. In The Foundations of the Digital Wireless World: Selected Works of A. J. Viterbi. World Scientific, 41–50. Google ScholarDigital Library
Jie Yang, Shuailong Liang, and Yue Zhang. 2018. Design challenges and misconceptions in neural sequence labeling. arXiv preprint arXiv:1806.04470 (2018).Google Scholar

Index Terms

Deep Learning Approach for the Morphological Synthesis in Malayalam and Tamil at the Character Level
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Phonology / morphology
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

A deep learning based Part-of-Speech (POS) tagger for Sanskrit language by embedding character level features
FIRE '18: Proceedings of the 10th Annual Meeting of the Forum for Information Retrieval Evaluation

Part-of-Speech (POS) tagging is an important task in Natural Language Processing and numerous taggers have been developed for POS tagging in several languages. In Sanskrit also, one of the oldest languages in the world, many POS taggers were developed. ...
Read More
A deep learning approach for Malayalam morphological analysis at character level
Abstract
Morphological analysis is one of the fundamental tasks in computational processing of natural languages. It is the study of the rules of word construction by analysing the syntactic properties and morphological information. In order to perform ...
Read More
A novel approach to morphological generator for tamil
ICDEM'10: Proceedings of the Second international conference on Data Engineering and Management

Tamil is a morphologically rich language. Being agglutinative language most of the categories expressed are suffixes. Tamil is a post positional inflectional language. The Morphological Generator takes lemma and a Morpho-lexical description as input and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Asian and Low-Resource Language Information Processing Volume 20, Issue 6
November 2021
439 pages
ISSN:2375-4699
EISSN:2375-4702
DOI:10.1145/3476127
Editor:
Imed Zitouni
Google, USA
Issue’s Table of Contents
Copyright © 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 August 2021
- Accepted: 1 March 2021
- Revised: 1 August 2020
- Received: 1 April 2019
Published in tallip Volume 20, Issue 6

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Morphological generation
recurrent neural networks
long short-term memory networks
gated recurrent unit
stacked RNN
bidirectional RNN
conditional random field
Qualifiers
- research-article
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 143
  Total Downloads
- Downloads (Last 12 months)28
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Deep Learning Approach for the Morphological Synthesis in Malayalam and Tamil at the Character Level

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

A deep learning based Part-of-Speech (POS) tagger for Sanskrit language by embedding character level features

A deep learning approach for Malayalam morphological analysis at character level

A novel approach to morphological generator for tamil

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Deep Learning Approach for the Morphological Synthesis in Malayalam and Tamil at the Character Level

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

A deep learning based Part-of-Speech (POS) tagger for Sanskrit language by embedding character level features

A deep learning approach for Malayalam morphological analysis at character level

A novel approach to morphological generator for tamil

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media