Analyzing Code-Switching Rules for English–Hindi Code-Mixed Text

Mahata, Sainik Kumar; Makhija, Sushnat; Agnihotri, Ayushi; Das, Dipankar

doi:10.1007/978-981-13-7403-6_14

Sainik Kumar Mahata¹⁶,
Sushnat Makhija¹⁷,
Ayushi Agnihotri¹⁷ &
…
Dipankar Das¹⁶

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 937))

2707 Accesses
4 Citations

Abstract

In this work, we have proposed an efficient and less resource intensive strategy for parsing and analyzing switching points in code-mixed data. Specifically, we have explored the rules of code-switching in Hindi–English code-mixed data. The work involves code-mixed text extraction, translation of the extracted texts to its pure form, forming word pairs, annotation of these using of Parts-of-Speech tags and recognition of the rules that govern switching in code-mixed text. We have created three models, viz. baseline model, lexicon-based model, and machine learning-based model, and found out the individual accuracies of these models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

P. Agarwal, A. Sharma, J. Grover, M. Sikka, K. Rudra, M. Choudhury, I may talk in English but gaali toh hindi mein hi denge: a study of English-Hindi code-switching and swearing pattern on social networks, in 2017 9th International Conference on Communication Systems and Networks (COMSNETS) (IEEE, New York, 2017), pp. 554–557
Google Scholar
U.Z. Ahmed, K. Bali, M. Choudhury, V. Sowmya, Challenges in designing input method editors for Indian languages: the role of word-origin and context, in Proceedings of the Workshop on Advances in Text Input Methods (WTIM 2011) (2011), pp. 1–9
Google Scholar
J.P. Blom, J.J. Gumperz et al., Social meaning in linguistic structure: code-switching in Norway. The bilingualism reader (2000), pp. 111–136
Google Scholar
M.S. Cárdenas-Claros, N. Isharyanti, Code-switching and code-mixing in internet chatting: between ‘yes’, ‘ya’, and ‘si’-a case study. Jalt Call J. 5(3), 67–78 (2009)
Google Scholar
A. Chopde, Itrans-Indian language transliteration package (2006). http://www.aczoom.com/itrans
M. Choudhury, K. Bali, T. Dasgupta, A. Basu, Resource creation for training and testing of transliteration systems for Indian languages, in LREC (2010)
Google Scholar
D. Crystal, A Dictionary of Language (University of Chicago Press, Chicago, 2001)
Google Scholar
B. Danet, S.C. Herring, Introduction: the multilingual internet. J. Comput.-Med. Commun. 9(1), JCMC9110 (2003)
Google Scholar
B. Danet, S.C. Herring, Multilingualism on the internet, in Language and Communication: Diversity and Change Handbook of Applied Linguistics vol 9, (2007), pp. 553–592
Google Scholar
A. Dey, P. Fung, A Hindi-English code-switching corpus, in LREC (2014), pp. 2410–2413
Google Scholar
S. Kalmegh, Analysis of WEKA data mining algorithm REPTree, simple cart and randomtree for classification of Indian news. Int. J. Innov. Sci. Eng. Technol. 2(2), 438–446 (2015)
Google Scholar
D.C. Li, Cantonese-English code-switching research in Hong Kong: a survey of recent research. Hong Kong Engl.: Auton. Creat. 1, 79 (2002)
Google Scholar
S. Mandal, D. Das, S.K. Mahata, Preparing Bengali-English code-mixed corpus for sentiment analysis of Indian languages, in The 13th Workshop on Asian Language Resources (2018), p. 57
Google Scholar
J. Patro, B. Samanta, S. Singh, A. Basu, P. Mukherjee, M. Choudhury, A. Mukherjee, All that is English may be Hindi: enhancing language identification through automatic ranking of likeliness of word borrowing in social media (2017). arXiv preprint arXiv:170708446
Y. Vyas, S. Gella, J. Sharma, K. Bali, M. Choudhury, POS tagging of English-Hindi code-mixed social media content, in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2014), pp. 974–979
Google Scholar

Download references

Acknowledgements

This work is supported by Media Lab Asia, MeitY, Government of India, under the Visvesvaraya Ph.D. Scheme for Electronics & IT.

Author information

Authors and Affiliations

Jadavpur University, Kolkata, India
Sainik Kumar Mahata & Dipankar Das
Rajasthan Technical University, Kota, India
Sushnat Makhija & Ayushi Agnihotri

Authors

Sainik Kumar Mahata
View author publications
You can also search for this author in PubMed Google Scholar
Sushnat Makhija
View author publications
You can also search for this author in PubMed Google Scholar
Ayushi Agnihotri
View author publications
You can also search for this author in PubMed Google Scholar
Dipankar Das
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sainik Kumar Mahata .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, University of Kalyani, Kalyani, West Bengal, India
Jyotsna Kumar Mandal
Department of Computer Science and Engineering, Institute of Engineering and Management, Kolkata, West Bengal, India
Debika Bhattacharya

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mahata, S.K., Makhija, S., Agnihotri, A., Das, D. (2020). Analyzing Code-Switching Rules for English–Hindi Code-Mixed Text. In: Mandal, J., Bhattacharya, D. (eds) Emerging Technology in Modelling and Graphics. Advances in Intelligent Systems and Computing, vol 937. Springer, Singapore. https://doi.org/10.1007/978-981-13-7403-6_14

Download citation

DOI: https://doi.org/10.1007/978-981-13-7403-6_14
Published: 17 July 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-7402-9
Online ISBN: 978-981-13-7403-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics