Skip to main content

Analyzing Code-Switching Rules for English–Hindi Code-Mixed Text

  • Conference paper
  • First Online:
Emerging Technology in Modelling and Graphics

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 937))

Abstract

In this work, we have proposed an efficient and less resource intensive strategy for parsing and analyzing switching points in code-mixed data. Specifically, we have explored the rules of code-switching in Hindi–English code-mixed data. The work involves code-mixed text extraction, translation of the extracted texts to its pure form, forming word pairs, annotation of these using of Parts-of-Speech tags and recognition of the rules that govern switching in code-mixed text. We have created three models, viz. baseline model, lexicon-based model, and machine learning-based model, and found out the individual accuracies of these models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://dasdipankar.com/ICON_NLP_Tool_Contest_2017/HIEN.json.

  2. 2.

    https://www.ashtangayoga.info/sanskrit/transliteration/transliteration-tool/.

  3. 3.

    http://www.nltk.org.

  4. 4.

    http://sivareddy.in/downloads.

  5. 5.

    http://conceptnet.io/.

References

  1. P. Agarwal, A. Sharma, J. Grover, M. Sikka, K. Rudra, M. Choudhury, I may talk in English but gaali toh hindi mein hi denge: a study of English-Hindi code-switching and swearing pattern on social networks, in 2017 9th International Conference on Communication Systems and Networks (COMSNETS) (IEEE, New York, 2017), pp. 554–557

    Google Scholar 

  2. U.Z. Ahmed, K. Bali, M. Choudhury, V. Sowmya, Challenges in designing input method editors for Indian languages: the role of word-origin and context, in Proceedings of the Workshop on Advances in Text Input Methods (WTIM 2011) (2011), pp. 1–9

    Google Scholar 

  3. J.P. Blom, J.J. Gumperz et al., Social meaning in linguistic structure: code-switching in Norway. The bilingualism reader (2000), pp. 111–136

    Google Scholar 

  4. M.S. Cárdenas-Claros, N. Isharyanti, Code-switching and code-mixing in internet chatting: between ‘yes’, ‘ya’, and ‘si’-a case study. Jalt Call J. 5(3), 67–78 (2009)

    Google Scholar 

  5. A. Chopde, Itrans-Indian language transliteration package (2006). http://www.aczoom.com/itrans

  6. M. Choudhury, K. Bali, T. Dasgupta, A. Basu, Resource creation for training and testing of transliteration systems for Indian languages, in LREC (2010)

    Google Scholar 

  7. D. Crystal, A Dictionary of Language (University of Chicago Press, Chicago, 2001)

    Google Scholar 

  8. B. Danet, S.C. Herring, Introduction: the multilingual internet. J. Comput.-Med. Commun. 9(1), JCMC9110 (2003)

    Google Scholar 

  9. B. Danet, S.C. Herring, Multilingualism on the internet, in Language and Communication: Diversity and Change Handbook of Applied Linguistics vol 9, (2007), pp. 553–592

    Google Scholar 

  10. A. Dey, P. Fung, A Hindi-English code-switching corpus, in LREC (2014), pp. 2410–2413

    Google Scholar 

  11. S. Kalmegh, Analysis of WEKA data mining algorithm REPTree, simple cart and randomtree for classification of Indian news. Int. J. Innov. Sci. Eng. Technol. 2(2), 438–446 (2015)

    Google Scholar 

  12. D.C. Li, Cantonese-English code-switching research in Hong Kong: a survey of recent research. Hong Kong Engl.: Auton. Creat. 1, 79 (2002)

    Google Scholar 

  13. S. Mandal, D. Das, S.K. Mahata, Preparing Bengali-English code-mixed corpus for sentiment analysis of Indian languages, in The 13th Workshop on Asian Language Resources (2018), p. 57

    Google Scholar 

  14. J. Patro, B. Samanta, S. Singh, A. Basu, P. Mukherjee, M. Choudhury, A. Mukherjee, All that is English may be Hindi: enhancing language identification through automatic ranking of likeliness of word borrowing in social media (2017). arXiv preprint arXiv:170708446

  15. Y. Vyas, S. Gella, J. Sharma, K. Bali, M. Choudhury, POS tagging of English-Hindi code-mixed social media content, in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2014), pp. 974–979

    Google Scholar 

Download references

Acknowledgements

This work is supported by Media Lab Asia, MeitY, Government of India, under the Visvesvaraya Ph.D. Scheme for Electronics & IT.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sainik Kumar Mahata .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mahata, S.K., Makhija, S., Agnihotri, A., Das, D. (2020). Analyzing Code-Switching Rules for English–Hindi Code-Mixed Text. In: Mandal, J., Bhattacharya, D. (eds) Emerging Technology in Modelling and Graphics. Advances in Intelligent Systems and Computing, vol 937. Springer, Singapore. https://doi.org/10.1007/978-981-13-7403-6_14

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-7403-6_14

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-7402-9

  • Online ISBN: 978-981-13-7403-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics