Skip to main content
Log in

An enhanced sentiment dictionary for domain adaptation with multi-domain dataset in Tamil language (ESD-DA)

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Mostly sentiment analysis employs dictionary approaches for recognizing the polarity of terms in a review. However, in sentiment analysis between different domains called domain adaptation (DA), the sentiment lexicon disappoints that leads to the feature mismatch problem. Now, many e-commerce sites try to process reviews in their native languages. In this paper, we propose an enhanced dictionary in our native language (Tamil) that aims at building contextual relationships among the terms of multi-domain datasets that tries to minimize the feature mismatch problem. The proposed dictionary employs both labeled and unlabeled data from the source domain and unlabeled data from the target domain. More precisely, the initial dictionary explores pointwise mutual information for calculating contextual weight then the final dictionary estimates the rank score based on the importance of terms among all the reviews. This work intends to classify reviews of multiple target domains in Tamil by using the unified dictionary with a large number of vocabularies. This extendible dictionary significantly improves the accuracy of DA with the other baseline methods and handles many words in multiple domains with ease.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. https://www.britannica.com/topic/Tamil-language.

  2. https://en.wikipedia.org/wiki/List_of_languages_by_total_number_of_speakers.

  3. http://www.cs.jhu.edu/~mdredze/code.php.

  4. https://www.indiaglitz.com/tamil-movie-reviews.

  5. http://www.cs.jhu.edu/~mdredze/code.php.

  6. http://en.wikipedia.org/wiki/Listofemoticons.

  7. https://translate.google.com/toolkit.

  8. https://github.com/AshokR/TamilNLP/wiki/POS-Tagger.

  9. https://github.com/AshokR/~/TamilStopWords.txt.

  10. https://github.com/krishna41999/TamilSentimentDictionary.

References

  • Ahmed M, Chen Q, Li Z (2020) Constructing domain-dependent sentiment dictionary for sentiment analysis. Neural Comput Appl 32:14719–14732

    Article  Google Scholar 

  • Al-Moslmi T, Omar N, Abdullah S, Albared M (2017) Approaches to cross-domain sentiment analysis: a systematic literature review. IEEE Access 5:16173–16192

    Article  Google Scholar 

  • Aral S (2013) The problem with online ratings. http://sloanreview.mit.edu/article/the-problem-with-online-ratings-2/. Accessed 10 May 2020

  • Blitzer J, McDonald R, Pereira F (2006) Domain adaptation with structural correspondence learning. In: Proceedings of the 2006 conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 120–128

  • Blitzer J, Dredze M, Pereira F et al (2007) Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. ACL 7:440–447

    Google Scholar 

  • Bollegala D, Weir D, Carroll J (2013) Cross-domain sentiment classification using a sentiment sensitive thesaurus. IEEE Trans Knowl Data Eng 25:1719–1731

    Article  Google Scholar 

  • Bollegala D, Mu T, Goulermas JY (2016) Cross-domain sentiment classification using sentiment sensitive embeddings. IEEE Trans Knowl Data Eng 28:398–410

    Article  Google Scholar 

  • Cai Y, Yang K, Huang D, Zhou Z, Lei X, Xie H, Wong TL (2017) A hybrid model for opinion mining based on domain sentiment dictionary. Int J Mach Learn Cybern 10:1–12. https://doi.org/10.1007/s13042-017-0757-6

    Article  Google Scholar 

  • Chen Y, Skiena S (2014) Building sentiment lexicons for all major languages. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (Volume 2: Short Papers), vol 2. pp 383–389

  • Dai W, Yang Q, Xue GR, Yu Y (2007) Boosting for transfer learning. In: Proceedings of the 24th international conference on machine learning. ACM, pp 193–200

  • Das A, Bandyopadhyay S (2010) Sentiwordnet for Indian languages. In: Proceedings of the eighth workshop on Asian language resouces. pp 56–63

  • Das A, Gambäck B (2012) Sentimantics: conceptual spaces for lexical sentiment polarity representation with contextuality. In: Proceedings of the 3rd workshop in computational approaches to subjectivity and sentiment analysis. Association for Computational Linguistics, pp 38–46

  • Denecke K (2009) Are sentiwordnet scores suited for multi-domain sentiment classification? In: 2009. ICDIM 2009. Fourth international conference on digital information management. IEEE, pp 1–6

  • Dhanalakshmi V, Kumar A, Shivapratap G, Soman K, Rajendran S (2009) Tamil pos tagging using linear programming. Int J Recent Trends Eng 1(2):166

    Google Scholar 

  • Ganin Y, Lempitsky V (2015) Unsupervised domain adaptation by backpropagation. In: Proceedings of the 32nd international conference on international conference on machine learning—Volume 37, JMLR.org, ICML’15. pp 1180–1189

  • Gindl S, Weichselbraun A, Scharl A (2010) Cross-domain contextualisation of sentiment lexicons. In: Proceedings of 19th European conference on artificial intelligence (ECAI 2010). pp 771–776

  • Glorot X, Bordes A, Bengio Y (2011) Domain adaptation for large-scale sentiment classification: a deep learning approach. In: Proceedings of the 28th international conference on international conference on machine learning, Omnipress, Madison, WI, USA, ICML’11. pp 513–520

  • Hatzivassiloglou V, McKeown KR (1997) Predicting the semantic orientation of adjectives. In: Proceedings of the eighth conference on European chapter of the Association for Computational Linguistics. Association for Computational Linguistics, pp 174–181

  • Jha V, Savitha R, Shenoy PD, Venugopal K, Sangaiah AK (2018) A novel sentiment aware dictionary for multi-domain sentiment classification. Comput Electr Eng 69:585–597

    Article  Google Scholar 

  • Jiang L, Yu M, Zhou M, Liu X, Zhao T (2011) Target-dependent twitter sentiment classification. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, vol 1. Association for Computational Linguistics, pp 151–160

  • Kannan A, Mohanty G, Mamidi R (2016) Towards building a SentiWordNet for Tamil. In: Proceedings of the 13th international conference on natural language processing, NLP Association of India, Varanasi, India. pp 30–35. https://www.aclweb.org/anthology/W16-6305. Accessed 02 Oct 2019

  • Krishnakumari K, Sivasankar E (2018) Scalable aspect-based summarization in the hadoop environment. In: Aggarwal V, Bhatnagar V, Mishra D (eds) Big data analytics. Springer, pp 439–449

  • Krishnakumari K, Sivasankar E, Radhakrishnan S (2020) Hyperparameter tuning in convolutional neural networks for domain adaptation in sentiment classification (htcnn-dasc). Soft Comput 24(5):3511–3527

    Article  Google Scholar 

  • Li T, Sindhwani V, Ding C, Zhang Y (2009) Knowledge transformation for cross-domain sentiment classification. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. ACM, pp 716–717

  • Liu B (2012) Sentiment analysis and opinion mining. Synth Lect Hum Lang Technol 5(1):1–167

    Article  Google Scholar 

  • Liu B, Hu M, Cheng J (2005) Opinion observer: analyzing and comparing opinions on the web. In: Proceedings of the 14th international conference on World Wide Web. ACM, pp 342–351

  • Mritunjay K, Akhilesh T, Dushyant K, Sreedhar P (2017) A study by kpmg in india and google april 2017. https://assets.kpmg.com/content/dam/kpmg/in/pdf/2017/04/Indian-languages-Defining-Indias-Internet.pdf. Accessed 9 Nov 2018

  • Neviarouskaya A, Prendinger H, Ishizuka M (2011) Sentiful: a lexicon for sentiment analysis. IEEE Trans Affect Comput 2:22–36

    Article  Google Scholar 

  • Padmamala R, Prema V (2017) Sentiment analysis of online tamil contents using recursive neural network models approach for tamil language. In: 2017 IEEE international conference on smart technologies and management for computing, communication, controls, energy and materials (ICSTM). IEEE, pp 28–31

  • Pan SJ, Ni X, Sun JT, Yang Q, Chen Z (2010) Cross-domain sentiment classification via spectral feature alignment. In: Proceedings of the 19th international conference on world wide web. ACM, pp 751–760

  • Pan W, Zhong E, Yang Q (2012) Transfer learning for text mining. In: Aggarwal C, Zhai C (eds) Mining text data. Springer, pp 223–257

  • Patra BG, Das D, Das A, Prasath R (2015) Shared task on sentiment analysis in Indian languages (sail) tweets-an overview. In: International conference on mining intelligence and knowledge exploration. Springer, pp 650–655

  • Rajendran S, Arulmozi S, Shanmugam BK, Baskaran S, Thiagarajan S (2002) Tamil wordnet. In: Proceedings of the first international global WordNet conference. Mysore, vol 152. pp 271–274

  • Ravi K, Ravi V (2015) A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowl Based Syst 89:14–46

    Article  Google Scholar 

  • Ravishankar N, Raghunathan S (2017) Corpus based sentiment classification of tamil movie tweets using syntactic patterns. Comput Sci 8(2):172–178

    Google Scholar 

  • Ravishankar N, Shriram R, Vengatesan K, Mahajan S, Sanjeevikumar P, Umashankar S (2018) Grammar rule-based sentiment categorization model for tamil tweets. In: Dash S, Naidu P, Bayindir R, Das S (eds) Artificial intelligence and evolutionary computations in engineering systems. Springer, pp 687–695

  • Salehan M, Kim DJ (2016) Predicting the performance of online consumer reviews: a sentiment mining approach to big data analytics. Decis Support Syst 81:30–40

    Article  Google Scholar 

  • Sarkar K, Chakraborty S (2015) A sentiment analysis system for Indian language tweets. In: International conference on mining intelligence and knowledge exploration. Springer, pp 694–702

  • Se S, Vinayakumar R, Kumar MA, Soman K (2015) Amrita-cen@ sail2015: sentiment analysis in Indian languages. In: International conference on mining intelligence and knowledge exploration. Springer, pp 703–710

  • Se S, Vinayakumar R, Kumar MA, Soman K (2016) Predicting the sentimental reviews in tamil movie using machine learning algorithms. Indian J Sci Technol 9(45):1–5

    Article  Google Scholar 

  • Thangarasu M, Manavalan R (2012) Stemmers for tamil language: performance analysis. Int J Comput Sci Eng Technol 4:902–908 arXiv:1310.0754

    Google Scholar 

  • Thilagavathi R, Krishnakumari K (2016) Tamil english language sentiment analysis system. Int J Eng Res Technol 4:114–118

    Google Scholar 

  • Turney PD (2002) Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 417–424

  • Weeds J, Weir D (2005) Co-occurrence retrieval: a flexible framework for lexical distributional similarity. Comput Linguist 31:439–475

    Article  Google Scholar 

  • Xing FZ, Pallucchini F, Cambria E (2019) Cognitive-inspired domain adaptation of sentiment lexicons. Inf Process Manag 56(3):554–564

    Article  Google Scholar 

  • Zhendong D, Qiang D (2006) Hownet and the computation of meaning (with Cd-rom). World Scientific, Singapore

    Google Scholar 

Download references

Acknowledgements

The authors thank the anonymous reviewers and our friends for their critics and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. Krishnakumari.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sivasankar, E., Krishnakumari, K. & Balasubramanian, P. An enhanced sentiment dictionary for domain adaptation with multi-domain dataset in Tamil language (ESD-DA). Soft Comput 25, 3697–3711 (2021). https://doi.org/10.1007/s00500-020-05400-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-020-05400-x

Keywords

Navigation