Skip to main content
Log in

Heterogeneous classifier ensemble for sentiment analysis of Bengali and Hindi tweets

  • Published:
Sādhanā Aims and scope Submit manuscript

Abstract

Sentiment analysis is an essential step for analysing social media texts such as tweets and other posts on the various micro-blogging sites. The basic step of sentiment analysis is sentiment polarity detection, which identifies whether an input piece of social media text is positive, negative or neutral. In this paper, we present an approach that combines heterogeneous classifiers in an ensemble for sentiment polarity detection in Bengali and Hindi tweets. Our proposed method constructs an ensemble of three different base classifiers where the feature set for each base classifier is different from each other. We have also incorporated an external knowledge base called sentiment lexicon to augment tweet words with sentiment polarity information retrieved from the sentiment lexicon. Experimental results show the effectiveness of our proposed heterogeneous ensemble model for sentiment polarity detection for both Bengali and Hindi languages. It has been shown that our system outperforms other existing Bengali and Hindi sentiment classification systems to which it is compared.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11

Similar content being viewed by others

Notes

  1. http://amitavadas.com/sentiwordnet.php

  2. http://www.cs.waikato.ac.nz/ml/weka/

  3. https://radimrehurek.com/gensim/models/word2vec.html

References

  1. Bowker J 2003 The concise Oxford dictionary of world religions. Oxford University Press, Oxford

    Google Scholar 

  2. Zhao J, Liu K, Wang G 2008 Adding redundant features for CRFs-based sentence sentiment classification. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 117–126

  3. Joachims T 1998 Making large scale SVM learning practical. Technical Report

  4. Pang B, Lee L, Vaithyanathan S 2002 Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, vol. 10, pp. 79–86

  5. Dave K, Lawrence S, Pennock D M 2003 Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. In: Proceedings of the 12th International Conference on World Wide Web, pp. 519–528

  6. Mullen T, Collier N 2004 Sentiment analysis using support vector machines with diverse information sources. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 412–418

  7. Pang B, Lee L 2008 Opinion mining and sentiment analysis. In: Foundations and Trends in Information Retrieval. Now Publishers Inc., vol. 2(1–2), pp. 1–135

  8. Goldberg A B, Zhu X 2006 Seeing stars when there aren’t many stars: graph-based semi-supervised learning for sentiment categorization. In: Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing, Association for Computational Linguistics, pp. 45–52

  9. Miao Q, Li Q, Zeng D 2010 Fine-grained opinion mining by integrating multiple review sources. Journal of the American Society for Information Science and Technology 61(11): 2288–2299

    Article  Google Scholar 

  10. Riloff E, Wiebe J 2003 Learning extraction patterns for subjective expressions. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pp. 105–112

  11. Prabowo R, Thelwall M 2009 Sentiment analysis: a combined approach. Journal of Informetrics 3(2): 143–157

    Article  Google Scholar 

  12. Narayanan R, Liu B, Choudhary A 2009 Sentiment analysis of conditional sentences. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing:, Association for Computational Linguistics, vol. 1, pp. 180–189

  13. Wiegand M, Balahur A, Roth B, Klakow D, Montoyo A 2010 A survey on the role of negation in sentiment analysis. In: Proceedings of the Workshop on Negation and Speculation in Natural Language Processing, pages 60–68

  14. Ku L-W, Liang Y-T, Chen H-H 2006 Opinion extraction, summarization and tracking in news and blog corpora. In: Proceedings of AAAI, pp. 100–107

  15. Kim J, Chern G, Feng D, Shaw E, Hovy E 2006 Mining and assessing discussions on the web through speech act analysis. In: Proceedings of the Workshop on Web Content Mining with Human Language Technologies at the 5th International Semantic Web Conference, pp. 5–9

  16. Pang B, Lee L 2004 A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, p. 271

  17. Zhu F, Zhang X 2010 Impact of online consumer reviews on sales: the moderating role of product and consumer characteristics. Journal of Marketing 74(2): 133–148

    Article  Google Scholar 

  18. Melville P, Gryc W, Lawrence R D, 2009 Sentiment analysis of blogs by combining lexical knowledge with text classification. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1275–1284

  19. Ramakrishnan G, Jadhav A, Joshi A, Chakrabarti S, Bhattacharyya P 2003 Question answering via Bayesian inference on lexical relations. In: Proceedings of the ACL 2003 Workshop on Multilingual Summarization and Question Answering, Association for Computational Linguistics, vol. 12, pp. 1–10

  20. Jiao J, Zhou Y 2011 Sentiment polarity analysis based multi-dictionary. Physics Procedia 22: 590–596

    Article  Google Scholar 

  21. Macdonald C, Ounis I 2006 The TREC Blogs06 collection: creating and analysing a blog test collection. Tech Report TR-2006-224, Department of Computer Science, University of Glasgow, vol. 1, pp. 3–1

  22. Hatzivassiloglou V, McKeown K R 1997 Predicting the semantic orientation of adjectives. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, pp. 174–181

  23. Wiebe J M 2000 Learning subjective adjectives from corpora. In: Proceedings of AAAI/IAAI, pp. 735–740

  24. Yu H, Hatzivassiloglou V 2003 Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 129–136

  25. Esuli A, Sebastiani F 2006 Sentiwordnet: a publicly available lexical resource for opinion mining. In Proceedings of LREC, vol. 6, pp. 417–422

    Google Scholar 

  26. Fellbaum C 1999 In: WordNet. Blackwell Publishing Ltd.

  27. Chen C C, Tseng Y-D 2011 Quality evaluation of product reviews using an information quality framework. Decision Support Systems 50(4): 755–768

    Article  Google Scholar 

  28. Kang H, Yoo S J, Han D 2012 Senti-lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews. Expert Systems with Applications 39(5): 6000–6010

    Article  Google Scholar 

  29. Sarkar K, Chakraborty S 2015 A sentiment analysis system for Indian language tweets. In: Proceedings of the International Conference on Mining Intelligence and Knowledge Exploration. Springer, pp. 694–702

  30. Onan A, Korukoglu S, Bulut H 2016 A multiobjective weighted voting ensemble classifier based on differential evolution algorithm for text sentiment classification. Expert Systems with Applications 62: 1–16

    Article  Google Scholar 

  31. da Silva N F F, Hruschka E R, Hruschka E R 2014 Tweet sentiment analysis with classifier ensembles. Decision Support Systems 66: 170–179

    Article  Google Scholar 

  32. Rodríguez-Penagos C, Batalla J A, García-Narbona J C-F D, Grivolla J, Lambert P, Sauri R 2013 FBM: combining lexicon-based ML and heuristics for social media polarities. In: Second Joint Conference on Lexical and Computational Semantics (*SEM). Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), vol. 2, pp. 483–489

  33. Hassan A, Abbasi A, Zeng D 2013 Twitter sentiment analysis: a bootstrap ensemble framework. In: Proceedings of the 2013 International Conference on Social Computing, IEEE, pp. 357–364

  34. Ankit S N 2018 An ensemble classification system for twitter sentiment analysis. Procedia Computer Science 132: 937–946

  35. Patra B G, Das D, Das A, Prasath R 2015 Shared task on sentiment analysis in Indian languages (SAIL) tweets—an overview. In: Proceedings of the International Conference on Mining Intelligence and Knowledge Exploration. Springer, pp. 650–655

  36. Sarkar K, Bhowmick M 2017 Sentiment polarity detection in bengali tweets using multinomial Naïve Bayes and support vector machines. In: Proceedings of the 2017 IEEE Calcutta Conference (CALCON), IEEE, pp. 31–36

  37. Sarkar K 2019 Sentiment polarity detection in Bengali tweets using deep convolutional neural networks. Journal of Intelligent Systems 28(3): 377–386

    Article  Google Scholar 

  38. Sarkar K 2018 Using character N-gram features and Multinomial Naïve Bayes for sentiment polarity detection in Bengali tweets. In: Proceedings of the 2018 Fifth International Conference on Emerging Applications of Information Technology (EAIT) , pp. 1–4

  39. Sarkar K 2019 Sentiment polarity detection in Bengali tweets using LSTM recurrent neural networks. In: Proceedings of the 2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP), pp. 1–6

  40. Sharma Y, Mangat V, Kaur M 2015 A practical approach to sentiment analysis of Hindi tweets. In: Proceedings of the 2015 1st International Conference on Next Generation Computing Technologies (NGCT), pp. 677–680

  41. Joshi A, Balamurali A R, Bhattacharyya P 2010 A fall-back strategy for sentiment analysis in Hindi: a case study. In: Proceedings of the 8th ICON

  42. Mittal N, Agarwal B, Chouhan G, Bania N, Pareek P 2013 Sentiment analysis of Hindi reviews based on negation and discourse relation. In: Proceedings of the 11th Workshop on Asian Language Resources

  43. Ouyang X, Zhou P, Li C H, Liu L 2015 Sentiment analysis using convolutional neural network. In: Proceedings of the IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing, pp. 2359–2364

  44. Wang X, Jiang W, Luo Z 2016 Combination of Convolutional and Recurrent Neural Network for Sentiment Analysis of Short Texts. In: Proceedings of COLING

  45. Rani S, Kumar P 2019 Deep learning based sentiment analysis using convolution neural network. Arabian Journal for Science and Engineering 44: 3305–3314

    Article  Google Scholar 

  46. Singh J, Singh G, Singh R, Singh P 2018 Morphological evaluation and sentiment analysis of Punjabi text using deep learning classification. Journal of King Saud University – Computer and Information Sciences

  47. Akhtar M S, Kumar A, Ekbal A, Bhattacharyya P 2016 A hybrid deep learning architecture for sentiment analysis. In: Proceedings of COLING, pp. 482–493

  48. Kittler J, Hatef M, Duin R P W 1996 Combining classifiers. In: Proceedings of the 13th International Conference on Pattern Recognition, vol. 2, pp. 897–901

  49. Das A, Bandyopadhyay S 2010 SentiWordNet for Indian Languages. In: Proceedings of COLING, pp. 56–63

  50. Vapnik V 1982 Estimation of dependences based on empirical data. In: Springer Series in Statistics. Springer-Verlag, vol. 40

  51. Yang Y, Liu X 1999 A re-examination of text categorization methods. In: Proceedings of SIGIR ’99

  52. Platt J C 1999 Fast training of support vector machines using sequential minimal optimization, advances in kernel methods. In: Advances in kernel methods: support vector learning

  53. Wolpert D H, Macready W G 1997 No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation 1: 67–82

    Article  Google Scholar 

  54. Merenda F, Zaghi C, Caselli T, Nissim M 2018 Source-driven representations for hate speech detection. In: Proceedings of the 5th Italian Conference on Computational Linguistics, Turin, Italy

  55. Graumans L, David R, Caselli T 2019 Twitter-based polarised embeddings for abusive language detection. In: Proceedings of the 2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW), pp. 1–7

  56. Mikolov T, Chen K, Corrado G S, Dean J 2013 Efficient estimation of word representations in vector space. In: Proceedings of CoRR, abs/1301.3781

  57. Zhang S, Zhang X, Chan J, Rosso P 2019 Irony detection via sentiment-based transfer learning. Information Processing and Management 56: 1633–1644

    Article  Google Scholar 

  58. Cagnina L C, Rosso P 2017 Detecting deceptive opinions: intra and cross-domain classification using an efficient representation. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 25(2): 175–189

    Google Scholar 

  59. Rosso P, Cagnina L C 2017 Deception detection and opinion spam. In: Cambria E, Das D, Bandyopadhyay S and Feraco S (Eds.) A Practical Guide to Sentiment Analysis. Socio-Affective Computing, vol. 5. Springer-Verlag, pp. 155–171

Download references

Acknowledgements

This research work is supported by the project titled “Indian Social Media Sensor: An Indian Social Media Text Mining System for Topic Detection, Topic Sentiment Analysis and Opinion Summarization” funded by the Department of Science and Technology, Government of India, under the SERB scheme.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kamal Sarkar.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sarkar, K. Heterogeneous classifier ensemble for sentiment analysis of Bengali and Hindi tweets. Sādhanā 45, 196 (2020). https://doi.org/10.1007/s12046-020-01424-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12046-020-01424-z

Keywords

Navigation