Heterogeneous classifier ensemble for sentiment analysis of Bengali and Hindi tweets

Sarkar, Kamal

doi:10.1007/s12046-020-01424-z

Heterogeneous classifier ensemble for sentiment analysis of Bengali and Hindi tweets

Published: 06 August 2020

Volume 45, article number 196, (2020)
Cite this article

Sādhanā Aims and scope Submit manuscript

Kamal Sarkar ORCID: orcid.org/0000-0002-0689-3976¹

304 Accesses
11 Citations
Explore all metrics

Abstract

Sentiment analysis is an essential step for analysing social media texts such as tweets and other posts on the various micro-blogging sites. The basic step of sentiment analysis is sentiment polarity detection, which identifies whether an input piece of social media text is positive, negative or neutral. In this paper, we present an approach that combines heterogeneous classifiers in an ensemble for sentiment polarity detection in Bengali and Hindi tweets. Our proposed method constructs an ensemble of three different base classifiers where the feature set for each base classifier is different from each other. We have also incorporated an external knowledge base called sentiment lexicon to augment tweet words with sentiment polarity information retrieved from the sentiment lexicon. Experimental results show the effectiveness of our proposed heterogeneous ensemble model for sentiment polarity detection for both Bengali and Hindi languages. It has been shown that our system outperforms other existing Bengali and Hindi sentiment classification systems to which it is compared.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploring the Performance of Ensemble Machine Learning Classifiers for Sentiment Analysis of COVID-19 Tweets

A Stacked Ensemble Approach to Bengali Sentiment Analysis

Feature Set Ensembles for Sentiment Analysis of Tweets

Notes

References

Bowker J 2003 The concise Oxford dictionary of world religions. Oxford University Press, Oxford
Google Scholar
Zhao J, Liu K, Wang G 2008 Adding redundant features for CRFs-based sentence sentiment classification. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 117–126
Joachims T 1998 Making large scale SVM learning practical. Technical Report
Pang B, Lee L, Vaithyanathan S 2002 Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, vol. 10, pp. 79–86
Dave K, Lawrence S, Pennock D M 2003 Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. In: Proceedings of the 12th International Conference on World Wide Web, pp. 519–528
Mullen T, Collier N 2004 Sentiment analysis using support vector machines with diverse information sources. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 412–418
Pang B, Lee L 2008 Opinion mining and sentiment analysis. In: Foundations and Trends in Information Retrieval. Now Publishers Inc., vol. 2(1–2), pp. 1–135
Goldberg A B, Zhu X 2006 Seeing stars when there aren’t many stars: graph-based semi-supervised learning for sentiment categorization. In: Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing, Association for Computational Linguistics, pp. 45–52
Miao Q, Li Q, Zeng D 2010 Fine-grained opinion mining by integrating multiple review sources. Journal of the American Society for Information Science and Technology 61(11): 2288–2299
Article Google Scholar
Riloff E, Wiebe J 2003 Learning extraction patterns for subjective expressions. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pp. 105–112
Prabowo R, Thelwall M 2009 Sentiment analysis: a combined approach. Journal of Informetrics 3(2): 143–157
Article Google Scholar
Narayanan R, Liu B, Choudhary A 2009 Sentiment analysis of conditional sentences. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing:, Association for Computational Linguistics, vol. 1, pp. 180–189
Wiegand M, Balahur A, Roth B, Klakow D, Montoyo A 2010 A survey on the role of negation in sentiment analysis. In: Proceedings of the Workshop on Negation and Speculation in Natural Language Processing, pages 60–68
Ku L-W, Liang Y-T, Chen H-H 2006 Opinion extraction, summarization and tracking in news and blog corpora. In: Proceedings of AAAI, pp. 100–107
Kim J, Chern G, Feng D, Shaw E, Hovy E 2006 Mining and assessing discussions on the web through speech act analysis. In: Proceedings of the Workshop on Web Content Mining with Human Language Technologies at the 5th International Semantic Web Conference, pp. 5–9
Pang B, Lee L 2004 A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, p. 271
Zhu F, Zhang X 2010 Impact of online consumer reviews on sales: the moderating role of product and consumer characteristics. Journal of Marketing 74(2): 133–148
Article Google Scholar
Melville P, Gryc W, Lawrence R D, 2009 Sentiment analysis of blogs by combining lexical knowledge with text classification. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1275–1284
Ramakrishnan G, Jadhav A, Joshi A, Chakrabarti S, Bhattacharyya P 2003 Question answering via Bayesian inference on lexical relations. In: Proceedings of the ACL 2003 Workshop on Multilingual Summarization and Question Answering, Association for Computational Linguistics, vol. 12, pp. 1–10
Jiao J, Zhou Y 2011 Sentiment polarity analysis based multi-dictionary. Physics Procedia 22: 590–596
Article Google Scholar
Macdonald C, Ounis I 2006 The TREC Blogs06 collection: creating and analysing a blog test collection. Tech Report TR-2006-224, Department of Computer Science, University of Glasgow, vol. 1, pp. 3–1
Hatzivassiloglou V, McKeown K R 1997 Predicting the semantic orientation of adjectives. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, pp. 174–181
Wiebe J M 2000 Learning subjective adjectives from corpora. In: Proceedings of AAAI/IAAI, pp. 735–740
Yu H, Hatzivassiloglou V 2003 Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 129–136
Esuli A, Sebastiani F 2006 Sentiwordnet: a publicly available lexical resource for opinion mining. In Proceedings of LREC, vol. 6, pp. 417–422
Google Scholar
Fellbaum C 1999 In: WordNet. Blackwell Publishing Ltd.
Chen C C, Tseng Y-D 2011 Quality evaluation of product reviews using an information quality framework. Decision Support Systems 50(4): 755–768
Article Google Scholar
Kang H, Yoo S J, Han D 2012 Senti-lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews. Expert Systems with Applications 39(5): 6000–6010
Article Google Scholar
Sarkar K, Chakraborty S 2015 A sentiment analysis system for Indian language tweets. In: Proceedings of the International Conference on Mining Intelligence and Knowledge Exploration. Springer, pp. 694–702
Onan A, Korukoglu S, Bulut H 2016 A multiobjective weighted voting ensemble classifier based on differential evolution algorithm for text sentiment classification. Expert Systems with Applications 62: 1–16
Article Google Scholar
da Silva N F F, Hruschka E R, Hruschka E R 2014 Tweet sentiment analysis with classifier ensembles. Decision Support Systems 66: 170–179
Article Google Scholar
Rodríguez-Penagos C, Batalla J A, García-Narbona J C-F D, Grivolla J, Lambert P, Sauri R 2013 FBM: combining lexicon-based ML and heuristics for social media polarities. In: Second Joint Conference on Lexical and Computational Semantics (*SEM). Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), vol. 2, pp. 483–489
Hassan A, Abbasi A, Zeng D 2013 Twitter sentiment analysis: a bootstrap ensemble framework. In: Proceedings of the 2013 International Conference on Social Computing, IEEE, pp. 357–364
Ankit S N 2018 An ensemble classification system for twitter sentiment analysis. Procedia Computer Science 132: 937–946
Patra B G, Das D, Das A, Prasath R 2015 Shared task on sentiment analysis in Indian languages (SAIL) tweets—an overview. In: Proceedings of the International Conference on Mining Intelligence and Knowledge Exploration. Springer, pp. 650–655
Sarkar K, Bhowmick M 2017 Sentiment polarity detection in bengali tweets using multinomial Naïve Bayes and support vector machines. In: Proceedings of the 2017 IEEE Calcutta Conference (CALCON), IEEE, pp. 31–36
Sarkar K 2019 Sentiment polarity detection in Bengali tweets using deep convolutional neural networks. Journal of Intelligent Systems 28(3): 377–386
Article Google Scholar
Sarkar K 2018 Using character N-gram features and Multinomial Naïve Bayes for sentiment polarity detection in Bengali tweets. In: Proceedings of the 2018 Fifth International Conference on Emerging Applications of Information Technology (EAIT) , pp. 1–4
Sarkar K 2019 Sentiment polarity detection in Bengali tweets using LSTM recurrent neural networks. In: Proceedings of the 2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP), pp. 1–6
Sharma Y, Mangat V, Kaur M 2015 A practical approach to sentiment analysis of Hindi tweets. In: Proceedings of the 2015 1st International Conference on Next Generation Computing Technologies (NGCT), pp. 677–680
Joshi A, Balamurali A R, Bhattacharyya P 2010 A fall-back strategy for sentiment analysis in Hindi: a case study. In: Proceedings of the 8th ICON
Mittal N, Agarwal B, Chouhan G, Bania N, Pareek P 2013 Sentiment analysis of Hindi reviews based on negation and discourse relation. In: Proceedings of the 11th Workshop on Asian Language Resources
Ouyang X, Zhou P, Li C H, Liu L 2015 Sentiment analysis using convolutional neural network. In: Proceedings of the IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing, pp. 2359–2364
Wang X, Jiang W, Luo Z 2016 Combination of Convolutional and Recurrent Neural Network for Sentiment Analysis of Short Texts. In: Proceedings of COLING
Rani S, Kumar P 2019 Deep learning based sentiment analysis using convolution neural network. Arabian Journal for Science and Engineering 44: 3305–3314
Article Google Scholar
Singh J, Singh G, Singh R, Singh P 2018 Morphological evaluation and sentiment analysis of Punjabi text using deep learning classification. Journal of King Saud University – Computer and Information Sciences
Akhtar M S, Kumar A, Ekbal A, Bhattacharyya P 2016 A hybrid deep learning architecture for sentiment analysis. In: Proceedings of COLING, pp. 482–493
Kittler J, Hatef M, Duin R P W 1996 Combining classifiers. In: Proceedings of the 13th International Conference on Pattern Recognition, vol. 2, pp. 897–901
Das A, Bandyopadhyay S 2010 SentiWordNet for Indian Languages. In: Proceedings of COLING, pp. 56–63
Vapnik V 1982 Estimation of dependences based on empirical data. In: Springer Series in Statistics. Springer-Verlag, vol. 40
Yang Y, Liu X 1999 A re-examination of text categorization methods. In: Proceedings of SIGIR ’99
Platt J C 1999 Fast training of support vector machines using sequential minimal optimization, advances in kernel methods. In: Advances in kernel methods: support vector learning
Wolpert D H, Macready W G 1997 No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation 1: 67–82
Article Google Scholar
Merenda F, Zaghi C, Caselli T, Nissim M 2018 Source-driven representations for hate speech detection. In: Proceedings of the 5th Italian Conference on Computational Linguistics, Turin, Italy
Graumans L, David R, Caselli T 2019 Twitter-based polarised embeddings for abusive language detection. In: Proceedings of the 2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW), pp. 1–7
Mikolov T, Chen K, Corrado G S, Dean J 2013 Efficient estimation of word representations in vector space. In: Proceedings of CoRR, abs/1301.3781
Zhang S, Zhang X, Chan J, Rosso P 2019 Irony detection via sentiment-based transfer learning. Information Processing and Management 56: 1633–1644
Article Google Scholar
Cagnina L C, Rosso P 2017 Detecting deceptive opinions: intra and cross-domain classification using an efficient representation. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 25(2): 175–189
Google Scholar
Rosso P, Cagnina L C 2017 Deception detection and opinion spam. In: Cambria E, Das D, Bandyopadhyay S and Feraco S (Eds.) A Practical Guide to Sentiment Analysis. Socio-Affective Computing, vol. 5. Springer-Verlag, pp. 155–171

Download references

Acknowledgements

This research work is supported by the project titled “Indian Social Media Sensor: An Indian Social Media Text Mining System for Topic Detection, Topic Sentiment Analysis and Opinion Summarization” funded by the Department of Science and Technology, Government of India, under the SERB scheme.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
Kamal Sarkar

Authors

Kamal Sarkar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kamal Sarkar.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sarkar, K. Heterogeneous classifier ensemble for sentiment analysis of Bengali and Hindi tweets. Sādhanā 45, 196 (2020). https://doi.org/10.1007/s12046-020-01424-z

Download citation

Received: 24 July 2019
Revised: 12 January 2020
Accepted: 25 April 2020
Published: 06 August 2020
DOI: https://doi.org/10.1007/s12046-020-01424-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Heterogeneous classifier ensemble for sentiment analysis of Bengali and Hindi tweets

Abstract

Access this article

Similar content being viewed by others

Exploring the Performance of Ensemble Machine Learning Classifiers for Sentiment Analysis of COVID-19 Tweets

A Stacked Ensemble Approach to Bengali Sentiment Analysis

Feature Set Ensembles for Sentiment Analysis of Tweets

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Heterogeneous classifier ensemble for sentiment analysis of Bengali and Hindi tweets

Abstract

Access this article

Similar content being viewed by others

Exploring the Performance of Ensemble Machine Learning Classifiers for Sentiment Analysis of COVID-19 Tweets

A Stacked Ensemble Approach to Bengali Sentiment Analysis

Feature Set Ensembles for Sentiment Analysis of Tweets

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation