Abstract
Sentiment analysis is an essential step for analysing social media texts such as tweets and other posts on the various micro-blogging sites. The basic step of sentiment analysis is sentiment polarity detection, which identifies whether an input piece of social media text is positive, negative or neutral. In this paper, we present an approach that combines heterogeneous classifiers in an ensemble for sentiment polarity detection in Bengali and Hindi tweets. Our proposed method constructs an ensemble of three different base classifiers where the feature set for each base classifier is different from each other. We have also incorporated an external knowledge base called sentiment lexicon to augment tweet words with sentiment polarity information retrieved from the sentiment lexicon. Experimental results show the effectiveness of our proposed heterogeneous ensemble model for sentiment polarity detection for both Bengali and Hindi languages. It has been shown that our system outperforms other existing Bengali and Hindi sentiment classification systems to which it is compared.
Similar content being viewed by others
References
Bowker J 2003 The concise Oxford dictionary of world religions. Oxford University Press, Oxford
Zhao J, Liu K, Wang G 2008 Adding redundant features for CRFs-based sentence sentiment classification. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 117–126
Joachims T 1998 Making large scale SVM learning practical. Technical Report
Pang B, Lee L, Vaithyanathan S 2002 Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, vol. 10, pp. 79–86
Dave K, Lawrence S, Pennock D M 2003 Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. In: Proceedings of the 12th International Conference on World Wide Web, pp. 519–528
Mullen T, Collier N 2004 Sentiment analysis using support vector machines with diverse information sources. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 412–418
Pang B, Lee L 2008 Opinion mining and sentiment analysis. In: Foundations and Trends in Information Retrieval. Now Publishers Inc., vol. 2(1–2), pp. 1–135
Goldberg A B, Zhu X 2006 Seeing stars when there aren’t many stars: graph-based semi-supervised learning for sentiment categorization. In: Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing, Association for Computational Linguistics, pp. 45–52
Miao Q, Li Q, Zeng D 2010 Fine-grained opinion mining by integrating multiple review sources. Journal of the American Society for Information Science and Technology 61(11): 2288–2299
Riloff E, Wiebe J 2003 Learning extraction patterns for subjective expressions. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pp. 105–112
Prabowo R, Thelwall M 2009 Sentiment analysis: a combined approach. Journal of Informetrics 3(2): 143–157
Narayanan R, Liu B, Choudhary A 2009 Sentiment analysis of conditional sentences. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing:, Association for Computational Linguistics, vol. 1, pp. 180–189
Wiegand M, Balahur A, Roth B, Klakow D, Montoyo A 2010 A survey on the role of negation in sentiment analysis. In: Proceedings of the Workshop on Negation and Speculation in Natural Language Processing, pages 60–68
Ku L-W, Liang Y-T, Chen H-H 2006 Opinion extraction, summarization and tracking in news and blog corpora. In: Proceedings of AAAI, pp. 100–107
Kim J, Chern G, Feng D, Shaw E, Hovy E 2006 Mining and assessing discussions on the web through speech act analysis. In: Proceedings of the Workshop on Web Content Mining with Human Language Technologies at the 5th International Semantic Web Conference, pp. 5–9
Pang B, Lee L 2004 A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, p. 271
Zhu F, Zhang X 2010 Impact of online consumer reviews on sales: the moderating role of product and consumer characteristics. Journal of Marketing 74(2): 133–148
Melville P, Gryc W, Lawrence R D, 2009 Sentiment analysis of blogs by combining lexical knowledge with text classification. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1275–1284
Ramakrishnan G, Jadhav A, Joshi A, Chakrabarti S, Bhattacharyya P 2003 Question answering via Bayesian inference on lexical relations. In: Proceedings of the ACL 2003 Workshop on Multilingual Summarization and Question Answering, Association for Computational Linguistics, vol. 12, pp. 1–10
Jiao J, Zhou Y 2011 Sentiment polarity analysis based multi-dictionary. Physics Procedia 22: 590–596
Macdonald C, Ounis I 2006 The TREC Blogs06 collection: creating and analysing a blog test collection. Tech Report TR-2006-224, Department of Computer Science, University of Glasgow, vol. 1, pp. 3–1
Hatzivassiloglou V, McKeown K R 1997 Predicting the semantic orientation of adjectives. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, pp. 174–181
Wiebe J M 2000 Learning subjective adjectives from corpora. In: Proceedings of AAAI/IAAI, pp. 735–740
Yu H, Hatzivassiloglou V 2003 Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 129–136
Esuli A, Sebastiani F 2006 Sentiwordnet: a publicly available lexical resource for opinion mining. In Proceedings of LREC, vol. 6, pp. 417–422
Fellbaum C 1999 In: WordNet. Blackwell Publishing Ltd.
Chen C C, Tseng Y-D 2011 Quality evaluation of product reviews using an information quality framework. Decision Support Systems 50(4): 755–768
Kang H, Yoo S J, Han D 2012 Senti-lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews. Expert Systems with Applications 39(5): 6000–6010
Sarkar K, Chakraborty S 2015 A sentiment analysis system for Indian language tweets. In: Proceedings of the International Conference on Mining Intelligence and Knowledge Exploration. Springer, pp. 694–702
Onan A, Korukoglu S, Bulut H 2016 A multiobjective weighted voting ensemble classifier based on differential evolution algorithm for text sentiment classification. Expert Systems with Applications 62: 1–16
da Silva N F F, Hruschka E R, Hruschka E R 2014 Tweet sentiment analysis with classifier ensembles. Decision Support Systems 66: 170–179
Rodríguez-Penagos C, Batalla J A, García-Narbona J C-F D, Grivolla J, Lambert P, Sauri R 2013 FBM: combining lexicon-based ML and heuristics for social media polarities. In: Second Joint Conference on Lexical and Computational Semantics (*SEM). Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), vol. 2, pp. 483–489
Hassan A, Abbasi A, Zeng D 2013 Twitter sentiment analysis: a bootstrap ensemble framework. In: Proceedings of the 2013 International Conference on Social Computing, IEEE, pp. 357–364
Ankit S N 2018 An ensemble classification system for twitter sentiment analysis. Procedia Computer Science 132: 937–946
Patra B G, Das D, Das A, Prasath R 2015 Shared task on sentiment analysis in Indian languages (SAIL) tweets—an overview. In: Proceedings of the International Conference on Mining Intelligence and Knowledge Exploration. Springer, pp. 650–655
Sarkar K, Bhowmick M 2017 Sentiment polarity detection in bengali tweets using multinomial Naïve Bayes and support vector machines. In: Proceedings of the 2017 IEEE Calcutta Conference (CALCON), IEEE, pp. 31–36
Sarkar K 2019 Sentiment polarity detection in Bengali tweets using deep convolutional neural networks. Journal of Intelligent Systems 28(3): 377–386
Sarkar K 2018 Using character N-gram features and Multinomial Naïve Bayes for sentiment polarity detection in Bengali tweets. In: Proceedings of the 2018 Fifth International Conference on Emerging Applications of Information Technology (EAIT) , pp. 1–4
Sarkar K 2019 Sentiment polarity detection in Bengali tweets using LSTM recurrent neural networks. In: Proceedings of the 2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP), pp. 1–6
Sharma Y, Mangat V, Kaur M 2015 A practical approach to sentiment analysis of Hindi tweets. In: Proceedings of the 2015 1st International Conference on Next Generation Computing Technologies (NGCT), pp. 677–680
Joshi A, Balamurali A R, Bhattacharyya P 2010 A fall-back strategy for sentiment analysis in Hindi: a case study. In: Proceedings of the 8th ICON
Mittal N, Agarwal B, Chouhan G, Bania N, Pareek P 2013 Sentiment analysis of Hindi reviews based on negation and discourse relation. In: Proceedings of the 11th Workshop on Asian Language Resources
Ouyang X, Zhou P, Li C H, Liu L 2015 Sentiment analysis using convolutional neural network. In: Proceedings of the IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing, pp. 2359–2364
Wang X, Jiang W, Luo Z 2016 Combination of Convolutional and Recurrent Neural Network for Sentiment Analysis of Short Texts. In: Proceedings of COLING
Rani S, Kumar P 2019 Deep learning based sentiment analysis using convolution neural network. Arabian Journal for Science and Engineering 44: 3305–3314
Singh J, Singh G, Singh R, Singh P 2018 Morphological evaluation and sentiment analysis of Punjabi text using deep learning classification. Journal of King Saud University – Computer and Information Sciences
Akhtar M S, Kumar A, Ekbal A, Bhattacharyya P 2016 A hybrid deep learning architecture for sentiment analysis. In: Proceedings of COLING, pp. 482–493
Kittler J, Hatef M, Duin R P W 1996 Combining classifiers. In: Proceedings of the 13th International Conference on Pattern Recognition, vol. 2, pp. 897–901
Das A, Bandyopadhyay S 2010 SentiWordNet for Indian Languages. In: Proceedings of COLING, pp. 56–63
Vapnik V 1982 Estimation of dependences based on empirical data. In: Springer Series in Statistics. Springer-Verlag, vol. 40
Yang Y, Liu X 1999 A re-examination of text categorization methods. In: Proceedings of SIGIR ’99
Platt J C 1999 Fast training of support vector machines using sequential minimal optimization, advances in kernel methods. In: Advances in kernel methods: support vector learning
Wolpert D H, Macready W G 1997 No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation 1: 67–82
Merenda F, Zaghi C, Caselli T, Nissim M 2018 Source-driven representations for hate speech detection. In: Proceedings of the 5th Italian Conference on Computational Linguistics, Turin, Italy
Graumans L, David R, Caselli T 2019 Twitter-based polarised embeddings for abusive language detection. In: Proceedings of the 2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW), pp. 1–7
Mikolov T, Chen K, Corrado G S, Dean J 2013 Efficient estimation of word representations in vector space. In: Proceedings of CoRR, abs/1301.3781
Zhang S, Zhang X, Chan J, Rosso P 2019 Irony detection via sentiment-based transfer learning. Information Processing and Management 56: 1633–1644
Cagnina L C, Rosso P 2017 Detecting deceptive opinions: intra and cross-domain classification using an efficient representation. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 25(2): 175–189
Rosso P, Cagnina L C 2017 Deception detection and opinion spam. In: Cambria E, Das D, Bandyopadhyay S and Feraco S (Eds.) A Practical Guide to Sentiment Analysis. Socio-Affective Computing, vol. 5. Springer-Verlag, pp. 155–171
Acknowledgements
This research work is supported by the project titled “Indian Social Media Sensor: An Indian Social Media Text Mining System for Topic Detection, Topic Sentiment Analysis and Opinion Summarization” funded by the Department of Science and Technology, Government of India, under the SERB scheme.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sarkar, K. Heterogeneous classifier ensemble for sentiment analysis of Bengali and Hindi tweets. Sādhanā 45, 196 (2020). https://doi.org/10.1007/s12046-020-01424-z
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12046-020-01424-z