Abstract
Rich online consumer reviews (OCR) can be mined to gain valuable insights, beneficial for both brands and future buyers. Recently, aspect based sentiment classification have shown excellent results for fine grained sentiment analysis of OCR. However, there are only few studies so far that rely on both explicitly deriving sentiment using syntactic features, and capturing implicit contextual word relations for the task of aspect based sentiment classification. In this paper, we propose a novel method: Hybrid Attribute Based Sentiment Classification (HABSC) with the aim to derive sentiment orientation of OCR by capturing implicit word relations and incorporating domain specific knowledge. First, we detect the most frequent bigrams and trigrams in the corpus, followed by POS tagging to retain aspect descriptions and opinion words. Then, we employ TFIDF (term frequency inverse document frequency) to represent each document, followed by automatically extracting optimal number of topics in the given corpus. All the adjectives and adverbs are labelled using domain specific knowledge and pre-existing lexicons. Lastly, we find sentiment orientation of each review under the assumption that each review is a mixture of weighted and sentiment labelled attributes. We test the efficiency of our method using datasets from two different domains: hotel reviews from TripAdvisor.com and mobile phone reviews from Amazon.com. Results show that, the classification accuracy of HABSC significantly exceeds various state-of-the-art methods including aspect-based sentiment classification and supervised classification using distributed word and paragraph vectors. Our method also exhibits less computational time as compared to distributed vectorization schemes.
Similar content being viewed by others
References
Abburi H, Akkireddy ESA, Gangashetti S, Mamidi R (2016) Multimodal sentiment analysis of telugu songs. In: SAAIP@ IJCAI, pp 48–52
Abdelwahab O, Elmaghraby A (2016) Uofl at semeval-2016 task 4: multi domain word2vec for twitter sentiment classification. In: Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016), pp 164–170
Al-Amin M, Islam MS, Uzzal SD (2017) Sentiment analysis of bengali comments with word2vec and sentiment information of words. In: International conference on electrical, computer and communication engineering (ECCE). IEEE, pp 186-190
Alam MH, Ryu WJ, Lee S (2016) Joint multi-grain topic sentiment: modeling semantic aspects for online reviews. Inform Sci 339:206–223
Appel O, Chiclana F, Carter J, Fujita H (2018) Successes and challenges in developing a hybrid approach to sentiment analysis. Appl Intell 48(5):1176–1188
Bansal B, Srivastava S (2018) Sentiment classification of online consumer reviews using word vector representations. Procedia Computer Science 132:1147–1153
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(Jan):993–1022
Cerón-Guzmán JA, León-Guzmán E (2016) A sentiment analysis system of spanish tweets and its application in Colombia 2014 presidential election. In: 2016 IEEE international conferences on big data and cloud computing (BDCloud), social computing and networking (socialcom), sustainable computing and communications (sustaincom)(BDCloud-socialcom-sustaincom), pp 250–257
Chen R, Xu W (2017) The determinants of online customer ratings: a combined domain ontology and topic text analytics approach. Electron Commer Res 17(1):31–50
Chen R, Zheng Y, Xu W, Liu M, Wang J (2018) Secondhand seller reputation in online markets: a text analytics framework. Decis Support Syst 108:96–106
Dataset (2016) Amazon mobile review dataset. https://www.kaggle.com/PromptCloudHQ/amazon-reviews-unlocked-mobile-phones/data. Online; Accessed Nov 2017
Giatsoglou M, Vozalis MG, Diamantaras K, Vakali A, Sarigiannidis G, Chatzisavvas KC (2017) Sentiment analysis leveraging emotions and word embeddings. Expert Syst Appl 69:214–224
Hogenboom A, Heerschop B, Frasincar F, Kaymak U, de Jong F (2014) Multi-lingual support for lexicon-based sentiment analysis guided by semantics. Decis Support Syst 62:43–53
Hu M, Liu B (2004) Mining opinion features in customer reviews. In: AAAI, vol 4. pp 755-760
Jiang S, Lewris J, Voltmer M, Wang H (2016) Integrating rich document representations for text classification. In: 2016 IEEE systems and information engineering design symposium (SIEDS). IEEE, 303-308
Jiang Y, Song X, Harrison J, Quegan S, Maynard D (2017) Comparing attitudes to climate change in the media using sentiment analysis based on latent dirichlet allocation. In: Proceedings of the 2017 EMNLP workshop natural language processing meets journalism, pp 25–30
Jo Y, Oh AH (2011) Aspect and sentiment unification model for online review analysis. In: Proceedings of the fourth ACM international conference on Web search and data mining. ACM, pp 815–824
Karami A, Gangopadhyay A, Zhou B, Kharrazi H (2017) Fuzzy approach topic discovery in health and medical corpora. Int J Fuzzy Syst 20(4):1334–1345
Karami A, Dahl AA, Turner-McGrievy G, Kharrazi H, Shaw G (2018) Characterizing diabetes, diet, exercise, and obesity comments on twitter. Int J Inf Manage 38(1):1–6
Kim HK, Kim M (2016) Model-induced term-weighting schemes for text classification. Appl Intell 45 (1):30–43
Koltcov S, Koltsova O, Nikolenko S (2014) Latent dirichlet allocation: stability and applications to studies of user-generated content. In: Proceedings of the 2014 ACM conference on web science. ACM, pp 161-165
Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning, pp 1188–1196
Li G, Liu F (2014) Sentiment analysis based on clustering: a framework in improving accuracy and recognizing neutral opinions. Appl Intell 40(3):441–452
Lin C, He Y (2009) Joint sentiment/topic model for sentiment analysis. In: Proceedings of the 18th ACM conference on Information and knowledge management. ACM, pp 375–384
Liu B (2012) Sentiment analysis and opinion mining. Synthesis Lectures On Human Language Technologies 5(1):1–167
Liu B, Hu M, Cheng J (2005) Opinion observer: analyzing and comparing opinions on the web. In: Proceedings of the 14th international conference on world wide web. ACM, pp 342–351
Liu H (2017) Sentiment analysis of citations using word2vec. arXiv:170400177
Liu X, Burns AC, Hou Y (2017) An investigation of brand-related user-generated content on twitter. J Advert 46(2):236–247
Liu Y, Jiang C, Zhao H (2018) Using contextual features and multi-view ensemble learning in product defect identification from online discussion forums. Decis Support Syst 105:1–12
Ma S, Zhang C, He D (2016) Document representation methods for clustering bilingual documents. Proceedings of the Association for Information Science and Technology 53(1):1–10
Mei Q, Ling X, Wondra M, Su H, Zhai C (2007) Topic sentiment mixture: modeling facets and opinions in weblogs. In: Proceedings of the 16th international conference on world wide web. ACM, pp 171–180
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:13013781
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
Mikolov T, Wt Yih, Zweig G (2013) Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 conference of the north american chapter of the association for computational linguistics: human language technologies, pp 746-751
Nielsen FÅ (2011) A new anew: evaluation of a word list for sentiment analysis in microblogs. arXiv:11032903
Panichella A, Dit B, Oliveto R, Di Penta M, Poshynanyk D, De Lucia A (2013) How to effectively use topic models for software engineering tasks? an approach based on genetic algorithms. In: 2013 35th international conference on software engineering (ICSE). IEEE, pp 522–531
Pham DH, Le AC (2017) Learning multiple layers of knowledge representation for aspect based sentiment analysis. Data Knowl Eng
Qiang J, Li Y, Yuan Y, Liu W (2018) Snapshot ensembles of non-negative matrix factorization for stability of topic modeling. Appl Intell:1–13
Qiao Z, Zhang X, Zhou M, Wang GA, Fan W (2017) A domain oriented lda model for mining product defects from online customer reviews
Rehurek R. Gensim. https://radimrehurek.com/gensim/models/phrases.html. Last accessed Nov 2017
Rehurek R, Sojka P (2010) Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 workshop on new challenges for NLP frameworks. Citeseer
Salehan M, Kim DJ (2016) Predicting the performance of online consumer reviews: a sentiment mining approach to big data analytics. Decis Support Syst 81:30–40
Sanguansat P (2016) Paragraph2vec-based sentiment analysis on social media for business in thailand. In: 2016 8th international conference on knowledge and smart technology (KST). IEEE, pp 175–178
Schwenk H (2007) Continuous space language models. Comput Speech Lang 21(3):492–518
Spacy https://spacy.io. Last accessed Nov 2017
Tripathy A, Agrawal A, Rath SK (2016) Classification of sentiment reviews using n-gram machine learning approach. Expert Syst Appl 57:117–126
Wang T, Cai Y, Hf Leung, Lau RY, Li Q, Min H (2014) Product aspect extraction supervised with online domain knowledge. Knowl-Based Syst 71:86–100
Wang W, Wang H, Song Y (2017) Ranking product aspects through sentiment analysis of online reviews. J Exp Theor Artif Intell 29(2):227–246
Wang Z, Gu S, Xu X (2018) Gslda: lda-based group spamming detection in product reviews. Appl Intell 48(9):3094–3107
Xianghua F, Guo L, Yanyan G, Zhiqiang W (2013) Multi-aspect sentiment analysis for chinese online social reviews based on topic modeling and hownet lexicon. Knowl-Based Syst 37:186–195
Xin Y, Yang J, Xie ZQ, Zhang JP (2015) An overlapping semantic community detection algorithm base on the arts multiple sampling models. Expert Syst Appl 42(7):3420–3432
Yan X, Guo J, Lan Y, Cheng X (2013) A biterm topic model for short texts. In: Proceedings of the 22nd international conference on world wide web. ACM, pp 1445–1456
Yao Y, Li X, Liu X, Liu P, Liang Z, Zhang J, Mai K (2017) Sensing spatial distribution of urban land use by integrating points-of-interest and google word2vec model. Int J Geogr Inf Sci 31(4):825–848
Yu D, Mu Y, Jin Y (2017) Rating prediction using review texts with underlying sentiments. Inf Process Lett 117:10–18
Zainuddin N, Selamat A, Ibrahim R (2017) Hybrid sentiment classification on twitter aspect-based sentiment analysis. Appl Intell 48(5):1–15
Zhang D, Xu H, Su Z, Xu Y (2015) Chinese comments sentiment classification based on word2vec and svmperf. Expert Syst Appl 42(4):1857–1863
Zirn C, Stuckenschmidt H (2014) Multidimensional topic analysis in political texts. Data Knowl Eng 90:38–53
Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Author information
Authors and Affiliations
Corresponding author
Additional information
Availability of data and materials
The datasets analysed during the current study, are available in the Zenodo repository: Amazon Mobile Review Dataset [https://doi.org/10.5281/zenodo.1211639] and TripAdvisor Hotel Review Dataset [https://doi.org/10.5281/zenodo.1219899]. The datasets were originally derived from [11] and [4] respectively.
Rights and permissions
About this article
Cite this article
Bansal, B., Srivastava, S. Hybrid attribute based sentiment classification of online reviews for consumer intelligence. Appl Intell 49, 137–149 (2019). https://doi.org/10.1007/s10489-018-1299-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-018-1299-7