Abstract
In this paper corpus-based information extraction and opinion mining method is proposed. Our domain is restaurant reviews, and our information extraction and opinion mining module is a part of a Russian knowledge-based recommendation system.
Our method is based on thorough corpus analysis and automatic selection of machine learning models and feature sets. We also pay special attention to the verification of statistical significance.
According to the results of the research, Naive Bayes models perform well at classifying sentiment with respect to a restaurant aspect, while Logistic Regression is good at deciding on the relevance of a user’s review.
The approach proposed can be used in similar domains, for example, hotel reviews, with data represented by colloquial non-structured texts (in contrast with the domain of technical products, books, etc.) and for other languages with rich morphology and free word order.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
It should be noted that the classifier (“model + feature set” combination) with the highest rank does not necessarily demonstrate the highest average weighted F1 score. The classes 0, 1, 2 and 3 assigned to the classifiers in this paper are based on their ranks (according to non-parametric Holm-Bonferroni test) and not F1 scores.
- 3.
Baseline features set is considered the simplest one, while Extended_All – the most complex one. MNB and NB models are considered the simplest models, Perceptron – a more complex one, and LogReg and linear SVM – the most complex ones (in fact, they are both similar to Perceptron but their training is more computationally expensive [5]). MNB and NB classifiers are considered similar in the degree of “simplicity” as well as LogReg and linear SVM. A simple model with complex features is considered simpler than a complex model with simple (e.g., baseline) features.
References
Aston, N., Liddle, J., Hu, W.: Twitter sentiment in data streams with perceptron. J. Comput. Commun. 2, 11–16 (2014)
Bakliwal, A., Patil, A., Arora, P., Varma, V.: Towards enhanced opinion classification using NLP techniques. In: Proceedings of the Workshop on Sentiment Analysis where AI meets Psychology (SAAIP), IJCNLP, pp. 101–107 (2011)
Benamara, F., Cesarano, C., Picariello, A., Reforgiato, D., Subrahmanian, V.S.: Sentiment analysis: adjectives and adverbs are better than adjectives alone. In: Proceedings of the International Conference on Weblogs and Social Media (ICWSM) (2007)
Bermingham, A., Smeaton, A.: Classifying sentiment in microblogs: is brevity an advantage? In: Proceedings of the International Conference on Information and Knowledge Management (CIKM) (2010)
Collobert, R., Bengio, S.: Links between Perceptrons, MLPs and SVMs. In: Proceedings of the 21th International Conference on Machine Learning (2004)
Das, S.R., Chen, M.Y.: Yahoo! for Amazon: sentiment parsing from small talk on the web. Manage. Sci. 53(9), 1375–1388 (2007)
Dave, K., Lawrence, S., Pennock, D.M.: Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: Proceedings of the 12th International Conference on World Wide Web, pp. 519–528 (2003)
Davidov, D., Tsur, O., Rappoport, A.: Enhanced sentiment learning using twitter hashtags and smileys. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 241–249. Association for Computational Linguistics (2010)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Devitt, A., Ahmad, K.: Is there a language of sentiment? An analysis of lexical resources for sentiment analysis. Lang. Resour. Eval. 47(2), 475–511 (2013)
Emadzadeh, E., Nikfarjam, A., Ghauth, K.I., Why, N.K.: Learning materials recommendation using a hybrid recommender system with automated keyword extraction. World Appl. Sci. J. 9(11), 1260–1271 (2010)
Gatterbauer, W., Bohunsky, P., Herzog, M., Krüpl, B., Pollak, B.: Towards domain-independent information extraction from web tables. In: Proceedings of the 16th International Conference on World Wide Web, pp. 71–80 (2007)
Iman, R.L., Davenport, J.M.: Approximations of the critical region of the Friedman statistic. Commun. Stat. 18, 571–595 (1980)
Kennedy, A., Inkpen, D.: Sentiment classification of movie reviews using contextual valence shifters. Comput. Intell. 22(2), 110–125 (2006)
Kotelnikov, M., Klekovkina, M.: The automatic sentiment text classification method based on emotional vocabulary. In: RCDL’2012 (2012)
Leksin, V.A., Nikolenko, S.I.: Semi-supervised tag extraction in a web recommender system. In: Brisaboa, N., Pedreira, O., Zezula, P. (eds.) SISAP 2013. LNCS, vol. 8199, pp. 206–212. Springer, Heidelberg (2013)
Li, Y., Nie, J., Zhang, Y., Wang, B., Yan, B., Weng, F.: Contextual recommendation based on text mining. In: Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010): Poster Volume, pp. 692–700 (2010)
Liu, J., Seneff, S.: Review sentiment scoring via a parse-and-paraphrase paradigm. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Singapore, pp. 161–169 (2009)
Marchand, M., Ginsca, A.L., Besançon, R., Mesnard, O.: [LVIC-LIMSI]: using syntactic features and multi-polarity words for sentiment analysis in twitter. In: Proceedings of the 7th International Workshop on Semantic Evaluation, pp. 418–424 (2013)
Narayanan, V., Arora, I., Bhatia, A.: Fast and accurate sentiment classification using an enhanced Naive Bayes model. In: Yin, H., Tang, K., Gao, Y., Klawonn, F., Lee, M., Weise, T., Li, B., Yao, X. (eds.) IDEAL 2013. LNCS, vol. 8206, pp. 194–201. Springer, Heidelberg (2013)
Naw, N., Hlaing, E.E.: Relevant words extraction method for recommendation system. Int. J. Emer. Technol. Adv. Eng. 3(1), 680–685 (2013)
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 79–86 (2002)
Pak, A., Paroubek, P.: Language independent approach to sentiment analysis. Komp’uternaya Lingvistika i Intellektualnie Tehnologii: po materialam ezhegodnoy mezhdunarodnoy konferencii “Dialog”, vol. 11(18), RGHU, Moscow, pp. 37–50 (2012)
Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retrieval 2(1–2), 1–135 (2008)
Park, D.H., Kim, H.K., Kim, J.K.: A literature review and classification of recommender systems research. Soc. Sci. 5, 290–294 (2011)
Pazzani, M.J., Billsus, D.: Content-based recommendation systems. In: Brusilovsky, P., Kobsa, A., Nejdl, W. (eds.) The Adaptive Web, LNCS, vol. 4321, pp. 325–341. Springer, Heildelberg (2007)
Pronoza, E., Yagunova, E., Lyashin, A.: Restaurant information extraction for the recommendation system. In: Proceedings of the 2nd Workshop on Social and Algorithmic Issues in Business Support: “Knowledge Hidden in Text”, LTC’2013, (2013)
Ricci, F., Rokach, L., Shapira, B., Kantor, P.: Recommender Systems Handbook. Springer, New York (2011)
Saif, H.: Sentiment analysis of microblogs. Mining the New World. Technical Report KMI-12-2 (2012)
Sarawagi, S.: Information extraction. Found. Trends Databases 1(3), 261–377 (2007)
Shah, K., Munshi, N., Reddy, P.: Sentiment Analysis and Opinion Mining of Microblogs. In: University of Illinois at Chicago, Course CS 583 - Data Mining and Text Mining (2013). http://www.cs.uic.edu/~preddy/dm1.pdf
Sharma, A., Dey, S.: An artificial neural network based approach for sentiment analysis of opinionated text. In: Proceedings of the 2012 ACM Research in Applied Computation Symposium, pp. 37–42 (2012)
Sidorov, G., Velasquez, F., Stamatatos, E., Gelbukh, A., Chanona-Hernández, L.: Syntactic n-grams as machine learning features for natural language processing. Expert Syst. Appl. 41(3), 853–860 (2014)
Socher, R., Perelygin, A., Wy, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (2013)
Turmo, J., Ageno, A., Català, N.: Adaptive information extraction. ACM Comput. Surv. 38(2), 3 (2006)
Turney, P.: Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 417–424 (2002)
Wang, S., Manning, C.D.: Baselines and bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL), vol. 2, pp. 90–94 (2012)
Acknowledgement
The authors acknowledge Saint-Petersburg State University for a research grant 30.38.305.2014.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Pronoza, E., Yagunova, E., Volskaya, S. (2014). Corpus-Based Information Extraction and Opinion Mining for the Restaurant Recommendation System. In: Besacier, L., Dediu, AH., Martín-Vide, C. (eds) Statistical Language and Speech Processing. SLSP 2014. Lecture Notes in Computer Science(), vol 8791. Springer, Cham. https://doi.org/10.1007/978-3-319-11397-5_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-11397-5_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11396-8
Online ISBN: 978-3-319-11397-5
eBook Packages: Computer ScienceComputer Science (R0)