Skip to main content

Corpus-Based Information Extraction and Opinion Mining for the Restaurant Recommendation System

  • Conference paper
  • First Online:
Statistical Language and Speech Processing (SLSP 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8791))

Included in the following conference series:

Abstract

In this paper corpus-based information extraction and opinion mining method is proposed. Our domain is restaurant reviews, and our information extraction and opinion mining module is a part of a Russian knowledge-based recommendation system.

Our method is based on thorough corpus analysis and automatic selection of machine learning models and feature sets. We also pay special attention to the verification of statistical significance.

According to the results of the research, Naive Bayes models perform well at classifying sentiment with respect to a restaurant aspect, while Logistic Regression is good at deciding on the relevance of a user’s review.

The approach proposed can be used in similar domains, for example, hotel reviews, with data represented by colloquial non-structured texts (in contrast with the domain of technical products, books, etc.) and for other languages with rich morphology and free word order.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://scikit-learn.org

  2. 2.

    It should be noted that the classifier (“model + feature set” combination) with the highest rank does not necessarily demonstrate the highest average weighted F1 score. The classes 0, 1, 2 and 3 assigned to the classifiers in this paper are based on their ranks (according to non-parametric Holm-Bonferroni test) and not F1 scores.

  3. 3.

    Baseline features set is considered the simplest one, while Extended_All – the most complex one. MNB and NB models are considered the simplest models, Perceptron – a more complex one, and LogReg and linear SVM – the most complex ones (in fact, they are both similar to Perceptron but their training is more computationally expensive [5]). MNB and NB classifiers are considered similar in the degree of “simplicity” as well as LogReg and linear SVM. A simple model with complex features is considered simpler than a complex model with simple (e.g., baseline) features.

References

  1. Aston, N., Liddle, J., Hu, W.: Twitter sentiment in data streams with perceptron. J. Comput. Commun. 2, 11–16 (2014)

    Article  Google Scholar 

  2. Bakliwal, A., Patil, A., Arora, P., Varma, V.: Towards enhanced opinion classification using NLP techniques. In: Proceedings of the Workshop on Sentiment Analysis where AI meets Psychology (SAAIP), IJCNLP, pp. 101–107 (2011)

    Google Scholar 

  3. Benamara, F., Cesarano, C., Picariello, A., Reforgiato, D., Subrahmanian, V.S.: Sentiment analysis: adjectives and adverbs are better than adjectives alone. In: Proceedings of the International Conference on Weblogs and Social Media (ICWSM) (2007)

    Google Scholar 

  4. Bermingham, A., Smeaton, A.: Classifying sentiment in microblogs: is brevity an advantage? In: Proceedings of the International Conference on Information and Knowledge Management (CIKM) (2010)

    Google Scholar 

  5. Collobert, R., Bengio, S.: Links between Perceptrons, MLPs and SVMs. In: Proceedings of the 21th International Conference on Machine Learning (2004)

    Google Scholar 

  6. Das, S.R., Chen, M.Y.: Yahoo! for Amazon: sentiment parsing from small talk on the web. Manage. Sci. 53(9), 1375–1388 (2007)

    Article  Google Scholar 

  7. Dave, K., Lawrence, S., Pennock, D.M.: Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: Proceedings of the 12th International Conference on World Wide Web, pp. 519–528 (2003)

    Google Scholar 

  8. Davidov, D., Tsur, O., Rappoport, A.: Enhanced sentiment learning using twitter hashtags and smileys. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 241–249. Association for Computational Linguistics (2010)

    Google Scholar 

  9. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MATH  MathSciNet  Google Scholar 

  10. Devitt, A., Ahmad, K.: Is there a language of sentiment? An analysis of lexical resources for sentiment analysis. Lang. Resour. Eval. 47(2), 475–511 (2013)

    Article  Google Scholar 

  11. Emadzadeh, E., Nikfarjam, A., Ghauth, K.I., Why, N.K.: Learning materials recommendation using a hybrid recommender system with automated keyword extraction. World Appl. Sci. J. 9(11), 1260–1271 (2010)

    Google Scholar 

  12. Gatterbauer, W., Bohunsky, P., Herzog, M., Krüpl, B., Pollak, B.: Towards domain-independent information extraction from web tables. In: Proceedings of the 16th International Conference on World Wide Web, pp. 71–80 (2007)

    Google Scholar 

  13. Iman, R.L., Davenport, J.M.: Approximations of the critical region of the Friedman statistic. Commun. Stat. 18, 571–595 (1980)

    Article  Google Scholar 

  14. Kennedy, A., Inkpen, D.: Sentiment classification of movie reviews using contextual valence shifters. Comput. Intell. 22(2), 110–125 (2006)

    Article  MathSciNet  Google Scholar 

  15. Kotelnikov, M., Klekovkina, M.: The automatic sentiment text classification method based on emotional vocabulary. In: RCDL’2012 (2012)

    Google Scholar 

  16. Leksin, V.A., Nikolenko, S.I.: Semi-supervised tag extraction in a web recommender system. In: Brisaboa, N., Pedreira, O., Zezula, P. (eds.) SISAP 2013. LNCS, vol. 8199, pp. 206–212. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  17. Li, Y., Nie, J., Zhang, Y., Wang, B., Yan, B., Weng, F.: Contextual recommendation based on text mining. In: Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010): Poster Volume, pp. 692–700 (2010)

    Google Scholar 

  18. Liu, J., Seneff, S.: Review sentiment scoring via a parse-and-paraphrase paradigm. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Singapore, pp. 161–169 (2009)

    Google Scholar 

  19. Marchand, M., Ginsca, A.L., Besançon, R., Mesnard, O.: [LVIC-LIMSI]: using syntactic features and multi-polarity words for sentiment analysis in twitter. In: Proceedings of the 7th International Workshop on Semantic Evaluation, pp. 418–424 (2013)

    Google Scholar 

  20. Narayanan, V., Arora, I., Bhatia, A.: Fast and accurate sentiment classification using an enhanced Naive Bayes model. In: Yin, H., Tang, K., Gao, Y., Klawonn, F., Lee, M., Weise, T., Li, B., Yao, X. (eds.) IDEAL 2013. LNCS, vol. 8206, pp. 194–201. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  21. Naw, N., Hlaing, E.E.: Relevant words extraction method for recommendation system. Int. J. Emer. Technol. Adv. Eng. 3(1), 680–685 (2013)

    Google Scholar 

  22. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 79–86 (2002)

    Google Scholar 

  23. Pak, A., Paroubek, P.: Language independent approach to sentiment analysis. Komp’uternaya Lingvistika i Intellektualnie Tehnologii: po materialam ezhegodnoy mezhdunarodnoy konferencii “Dialog”, vol. 11(18), RGHU, Moscow, pp. 37–50 (2012)

    Google Scholar 

  24. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retrieval 2(1–2), 1–135 (2008)

    Article  Google Scholar 

  25. Park, D.H., Kim, H.K., Kim, J.K.: A literature review and classification of recommender systems research. Soc. Sci. 5, 290–294 (2011)

    Google Scholar 

  26. Pazzani, M.J., Billsus, D.: Content-based recommendation systems. In: Brusilovsky, P., Kobsa, A., Nejdl, W. (eds.) The Adaptive Web, LNCS, vol. 4321, pp. 325–341. Springer, Heildelberg (2007)

    Chapter  Google Scholar 

  27. Pronoza, E., Yagunova, E., Lyashin, A.: Restaurant information extraction for the recommendation system. In: Proceedings of the 2nd Workshop on Social and Algorithmic Issues in Business Support: “Knowledge Hidden in Text”, LTC’2013, (2013)

    Google Scholar 

  28. Ricci, F., Rokach, L., Shapira, B., Kantor, P.: Recommender Systems Handbook. Springer, New York (2011)

    Book  MATH  Google Scholar 

  29. Saif, H.: Sentiment analysis of microblogs. Mining the New World. Technical Report KMI-12-2 (2012)

    Google Scholar 

  30. Sarawagi, S.: Information extraction. Found. Trends Databases 1(3), 261–377 (2007)

    Article  Google Scholar 

  31. Shah, K., Munshi, N., Reddy, P.: Sentiment Analysis and Opinion Mining of Microblogs. In: University of Illinois at Chicago, Course CS 583 - Data Mining and Text Mining (2013). http://www.cs.uic.edu/~preddy/dm1.pdf

  32. Sharma, A., Dey, S.: An artificial neural network based approach for sentiment analysis of opinionated text. In: Proceedings of the 2012 ACM Research in Applied Computation Symposium, pp. 37–42 (2012)

    Google Scholar 

  33. Sidorov, G., Velasquez, F., Stamatatos, E., Gelbukh, A., Chanona-Hernández, L.: Syntactic n-grams as machine learning features for natural language processing. Expert Syst. Appl. 41(3), 853–860 (2014)

    Article  Google Scholar 

  34. Socher, R., Perelygin, A., Wy, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (2013)

    Google Scholar 

  35. Turmo, J., Ageno, A., Català, N.: Adaptive information extraction. ACM Comput. Surv. 38(2), 3 (2006)

    Article  Google Scholar 

  36. Turney, P.: Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 417–424 (2002)

    Google Scholar 

  37. Wang, S., Manning, C.D.: Baselines and bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL), vol. 2, pp. 90–94 (2012)

    Google Scholar 

Download references

Acknowledgement

The authors acknowledge Saint-Petersburg State University for a research grant 30.38.305.2014.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Ekaterina Pronoza or Elena Yagunova .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Pronoza, E., Yagunova, E., Volskaya, S. (2014). Corpus-Based Information Extraction and Opinion Mining for the Restaurant Recommendation System. In: Besacier, L., Dediu, AH., Martín-Vide, C. (eds) Statistical Language and Speech Processing. SLSP 2014. Lecture Notes in Computer Science(), vol 8791. Springer, Cham. https://doi.org/10.1007/978-3-319-11397-5_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11397-5_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11396-8

  • Online ISBN: 978-3-319-11397-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics