Skip to main content
Log in

A multi-stage method for content classification and opinion mining on weblog comments

  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

In this paper, we illustrate how to combine supervised machine learning algorithms and unsupervised learning techniques for sentiment analysis and opinion mining purposes. To this end, we describe a multi-stage method for the automatic detection of different opinion trends. The proposal has been tested on real textual data available from comments introduced in a weblog, connected to organizational and administrative affairs in a public educational institution. The use of the described tool, given its potential impact to obtain valuable knowledge from opinion streams created by commenters, may be straightforwardly extended, for example, to the detection of opinion trends concerning policy decision making or electoral campaigns.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. http://wwww.quora.com.

  2. http://stackexchange.com/.

  3. http://answers.yahoo.com/.

  4. http://www.princeton.edu/main/campuslife/media/blogs/.

  5. http://blogs.berkeley.edu/.

References

  • Baeza-Yates, R., & Ribeiro-Neto, B. (2011). Modern information retrieval: the concepts and technology behind search. Reading: Addison-Wesley. http://www.mir2ed.com.

    Google Scholar 

  • Dietterich, T. (2000). Ensemble methods in machine learning. In Lecture notes in computer science: Vol. 1857. Multiple classifier systems (pp. 1–15). Berlin: Springer. doi:10.1007/3-540-45014-9_1.

    Chapter  Google Scholar 

  • Dumais, S., Platt, J., Heckerman, D., & Sahami, M. (1998). Inductive learning algorithms and representations for text categorization. In Proceedings of the seventh international conference on information and knowledge management (CIKM ’98) (pp. 148–155). New York: ACM. doi:10.1145/288627.288651.

    Google Scholar 

  • Godbole, N., Srinivasaiah, M., & Skiena, S. (2007). Large-scale sentiment analysis for news and blogs. In Proceedings of the international conference on weblogs and social media (ICWSM).

    Google Scholar 

  • Hamel, L. H. (2009). Knowledge discovery with support vector machines. New York: Wiley/Interscience.

    Book  Google Scholar 

  • Joachims, T. (2002). Learning to classify text using support vector machines. Methods, theory and algorithms. Norwell: Kluwer Academic.

    Book  Google Scholar 

  • Kressel, U. H. G. (1999). Pairwise classification and support vector machines. In C. J. C. B. Schölkopf & A. J. Smola (Eds.), Advances in kernel methods—support vector learning (pp. 255–268). Cambridge: MIT Press.

    Google Scholar 

  • Li, N., & Wu, D. D. (2010). Using text mining and sentiment analysis for online forums hotspot detection and forecast. Decision Support Systems, 48(2), 354–368.

    Article  Google Scholar 

  • Liu, B. (2012). Sentiment analysis and opinion mining: synthesis lectures on human language technologies. San Rafael: Morgan & Claypool.

    Google Scholar 

  • Mardia, K., Kent, J. T., & Bibby, J. (1979). Multivariate analysis. San Diego: Academic Press.

    Google Scholar 

  • Mergel, I. A., Schweik, C. M., & Fountain, J. E. (2009). The transformational effect of web 2.0. Technologies on government. http://dx.doi.org/10.2139/ssrn.1412796.

  • Moguerza, J., & Muñoz, A. (2006). Support vector machines with applications. Statistical Science, 21(3), 322–336.

    Article  Google Scholar 

  • Muñoz, A., & Moguerza, J. M. (2005). Building smooth neighbourhood kernels via functional data analysis. In ICANN (Vol. 2, pp. 631–636).

    Google Scholar 

  • Olson, D. L., & Delen, D. (2008). Advanced data mining techniques (1st ed.). Berlin: Springer.

    Google Scholar 

  • O’Reilly, T. (2007). What is web 2.0: design patterns and business models for the next generation of software. Communications & Strategies, 1, 18–37. http://ssrn.com/abstract=1008839.

    Google Scholar 

  • Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1–2), 1–135. doi:10.1561/1500000011.

    Article  Google Scholar 

  • Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on empirical methods in natural language processing (EMNLP ’02), Stroudsburg, PA, USA (Vol. 10, pp. 79–86). Association for Computational Linguistics. doi:10.3115/1118693.1118704.

    Chapter  Google Scholar 

  • R Core Team (2012). R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. http://www.R-project.org/. ISBN 3-900051-07-0.

    Google Scholar 

  • Roberts, F. S. (2008). Computer science and decision theory. Annals of Operations Research, 163(1), 209–253.

    Article  Google Scholar 

  • Russell, M. (2011). Mining the social web: analyzing data from Facebook, Twitter, LinkedIn, and other social media sites. Media: O’Reilly.

    Google Scholar 

  • Shein, K. P. P., & Nyunt, T. T. S. (2010). Sentiment classification based on ontology and svm classifier. In Proceedings of the 2010 second international conference on communication software and networks (ICCSN ’10), Washington, DC, USA, pp. 169–172). Los Alamitos: IEEE Comput. Soc. doi:10.1109/ICCN.2010.35.

    Chapter  Google Scholar 

  • Silverman, B. (1986). Density estimation. Chapman & Hall/CRC monographs on statistics and applied probability series. London: Chapman & Hall.

    Book  Google Scholar 

  • Tikhonov, A., & Arsenin, V. (1977). Solutions of ill-posed problems. Scripta series in mathematics. New York: Winston.

    Google Scholar 

  • Wilson, T., Wiebe, J., & Hoffmann, P. (2009). Recognizing contextual polarity: an exploration of features for phrase-level sentiment analysis. Computational Linguistics, 35(3), 399–433. doi:10.1162/coli.08-012-R1-06-90.

    Article  Google Scholar 

  • Witten, I., Frank, E., & Hall, M. (2011). Data mining: practical machine learning tools and techniques. The Morgan Kaufmann series in data management systems. Amsterdam: Elsevier.

    Google Scholar 

  • Zheng, W., & Ye, Q. (2009). Sentiment classification of Chinese traveler reviews by support vector machine algorithm. In Proceedings of the 3rd international conference on intelligent information technology application (IITA’09), Piscataway, NJ, USA (pp. 335–338). New York: IEEE Press.

    Google Scholar 

Download references

Acknowledgements

Research supported by grants from the Spanish Ministry Science and Innovation, the Ministry of Industry Tourism and Trade and the Government of Madrid: RIESGOS-CM (Ref. CAM s2009/esp-1594), Agora.net, e-COLABORA, Corporate Community, Democracy4All, EDUCALAB (Ref. IPT-2011-1071-430000) and MyUniversity.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Felipe Ortega.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alfaro, C., Cano-Montero, J., Gómez, J. et al. A multi-stage method for content classification and opinion mining on weblog comments. Ann Oper Res 236, 197–213 (2016). https://doi.org/10.1007/s10479-013-1449-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10479-013-1449-6

Keywords

Navigation