Abstract
In this paper, we illustrate how to combine supervised machine learning algorithms and unsupervised learning techniques for sentiment analysis and opinion mining purposes. To this end, we describe a multi-stage method for the automatic detection of different opinion trends. The proposal has been tested on real textual data available from comments introduced in a weblog, connected to organizational and administrative affairs in a public educational institution. The use of the described tool, given its potential impact to obtain valuable knowledge from opinion streams created by commenters, may be straightforwardly extended, for example, to the detection of opinion trends concerning policy decision making or electoral campaigns.
Similar content being viewed by others
References
Baeza-Yates, R., & Ribeiro-Neto, B. (2011). Modern information retrieval: the concepts and technology behind search. Reading: Addison-Wesley. http://www.mir2ed.com.
Dietterich, T. (2000). Ensemble methods in machine learning. In Lecture notes in computer science: Vol. 1857. Multiple classifier systems (pp. 1–15). Berlin: Springer. doi:10.1007/3-540-45014-9_1.
Dumais, S., Platt, J., Heckerman, D., & Sahami, M. (1998). Inductive learning algorithms and representations for text categorization. In Proceedings of the seventh international conference on information and knowledge management (CIKM ’98) (pp. 148–155). New York: ACM. doi:10.1145/288627.288651.
Godbole, N., Srinivasaiah, M., & Skiena, S. (2007). Large-scale sentiment analysis for news and blogs. In Proceedings of the international conference on weblogs and social media (ICWSM).
Hamel, L. H. (2009). Knowledge discovery with support vector machines. New York: Wiley/Interscience.
Joachims, T. (2002). Learning to classify text using support vector machines. Methods, theory and algorithms. Norwell: Kluwer Academic.
Kressel, U. H. G. (1999). Pairwise classification and support vector machines. In C. J. C. B. Schölkopf & A. J. Smola (Eds.), Advances in kernel methods—support vector learning (pp. 255–268). Cambridge: MIT Press.
Li, N., & Wu, D. D. (2010). Using text mining and sentiment analysis for online forums hotspot detection and forecast. Decision Support Systems, 48(2), 354–368.
Liu, B. (2012). Sentiment analysis and opinion mining: synthesis lectures on human language technologies. San Rafael: Morgan & Claypool.
Mardia, K., Kent, J. T., & Bibby, J. (1979). Multivariate analysis. San Diego: Academic Press.
Mergel, I. A., Schweik, C. M., & Fountain, J. E. (2009). The transformational effect of web 2.0. Technologies on government. http://dx.doi.org/10.2139/ssrn.1412796.
Moguerza, J., & Muñoz, A. (2006). Support vector machines with applications. Statistical Science, 21(3), 322–336.
Muñoz, A., & Moguerza, J. M. (2005). Building smooth neighbourhood kernels via functional data analysis. In ICANN (Vol. 2, pp. 631–636).
Olson, D. L., & Delen, D. (2008). Advanced data mining techniques (1st ed.). Berlin: Springer.
O’Reilly, T. (2007). What is web 2.0: design patterns and business models for the next generation of software. Communications & Strategies, 1, 18–37. http://ssrn.com/abstract=1008839.
Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1–2), 1–135. doi:10.1561/1500000011.
Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on empirical methods in natural language processing (EMNLP ’02), Stroudsburg, PA, USA (Vol. 10, pp. 79–86). Association for Computational Linguistics. doi:10.3115/1118693.1118704.
R Core Team (2012). R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. http://www.R-project.org/. ISBN 3-900051-07-0.
Roberts, F. S. (2008). Computer science and decision theory. Annals of Operations Research, 163(1), 209–253.
Russell, M. (2011). Mining the social web: analyzing data from Facebook, Twitter, LinkedIn, and other social media sites. Media: O’Reilly.
Shein, K. P. P., & Nyunt, T. T. S. (2010). Sentiment classification based on ontology and svm classifier. In Proceedings of the 2010 second international conference on communication software and networks (ICCSN ’10), Washington, DC, USA, pp. 169–172). Los Alamitos: IEEE Comput. Soc. doi:10.1109/ICCN.2010.35.
Silverman, B. (1986). Density estimation. Chapman & Hall/CRC monographs on statistics and applied probability series. London: Chapman & Hall.
Tikhonov, A., & Arsenin, V. (1977). Solutions of ill-posed problems. Scripta series in mathematics. New York: Winston.
Wilson, T., Wiebe, J., & Hoffmann, P. (2009). Recognizing contextual polarity: an exploration of features for phrase-level sentiment analysis. Computational Linguistics, 35(3), 399–433. doi:10.1162/coli.08-012-R1-06-90.
Witten, I., Frank, E., & Hall, M. (2011). Data mining: practical machine learning tools and techniques. The Morgan Kaufmann series in data management systems. Amsterdam: Elsevier.
Zheng, W., & Ye, Q. (2009). Sentiment classification of Chinese traveler reviews by support vector machine algorithm. In Proceedings of the 3rd international conference on intelligent information technology application (IITA’09), Piscataway, NJ, USA (pp. 335–338). New York: IEEE Press.
Acknowledgements
Research supported by grants from the Spanish Ministry Science and Innovation, the Ministry of Industry Tourism and Trade and the Government of Madrid: RIESGOS-CM (Ref. CAM s2009/esp-1594), Agora.net, e-COLABORA, Corporate Community, Democracy4All, EDUCALAB (Ref. IPT-2011-1071-430000) and MyUniversity.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Alfaro, C., Cano-Montero, J., Gómez, J. et al. A multi-stage method for content classification and opinion mining on weblog comments. Ann Oper Res 236, 197–213 (2016). https://doi.org/10.1007/s10479-013-1449-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10479-013-1449-6