Abstract
The role of sentiment analysis is increasingly emerging to study software developers’ emotions by mining crowd-generated content within social software engineering tools. However, off-the-shelf sentiment analysis tools have been trained on non-technical domains and general-purpose social media, thus resulting in misclassifications of technical jargon and problem reports. Here, we present Senti4SD, a classifier specifically trained to support sentiment analysis in developers’ communication channels. Senti4SD is trained and validated using a gold standard of Stack Overflow questions, answers, and comments manually annotated for sentiment polarity. It exploits a suite of both lexicon- and keyword-based features, as well as semantic features based on word embedding. With respect to a mainstream off-the-shelf tool, which we use as a baseline, Senti4SD reduces the misclassifications of neutral and positive posts as emotionally negative. To encourage replications, we release a lab package including the classifier, the word embedding space, and the gold standard with annotation guidelines.
Similar content being viewed by others
Notes
The full lab package including Senti4SD, the DSM and the gold standard is available for download at: https://github.com/collab-uniba/Senti4SD
The evaluations have been performed using the SentiStrength Java API obtained from http://sentistrength.wlv.ac.uk/ on December 2016.
Source: http://stackexchange.com/sites#questions Last accessed: June ‘17
References
Anderson A, Huttenlocher D, Kleinberg J, Leskovec J (2012) Discovering value from community activity on focused question answering sites: A case study of stack overflow. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, NY, USA, KDD’12, pp 850–858, https://doi.org/10.1145/2339530.2339665
Asaduzzaman M, Mashiyat AS, Roy CK, Schneider KA (2013) Answering questions about unanswered questions of stack overflow. In: Proceedings of the 10th Working Conference on Mining Software Repositories, IEEE Press, Piscataway, NJ, USA, MSR ‘13, pp 97–100
Baroni M, Dinu G, Kruszewski G (2014) Don’t count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Baltimore, Maryland, pp 238–247
Barua A, Thomas SW, Hassan AE (2014) What are developers talking about? an analysis of topics and trends in stack over- flow. Empir Softw Eng 19(3):619–654. https://doi.org/10.1007/s10664-012-9231-y
Basile P, Novielli N (2015) Uniba: Sentiment analysis of English tweets combining micro-blogging, lexicon and semantic features. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), ACL, pp 595–600
Bengio Y, Ducharme R, Vincent P, Janvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155
Blaz CCA, Becker K (2016) Sentiment analysis in tickets for IT support. In: Proceedings of the 13th International Conference on Mining Software Repositories, ACM, New York, NY, USA, MSR ‘16, pp 235–246, https://doi.org/10.1145/2901739.2901781
Bollegala D, Weir D, Carroll J (2013) Cross-Domain sentiment classification using a sentiment sensitive thesaurus. IEEE Trans Knowl Data Eng 25(8):1719–1731. https://doi.org/10.1109/TKDE.2012.103
Calefato F, Lanubile F, Marasciulo MC, Novielli N (2015) Mining successful answers in stack overflow. In: Proceedings of the 12th Working Conference on Mining Software Repositories, IEEE Press, Piscataway, NJ, USA, MSR ‘15, pp 430–433
Carofiglio V, de Rosis F, Novielli N (2009) Cognitive Emotion Modeling in Natural Language Communication. Springer London, London, pp 23–44
Cohen J (1968) Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychological Bulletin
Collobert R, Weston J (2008) A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, ACM, New York, NY, USA, ICML ‘08, pp 160–167, https://doi.org/10.1145/1390156.1390177
Danescu-Niculescu-Mizil C, Sudhof M, Jurafsky D, Leskovec J, Potts C (2013) A computational approach to politeness with application to social factors. In: ACL (1), The Association for Computer Linguistics, pp 250–259
Ekman P (1999) Handbook of Cognition and Emotion. John Wiley & Sons Ltd
De Lucia A, Fasano F, Oliveto R, Tortora G (2007) Recovering traceability links in software artifact management sys- tems using information retrieval methods. ACM Trans Softw Eng Methodol 16(4). https://doi.org/10.1145/1276933.1276934
Denning PJ. (2012) Moods. Commun. ACM, 55(12):33–35
Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) Liblinear: A library for large linear classification. J Mach Learn Res 9:1871–1874 URL http://dl.acm.org/citation.cfm?id=1390681.1442794
Ford D and Parnin C (2015) Exploring causes of frustration for software developers. In CHASE, pages 115–116. IEEE Press
Gachechiladze D, Lanubile F, Novielli N, and Serebrenik A (2017). Anger and its direction in collaborative software development. In Proceedings of the 39th International Conference on Software Engineering: New Ideas and Emerging Results Track (ICSE-NIER '17). IEEE Press, Piscataway, NJ, USA, 11–14. https://doi.org/10.1109/ICSE-NIER.2017.18
Graziotin D, Fagerholm F, Wang X, Abrahamsson P (2017) Unhappy Developers: Bad for Themselves, Bad for Process, and Bad for Software Product. To appear as a poster paper in the Proceedings of the 39th International Conference on Software Engineering (ICSE '17)
Guzman E, Bruegge B (2013) Towards emotional awareness in software development teams. In: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, ACM, New York, NY, USA, ESEC/FSE 2013, pp 671–674, https://doi.org/10.1145/2491411.2494578
Guzman E, Azocar D, Li Y (2014) Sentiment analysis of commit comments in Github: An empirical study. In: Proceedings of the 11th Working Conference on Mining Software Repositories, ACM, New York, NY, USA, MSR 2014, pp 352–355, https://doi.org/10.1145/2597073.2597118
Guzman E, Alkadhi R, Seyff N (2016) A needle in a haystack: What do twitter users say about software? In: 24th IEEE International Requirements Engineering Conference In: Proceedings of the IEEE 24th International Requirements Engineering Conference (RE), pp. 96–105, https://doi.org/10.1109/RE.2016.67
He H, Garcia EA (2009) Learning from Imbalanced Data. IEEE Trans Knowl Data Eng 21(9):1263–1284. https://doi.org/10.1109/TKDE.2008.239
Helleputte T (2015) Liblinea R: Linear Predictive Models Based on the LIBLINEAR C/C++ Library. R package version 1.94-2
Hogenboom A, Frasincar F, de Jong F, Kaymak U (2015) Using rhetorical structure in sentiment analysis. Commun ACM 58(7):69–77. https://doi.org/10.1145/2699418
Islam MDR and Zibran MF (2017) Leveraging automated sentiment analysis in software engineering. In Proceedings of the 14th International Conference on Mining Software Repositories (MSR '17). IEEE Press, Piscataway, NJ, USA, 203–214. https://doi.org/10.1109/MSR.2017.9
Joachims T (1998) Text categorization with suport vector machines: Learning with many relevant features. In: Proceedings of the 10th European Conference on Machine Learning, Springer-Verlag, London, UK, UK, ECML ‘98, pp 137–142
Joachims T (2006) Training linear SVMs in linear time. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, NY, USA, KDD ‘06, pp 217–226, https://doi.org/10.1145/1150402.1150429
Jongeling R, Datta S, Serebrenik A (2015) Choosing your weapons: On sentiment analysis tools for software engineering research. In: Software Maintenance and Evolution (ICSME), 2015 I.E. International Conference on, pp 531–535, https://doi.org/10.1109/ICSM.2015.7332508
Kucuktunc O, Cambazoglu BB, Weber I, Ferhatosmanoglu H (2012) A large- scale sentiment analysis for Yahoo! answers. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, ACM, New York, NY, USA, WSDM ‘12, pp 633–642, https://doi.org/10.1145/2124295.2124371
Kuhn M (2016) Contributions from Jed Wing, S. Weston, A. Williams, C. Keefer, A. Engelhardt, T. Cooper, Z. Mayer, B. Kenkel, the R Core Team, M. Benesty, R. Lescarbeau, A. Ziem, L. Scrucca, Y. Tang, and C. Candan., caret: Classification and Regression Training, 2016, r package version 6.0–70. Available: https://CRAN.R- project.org/package=caret
Landauer TK, Dutnais ST (1997) A solution to Platos problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol Rev 104(2):211–240
Lazarus R (1991) Emotion and adaptation. Oxford University Press, New York
Levy O, Goldberg Y (2014) Neural word embedding as implicit matrix factorization. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (Eds) Advances in Neural Information Processing Systems 27, Curran Associates, Inc., pp 2177–2185, URL http://papers.nips.cc/paper/5477-neural-word-embedding-as- implicit-matrix-factorization.pdf
Maalej W, Kurtanovic Z, Nabil H, Stanik C (2016) On the automatic classification of app reviews. Requir Eng 21(3):311–331. https://doi.org/10.1007/s00766-016-0251-9
Manning CD, Surdeanu M, Bauer J, Finkel J, Bethard SJ, McClosky D (2014) The Stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp 55–60
Mäntylä M, Adams B, Destefanis G, Graziotin D, Ortu M (2016) Mining valence, arousal, and dominance: Possibilities for detecting burnout and productivity? In: Proceedings of the 13th International Conference on Mining Software Repositories, ACM, New York, NY, USA, MSR ‘16, pp 247–258, https://doi.org/10.1145/2901739.2901752
Mäntylä MV, Novielli N, Lanubile F, Claes M, and Kuutila M (2017) Bootstrapping a lexicon for emotional arousal in software engineering. In Proceedings of the 14th International Conference on Mining Software Repositories (MSR '17). IEEE Press, Piscataway, NJ, USA, 198-202. https://doi.org/10.1109/MSR.2017.47
Meta (2017). Meta Stack exchange is too harsh to new users. http://meta.stackexchange.com/questions/179003/stack- exchange-is-too-harsh-to- new-users-please-help-them-improve- low-quality-po, Last accessed: February 2017
Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. CoRR abs/1301.3781
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013b) Distributed representations of words and phrases and their compositionality. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (Eds) Advances in Neural Information Processing Systems 26, Cur- ran Associates, Inc., pp 3111–3119
Miller GA, Charles WG (1991) Contextual Correlates of Semantic Similarity. Lang Cogn Process 6(1):1–28. https://doi.org/10.1080/01690969108406936
Mitchell TM (1997) Machine Learning (1 ed.). McGraw-Hill, Inc., New York, NY, USA
Mohammad SM (2016) Sentiment analysis: Detecting valence, emotions, and other affectual states from text. In: Meiselman H (Ed) Emotion Measurement, Elsevier
Mohammad SM, Kiritchenko S, Zhu X (2013) NRC-Canada: Building the state-of-the-art in sentiment analysis of tweets. CoRR abs/1308.6242, URL http://arxiv.org/abs/1308.6242
Müller SC and Fritz T (2015) Stuck and frustrated or in flow and happy: sensing developers' emotions and progress. In Proceedings of the 37th International Conference on Software Engineering - Volume 1 (ICSE '15), Vol. 1. IEEE Press, Piscataway, 688-699
Murgia A, Tourani P, Adams B, Ortu M (2014) Do developers feel emotions? An exploratory analysis of emotions in software artifacts. In: Proceedings of the 11th Working Conference on Mining Software Repositories, ACM, New York, MSR 2014, pp 262–271, https://doi.org/10.1145/2597073.2597086
Novielli N, Strapparava C (2013) The role of affect analysis in dialogue act identification. IEEE Trans Affect Comput 4(4):439–451. https://doi.org/10.1109/T-AFFC.2013.20
Novielli N, Calefato F, Lanubile F (2014) Towards discovering the role of emotions in Stack Overflow. In Proceedings of the 6th International Workshop on Social Software Engineering (SSE 2014). ACM, New York, 33-36. https://doi.org/10.1145/2661685.2661689
Novielli N, Calefato F, Lanubile F (2015) The challenges of sentiment detection in the social programmer ecosystem. In: Proceedings of the 7th International Workshop on Social Software Engineering, ACM, New York, SSE 2015, pp 33–40, https://doi.org/10.1145/2804381.2804387
Ortu M, Adams B, Destefanis G, Tourani P, Marchesi M, Tonelli R (2015) Are bullies more productive?: Empirical study of affectiveness vs. issue fixing time. In: Proceedings of the 12th Working Conference on Mining Software Repositories, IEEE Press, Piscataway, NJ, USA, MSR ‘15, pp 303–313
Ortu M, Murgia A, Destefanis G, Tourani P, Tonelli R, Marchesi M, Adams B (2016) The emotional side of software developers in Jira. In: Proceedings of the 13th International Conference on Mining Software Repositories, ACM, New York, NY, USA, MSR ‘16, pp 480–483, https://doi.org/10.1145/2901739.2903505
Pang B, Lee L (2008) Opinion mining and sentiment anal- ysis. Found Trends Inf Retr 2(1–2):1–135. https://doi.org/10.1561/1500000011
Panichella S, Sorbo AD, Guzman E, Visaggio A, Canfora G, Gall H (2015) How can i improve my app? classifying user reviews for software maintenance and evolution. 31st IEEE International Conference on Software Maintenance and Evolution
Pennebaker J and Francis M, Linguistic Inquiry and Word Count: LIWC. Erlbaum Publishers, 2001
Pletea D, Vasilescu B, and Serebrenik A (2014) Security and emotion: sentiment analysis of security discussions on GitHub. In Proceedings of the 11th Working Conference on Mining Software Repositories (MSR 2014). ACM, New York, NY, USA, 348-351. https://doi.org/10.1145/2597073.2597117
R Development Core Team (2008) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna http://www.R-project.org, ISBN 3-900051-07-0
Rahman MM, Roy CK, Keivanloo I (2015) Recommending insightful comments for source code using crowdsourced knowledge. In: 15th IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2015, Bremen, Germany, September 27–28, 2015, pp 81–90, https://doi.org/10.1109/SCAM.2015.7335404
Russell J (1980) A circumplex model of affect. J Pers Soc Psychol 39(6):1161–1178
Saif H, Fernandez M, He Y, Alani H (2014) On stopwords, filtering and data sparsity for sentiment analysis of twitter. In: Chair) NCC, Choukri K, Declerck T, Loftsson H, Maegaard B, Mariani J, Moreno A, Odijk J, Piperidis S (eds) Proceedings of the Ninth International Conference on Language Re- sources and Evaluation (LREC’14), European Language Resources Association (ELRA), Reykjavik, Iceland
Scherer K, Wranik T, Sangsue J, Tran V, Scherer U (2004) Emotions in everyday life: Probability of oc- currence, risk factors, appraisal and reaction patterns. Soc Sci Inf 43(4):499–570
Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1):1–47. https://doi.org/10.1145/505282.505283
SEmotion (2016) Proceedings of the 1st International Workshop on Emotion Awareness in Software Engineering. ACM, New York
Shaver P, Schwartz J, Kirson D, O’Connor C (1987) Emotion knowledge: Further exploration of a prototype approach. J Pers Soc Psychol 52(6):1061–1086. https://doi.org/10.1037//0022-3514.52.6.1061
Sinha V, Lazar A, Sharif B (2016) Analyzing developer sentiment in commit logs. In: Proceedings of the 13th International Conference on Mining Software Repositories, ACM, New York, NY, USA, MSR ‘16, pp 520–523, https://doi.org/10.1145/2901739.2903501
Smolensky P (1990) Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artif Intell 46(1–2):159–216. https://doi.org/10.1016/0004-3702(90)90007-M
Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng AY, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Stroudsburg, PA, pp 1631–1642
Strapparava C, Valitutti A (2004) WordNet-Affect: an affective extension of WordNet. In: Proceedings of LREC, vol 4, pp 1083–1086
Stone PJ, Dunphy DC, Smith MS, Ogilvie DM (1966) The general inquirer: A computer approach to content analysis. The MIT Press, Cambridge, MA
Thelwall M, Buckley K, Paltoglou G (2012) Sentiment strength detection for the social web. J Am Soc Inf Sci Technol 63(1):163–173. https://doi.org/10.1002/asi.21662
Tian Y, Lo D, Lawall J (2014) Sewordsim: Software-specific word similarity database. In: Companion Proceedings of the 36th International Conference on Software Engineering, ACM, New York, NY, USA, ICSE Companion 2014, pp 568–571, https://doi.org/10.1145/2591062.2591071
Tromp E, Pechenizkiy M (2015) Pattern-based emotion classification on social media. In: Gaber MM, Cocea M, Wiratunga N, Goker A (eds) Advances in social media analysis. Studies in Computational Intelligence, vol 602. Springer, Cham
Wittgenstein L (1965) Philosophical Investigations. The Macmillan Company, New York
Ye X, Shen H, Ma X, Bunescu RC, Liu C (2016) From word embeddings to document similarities for improved information retrieval in software engineering. In: Proceedings of the 38th International Conference on Software Engineering, ICSE 2016, Austin, TX, USA, May 14–22, 2016, pp 404–415, https://doi.org/10.1145/2884781.2884862
Acknowledgements
This work is partially supported by the project ‘EmoQuest - Investigating the Role of Emotions in Online Question & Answer Sites’, funded by the Italian Ministry of Education, University and Research (MIUR) under the program “Scientific Independence of young Researchers” (SIR). The computational work has been executed on the IT resources made available by two projects, ReCaS and PRISMA, funded by MIUR under the program “PON R&C 2007–2013”. We thank Pierpaolo Basile for insightful discussions and helpful comments and the annotators involved in the gold standard building.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Yasutaka Kamei
Appendix: Coding Guidelines
Appendix: Coding Guidelines
In the following, we report the task description and the guidelines used for training the coders involved in the emotion annotation study.
Task Description and Annotation Guidelines. You are invited to take part in the annotation study of developers contributed texts in Stack Overflow. We are interested in annotating the presence of emotions in technical documents authored by developers during their online interactions.
The data source is the official Stack Overflow dump released by Stack Exchange on May ‘15. You will be required to annotate randomly selected posts, including questions, answers, and comments. The unit of annotation is the entire post.
You will use the coding schema reported in Appendix Table 12. For each post, please indicate what emotion it conveys (if any) among the basic emotions (first column in the table), that are, love, joy, surprise, anger, sadness, and fear. Multiple Emotion labels are allowed but you should try to avoid if possible. You can use the second and third level in the schema as a reference for choosing the primary emotion, as shown in Appendix Table 13.
Once you define the emotion label, please specify the emotion polarity accordingly, choosing among positive, negative, neutral, and mixed. If the post does not contain any emotion, it should be annotated as neutral. The surprise is the only emotion that could match any of the polarity value: please, carefully evaluate each post in order to determine if it conveys positive, negative, or neutral polarity. If multiple emotion labels are indicated in a given text, you should define the polarity accordingly. A text annotated with one or more positive emotions only has a positive polarity. Conversely, a post annotated with one or more negative emotions holds a negative polarity. If both positive and negative emotions are found, you should indicate both. If you wish to indicate a polarity label you are required to specify the corresponding emotion. The absence of emotion can be annotated exclusively as neutral. The list of all possible combination allowed and not allowed by our coding schema is reported in Appendix Table 14.
Rights and permissions
About this article
Cite this article
Calefato, F., Lanubile, F., Maiorano, F. et al. Sentiment Polarity Detection for Software Development. Empir Software Eng 23, 1352–1382 (2018). https://doi.org/10.1007/s10664-017-9546-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-017-9546-9