Skip to main content
Log in

Sentiment Polarity Detection for Software Development

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

The role of sentiment analysis is increasingly emerging to study software developers’ emotions by mining crowd-generated content within social software engineering tools. However, off-the-shelf sentiment analysis tools have been trained on non-technical domains and general-purpose social media, thus resulting in misclassifications of technical jargon and problem reports. Here, we present Senti4SD, a classifier specifically trained to support sentiment analysis in developers’ communication channels. Senti4SD is trained and validated using a gold standard of Stack Overflow questions, answers, and comments manually annotated for sentiment polarity. It exploits a suite of both lexicon- and keyword-based features, as well as semantic features based on word embedding. With respect to a mainstream off-the-shelf tool, which we use as a baseline, Senti4SD reduces the misclassifications of neutral and positive posts as emotionally negative. To encourage replications, we release a lab package including the classifier, the word embedding space, and the gold standard with annotation guidelines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. The full lab package including Senti4SD, the DSM and the gold standard is available for download at: https://github.com/collab-uniba/Senti4SD

  2. https://github.com/dav/word2vec

  3. The evaluations have been performed using the SentiStrength Java API obtained from http://sentistrength.wlv.ac.uk/ on December 2016.

  4. https://help.github.com/articles/locking-conversations

  5. Source: http://stackexchange.com/sites#questions Last accessed: June ‘17

References

  • Anderson A, Huttenlocher D, Kleinberg J, Leskovec J (2012) Discovering value from community activity on focused question answering sites: A case study of stack overflow. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, NY, USA, KDD’12, pp 850–858, https://doi.org/10.1145/2339530.2339665

  • Asaduzzaman M, Mashiyat AS, Roy CK, Schneider KA (2013) Answering questions about unanswered questions of stack overflow. In: Proceedings of the 10th Working Conference on Mining Software Repositories, IEEE Press, Piscataway, NJ, USA, MSR ‘13, pp 97–100

  • Baroni M, Dinu G, Kruszewski G (2014) Don’t count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Baltimore, Maryland, pp 238–247

    Chapter  Google Scholar 

  • Barua A, Thomas SW, Hassan AE (2014) What are developers talking about? an analysis of topics and trends in stack over- flow. Empir Softw Eng 19(3):619–654. https://doi.org/10.1007/s10664-012-9231-y

    Article  Google Scholar 

  • Basile P, Novielli N (2015) Uniba: Sentiment analysis of English tweets combining micro-blogging, lexicon and semantic features. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), ACL, pp 595–600

  • Bengio Y, Ducharme R, Vincent P, Janvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155

    MATH  Google Scholar 

  • Blaz CCA, Becker K (2016) Sentiment analysis in tickets for IT support. In: Proceedings of the 13th International Conference on Mining Software Repositories, ACM, New York, NY, USA, MSR ‘16, pp 235–246, https://doi.org/10.1145/2901739.2901781

  • Bollegala D, Weir D, Carroll J (2013) Cross-Domain sentiment classification using a sentiment sensitive thesaurus. IEEE Trans Knowl Data Eng 25(8):1719–1731. https://doi.org/10.1109/TKDE.2012.103

    Article  Google Scholar 

  • Calefato F, Lanubile F, Marasciulo MC, Novielli N (2015) Mining successful answers in stack overflow. In: Proceedings of the 12th Working Conference on Mining Software Repositories, IEEE Press, Piscataway, NJ, USA, MSR ‘15, pp 430–433

  • Carofiglio V, de Rosis F, Novielli N (2009) Cognitive Emotion Modeling in Natural Language Communication. Springer London, London, pp 23–44

    Google Scholar 

  • Cohen J (1968) Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychological Bulletin

  • Collobert R, Weston J (2008) A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, ACM, New York, NY, USA, ICML ‘08, pp 160–167, https://doi.org/10.1145/1390156.1390177

  • Danescu-Niculescu-Mizil C, Sudhof M, Jurafsky D, Leskovec J, Potts C (2013) A computational approach to politeness with application to social factors. In: ACL (1), The Association for Computer Linguistics, pp 250–259

  • Ekman P (1999) Handbook of Cognition and Emotion. John Wiley & Sons Ltd

  • De Lucia A, Fasano F, Oliveto R, Tortora G (2007) Recovering traceability links in software artifact management sys- tems using information retrieval methods. ACM Trans Softw Eng Methodol 16(4). https://doi.org/10.1145/1276933.1276934

    Article  Google Scholar 

  • Denning PJ. (2012) Moods. Commun. ACM, 55(12):33–35

    Article  Google Scholar 

  • Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) Liblinear: A library for large linear classification. J Mach Learn Res 9:1871–1874 URL http://dl.acm.org/citation.cfm?id=1390681.1442794

    MATH  Google Scholar 

  • Ford D and Parnin C (2015) Exploring causes of frustration for software developers. In CHASE, pages 115–116. IEEE Press

  • Gachechiladze D, Lanubile F, Novielli N, and Serebrenik A (2017). Anger and its direction in collaborative software development. In Proceedings of the 39th International Conference on Software Engineering: New Ideas and Emerging Results Track (ICSE-NIER '17). IEEE Press, Piscataway, NJ, USA, 11–14. https://doi.org/10.1109/ICSE-NIER.2017.18

  • Graziotin D, Fagerholm F, Wang X, Abrahamsson P (2017) Unhappy Developers: Bad for Themselves, Bad for Process, and Bad for Software Product. To appear as a poster paper in the Proceedings of the 39th International Conference on Software Engineering (ICSE '17)

  • Guzman E, Bruegge B (2013) Towards emotional awareness in software development teams. In: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, ACM, New York, NY, USA, ESEC/FSE 2013, pp 671–674, https://doi.org/10.1145/2491411.2494578

  • Guzman E, Azocar D, Li Y (2014) Sentiment analysis of commit comments in Github: An empirical study. In: Proceedings of the 11th Working Conference on Mining Software Repositories, ACM, New York, NY, USA, MSR 2014, pp 352–355, https://doi.org/10.1145/2597073.2597118

  • Guzman E, Alkadhi R, Seyff N (2016) A needle in a haystack: What do twitter users say about software? In: 24th IEEE International Requirements Engineering Conference In: Proceedings of the IEEE 24th International Requirements Engineering Conference (RE), pp. 96–105, https://doi.org/10.1109/RE.2016.67

  • He H, Garcia EA (2009) Learning from Imbalanced Data. IEEE Trans Knowl Data Eng 21(9):1263–1284. https://doi.org/10.1109/TKDE.2008.239

    Article  Google Scholar 

  • Helleputte T (2015) Liblinea R: Linear Predictive Models Based on the LIBLINEAR C/C++ Library. R package version 1.94-2

  • Hogenboom A, Frasincar F, de Jong F, Kaymak U (2015) Using rhetorical structure in sentiment analysis. Commun ACM 58(7):69–77. https://doi.org/10.1145/2699418

    Article  Google Scholar 

  • Islam MDR and Zibran MF (2017) Leveraging automated sentiment analysis in software engineering. In Proceedings of the 14th International Conference on Mining Software Repositories (MSR '17). IEEE Press, Piscataway, NJ, USA, 203–214. https://doi.org/10.1109/MSR.2017.9

  • Joachims T (1998) Text categorization with suport vector machines: Learning with many relevant features. In: Proceedings of the 10th European Conference on Machine Learning, Springer-Verlag, London, UK, UK, ECML ‘98, pp 137–142

    Chapter  Google Scholar 

  • Joachims T (2006) Training linear SVMs in linear time. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, NY, USA, KDD ‘06, pp 217–226, https://doi.org/10.1145/1150402.1150429

  • Jongeling R, Datta S, Serebrenik A (2015) Choosing your weapons: On sentiment analysis tools for software engineering research. In: Software Maintenance and Evolution (ICSME), 2015 I.E. International Conference on, pp 531–535, https://doi.org/10.1109/ICSM.2015.7332508

  • Kucuktunc O, Cambazoglu BB, Weber I, Ferhatosmanoglu H (2012) A large- scale sentiment analysis for Yahoo! answers. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, ACM, New York, NY, USA, WSDM ‘12, pp 633–642, https://doi.org/10.1145/2124295.2124371

  • Kuhn M (2016) Contributions from Jed Wing, S. Weston, A. Williams, C. Keefer, A. Engelhardt, T. Cooper, Z. Mayer, B. Kenkel, the R Core Team, M. Benesty, R. Lescarbeau, A. Ziem, L. Scrucca, Y. Tang, and C. Candan., caret: Classification and Regression Training, 2016, r package version 6.0–70. Available: https://CRAN.R- project.org/package=caret

  • Landauer TK, Dutnais ST (1997) A solution to Platos problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol Rev 104(2):211–240

    Article  Google Scholar 

  • Lazarus R (1991) Emotion and adaptation. Oxford University Press, New York

    Google Scholar 

  • Levy O, Goldberg Y (2014) Neural word embedding as implicit matrix factorization. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (Eds) Advances in Neural Information Processing Systems 27, Curran Associates, Inc., pp 2177–2185, URL http://papers.nips.cc/paper/5477-neural-word-embedding-as- implicit-matrix-factorization.pdf

  • Maalej W, Kurtanovic Z, Nabil H, Stanik C (2016) On the automatic classification of app reviews. Requir Eng 21(3):311–331. https://doi.org/10.1007/s00766-016-0251-9

    Article  Google Scholar 

  • Manning CD, Surdeanu M, Bauer J, Finkel J, Bethard SJ, McClosky D (2014) The Stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp 55–60

  • Mäntylä M, Adams B, Destefanis G, Graziotin D, Ortu M (2016) Mining valence, arousal, and dominance: Possibilities for detecting burnout and productivity? In: Proceedings of the 13th International Conference on Mining Software Repositories, ACM, New York, NY, USA, MSR ‘16, pp 247–258, https://doi.org/10.1145/2901739.2901752

  • Mäntylä MV, Novielli N, Lanubile F, Claes M, and Kuutila M (2017) Bootstrapping a lexicon for emotional arousal in software engineering. In Proceedings of the 14th International Conference on Mining Software Repositories (MSR '17). IEEE Press, Piscataway, NJ, USA, 198-202. https://doi.org/10.1109/MSR.2017.47

  • Meta (2017). Meta Stack exchange is too harsh to new users. http://meta.stackexchange.com/questions/179003/stack- exchange-is-too-harsh-to- new-users-please-help-them-improve- low-quality-po, Last accessed: February 2017

  • Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. CoRR abs/1301.3781

  • Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013b) Distributed representations of words and phrases and their compositionality. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (Eds) Advances in Neural Information Processing Systems 26, Cur- ran Associates, Inc., pp 3111–3119

  • Miller GA, Charles WG (1991) Contextual Correlates of Semantic Similarity. Lang Cogn Process 6(1):1–28. https://doi.org/10.1080/01690969108406936

    Article  Google Scholar 

  • Mitchell TM (1997) Machine Learning (1 ed.). McGraw-Hill, Inc., New York, NY, USA

  • Mohammad SM (2016) Sentiment analysis: Detecting valence, emotions, and other affectual states from text. In: Meiselman H (Ed) Emotion Measurement, Elsevier

  • Mohammad SM, Kiritchenko S, Zhu X (2013) NRC-Canada: Building the state-of-the-art in sentiment analysis of tweets. CoRR abs/1308.6242, URL http://arxiv.org/abs/1308.6242

  • Müller SC and Fritz T (2015) Stuck and frustrated or in flow and happy: sensing developers' emotions and progress. In Proceedings of the 37th International Conference on Software Engineering - Volume 1 (ICSE '15), Vol. 1. IEEE Press, Piscataway, 688-699

  • Murgia A, Tourani P, Adams B, Ortu M (2014) Do developers feel emotions? An exploratory analysis of emotions in software artifacts. In: Proceedings of the 11th Working Conference on Mining Software Repositories, ACM, New York, MSR 2014, pp 262–271, https://doi.org/10.1145/2597073.2597086

  • Novielli N, Strapparava C (2013) The role of affect analysis in dialogue act identification. IEEE Trans Affect Comput 4(4):439–451. https://doi.org/10.1109/T-AFFC.2013.20

    Article  Google Scholar 

  • Novielli N, Calefato F, Lanubile F (2014) Towards discovering the role of emotions in Stack Overflow. In Proceedings of the 6th International Workshop on Social Software Engineering (SSE 2014). ACM, New York, 33-36. https://doi.org/10.1145/2661685.2661689

  • Novielli N, Calefato F, Lanubile F (2015) The challenges of sentiment detection in the social programmer ecosystem. In: Proceedings of the 7th International Workshop on Social Software Engineering, ACM, New York, SSE 2015, pp 33–40, https://doi.org/10.1145/2804381.2804387

  • Ortu M, Adams B, Destefanis G, Tourani P, Marchesi M, Tonelli R (2015) Are bullies more productive?: Empirical study of affectiveness vs. issue fixing time. In: Proceedings of the 12th Working Conference on Mining Software Repositories, IEEE Press, Piscataway, NJ, USA, MSR ‘15, pp 303–313

  • Ortu M, Murgia A, Destefanis G, Tourani P, Tonelli R, Marchesi M, Adams B (2016) The emotional side of software developers in Jira. In: Proceedings of the 13th International Conference on Mining Software Repositories, ACM, New York, NY, USA, MSR ‘16, pp 480–483, https://doi.org/10.1145/2901739.2903505

  • Pang B, Lee L (2008) Opinion mining and sentiment anal- ysis. Found Trends Inf Retr 2(1–2):1–135. https://doi.org/10.1561/1500000011

    Article  Google Scholar 

  • Panichella S, Sorbo AD, Guzman E, Visaggio A, Canfora G, Gall H (2015) How can i improve my app? classifying user reviews for software maintenance and evolution. 31st IEEE International Conference on Software Maintenance and Evolution

  • Pennebaker J and Francis M, Linguistic Inquiry and Word Count: LIWC. Erlbaum Publishers, 2001

  • Pletea D, Vasilescu B, and Serebrenik A (2014) Security and emotion: sentiment analysis of security discussions on GitHub. In Proceedings of the 11th Working Conference on Mining Software Repositories (MSR 2014). ACM, New York, NY, USA, 348-351. https://doi.org/10.1145/2597073.2597117

  • R Development Core Team (2008) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna http://www.R-project.org, ISBN 3-900051-07-0

    Google Scholar 

  • Rahman MM, Roy CK, Keivanloo I (2015) Recommending insightful comments for source code using crowdsourced knowledge. In: 15th IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2015, Bremen, Germany, September 27–28, 2015, pp 81–90, https://doi.org/10.1109/SCAM.2015.7335404

  • Russell J (1980) A circumplex model of affect. J Pers Soc Psychol 39(6):1161–1178

    Article  Google Scholar 

  • Saif H, Fernandez M, He Y, Alani H (2014) On stopwords, filtering and data sparsity for sentiment analysis of twitter. In: Chair) NCC, Choukri K, Declerck T, Loftsson H, Maegaard B, Mariani J, Moreno A, Odijk J, Piperidis S (eds) Proceedings of the Ninth International Conference on Language Re- sources and Evaluation (LREC’14), European Language Resources Association (ELRA), Reykjavik, Iceland

  • Scherer K, Wranik T, Sangsue J, Tran V, Scherer U (2004) Emotions in everyday life: Probability of oc- currence, risk factors, appraisal and reaction patterns. Soc Sci Inf 43(4):499–570

    Article  Google Scholar 

  • Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1):1–47. https://doi.org/10.1145/505282.505283

    Article  Google Scholar 

  • SEmotion (2016) Proceedings of the 1st International Workshop on Emotion Awareness in Software Engineering. ACM, New York

    Google Scholar 

  • Shaver P, Schwartz J, Kirson D, O’Connor C (1987) Emotion knowledge: Further exploration of a prototype approach. J Pers Soc Psychol 52(6):1061–1086. https://doi.org/10.1037//0022-3514.52.6.1061

    Article  Google Scholar 

  • Sinha V, Lazar A, Sharif B (2016) Analyzing developer sentiment in commit logs. In: Proceedings of the 13th International Conference on Mining Software Repositories, ACM, New York, NY, USA, MSR ‘16, pp 520–523, https://doi.org/10.1145/2901739.2903501

  • Smolensky P (1990) Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artif Intell 46(1–2):159–216. https://doi.org/10.1016/0004-3702(90)90007-M

    Article  MathSciNet  MATH  Google Scholar 

  • Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng AY, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Stroudsburg, PA, pp 1631–1642

  • Strapparava C, Valitutti A (2004) WordNet-Affect: an affective extension of WordNet. In: Proceedings of LREC, vol 4, pp 1083–1086

  • Stone PJ, Dunphy DC, Smith MS, Ogilvie DM (1966) The general inquirer: A computer approach to content analysis. The MIT Press, Cambridge, MA

    Google Scholar 

  • Thelwall M, Buckley K, Paltoglou G (2012) Sentiment strength detection for the social web. J Am Soc Inf Sci Technol 63(1):163–173. https://doi.org/10.1002/asi.21662

    Article  Google Scholar 

  • Tian Y, Lo D, Lawall J (2014) Sewordsim: Software-specific word similarity database. In: Companion Proceedings of the 36th International Conference on Software Engineering, ACM, New York, NY, USA, ICSE Companion 2014, pp 568–571, https://doi.org/10.1145/2591062.2591071

  • Tromp E, Pechenizkiy M (2015) Pattern-based emotion classification on social media. In: Gaber MM, Cocea M, Wiratunga N, Goker A (eds) Advances in social media analysis. Studies in Computational Intelligence, vol 602. Springer, Cham

  • Wittgenstein L (1965) Philosophical Investigations. The Macmillan Company, New York

    MATH  Google Scholar 

  • Ye X, Shen H, Ma X, Bunescu RC, Liu C (2016) From word embeddings to document similarities for improved information retrieval in software engineering. In: Proceedings of the 38th International Conference on Software Engineering, ICSE 2016, Austin, TX, USA, May 14–22, 2016, pp 404–415, https://doi.org/10.1145/2884781.2884862

Download references

Acknowledgements

This work is partially supported by the project ‘EmoQuest - Investigating the Role of Emotions in Online Question & Answer Sites’, funded by the Italian Ministry of Education, University and Research (MIUR) under the program “Scientific Independence of young Researchers” (SIR). The computational work has been executed on the IT resources made available by two projects, ReCaS and PRISMA, funded by MIUR under the program “PON R&C 2007–2013”. We thank Pierpaolo Basile for insightful discussions and helpful comments and the annotators involved in the gold standard building.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nicole Novielli.

Additional information

Communicated by: Yasutaka Kamei

Appendix: Coding Guidelines

Appendix: Coding Guidelines

In the following, we report the task description and the guidelines used for training the coders involved in the emotion annotation study.

Task Description and Annotation Guidelines. You are invited to take part in the annotation study of developers contributed texts in Stack Overflow. We are interested in annotating the presence of emotions in technical documents authored by developers during their online interactions.

The data source is the official Stack Overflow dump released by Stack Exchange on May ‘15. You will be required to annotate randomly selected posts, including questions, answers, and comments. The unit of annotation is the entire post.

You will use the coding schema reported in Appendix Table 12. For each post, please indicate what emotion it conveys (if any) among the basic emotions (first column in the table), that are, love, joy, surprise, anger, sadness, and fear. Multiple Emotion labels are allowed but you should try to avoid if possible. You can use the second and third level in the schema as a reference for choosing the primary emotion, as shown in Appendix Table 13.

Once you define the emotion label, please specify the emotion polarity accordingly, choosing among positive, negative, neutral, and mixed. If the post does not contain any emotion, it should be annotated as neutral. The surprise is the only emotion that could match any of the polarity value: please, carefully evaluate each post in order to determine if it conveys positive, negative, or neutral polarity. If multiple emotion labels are indicated in a given text, you should define the polarity accordingly. A text annotated with one or more positive emotions only has a positive polarity. Conversely, a post annotated with one or more negative emotions holds a negative polarity. If both positive and negative emotions are found, you should indicate both. If you wish to indicate a polarity label you are required to specify the corresponding emotion. The absence of emotion can be annotated exclusively as neutral. The list of all possible combination allowed and not allowed by our coding schema is reported in Appendix Table 14.

Table 12 Mapping the Shaver et al. emotion framework to sentiment polarity
Table 13 Examples of Annotated Posts
Table 14 Combinations of Values Allowed and Not allowed by Our Annotation Schema

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Calefato, F., Lanubile, F., Maiorano, F. et al. Sentiment Polarity Detection for Software Development. Empir Software Eng 23, 1352–1382 (2018). https://doi.org/10.1007/s10664-017-9546-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-017-9546-9

Keywords

Navigation