Understanding the social evolution of the Java community in Stack Overflow: A 10-year study of developer interactions
Introduction
In the last decades, community-based question-and-answer (Q&A) sites have become very popular and have enabled knowledge sharing at unprecedented levels. Stack Overflow is the de facto Q&A website for topics in Computer Science. Current platform statistics account for 10 million users, 18 million questions, and 27 million answers (71% of questions answered) [1]. Moreover, the number of programming languages in use has increased. In 2018, the Developer’s Survey of Stack Overflow listed 38 different programming languages within the most loved, dreaded, and wanted languages.
Recently, Stack Overflow released its posts as online archives (https://archive.org/download/stackexchange), which paved the way to streamline the analysis of these conversation threads in various useful ways.
The present work complements current literature by introducing a new methodology of analysis that tackles the quality of service and user reputation in Q&A platforms from the user’s perspective. More specifically, this methodology aims to improve the user experience by proposing practical recommendations on how the user can point his questions in the right direction, i.e. reducing the number of questions with no answers, routing questions to the right answers, and promoting the content quality of the platform by identifying low-quality contents. As a meaningful case study, this paper explores the social evolution experienced by the Java developer community on Stack Overflow, i.e. an in-depth look into the topics that have motivated more discussion over the years, the evolving of social dynamics, including user altruism and reputation, and the cross-reference of internal contents as well as external sources.
To the best of our knowledge, such an integrative analysis has not been presented before. A number of works exist addressing similar topics though. The related work section describes some of these works while pinpointing the new, hereby presented contributions.
Section snippets
Related work
Understanding the dynamics of participation in Q&A platforms is essential to improve the value of crowdsourced knowledge and the quality of service as well as to promote user engagement. Many works have focused on the platform’s needs and challenges, but few works address the user’s perspective, namely the duality of being information seekers and information producers.
The increasing number of low-value, unanswered questions has prompted the need to learn how to pose well-received questions [2].
Data retrieval and preparation
Conversation threads tagged as Java-related and posted from 2008 till 2018 were downloaded through the Stack Overflow archives (https://archive.org/details/stackexchange). Thisamounted in a total of 3.33 million posts, from which 1.8 million represented questions and, within this set, approximately 0.9 million had answers (i.e. closed questions, referred here as Q&A pairs). These communication threads were sustained by a total of 0.3 million unique users.
The textual information in Q&A pairs,
Results and discussion
The study of the social interplay of the Java community in Stack Overflow throughout the last decade enabled the evaluation of the proposed methodology, in terms of correctness and robustness, as well as scalability in practical domains. The next sections describe the community evolution in general terms and then, the modelling of Q&A contents and the modelling of user intrinsic motivation.
Contents modelling aims to bring forward valuable, actionable information towards improving question
Conclusions and future work
This paper presents a methodology that combines machine learning and graph mining techniques to analyse the quality of service and user reputation of communities in Q&A platforms. To be able to grasp how to formulate questions properly is beneficial not only for the information seekers, because it increases the likelihood of receiving support, but also for the whole community, since it enhances effective knowledge-sharing behaviour, and, most notably, the creation of long-lasting value pieces
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
This study was supported by the Consellería de Educación, Universidades e Formación Profesional (Xunta de Galicia), Spain under the scope of the strategic funding of ED431C2018/55-GRC Competitive Reference Group, and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2019 unit. SING group thanks CITI (Centro de Investigación, Transferencia e Innovación) from the University of Vigo for hosting its IT infrastructure.
Guillermo Blanco González is Ph.D. student of Computer Science of the University of Vigo. He is currently developing advanced computational methods for modelling social dynamics, namely in biological ecosystems.
References (31)
- et al.
How to ask for technical help? Evidence-based guidelines for writing questions on Stack Overflow
Inf. Softw. Technol.
(2018) - et al.
Centrality in social networks: ii. experimental results
Soc. Netw.
(1979) Centrality in social networks conceptual clarification
Soc. Netw.
(1978)- StackExchange,...
- et al.
Topic shifts in StackOverflow: Ask it like socrates
- et al.
ColdRoute: effective routing of cold questions in stack exchange sites
Data Min. Knowl. Discov.
(2018) - et al.
Expert2Vec: Experts representation in community question answering for question routing
- et al.
Determining the popularity of design patterns used by programmers based on the analysis of questions and answers on stackoverflow.com social network
Computing
(2014) - et al.
Architectural knowledge for technology decisions in developer communities: An exploratory study with StackOverflow
- et al.
Learning to mine parallel natural language/source code corpora from stack overflow
Predicting the programming language of questions and snippets of StackOverflow using natural language processing
Toxic code snippets on stack overflow
IEEE Trans. Softw. Eng.
StackInTheFlow: Behavior-driven recommendation system for stack overflow posts
Recommendflow: Use topic model to automatically recommend stack overflow Q & A in IDE
Mining StackOverflow to turn the IDE into a self-confident programming prompter
Cited by (12)
A study on classifying Stack Overflow questions based on difficulty by utilizing contextual features
2024, Journal of Systems and SoftwareGender screening on question-answering communities
2023, Expert Systems with ApplicationsCitation Excerpt :An interesting discovery, made by Ford, Harkins, and Parnin (2017), indicates that women, who encounter female fellows, are more plausible to engage sooner than those who did not in Stack Overflow. Another key finding unveils that feminine members tend to ask more while masculine to respond more, resulting in less thumb-ups, and consequently, giving rise to lower average reputation scores for females (Blanco, Pérez-López, Fdez-Riverola, & cia Lourenço, 2020; May, Wachs, & Hannák, 2019; Wang, 2018). With this in mind, a reputation strategy was devised to reduce the gender gap via rewarding points for asking and answering to the same level.
New trends and applications in social media analytics
2021, Future Generation Computer SystemsCitation Excerpt :Next section provides a brief description of the main contents of each article. The papers selected for this issue are about new trends and applications in domains like social networks, big data, and Web of things (WoT) [23–28], sentiment analysis [29–31], community & component analysis, question & answering [25,32], network metrics [25,26,33], machine learning applications [25,27,30,31], metaheuristics [28,33,34] and data visualization [24]. The main contributions of each work are briefly summarized below.
The Age of Snippet Programming: Toward Understanding Developer Communities in Stack Overflow and Reddit
2023, ACM Web Conference 2023 - Companion of the World Wide Web Conference, WWW 2023
Guillermo Blanco González is Ph.D. student of Computer Science of the University of Vigo. He is currently developing advanced computational methods for modelling social dynamics, namely in biological ecosystems.
Roi Pérez López is a Master student of the Master in Computer Science of the University of Vigo. His main research interests include text mining, sentiment analysis, and topic modelling.
Florentino Fdez-Riverola is a Full Professor of the Department of Computer Science at the University of Vigo (Spain) and Coordinator of the New Generation Computer Systems group (SING, http://sing-group.org), which is dedicated to the research and development of cutting-edge computational methodologies and applications.
Anália Maria Garcia Lourenço is a faculty member of the Department of Computer Science and a researcher affiliated to the Biomedical Research Centre (CINBIO), at the University of Vigo and the Centre of Biological Engineering, at the University of Minho. Her main research interests include computational intelligence, bioinformatics and systems biology.