Abstract
Many open-source software projects depend on a few core developers, who take over both the bulk of coordination and programming tasks. They are supported by peripheral developers, who contribute either via discussions or programming tasks, often for a limited time. It is unclear what role these peripheral developers play in the programming and communication efforts, as well as the temporary task-related sub-groups in the projects. We mine code-repository data and mailing-list discussions to model the relationships and contributions of developers in a social network and devise a method to analyze the temporal collaboration structures in communication and programming, learning about the strength and stability of social sub-groups in open-source software projects. Our method uses multi-modal social networks on a series of time windows. Previous work has reduced the network structure representing developer collaboration to networks with only one type of interaction, which impedes the simultaneous analysis of more than one type of interaction. We use both communication and version-control data of open-source software projects and model different types of interaction over time. To demonstrate the practicability of our measurement and analysis method, we investigate 10 substantial and popular open-source software projects and show that, if sub-groups evolve, modeling these sub-groups helps predict the future evolution of interaction levels of programmers and groups of developers. Our method allows maintainers and other stakeholders of open-source software projects to assess instabilities and organizational changes in developer interaction and can be applied to different use cases in organizational analysis, such as understanding the dynamics of a specific incident or discussion.
- [1] . 2008. Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9, 65 (2008), 1981–2014. Google ScholarDigital Library
- [2] . 2012. A spectral algorithm for latent Dirichlet allocation. In Advances in Neural Information Processing Systems (NIPS). Curran Associates, Inc., 926–934. Google ScholarDigital Library
- [3] . 2014. A tensor approach to learning mixed membership community models. J. Mach. Learn. Res. 15, 1 (2014), 2239–2312. Google ScholarDigital Library
- [4] . 2014. Tensor decompositions for learning latent variable models. J. Mach. Learn. Res. 15, 1 (2014), 2773–2832. Google ScholarDigital Library
- [5] . 2021. Do communities in developer interaction networks align with subsystem developer teams? An empirical study of open source systems. In Proceedings of the Joint International Conference on Software and System Processes (ICSSP) and International Conference on Global Software Engineering (ICGSE). IEEE, 61–71.Google ScholarCross Ref
- [6] . 2011. Sociotechnical coordination and collaboration in open source software. In Proceedings of the International Conference on Software Maintenance (ICSM). IEEE, 568–573. Google ScholarDigital Library
- [7] . 2006. Mining email social networks. In Proceedings of the International Workshop Mining Software Repositories (MSR). ACM, 137–143. Google ScholarDigital Library
- [8] . 2008. Latent social structure in open source projects. In Proceedings of the International Symposium on Foundations of Software Engineering (FSE). ACM, 24–35. Google ScholarDigital Library
- [9] . 2015. Merits of organizational metrics in defect prediction: An industrial replication. In Proceedings of the International Conference on Software Engineering (ICSE). IEEE, 89–98. Google ScholarDigital Library
- [10] . 2008. Communication networks in geographically distributed software development. In Proceedings of the International Conference on Computer-Supported Cooperative Work (CSCW). ACM, 579–588. Google ScholarDigital Library
- [11] . 2013. Coordination breakdowns and their impact on development productivity and software failures. IEEE Trans. Softw. Eng. 39, 3 (2013), 343–360. Google ScholarDigital Library
- [12] . 2008. Socio-technical congruence: A framework for assessing the impact of technical and work dependencies on software development productivity. In Proceedings of the International Symposium Empirical Software Engineering and Measurement (ESEM). ACM, 2–11. Google ScholarDigital Library
- [13] . 2006. Identification of coordination requirements: Implications for the design of collaboration and awareness tools. In Proceedings of the International Conference on Computer-Supported Cooperative Work (CSCW). ACM, 353–362. Google ScholarDigital Library
- [14] . 2018. Link prediction on directed networks based on AUC optimization. IEEE Access 6 (2018), 28122–28136.Google ScholarCross Ref
- [15] . 2019. Tensor decomposition for multilayer networks clustering. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). AAAI, 3371–3378. Google ScholarDigital Library
- [16] . 2017. Developer role evolution in open source software ecosystem: An explanatory study on GNOME. J. Comput. Sci. Technol. 32, 2 (2017), 396–414.Google ScholarCross Ref
- [17] . 2016. The mirroring hypothesis: Theory, evidence, and exceptions. Industr. Corpor. Change 25, 5 (2016), 709–738.Google ScholarCross Ref
- [18] . 1968. How do committees invent?Datamation 14, 4 (1968), 28–31.Google Scholar
- [19] . 2017. Core-periphery communication and the success of free/libre open source software projects. J. Internet Serv. Applic. 8, 1 (2017), 10:1–10:11.Google Scholar
- [20] . 2006. The
igraph software package for complex network research. Inter J. Complex Syst. 1695, 5 (2006), 1–9.Google Scholar - [21] . 2018. Dynamic graph summarization: A tensor decomposition approach. Data Mining Knowl. Discov. 32, 5 (2018), 1397–1420. Google ScholarDigital Library
- [22] . 2012. Link prediction and recommendation across heterogeneous social networks. In Proceedings of the International Conference on Data Mining (ICDM). IEEE, 181–190. Google ScholarDigital Library
- [23] . 2013. Evolution of Networks: From Biological Nets to the Internet and WWW. Oxford University Press. Google ScholarDigital Library
- [24] . 2005. Socialization in an open source software community: A socio-technical analysis. In Proceedings of the International Conference on Computer-Supported Cooperative Work (CSCW) 14, 4 (2005), 323–368. Google ScholarDigital Library
- [25] . 2011. Multilinear algebra for analyzing data with multiple linkages. In Graph Algorithms in the Language of Linear Algebra.
(Software, Environment, Tools , Vol. 22.) Society for Industrial and Applied Mathematics (SIAM), 85–114.Google ScholarCross Ref - [26] . 2015. Testing and modeling dependencies between a network and nodal attributes. J. Amer. Statist. Assoc. 110, 511 (2015), 1047–1056.Google ScholarCross Ref
- [27] . 2015. Impact of developer turnover on quality in open-source software. In Proceedings of the European Software Engineering Conference and the International Symposium Foundations of Software Engineering (ESEC/FSE). ACM, 829–841. Google ScholarDigital Library
- [28] . 1998. Detection of logical coupling based on product release history. In Proceedings of the International Conference on Software Maintenance (ICSM). IEEE, 190–198. Google ScholarDigital Library
- [29] . 2019. Studying multifaceted collaboration of OSS developers and its impact on their bug fixing performance. In Proceedings of the International Workshop Quantitative Approaches to Software Quality (QuASoQ). CEUR Workshop Proceedings, 37–44.Google Scholar
- [30] . 2014. Detecting the community structure and activity patterns of temporal networks: A non-negative tensor factorization approach. PLoS One 9, 1 (2014), 1–13.Google ScholarCross Ref
- [31] . 2014. Assessing the bias in samples of large online networks. Soc. Netw. 38 (2014), 16–27.Google ScholarCross Ref
- [32] . 2006. Location, location, location: How network embeddedness affects project success in open source systems. Manag. Sci.ence 52, 7 (2006), 1043–1056. Google ScholarDigital Library
- [33] . 1999. The geography of coordination: Dealing with distance in R&D work. In Proceedings of the International Conference on Supporting Group Work (GROUP). ACM, 306–315. Google ScholarDigital Library
- [34] . 2013. Communication in open source software development mailing lists. In Proceedings of the International Workshop on Mining Software Repositories (MSR). IEEE, 277–286. Google ScholarDigital Library
- [35] . 2006. Collaboration in software engineering projects: A theory of coordination. In Proceedings of the International Conference on Information Systems (ICIS). Association for Information Systems, 553–568.Google Scholar
- [36] . 2007. Modeling homophily and stochastic equivalence in symmetric relational data. In Advances in Neural Information Processing Systems (NIPS). Curran Associates, Inc., 657–664. Google ScholarDigital Library
- [37] . 2009. Multiplicative latent factor models for description and prediction of social networks. Comput. Math. Organiz. Theor. 15, 4 (2009), 261–272. Google ScholarDigital Library
- [38] . 2011. Hierarchical multilinear models for multiway data. Comput. Statist. Data Anal. 55, 1 (2011), 530–543. Google ScholarDigital Library
- [39] . 2002. Latent space approaches to social network analysis. J. Amer. Statist. Assoc. 97, 460 (2002), 1090–1098.Google ScholarCross Ref
- [40] . 2011. Understanding a developer social network and its evolution. In Proceedings of the International Conference on Software Maintenance (ICSM). IEEE, 323–332. Google ScholarDigital Library
- [41] . 2006. Social dynamics of free and open source team communications. In Proceedings of the International Conference on Open Source Systems (OSS). Springer, 319–330.Google ScholarCross Ref
- [42] . 2020. On the fulfillment of coordination requirements in open-source software projects: An exploratory study. Empir. Softw. Eng. 25, 6 (2020), 4379–4426.Google ScholarDigital Library
- [43] . 2018.
Forecast : Forecasting Functions for Time Series and Linear Models. https://cran.r-project.org/src/contrib/Archive/forecast/forecast_8.4.tar.gz.Google ScholarR package version 8.4. - [44] . 2002. A state space framework for automatic forecasting using exponential smoothing methods. Int. J. Forecast. 18, 3 (2002), 439–454.Google ScholarCross Ref
- [45] . 2016. Extracting information from multiplex networks. Chaos: Interdisc. J. Nonlin. Sci. 26, 6 (2016), 065306.Google ScholarCross Ref
- [46] . 2017. After Woolwich: Analyzing open source communications to understand the interactive and multi-polar dynamics of the arc of conflict. British J. Criminol. 58, 2 (2017), 434–454.Google Scholar
- [47] . 2017. Exploratory data analysis as a foundation of inductive research. Hum. Resour. Manag. Rev. 27, 2 (2017), 265–276.Google Scholar
- [48] . 2005. Collaboration, leadership, control, and conflict negotiation and the netbeans.org open source software development community. In Proceedings of the Hawaii International Conference on System Sciences (HICSS). IEEE, 196b. Google ScholarDigital Library
- [49] . 2011. The onion patch: Migration in open source ecosystems. In Proceedings of the European Software Engineering Conference on and the International Symposium Foundations of Software Engineering (ESEC/FSE). ACM, 70–80. Google ScholarDigital Library
- [50] . 2011. Mining and visualizing developer networks from version control systems. In Proceedings of the International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE). ACM, 24–31. Google ScholarDigital Library
- [51] . 2017. Classifying developers into core and peripheral: An empirical study on count and network metrics. In Proceedings of the International Conference on Software Engineering (ICSE). IEEE, 164–174. Google ScholarDigital Library
- [52] . 2017. Evolutionary trends of developer coordination: A network approach. Empir. Softw. Eng. 22, 4 (2017), 2050–2094. Google ScholarDigital Library
- [53] . 2015. From developer networks to verified communities: A fine-grained approach. In Proceedings of the International Conference on Software Engineering (ICSE). IEEE, 563–573. Google ScholarDigital Library
- [54] . 2015. Understanding the impact of rapid releases on software quality. Empir. Softw. Eng. 20, 2 (2015), 336–373. Google ScholarDigital Library
- [55] . 2005. Higher-order web link analysis using multilinear algebra. In Proceedings of the International Conference on Data Mining (ICDM). IEEE, 8–pp. Google ScholarDigital Library
- [56] . 2009. Matrix factorization techniques for recommender systems. Computer 42, 8 (2009), 30–37. Google ScholarDigital Library
- [57] . 2018. Unpacking team process dynamics and emergent phenomena: Challenges, conceptual advances, and innovative methods. Amer. Psychol. 73, 4 (2018), 576–592.Google ScholarCross Ref
- [58] . 1995. Coordination in software development. Commun. ACM 38, 3 (1995), 69–82. Google ScholarDigital Library
- [59] . 2016. Peripheral developer participation in open source projects: An empirical analysis. ACM Trans. Manag. Inf. Syst. 6, 4 (2016), 1–31. Google ScholarDigital Library
- [60] . 2010. Network growth and the spectral evolution model. In Proceedings of the International Conference on Information and Knowledge Management (CIKM). ACM, 739–748. Google ScholarDigital Library
- [61] . 2012. Conway's law revisited: The evidence for a task-based perspective. IEEE Softw. 29, 1 (2012), 90–93. Google ScholarDigital Library
- [62] . 2011. Does socio-technical congruence have an effect on software build success? A study of coordination in a software project. IEEE Trans. Softw. Eng. 37, 3 (2011), 307–324. Google ScholarDigital Library
- [63] . 2003. From a firm-based to a community-based model of knowledge creation: The case of the Linux kernel development. Organiz. Sci. 14, 6 (2003), 633–649. Google ScholarDigital Library
- [64] . 2008. Microscopic evolution of social networks. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD). ACM, 462–470. Google ScholarDigital Library
- [65] . 2015.
rTensor : Tools for Tensor Analysis and Decomposition. https://cran.r-project.org/src/contrib/Archive/rTensor/rTensor_1.3.tar.gz.Google ScholarR package version 1.3. - [66] . 2017. Developer turnover in global, industrial open source projects: Insights from applying survival analysis. In Proceedings of the International Conference on Global Software Engineering (ICGSE). IEEE, 66–75. Google ScholarDigital Library
- [67] . 2017. Starting open-source collaborative innovation: The antecedents of network formation in community source. Inf. Syst. J. 27, 5 (2017), 643–670.Google ScholarCross Ref
- [68] . 2008. AUC: A misleading measure of the performance of predictive distribution models. Glob. Ecol. Biogeog. 17, 2 (2008), 145–151.Google ScholarCross Ref
- [69] . 2006. Applying social network analysis techniques to community-driven libre software projects. Int. J. Inf. Technol. Web Eng. 1 (2006), 28–50.Google ScholarCross Ref
- [70] . 2012. Exponential Random Graph Models for Social Networks: Theory, Methods, and Applications. (
Structural Analysis in the Social Sciences , Vol. 35.) Cambridge University Press.Google ScholarCross Ref - [71] . 1990. What is coordination theory and how can it help design cooperative work systems? In Proceedings of the International Conference on Computer-Supported Cooperative Work (CSCW). ACM, 357–370. Google ScholarDigital Library
- [72] . 2020. On the relationship between design discussions and design quality: A case study of Apache projects. In Proceedings of the European Software Engineering Conference and the International Symposium on Foundations of Software Engineering (ESEC/FSE). ACM, 543–555. Google ScholarDigital Library
- [73] . 2021. In search of socio-technical congruence: A large-scale longitudinal study. IEEE Trans. Softw. Eng. (2021). Retrieved from https://www.computer.org/csdl/journal/ts/5555/01/09436025/1tJsglfkGru.Google ScholarCross Ref
- [74] . 2011. Socio-technical developer networks: Should we trust our measurements? In Proceedings of the International Conference on Software Engineering (ICSE). ACM, 281–290. Google ScholarDigital Library
- [75] . 2011. Link prediction via matrix factorization. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD). Springer, 437–452. Google ScholarDigital Library
- [76] . 2002. Two case studies of open source software development: Apache and Mozilla. ACM Trans. Softw. Eng. Methodol. 11, 3 (2002), 309–346. Google ScholarDigital Library
- [77] . 2008. The influence of organizational structure on software quality: An empirical case study. In Proceedings of the International Conference on Software Engineering (ICSE). ACM, 521–530. Google ScholarDigital Library
- [78] . 2007. Membership herding and network stability in the open source community: The Ising perspective. Manag. Sci. 53, 7 (2007), 1086–1101. Google ScholarDigital Library
- [79] . 2012. Characterizing key developers: A case study with Apache Ant. In Proceedings of the International Conference on Collaboration and Technology (CRIWG). Springer, 97–112. Google ScholarDigital Library
- [80] . 2007. The emergence of governance in an open source community. Acad. Manag. J. 50, 5 (2007), 1079–1106.Google ScholarCross Ref
- [81] . 2021. Predicting the emergence of community smells using socio-technical metrics: A machine-learning approach. J. Syst. Softw. 171 (2021), 110847.Google ScholarCross Ref
- [82] . 1972. On the criteria to be used in decomposing systems into modules. Commun. ACM 15, 12 (1972), 1053–1058. Google ScholarDigital Library
- [83] . 2008. What dynamic network metrics can tell us about developer roles. In Proceedings of the International Workshop Cooperative and Human Aspects of Software Engineering (CHASE). ACM, 81–84. Google ScholarDigital Library
- [84] . 2014. Modeling relational events: A case study on an open source software project. Organiz. Res. Meth. 17, 1 (2014), 23–50.Google ScholarCross Ref
- [85] . 2019. The list is the process: Reliable pre-integration tracking of commits on mailing lists. In Proceedings of the International Conference on Software Engineering (ICSE). IEEE, 807–818. Google ScholarDigital Library
- [86] . 2019. A systematic examination of knowledge loss in open source software projects. Int. J. Inf. Manag. 46 (2019), 104–123.Google ScholarCross Ref
- [87] . 2017.
R : A Language and Environment for Statistical Computing.R Foundation for Statistical Computing. http://www.R-project.org/.Google Scholar - [88] . 2011.
pROC : An open-source package forR andS+ to analyze and compare ROC curves. BMC Bioinf. 12, 77 (2011), 1–8.Google Scholar - [89] . 1988. Thirteen ways to look at the correlation coefficient. American Statist. 42, 1 (1988), 59–66.Google ScholarCross Ref
- [90] . 2021. Tensor extrapolation: Forecasting large-scale relational data. J. Oper. Res. Societ. (2021). Retrieved from https://www.tandfonline.com/doi/full/10.1080/01605682.2021.1892460.Google ScholarCross Ref
- [91] . 2003. Spectral methods for analyzing and visualizing networks: An introduction. In Dynamic Social Network Modeling and Analysis: Workshop Summary and Papers. National Academy of Science, 209–228.Google Scholar
- [92] . 2012. How peripheral developers contribute to open-source software development. Inf. Syst. Res. 23, 1 (2012), 144–163. Google ScholarDigital Library
- [93] . 2017. Boundary spanners in open source software development: A study of Python email archives. In Proceedings of the Asia-Pacific Software Engineering Conference (APSEC). IEEE, 308–317.Google ScholarCross Ref
- [94] . 2005. Non-negative tensor factorization with applications to statistics and computer vision. In Proceedings of the International Conference on Machine Learning (ICML). ACM, 792–799. Google ScholarDigital Library
- [95] . 2010. On the central role of mailing lists in open source projects: An exploratory study. In New Frontiers in Artificial Intelligence: JSAI-isAI 2009 Workshops. Springer, 91–103. Google ScholarDigital Library
- [96] . 2012. Link prediction on evolving data using tensor factorization. In New Frontiers in Applied Data Mining: PAKDD 2011 International Workshops. Springer, 100–110. Google ScholarDigital Library
- [97] . 2019. Let me in: Guidelines for the successful onboarding of newcomers to open source projects. IEEE Softw. 36, 4 (2019), 41–49.Google ScholarCross Ref
- [98] . 2017. How social and communication channels shape and challenge a participatory culture in software development. IEEE Trans. Softw. Eng. 43, 2 (2017), 185–204. Google ScholarDigital Library
- [99] . 2015. Social debt in software engineering: Insights from industry. J. Internet Serv. Applic. 6, 10 (2015), 1–17.Google Scholar
- [100] . 2013. Organizational social structures for software engineering. ACM Comput. Surv. 46, 1 (2013), 1–35. Google ScholarDigital Library
- [101] . 2019. Discovering community patterns in open-source: A systematic approach and its evaluation. Empir. Softw. Eng. 24, 3 (2019), 1369–1417. Google ScholarDigital Library
- [102] . 2020. Scaling open source communities: An empirical study of the Linux kernel. In Proceedings of the International Conference on Software Engineering (ICSE). ACM, 1222–1234. Google ScholarDigital Library
- [103] . 2010. An empirical study on the structural complexity introduced by core and peripheral developers in free software projects. In Proceedings of the Brazilian Symposium on Software Engineering (SBES). IEEE, 21–29. Google ScholarDigital Library
- [104] . 2010. Analysis of virtual communities supporting OSS projects using social network analysis. Inf. Softw. Technol. 52, 3 (2010), 296–303. Google ScholarDigital Library
- [105] . 2014. Collaboration in open-source projects: Myth or reality? In Proceedings of the International Workshop Mining Software Repositories (MSR). ACM, 304–307. Google ScholarDigital Library
- [106] . 2017. Nonlinear least squares updating of the canonical polyadic decomposition. In Proceedings of the European Signal Processing Conference. (EUSIPCO). IEEE, 663–667.Google ScholarCross Ref
- [107] . 2014. Social metrics included in prediction models on software engineering: A mapping study. In Proceedings of the International Conference on Predicitive Models in Software Engineering (PROMISE). ACM, 72–81. Google ScholarDigital Library
- [108] . 2016. Who is who in the mailing list? Comparing six disambiguation heuristics to identify multiple addresses of a participant. In Proceedings of the International Conference on Software Maintenance and Evolution (ICSME). IEEE, 345–355.Google ScholarCross Ref
- [109] . 2021. Bounding and binding: Trajectories of community-organization emergence following a major disruption. Organiz. Sci. 32, 3 (2021), 824–855.Google ScholarCross Ref
- [110] . 2014. Building it together: Synchronous development in OSS. In Proceedings of the International Conference on Software Engineering (ICSE). ACM, 222–233. Google ScholarDigital Library
- [111] . 2015. Evaluating link prediction methods. Knowl. Inf. Syst. 45, 3 (2015), 751–782. Google ScholarDigital Library
- [112] . 2011. Network analysis of OSS evolution: An empirical study on ArgoUML project. In Proceedings of the International Workshop on Principles of Software Evolution and ERCIM Workshop on Software Evolution (IWPSE-EVOL). ACM, 71–80. Google ScholarDigital Library
- [113] . 2017. On the scalability of Linux kernel maintainers’ work. In Proceedings of the European Software Engineering Conference and the International Symposium Foundations of Software Engineering (ESEC/FSE). ACM, 27–37. Google ScholarDigital Library
- [114] . 2004. Recall, Precision and Average Precision.
Technical Report . University of Waterloo, Waterloo, Canada.Google Scholar - [115] . 2005. Mining version histories to guide software changes. IEEE Trans. Softw. Eng. 31, 6 (2005), 429–445. Google ScholarDigital Library
Index Terms
- Measuring and Modeling Group Dynamics in Open-Source Software Development: A Tensor Decomposition Approach
Recommendations
Self-organization process in open-source software: An empirical study
Software systems must continually evolve to adapt to new functional requirements or quality requirements to remain competitive in the marketplace. However, different software systems follow different strategies to evolve, affecting both the release plan ...
Automatic classification of software artifacts in open-source applications
MSR '18: Proceedings of the 15th International Conference on Mining Software RepositoriesWith the increasing popularity of open-source software development, there is a tremendous growth of software artifacts that provide insight into how people build software. Researchers are always looking for large-scale and representative software ...
Open Source Software: Lessons from and for Software Engineering
Despite initial suggestions to the contrary, open source software projects exhibit many of the fundamental tenets of software engineering. Likewise, the existence of category-killer apps suggests that conventional software engineering can draw some ...
Comments