Abstract
While open-source software has become ubiquitous, its sustainability is in question: without a constant supply of contributor effort, open-source projects are at risk. While prior work has extensively studied the motivations of open-source contributors in general, relatively little is known about how people choose which project to contribute to, beyond personal interest. This question is especially relevant in transparent social coding environments like GitHub, where visible cues on personal profile and repository pages, known as signals, are known to impact impression formation and decision making. In this paper, we report on a mixed-methods empirical study of the signals that influence the contributors' decision to join a GitHub project. We first interviewed 15 GitHub contributors about their project evaluation processes and identified the important signals they used, including the structure of the README and the amount of recent activity. Then, we proceeded quantitatively to test out the impact of each signal based on the data of 9,977 GitHub projects. We reveal that many important pieces of information lack easily observable signals, and that some signals may be both attractive and unattractive. Our findings have direct implications for open-source maintainers and the design of social coding environments, e.g., features to be added to facilitate better project searching experience.
- Wissam Abdallah, Marc Goergen, and Noel O'Sullivan. 2015. Endogeneity: How failure to correct for it can cause wrong inferences and some remedies. British Journal of Management, Vol. 26, 4 (2015), 791--804.Google ScholarCross Ref
- Jason Abrevaya, Jerry A Hausman, and Shakeeb Khan. 2010. Testing for causal effects in a generalized regression model with endogenous regressors. Econometrica, Vol. 78, 6 (2010), 2043--2061.Google ScholarCross Ref
- George A Akerlof. 1978. The market for “lemons”: Quality uncertainty and the market mechanism. In Uncertainty in Economics. Elsevier, 235--251.Google Scholar
- Guilherme Avelino, Leonardo Passos, Andre Hora, and Marco Tulio Valente. 2016. A novel approach for estimating truck factors. In Proceedings of the International Conference on Program Comprehension (ICPC). IEEE, 1--10.Google ScholarCross Ref
- Saeideh Bakhshi, Partha Kanuparthy, and David A. Shamma. 2015. Understanding Online Reviews: Funny, Cool or Useful?. In Proceedings of the ACM Conference on Computer Supported Cooperative Work & Social Computing (CSCW). ACM, 1270--1276.Google Scholar
- Sogol Balali, Igor Steinmacher, Umayal Annamalai, Anita Sarma, and Marco Aurelio Gerosa. 2018. Newcomers' Barriers... Is That All? An Analysis of Mentors' and Newcomers' Barriers in OSS Projects. Proceedings of the ACM Conference on Computer Supported Cooperative Work & Social Computing (CSCW), 1--36.Google ScholarDigital Library
- Andrew Begel, Jan Bosch, and Margaret-Anne Storey. 2013. Social networking meets software development: Perspectives from GitHub, MSDN, Stack Exchange, and Topcoder. IEEE Software 1 (2013), 52--66.Google ScholarDigital Library
- Kelly Blincoe, Jyoti Sheoran, Sean Goggins, Eva Petakovic, and Daniela Damian. 2016. Understanding the popular users: Following, affiliation influence and leadership on GitHub. Information and Software Technology, Vol. 70 (2016), 30--39.Google ScholarDigital Library
- Richard Blundell and James L Powell. 2003. Endogeneity in nonparametric and semiparametric regression models. Econometric Society Monographs, Vol. 36 (2003), 312--357.Google Scholar
- Richard W Blundell and James L Powell. 2004. Endogeneity in semiparametric binary response models. The Review of Economic Studies, Vol. 71, 3 (2004), 655--679.Google ScholarCross Ref
- Hudson Borges and Marco Tulio Valente. 2018. What's in a GitHub star? Understanding repository starring practices in a social coding platform. Journal of Systems and Software, Vol. 146 (2018), 112--129.Google ScholarCross Ref
- Margaret Burnett, Simone Stumpf, Jamie Macbeth, Stephann Makri, Laura Beckwith, Irwin Kwan, Anicia Peters, and William Jernigan. 2016. GenderMag: A method for evaluating software's gender inclusiveness. Interacting with Computers, Vol. 28, 6 (2016), 760--787.Google ScholarCross Ref
- Andrea Capiluppi, Alexander Serebrenik, and Leif Singer. 2013. Assessing technical candidates on the social web. IEEE Software, Vol. 30, 1 (2013), 45--51.Google ScholarDigital Library
- Jailton Coelho and Marco Tulio Valente. 2017. Why modern open source projects fail. In Proceedings of the Joint Meeting on Foundations of Software Engineering (ESEC/FSE). ACM, 186--196.Google ScholarDigital Library
- Jacob Cohen, Patricia Cohen, Stephen G West, and Leona S Aiken. 2013. Applied multiple regression/correlation analysis for the behavioral sciences .Routledge.Google Scholar
- Benjamin C. Collier and Robert Hampshire. 2010. Sending Mixed Signals: Multilevel Reputation Effects in Peer-to-peer Lending Markets. In Proceedings of the ACM Conference on Computer Supported Cooperative Work & Social Computing (CSCW). ACM, 197--206.Google Scholar
- Kevin Crowston, Kangning Wei, James Howison, and Andrea Wiggins. 2012. Free/Libre open-source software development: What we know and what we do not know. ACM Computing Surveys (CSUR), Vol. 44, 2 (2012), 7.Google ScholarDigital Library
- Laura Dabbish, Colleen Stuart, Jason Tsay, and Jim Herbsleb. 2012. Social coding in GitHub: transparency and collaboration in an open software repository. In Proceedings of the ACM Conference on Computer Supported Cooperative Work (CSCW). ACM, 1277--1286.Google ScholarDigital Library
- Cristian Danescu-Niculescu-Mizil, Moritz Sudhof, Daniel Jurafsky, Jure Leskovec, and Christopher Potts. 2013. A computational approach to politeness with application to social factors. meeting of the association for computational linguistics, Vol. 1 (2013), 250--259.Google Scholar
- Cristian Danescu-Niculescu-Mizil, Moritz Sudhof, Dan Jurafsky, Jure Leskovec, and Christopher Potts. 2013. A computational approach to politeness with application to social factors. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), Vol. 1. 250--259.Google Scholar
- Judith Donath. 2007. Signals in social supernets. Journal of Computer-Mediated Communication, Vol. 13, 1 (2007), 231--251.Google ScholarDigital Library
- Steve Easterbrook, Janice Singer, Margaret-Anne Storey, and Daniela Damian. 2008. Selecting empirical methods for software engineering research. In Guide to Advanced Empirical Software Engineering. Springer, 285--311.Google Scholar
- Nadia Eghbal. 2016. Roads and Bridges: The Unseen Labor Behind Our Digital Infrastructure .Ford Foundation.Google Scholar
- K Anders Ericsson and Herbert A Simon. 1980. Verbal reports as data. Psychological Review, Vol. 87, 3 (1980), 215.Google ScholarCross Ref
- J Alberto Espinosa, Jonathon N Cummings, and Cynthia Pickering. 2011. Time separation, coordination, and performance in technical teams. IEEE Transactions on Engineering Management, Vol. 59, 1 (2011), 91--103.Google ScholarCross Ref
- E Michael Foster. 1997. Instrumental variables for logistic regression: an illustration. Social Science Research, Vol. 26, 4 (1997), 487--504.Google ScholarCross Ref
- Matthieu Foucault, Marc Palyart, Xavier Blanc, Gail C Murphy, and Jean-Rémy Falleri. 2015. Impact of developer turnover on quality in open-source software. In Proceedings of the Joint Meeting on Foundations of Software Engineering (ESEC/FSE). ACM, 829--841.Google ScholarDigital Library
- Felipe Fronchetti, Igor Wiese, Gustavo Pinto, and Igor Steinmacher. 2019. What Attract Newcomers to Onboard on OSS Projects? TL;DR: Popularity. In Proceedings of the International Conference on Open Source Systems (OSS). Springer, 91--103.Google ScholarCross Ref
- Daviti Gachechiladze, Filippo Lanubile, Nicole Novielli, and Alexander Serebrenik. 2017. Anger and its direction in collaborative software development. In Proceedings of the International Conference on Software Engineering: New Ideas and Emerging Results Track (ICSE-NIER). IEEE, 11--14.Google ScholarDigital Library
- Andrew Gelman and Jennifer Hill. 2006. Data analysis using regression and multilevel/hierarchical models .Cambridge University Press.Google Scholar
- Chris Gibbs, Daniel Guttentag, Ulrike Gretzel, Jym Morton, and Alasdair Goodwill. 2018. Pricing in the sharing economy: a hedonic pricing model applied to Airbnb listings. Journal of Travel & Tourism Marketing, Vol. 35, 1 (2018), 46--56.Google ScholarCross Ref
- Georgios Gousios and Diomidis Spinellis. 2012. GHTorrent: GitHub's data from a firehose. In Proceedings of the International Conference on Mining Software Repositories (MSR). IEEE, 12--21.Google ScholarCross Ref
- William H Greene. 2003. Econometric analysis .Pearson Education India.Google Scholar
- Tim Guilford and Marian Stamp Dawkins. 1991. Receiver psychology and the evolution of animal signals. Animal Behaviour, Vol. 42, 1 (1991), 1--14.Google ScholarCross Ref
- Zijian Guo and Dylan S Small. 2016. Control function instrumental variable estimation of nonlinear causal effect models. The Journal of Machine Learning Research, Vol. 17, 1 (2016), 3448--3482.Google ScholarDigital Library
- Alexander Hars and Shaosong Ou. 2002. Working for free? Motivations for participating in open-source projects. International Journal of Electronic Commerce, Vol. 6, 3 (2002), 25--39.Google ScholarDigital Library
- Hideaki Hata, Taiki Todo, Saya Onoue, and Kenichi Matsumoto. 2015. Characteristics of sustainable OSS projects: A theoretical and empirical study. In Proceedings of the International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE). IEEE, 15--21.Google ScholarDigital Library
- Jerry A Hausman. 1978. Specification tests in econometrics. Econometrica: Journal of the Econometric Society (1978), 1251--1271.Google Scholar
- Guido Hertel, Sven Niedner, and Stefanie Herrmann. 2003. Motivation of software developers in Open Source projects: an Internet-based survey of contributors to the Linux kernel. Research Policy, Vol. 32, 7 (2003), 1159--1177.Google ScholarCross Ref
- Giuseppe Iaffaldano, Igor Steinmacher, Fabio Calefato, Marco Gerosa, and Filippo Lanubile. 2019. Why do developers take breaks from contributing to OSS projects? A preliminary analysis. In Proceedings of the International Workshop on Software Health (SoHeal).Google ScholarDigital Library
- Lawrence R James and B Krishna Singh. 1978. An introduction to the logic, assumptions, and basic analytic procedures of two-stage least squares. Psychological Bulletin, Vol. 85, 5 (1978), 1104.Google ScholarCross Ref
- Corey Jergensen, Anita Sarma, and Patrick Wagstrom. 2011. The onion patch: migration in open source ecosystems. In Proceedings of the Joint Meeting on Foundations of Software Engineering (ESEC/FSE). ACM, 70--80.Google ScholarDigital Library
- Michael L Johnson, William Crown, Bradley C Martin, Colin R Dormuth, and Uwe Siebert. 2009. Good research practices for comparative effectiveness research: Analytic methods to improve causal inference from nonrandomized studies of treatment effects using secondary data sources: The ISPOR Good Research Practices for Retrospective Database Analysis Task Force Report--Part III. Value in Health, Vol. 12, 8 (2009), 1062--1073.Google Scholar
- Robbert Jongeling, Proshanta Sarkar, Subhajit Datta, and Alexander Serebrenik. 2017. On negative results when using sentiment analysis tools for software engineering research. Empirical Software Engineering, Vol. 22, 5 (2017), 2543--2584.Google ScholarDigital Library
- Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M German, and Daniela Damian. 2014. The promises and perils of mining GitHub. In Proceedings of the International Conference on Mining Software Repositories (MSR). ACM, 92--101.Google ScholarDigital Library
- Mikko Ketokivi and Cameron N McIntosh. 2017. Addressing the endogeneity dilemma in operations management research: Theoretical, empirical, and pragmatic considerations. Journal of Operations Management, Vol. 52 (2017), 1--14.Google ScholarDigital Library
- Amna Kirmani and Akshay R Rao. 2000. No pain, no gain: A critical review of the literature on signaling unobservable product quality. Journal of Marketing, Vol. 64, 2 (2000), 66--79.Google ScholarCross Ref
- Sandeep Krishnamurthy. 2006. On the intrinsic and extrinsic motivation of free/libre/open source (FLOSS) developers. Knowledge, Technology & Policy, Vol. 18, 4 (2006), 17--39.Google ScholarCross Ref
- Karim R Lakhani and Robert G Wolf. 2003. Why hackers do what they do: Understanding motivation and effort in free/open source software projects. Technical Report 4425-03. MIT.Google Scholar
- Cliff AC Lampe, Nicole Ellison, and Charles Steinfield. 2007. A familiar Face(book): profile elements as signals in an online social network. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI). ACM, 435--444.Google ScholarDigital Library
- Michael J Lee, Bruce Ferwerda, Junghong Choi, Jungpil Hahn, Jae Yun Moon, and Jinwoo Kim. 2013. GitHub developers use rockstars to overcome overflow of news. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI). ACM, 133--138.Google ScholarDigital Library
- Josh Lerner and Jean Tirole. 2002. Some simple economics of open source. The Journal of Industrial Economics, Vol. 50, 2 (2002), 197--234.Google ScholarCross Ref
- Bin Lin, Gregorio Robles, and Alexander Serebrenik. 2017. Developer turnover in global, industrial open source projects: Insights from applying survival analysis. In Proceedings of the International Conference on Global Software Engineering (ICGSE). IEEE, 66--75.Google ScholarDigital Library
- Bin Lin, Fiorella Zampetti, Gabriele Bavota, Massimiliano Di Penta, Michele Lanza, and Rocco Oliveto. 2018. Sentiment Analysis for Software Engineering: How Far Can We Go?. In Proceedings of the International Conference on Software Engineering (ICSE). ACM, 94--104.Google ScholarDigital Library
- Georg JP Link and Debora Jeske. 2017. Understanding Organization and Open Source Community Relations through the Attraction-Selection-Attrition Model. In Proceedings of the International Symposium on Open Collaboration (OpenSym). ACM, 17.Google ScholarDigital Library
- Christine M Liu and Judith S Donath. 2006. Urbanhermes: social signaling with electronic fashion. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI). ACM, 885--888.Google ScholarDigital Library
- Xiao Ma, Jeffery T. Hancock, Kenneth Lim Mingjie, and Mor Naaman. 2017. Self-Disclosure and Perceived Trustworthiness of Airbnb Host Profiles. In Proceedings of the ACM Conference on Computer Supported Cooperative Work & Social Computing (CSCW). ACM, 2397--2409.Google ScholarDigital Library
- Jennifer Marlow and Laura Dabbish. 2013. Activity traces and signals in software developer recruitment and hiring. In Proceedings of the ACM Conference on Computer Supported Cooperative Work (CSCW). ACM, 145--156.Google ScholarDigital Library
- Jennifer Marlow, Laura Dabbish, and Jim Herbsleb. 2013. Impression formation in online peer production: activity traces and personal profiles in GitHub. In Proceedings of the ACM Conference on Computer Supported Cooperative Work (CSCW). ACM, 117--128.Google ScholarDigital Library
- Christopher Mendez, Hema Susmita Padala, Zoe Steine-Hanson, Claudia Hilderbrand, Amber Horvath, Charles Hill, Logan Simpson, Nupoor Patil, Anita Sarma, and Margaret Burnett. 2018. Open Source barriers to entry, revisited: A sociotechnical perspective. In Proceedings of the International Conference on Software Engineering (ICSE). ACM, 1004--1015.Google ScholarDigital Library
- Courtney Miller, David Widder, Christian Kastner, and Bogdan Vasilescu. 2019. Why do People Give Up FLOSSing? A Study of Contributor Disengagement in Open Source. In Proceedings of the International Conference on Open Source Systems (OSS). Springer, 116--129.Google ScholarCross Ref
- Nicole Novielli, Fabio Calefato, and Filippo Lanubile. 2015. The challenges of sentiment detection in the social programmer ecosystem. In Proceedings of the International Workshop on Social Software Engineering (SSE). ACM, 33--40.Google ScholarDigital Library
- Emily Oster. 2019. Unobservable selection and coefficient stability: Theory and evidence. Journal of Business & Economic Statistics, Vol. 37, 2 (2019), 187--204.Google ScholarCross Ref
- Huilian Sophie Qiu, Alexander Nolte, Anita Brown, Alexander Serebrenik, and Bogdan Vasilescu. 2019. Going Farther Together: The Impact of Social Capital on Sustained Participation in Open Source. In Proceedings of the International Conference on Software Engineering (ICSE). IEEE, 688--699.Google ScholarDigital Library
- Gregorio Robles and Jesus M Gonzalez-Barahona. 2006. Contributor turnover in libre software projects. In Proceedings of the International Conference on Open Source Systems (OSS). Springer, 273--286.Google ScholarCross Ref
- N Sadat Shami, Kate Ehrlich, Geri Gay, and Jeffrey T Hancock. 2009. Making sense of strangers' expertise from signals in digital artifacts. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI). ACM, 69--78.Google ScholarDigital Library
- Jyoti Sheoran, Kelly Blincoe, Eirini Kalliamvakou, Daniela Damian, and Jordan Ell. 2014. Understanding watchers on GitHub. In Proceedings of the International Conference on Mining Software Repositories (MSR). ACM, 336--339.Google ScholarDigital Library
- Michael Spence. 1973. Job market signaling. The Quarterly Journal of Economics, Vol. 87, 3 (1973), 355--374.Google ScholarCross Ref
- Michael Spence. 2002. Signaling in retrospect and the informational structure of markets. American Economic Review, Vol. 92, 3 (2002), 434--459.Google ScholarCross Ref
- Igor Steinmacher, Tayana Conte, Marco Aurélio Gerosa, and David Redmiles. 2015. Social barriers faced by newcomers placing their first contribution in open source software projects. In Proceedings of the ACM Conference on Computer Supported Cooperative Work & Social Computing (CSCW). ACM, 1379--1392.Google ScholarDigital Library
- Igor Steinmacher, Tayana Uchoa Conte, Christoph Treude, and Marco Aurélio Gerosa. 2016. Overcoming open source project entry barriers with a portal for newcomers. In Proceedings of the International Conference on Software Engineering (ICSE). ACM, 273--284.Google ScholarDigital Library
- Igor Steinmacher, Marco Aurélio Gerosa, and David Redmiles. 2014. Attracting, onboarding, and retaining newcomer developers in open source software projects. In Workshop on Global Software Development in a CSCW Perspective .Google Scholar
- Igor Steinmacher, Gustavo Pinto, Igor Scaliante Wiese, and Marco Aurélio Gerosa. 2018. Almost there: A study on quasi-contributors in open-source software projects. In Proceedings of the International Conference on Software Engineering (ICSE). IEEE, 256--266.Google ScholarDigital Library
- Anselm Strauss and Juliet M Corbin. 1990. Basics of qualitative research: Grounded theory procedures and techniques. Sage Publications, Inc.Google Scholar
- Joseph V Terza, Anirban Basu, and Paul J Rathouz. 2008. Two-stage residual inclusion estimation: addressing endogeneity in health econometric modeling. Journal of Health Economics, Vol. 27, 3 (2008), 531--543.Google ScholarCross Ref
- Parastou Tourani, Bram Adams, and Alexander Serebrenik. 2017. Code of conduct in open source projects. In Proceedings of the International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 24--33.Google ScholarCross Ref
- Asher Trockman, Shurui Zhou, Christian Kastner, and Bogdan Vasilescu. 2018. Adding Sparkle to Social Coding: An Empirical Study of Repository Badges in the npm Ecosystem. In Proceedings of the International Conference on Software Engineering (ICSE). ACM, 511--522.Google ScholarDigital Library
- Marat Valiev, Bogdan Vasilescu, and James Herbsleb. 2018. Ecosystem-Level Determinants of Sustained Activity in Open-Source Projects: A Case Study of the PyPI Ecosystem. In Proceedings of the Joint Meeting on Foundations of Software Engineering (ESEC/FSE). ACM, 644--655.Google ScholarDigital Library
- Bogdan Vasilescu, Vladimir Filkov, and Alexander Serebrenik. 2015a. Perceptions of Diversity on GitHub: A User Survey. In Proceedings of the International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE). IEEE, 50--56.Google Scholar
- Bogdan Vasilescu, Daryl Posnett, Baishakhi Ray, Mark G. J. van den Brand, Alexander Serebrenik, Premkumar Devanbu, and Vladimir Filkov. 2015b. Gender and tenure diversity in GitHub teams. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI). ACM, 3789--3798.Google Scholar
- Michael R Veall and Klaus F Zimmermann. 1996. Pseudo-R2 measures for some common limited dependent variable models. Journal of Economic Surveys, Vol. 10, 3 (1996), 241--259.Google ScholarCross Ref
- Kazuhiro Yamashita, Yasutaka Kamei, Shane McIntosh, Ahmed E Hassan, and Naoyasu Ubayashi. 2016. Magnet or sticky? Measuring project characteristics from the perspective of developer attraction and retention. Journal of Information Processing, Vol. 24, 2 (2016), 339--348.Google ScholarCross Ref
- Amotz Zahavi. 1975. Mate selection-a selection for a handicap. Journal of theoretical Biology, Vol. 53, 1 (1975), 205--214.Google ScholarCross Ref
- Amotz Zahavi and Avishag Zahavi. 1999. The handicap principle: a missing piece of Darwin's puzzle .Oxford University Press.Google Scholar
- Haiyi Zhu, Robert Kraut, and Aniket Kittur. 2012. Effectiveness of shared leadership in online communities. In Proceedings of the ACM Conference on Computer Supported Cooperative Work & Social Computing (CSCW). ACM, 407--416.Google ScholarDigital Library
- Haiyi Zhu, Amy Zhang, Jiping He, Robert E Kraut, and Aniket Kittur. 2013. Effects of peer feedback on contribution: a field experiment in Wikipedia. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI). ACM, 2253--2262.Google ScholarDigital Library
- Frances Zlotnick. 2017. GitHub Open Source Survey 2017. http://opensourcesurvey.org/2017/. https://doi.org/10.5281/zenodo.806811Google Scholar
Index Terms
- The Signals that Potential Contributors Look for When Choosing Open-source Projects
Recommendations
Why modern open source projects fail
ESEC/FSE 2017: Proceedings of the 2017 11th Joint Meeting on Foundations of Software EngineeringOpen source is experiencing a renaissance period, due to the appearance of modern platforms and workflows for developing and maintaining public code. As a result, developers are creating open source software at speeds never seen before. Consequently, ...
The Value of Engaging with Open Source Communities
OpenSym '17: Proceedings of the 13th International Symposium on Open Collaboration CompanionOrganizational engagement with open source communities has become increasingly common. Open source is becoming a widely accepted and utilized form of innovation and product development. The fact that organizations extend their development efforts to ...
Continued Voluntary Participation Intention in Firm-Participating Open Source Software Projects
Firm participation in open source software OSS development is a noteworthy phenomenon and includes two types of firm-participating OSS projects: community founded developed from an open project and spinout spun out from an information technology firm's ...
Comments