skip to main content
research-article

How do Data Science Workers Collaborate? Roles, Workflows, and Tools

Authors Info & Claims
Published:29 May 2020Publication History
Skip Abstract Section

Abstract

Today, the prominence of data science within organizations has given rise to teams of data science workers collaborating on extracting insights from data, as opposed to individual data scientists working alone. However, we still lack a deep understanding of how data science workers collaborate in practice. In this work, we conducted an online survey with 183 participants who work in various aspects of data science. We focused on their reported interactions with each other (e.g., managers with engineers) and with different tools (e.g., Jupyter Notebook). We found that data science teams are extremely collaborative and work with a variety of stakeholders and tools during the six common steps of a data science workflow (e.g., clean data and train model). We also found that the collaborative practices workers employ, such as documentation, vary according to the kinds of tools they use. Based on these findings, we discuss design implications for supporting data science team collaborations and future research directions.

References

  1. Serge Abiteboul, Gerome Miklau, Julia Stoyanovich, and Gerhard Weikum. 2016. Data, responsibly (dagstuhl seminar 16291). In Dagstuhl Reports, Vol. 6. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.Google ScholarGoogle Scholar
  2. Saleema Amershi, Bongshin Lee, Ashish Kapoor, Ratul Mahajan, and Blaine Christian. 2011. Human-guided machine learning for fast and accurate network alarm triage. In Twenty-Second International Joint Conference on Artificial Intelligence .Google ScholarGoogle Scholar
  3. Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N Bennett, Kori Inkpen, et al. 2019. Guidelines for human-AI interaction. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 3.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cecilia Aragon, Clayton Hutto, Andy Echenique, Brittany Fiore-Gartland, Yun Huang, Jinyoung Kim, Gina Neff, Wanli Xing, and Joseph Bayer. 2016. Developing a research agenda for human-centered data science. In Proceedings of the 19th ACM Conference on Computer Supported Cooperative Work and Social Computing Companion. ACM, 529--535.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Kenneth R Baker, Lynn Foster-Johnson, Barry Lawson, and Stephen G Powell. 2006. A survey of MBA spreadsheet users. Spreadsheet Engineering Research Project. Tuck School of Business, Vol. 9 (2006).Google ScholarGoogle Scholar
  6. Andrew Begel. 2008. Effecting change: Coordination in large-scale software development. In Proceedings of the 2008 international workshop on Cooperative and human aspects of software engineering. ACM, 17--20.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Rachel KE Bellamy, Kuntal Dey, Michael Hind, Samuel C Hoffman, Stephanie Houde, Kalapriya Kannan, Pranay Lohia, Jacquelyn Martino, Sameep Mehta, A Mojsilović, et al. 2019. AI Fairness 360: An extensible toolkit for detecting and mitigating algorithmic bias. IBM Journal of Research and Development, Vol. 63, 4/5 (2019), 4--1.Google ScholarGoogle ScholarCross RefCross Ref
  8. Anant Bhardwaj, Souvik Bhattacherjee, Amit Chavan, Amol Deshpande, Aaron J Elmore, Samuel Madden, and Aditya G Parameswaran. 2014. Datahub: Collaborative data science & dataset version management at scale. arXiv preprint arXiv:1409.0798 (2014).Google ScholarGoogle Scholar
  9. Christian Bird, David Pattison, Raissa D'Souza, Vladimir Filkov, and Premkumar Devanbu. 2008. Latent social structure in open source projects. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering. ACM, 24--35.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Chris Bopp, Ellie Harmon, and Amy Voida. 2017. Disempowered by data: Nonprofits, social enterprises, and the consequences of data-driven work. In Proceedings of the 2017 CHI conference on human factors in computing systems. ACM, 3608--3619.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Christine L Borgman, Jillian C Wallis, and Matthew S Mayernik. 2012. Who's got the data? Interdependencies in science and technology collaborations. Computer Supported Cooperative Work (CSCW), Vol. 21, 6 (2012), 485--523.Google ScholarGoogle ScholarCross RefCross Ref
  12. Joel Brandt, Mira Dontcheva, Marcos Weskamp, and Scott R Klemmer. 2010. Example-centric programming: integrating web search into the development environment. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 513--522.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Laurence Brothers, V Sembugamoorthy, and M Muller. 1990. ICICLE: groupware for code inspection. In Proceedings of the 1990 ACM conference on Computer-supported cooperative work. ACM, 169--181.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Matthew Adam Bruckner. 2018. The promise and perils of algorithmic lenders' use of big data. Chi.-Kent L. Rev., Vol. 93 (2018), 3.Google ScholarGoogle Scholar
  15. Murray Campbell, A Joseph Hoane Jr, and Feng-hsiung Hsu. 2002. Deep blue. Artificial intelligence, Vol. 134, 1--2 (2002), 57--83.Google ScholarGoogle Scholar
  16. Rose Chang, Meredith Granger, Alena Bueller, and Taka Shimokobe. 2018. Designing comments. Poster at JupyterCon 2018.Google ScholarGoogle Scholar
  17. Laura Dabbish, Colleen Stuart, Jason Tsay, and Jim Herbsleb. 2012. Social coding in GitHub: transparency and collaboration in an open software repository. In Proceedings of the ACM 2012 conference on computer supported cooperative work. ACM, 1277--1286.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Tommy Dang, Fang Jin, et al. 2018. Predict saturated thickness using tensorboard visualization. In Proceedings of the Workshop on Visualisation in Environmental Sciences. Eurographics Association, 35--39.Google ScholarGoogle Scholar
  19. J Steve Davis. 1996. Tools for spreadsheet auditing. International Journal of Human-Computer Studies, Vol. 45, 4 (1996), 429--442.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Seth Dobrin and IBM Analytics. 2017. How IBM builds an effective data science team. https://venturebeat.com/2017/12/22/how-ibm-builds-an-effective-data-science-team/Google ScholarGoogle Scholar
  21. Jaimie Drozdal, Justin Weisz, Dakuo Wang, Dass Gaurave, Bingsheng Yao, Changruo Zhao, Michael Muller, Lin Ju, and Hui Su. 2020. Exploring Information Needs for Establishing Trust in Automated Data Science Systems. In IUI'20. ACM, in press.Google ScholarGoogle Scholar
  22. Paul Duguid. 2005. "The art of knowing": Social and tacit dimensions of knowledge and the limits of the community of practice. The information society, Vol. 21, 2 (2005), 109--118.Google ScholarGoogle Scholar
  23. Melanie Feinberg. 2017. A design perspective on data. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. ACM, 2952--2963.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Adam Fourney and Meredith Ringel Morris. 2013. Enhancing technical Q&A forums with CiteHistory. In Seventh International AAAI Conference on Weblogs and Social Media .Google ScholarGoogle Scholar
  25. Batya Friedman, Peter H Kahn, Alan Borning, and Alina Huldtgren. 2013. Value sensitive design and information systems. In Early engagement and new technologies: Opening up the laboratory. Springer, 55--95.Google ScholarGoogle Scholar
  26. Megan Garcia. 2016. Racist in the machine: The disturbing implications of algorithmic bias. World Policy Journal, Vol. 33, 4 (2016), 111--117.Google ScholarGoogle ScholarCross RefCross Ref
  27. Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, III Daumeé, Hal, and Kate Crawford. 2018. Datasheets for Datasets. arXiv e-prints, Article arXiv:1803.09010 (Mar 2018), arXiv:1803.09010 pages.arxiv: cs.DB/1803.09010Google ScholarGoogle Scholar
  28. Yolanda Gil, James Honaker, Shikhar Gupta, Yibo Ma, Vito D'Orazio, Daniel Garijo, Shruti Gadewar, Qifan Yang, and Neda Jahanshad. 2019. Towards human-guided machine learning. In Proceedings of the 24th International Conference on Intelligent User Interfaces. ACM, 614--624.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Google. [n.d.] a. Cloud AutoML. https://cloud.google.com/automl/ Retrieved 3-April-2019 fromGoogle ScholarGoogle Scholar
  30. Google. [n.d.] b. Colaboratory. https://colab.research.google.com Retrieved 3-April-2019 fromGoogle ScholarGoogle Scholar
  31. Brian Granger, Chris Colbert, and Ian Rose. 2017. JupyterLab: The next generation jupyter frontend. JupyterCon 2017 (2017).Google ScholarGoogle Scholar
  32. Corrado Grappiolo, Emile van Gerwen, Jack Verhoosel, and Lou Somers. 2019. The Semantic Snake Charmer Search Engine: A Tool to Facilitate Data Science in High-tech Industry Domains. In Proceedings of the 2019 Conference on Human Information Interaction and Retrieval. ACM, 355--359.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Jonathan Grudin. 2009. AI and HCI: Two fields divided by a common focus. Ai Magazine, Vol. 30, 4 (2009), 48--48.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Philip J Guo, Sean Kandel, Joseph M Hellerstein, and Jeffrey Heer. 2011. Proactive wrangling: mixed-initiative end-user programming of data transformation scripts. In Proceedings of the 24th annual ACM symposium on User interface software and technology. ACM, 65--74.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Sara Hajian, Francesco Bonchi, and Carlos Castillo. 2016. Algorithmic bias: From discrimination discovery to fairness-aware data mining. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, 2125--2126.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Christine A Halverson, Jason B Ellis, Catalina Danis, and Wendy A Kellogg. 2006. Designing task visualizations to support the coordination of work in software development. In Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work. ACM, 39--48.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Björn Hartmann, Mark Dhillon, and Matthew K Chan. 2011. HyperSource: bridging the gap between source and code-related web sites. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 2207--2210.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Bob Hayes. 2018. Top 10 challenges to practicing data science at work. http://businessoverbroadway .com/top-10-challengesto-practicing-data-science-at-work.Google ScholarGoogle Scholar
  39. Jeffrey Heer. 2019. Agency plus automation: Designing artificial intelligence into interactive systems. Proceedings of the National Academy of Sciences, Vol. 116, 6 (2019), 1844--1850.Google ScholarGoogle ScholarCross RefCross Ref
  40. Jeffrey Heer and Ben Shneiderman. 2012. Interactive dynamics for visual analysis. Queue, Vol. 10, 2 (2012), 30.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Jeffrey Heer, Fernanda B Viégas, and Martin Wattenberg. 2007. Voyagers and voyeurs: supporting asynchronous collaborative information visualization. In Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, 1029--1038.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. James D Herbsleb, Audris Mockus, Thomas A Finholt, and Rebecca E Grinter. 2001. An empirical study of global software development: distance and speed. In Proceedings of the 23rd international conference on software engineering. IEEE Computer Society, 81--90.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Youyang Hou and Dakuo Wang. 2017. Hacking with NPOs: collaborative analytics and broker roles in civic data hackathons. Proceedings of the ACM on Human-Computer Interaction, Vol. 1, CSCW (2017), 53.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Haiyan Huang and Eileen M Trauth. 2007. Cultural influences and globally distributed information systems development: experiences from Chinese IT professionals. In Proceedings of the 2007 ACM SIGMIS CPR conference on Computer personnel research: The global information technology workforce. ACM, 36--45.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Project Jupyter. [n.d.] a. Jupyter Notebook. https://jupyter.org Retrieved 3-April-2019 fromGoogle ScholarGoogle Scholar
  46. Project Jupyter. [n.d.] b. JupyterLab. https://www.github.com/jupyterlab/jupyterlabGoogle ScholarGoogle Scholar
  47. Kaggle. 2018. Kaggle Data Science Survey 2018. https://www.kaggle.com/sudhirnl7/data-science-survey-2018/ Retrieved 17-September-2019 fromGoogle ScholarGoogle Scholar
  48. Sean Kandel, Andreas Paepcke, Joseph Hellerstein, and Jeffrey Heer. 2011. Wrangler: Interactive visual specification of data transformation scripts. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 3363--3372.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Mary Beth Kery, Bonnie E John, Patrick O'Flaherty, Amber Horvath, and Brad A Myers. 2019. Towards Effective Foraging by Data Scientists to Find Past Analysis Choices. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 92.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Mary Beth Kery, Marissa Radensky, Mahima Arya, Bonnie E John, and Brad A Myers. 2018. The story in the notebook: Exploratory data science using a literate programming tool. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 174.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Miryung Kim, Thomas Zimmermann, Robert DeLine, and Andrew Begel. 2016. The emerging role of data scientists on software development teams. In Proceedings of the 38th International Conference on Software Engineering. ACM, 96--107.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Thomas Kluyver, Benjamin Ragan-Kelley, Fernando Pérez, Brian E Granger, Matthias Bussonnier, Jonathan Frederic, Kyle Kelley, Jessica B Hamrick, Jason Grout, Sylvain Corlay, et al. 2016. Jupyter Notebooks-a publishing format for reproducible computational workflows.. In ELPUB. 87--90.Google ScholarGoogle Scholar
  53. Robert E Kraut and Lynn A Streeter. 1995. Coordination in software development. Commun. ACM, Vol. 38, 3 (1995), 69--82.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Sean Kross and Philip J Guo. 2019. Practitioners Teaching Data Science in Industry and Academia: Expectations, Workflows, and Challenges. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 263.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. George Lawton. 2018. The nine roles you need on your data science research team. TechTarget. https://searchcio.techtarget.com/news/252445605/The-nine-roles-you-need-on-your-data-science-research-team.Google ScholarGoogle Scholar
  56. Chang Han Lee. 2014. Data career paths: Data analyst vs. data scientist vs. data engineer: 3 data careers decoded and what it means for you. Udacity. https://blog.udacity.com/2014/12/data-analyst-vs-data-scientist-vs-data-engineer.html.Google ScholarGoogle Scholar
  57. Q Vera Liao, Daniel Gruen, and Sarah Miller. 2020. Questioning the AI: Informing Design Practices for Explainable AI User Experiences. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. RC Littell, WW Stroup, GA Milliken, RD Wolfinger, and O Schabenberger. 2006. SAS for mixed models 2nd edition. SAS Institute, Cary, North Carolina, USA (2006).Google ScholarGoogle Scholar
  59. Michael Xieyang Liu, Jane Hsieh, Nathan Hahn, Angelina Zhou, Emily Deng, Shaun Burley, Cynthia Taylor, Aniket Kittur, and Brad A Myers. 2019 a. Unakite: Scaffolding Developers' Decision-Making Using the Web. In Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology. ACM, 67--80.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Sijia Liu, Parikshit Ram, Deepak Vijaykeerthy, Djallel Bouneffouf, Gregory Bramble, Horst Samulowitz, Dakuo Wang, Andrew Conn, and Alexander Gray. 2019 b. An ADMM Based Framework for AutoML Pipeline Configuration. arxiv: cs.LG/1905.00424Google ScholarGoogle Scholar
  61. Yaoli Mao, Dakuo Wang, Michael Muller, Kush Varshney, Ioana Baldini, Casey Dugan, and Aleksandra Mojsilovic. 2020. How Data Scientists Work Together With Domain Experts in Scientific Collaborations. In Proceedings of the 2020 ACM conference on GROUP. ACM.Google ScholarGoogle Scholar
  62. Kate Matsudaira. 2015. The science of managing data science. Queue, Vol. 13, 4 (2015), 30.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Ralf Mikut and Markus Reischl. 2011. Data mining tools. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, Vol. 1, 5 (2011), 431--443.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Steven Miller. 2014. Collaborative approaches needed to close the big data skills gap. Journal of Organization design, Vol. 3, 1 (2014), 26--30.Google ScholarGoogle ScholarCross RefCross Ref
  65. Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model cards for model reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM, 220--229.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Michael Muller, Melanie Feinberg, Timothy George, Steven J Jackson, Bonnie E John, Mary Beth Kery, and Samir Passi. 2019 a. Human-Centered Study of Data Science Work Practices. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, W15.Google ScholarGoogle Scholar
  67. Michael Muller, Ingrid Lange, Dakuo Wang, David Piorkowski, Jason Tsay, Q. Vera Liao, Casey Dugan, and Thomas Erickson. 2019 b. How Data Science Workers Work with Data: Discovery, Capture, Curation, Design, Creation. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI '19). ACM, New York, NY, USA, Forthcoming.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Michael Muller and Dakuo Wang. 2018. Explore new features with us. Lab demo at JupyterCon 2018.Google ScholarGoogle Scholar
  69. Michael J Muller. 2001. Layered participatory analysis: New developments in the CARD technique. In Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, 90--97.Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Oded Nov and Chen Ye. 2010. Why do people tag?: motivations for photo tagging. Commun. ACM, Vol. 53, 7 (2010), 128--131.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Judith S Olson, Dakuo Wang, Gary M Olson, and Jingwen Zhang. 2017. How people write together now: Beginning the investigation with advanced undergraduates in a project course. ACM Transactions on Computer-Human Interaction (TOCHI), Vol. 24, 1 (2017), 4.Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Cathy O'neil. 2016. Weapons of math destruction: How big data increases inequality and threatens democracy .Broadway Books.Google ScholarGoogle Scholar
  73. Soya Park, Amy X. Zhang, and David R. Karger. 2018. Post-literate Programming: Linking Discussion and Code in Software Development Teams. In The 31st Annual ACM Symposium on User Interface Software and Technology Adjunct Proceedings (UIST '18 Adjunct). ACM, New York, NY, USA, 51--53. https://doi.org/10.1145/3266037.3266098Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Chris Parnin. 2013. Programmer, interrupted. In 2013 IEEE Symposium on Visual Languages and Human Centric Computing. IEEE, 171--172.Google ScholarGoogle ScholarCross RefCross Ref
  75. Samir Passi and Steven Jackson. 2017. Data vision: Learning to see through algorithmic abstraction. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. ACM, 2436--2447.Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Samir Passi and Steven J Jackson. 2018. Trust in Data Science: Collaboration, Translation, and Accountability in Corporate Data Science Projects. Proceedings of the ACM on Human-Computer Interaction, Vol. 2, CSCW (2018), 136.Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. D.J. Patil. 2011. Building data science teams. Stanford University. http://web.stanford.edu/group/ mmds/slides2012/s-patil1.pdf.Google ScholarGoogle Scholar
  78. Sarah Picard, Matt Watkins, Michael Rempel, and Ashmini Kerodal. [n.d.]. Beyond the Algorithm. ( [n.,d.]).Google ScholarGoogle Scholar
  79. Kathleen H Pine and Max Liboiron. 2015. The politics of measurement and action. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, 3147--3156.Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Tye Rattenbury, Joseph M Hellerstein, Jeffrey Heer, Sean Kandel, and Connor Carreras. 2017. Principles of data wrangling: Practical techniques for data preparation ." O'Reilly Media, Inc.".Google ScholarGoogle Scholar
  81. Adam Rule, Aurélien Tabard, and James D Hollan. 2018. Exploration and explanation in computational notebooks. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 32.Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Katie A Siek, Gillian R Hayes, Mark W Newman, and John C Tang. 2014. Field deployments: Knowing from using in context. In Ways of Knowing in HCI. Springer, 119--142.Google ScholarGoogle Scholar
  83. Manuel Stein, Halldór Janetzko, Daniel Seebacher, Alexander J"ager, Manuel Nagel, Jürgen Hölsch, Sven Kosub, Tobias Schreck, Daniel Keim, and Michael Grossniklaus. 2017. How to make sense of team sport data: From acquisition to data modeling and research aspects. Data, Vol. 2, 1 (2017), 2.Google ScholarGoogle ScholarCross RefCross Ref
  84. Margaret-Anne Storey, Li-Te Cheng, Ian Bull, and Peter Rigby. 2006. Shared waypoints and social tagging to support collaboration in software development. In Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work. ACM, 195--198.Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Charles Sutton, Timothy Hobson, James Geddes, and Rich Caruana. 2018. Data diff: Interpretable, executable summaries of changes in distributions for data wrangling. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2279--2288.Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Bjørnar Tessem and Jon Iden. 2008. Cooperation between developers and operations in software engineering projects. In Proceedings of the 2008 international workshop on Cooperative and human aspects of software engineering. ACM, 105--108.Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. Christoph Treude, Margaret-anne Storey, and Jens Weber. 2009. Empirical studies on collaboration in software development: A systematic literature review. (2009).Google ScholarGoogle Scholar
  88. Michelle Ufford, Matthew Seal, and Kyle Kelley. 2018. Beyond Interactive: Notebook Innovation at Netflix.Google ScholarGoogle Scholar
  89. Manasi Vartak, Harihar Subramanyam, Wei-En Lee, Srinidhi Viswanathan, Saadiyah Husnoo, Samuel Madden, and Matei Zaharia. 2016. M odel DB: a system for machine learning model management. In Proceedings of the Workshop on Human-In-the-Loop Data Analytics. ACM, 14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Nitya Verma and Lynn Dombrowski. 2018. Confronting Social Criticisms: Challenges when Adopting Data-Driven Policing Strategies. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 469.Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. Stijn Viaene. 2013. Data scientists aren't domain experts. IT Professional, Vol. 15, 6 (2013), 12--17.Google ScholarGoogle ScholarCross RefCross Ref
  92. April Yi Wang, Anant Mittal, Christopher Brooks, and Steve Oney. 2019 a. How Data Scientists Use Computational Notebooks for Real-Time Collaboration. In Proceedings of the 2019 CHI Conference Extended Abstracts on Human Factors in Computing Systems. article 39.Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. April Yi Wang, Zihan Wu, Christopher Brooks, and Steve Oney. 2020. Callisto: Capturing the "Why" by Connecting Conversations with Computational Narratives. In Proceedings of the 2020 CHI Conference Extended Abstracts on Human Factors in Computing Systems. in press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. Dakuo Wang, Judith S. Olson, Jingwen Zhang, Trung Nguyen, and Gary M. Olson. 2015. DocuViz: Visualizing Collaborative Writing. In Proceedings of CHI'15. ACM, New York, NY, USA, 1865--1874.Google ScholarGoogle Scholar
  95. Dakuo Wang, Justin D. Weisz, Michael Muller, Parikshit Ram, Werner Geyer, Casey Dugan, Yla Tausczik, Horst Samulowitz, and Alexander Gray. 2019 b. Human-AI Collaboration in Data Science: Exploring Data Scientists' Perceptions of Automated AI. To appear in Computer Supported Cooperative Work (CSCW) (2019).Google ScholarGoogle Scholar
  96. Fei-Yue Wang, Jun Jason Zhang, Xinhu Zheng, Xiao Wang, Yong Yuan, Xiaoxiao Dai, Jie Zhang, and Liuqing Yang. 2016. Where does AlphaGo go: From church-turing thesis to AlphaGo thesis and beyond. IEEE/CAA Journal of Automatica Sinica, Vol. 3, 2 (2016), 113--120.Google ScholarGoogle ScholarCross RefCross Ref
  97. Daniel Weidele, Justin Weisz, Erick Oduor, Michael Muller, Josh Andres, Alexander Gray, and Dakuo Wang. 2020. AutoAIViz: Opening the Blackbox of Automated Artificial Intelligence with Conditional Parallel Coordinates. In IUI'20. ACM, in press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  98. Etienne Wenger. 2011. Communities of practice: A brief introduction. (2011).Google ScholarGoogle Scholar
  99. Marian G Williams and Vivienne Begg. 1993. Translation between software designers and users. Commun. ACM, Vol. 36, 6 (1993), 102--104.Google ScholarGoogle ScholarDigital LibraryDigital Library
  100. Yu Wu, Jessica Kropczynski, Patrick C Shih, and John M Carroll. 2014. Exploring the ecosystem of software developers on GitHub and other platforms. In Proceedings of the companion publication of the 17th ACM conference on Computer supported cooperative work & social computing. ACM, 265--268.Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. Shaoke Zhang, Chen Zhao, Qiang Zhang, Hui Su, Haiyan Guo, Jie Cui, Yingxin Pan, and Paul Moody. 2007. Managing collaborative activities in project management. In Proceedings of the 2007 symposium on Computer human interaction for the management of information technology. ACM, 3.Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. Yunfeng Zhang, Q Vera Liao, and Rachel KE Bellamy. 2020. Effect of Confidence and Explanation on Accuracy and Trust Calibration in AI-Assisted Decision Making. In Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  103. Marc-André Zöller and Marco F Huber. 2019. Survey on Automated Machine Learning. arXiv preprint arXiv:1904.12054 (2019).Google ScholarGoogle Scholar

Index Terms

  1. How do Data Science Workers Collaborate? Roles, Workflows, and Tools

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader