skip to main content
10.1145/2884781.2884783acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article
Public Access

The emerging role of data scientists on software development teams

Published:14 May 2016Publication History

ABSTRACT

Creating and running software produces large amounts of raw data about the development process and the customer usage, which can be turned into actionable insight with the help of skilled data scientists. Unfortunately, data scientists with the analytical and software engineering skills to analyze these large data sets have been hard to come by; only recently have software companies started to develop competencies in software-oriented data analytics. To understand this emerging role, we interviewed data scientists across several product groups at Microsoft. In this paper, we describe their education and training background, their missions in software engineering contexts, and the type of problems on which they work. We identify five distinct working styles of data scientists: (1) Insight Providers, who work with engineers to collect the data needed to inform decisions that managers make; (2) Modeling Specialists, who use their machine learning expertise to build predictive models; (3) Platform Builders, who create data platforms, balancing both engineering and data analysis concerns; (4) Polymaths, who do all data science activities themselves; and (5) Team Leaders, who run teams of data scientists and spread best practices. We further describe a set of strategies that they employ to increase the impact and actionability of their work.

References

  1. T. Menzies and T. Zimmermann, "Software Analytics: So What?," IEEE Software, vol. 30, no. 4, pp. 31--37, July 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Mockus, "Engineering big data solutions.," in Fose '14: Proceedings of the on Future of Software Engineering, Hyderabad, India, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Patil, Building Data Science Teams, O'Reilly, 2011.Google ScholarGoogle Scholar
  4. T. H. Davenport, J. G. Harris and R. Morison, Analytics at Work: Smarter Decisions, Better Results, Harvard Business Review Press, 2010.Google ScholarGoogle Scholar
  5. A. Simons, "Improvements in Windows Explorer," http://blogs.msdn.com/b/b8/archive/2011/08/29/improvements-in-windows-explorer.aspx, 2011.Google ScholarGoogle Scholar
  6. B. Adams, S. Bellomo, C. Bird, T. Marshall-Keim, F. Khomh and K. Moir, "The Practice and Future of Release Engineering: A Roundtable with Three Release Engineers," IEEE Software, vol. 32, no. 2, pp. 42--49, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Fisher, R. DeLine, M. Czerwinski and S. M. Drucker, "Interactions with big data analytics," Interactions, vol. 19, no. 3, pp. 50--59, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Kandel, A. Paepcke, J. Hellerstein and J. Heer, "Enterprise Data Analysis and Visualization: An Interview Study," in IEEE Visual Analytics Science & Technology (VAST), 2012.Google ScholarGoogle Scholar
  9. T. H. Davenport and D. Patil, "Data Scientist: The Sexiest Job of the 21st Century," Harvard Business Review, pp. 70--76, OCtober 2012.Google ScholarGoogle Scholar
  10. C. O'Neil and R. Schutt, Doing Data Science: Straight Talk from the Frontline, O'Reilly Media, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. W. Foreman, Data Smart: Using Data Science to Transform Information into Insight, Wiley, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. May, The New Know: Innovation Powered by Analytics, Wiley, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. H. D. Harris, S. P. Murphy and M. Vaisman, Analyzing the Analyzers: An Introspective Survey of Data Scientists and Their Work, O'Reilly, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Accenture, Most U.S. Companies Say Business Analytics Still Future Goal, Not Present Reality, http://newsroom.accenture.com/article_display.cfm?article_id=4777, 2008.Google ScholarGoogle Scholar
  15. A. E. Hassan and T. Xie, "Software intelligence: the future of mining software engineering data," in FOSER '10: Proceedings of the Workshop on Future of Software Engineering Research, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Zhang, Y. Dang, J.-G. Lou, S. Han, H. Zhang and T. Xie, "Software Analytics as a Learning Case in Practice: Approaches and Experiences," in MALETS '11: Proceedings International Workshop on Machine Learning Technologies in Software Engineering, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. P. L. Buse and T. Zimmermann, "Analytics for software development," in FOSER '10: Proceedings of the Workshop on Future of Software Engineering Research, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J.-G. Lou, Q. W. Lin, R. Ding, Q. Fu, D. Zhang and T. Xie, "Software Analytics for Incident Management of Online Services: An Experience Report," in ASE '13: Proceedings of the Internation Conference on Automated Software Engineering, 2013.Google ScholarGoogle Scholar
  19. T. Menzies, C. Bird, T. Zimmermann, W. Schulte and E. Kocaganeli, "The Inductive Software Engineering Manifesto: Principles for Industrial Data Mining," in MALETS '11: Proceedings International Workshop on Machine Learning Technologies in Software Engineering, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Zhang and T. Xie, "Software analytics: achievements and challenges," in ICSE '13: Proceedings of the 2013 International Conference on Software Engineering, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Zhang and T. Xie, "Software Analytics in Practice," in ICSE '12: Proceedings of the International Conference on Software Engineering., 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. Zhang, S. Han, Y. Dang, J.-G. Lou, H. Zhang and T. Xie, "Software Analytics in Practice," IEEE Software, vol. 30, no. 5, pp. 30--37, September 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. P. Buse and T. Zimmermann, "Information needs for software development analytics," in ICSE '12: Proceedings of 34th International Conference on Software Engineering, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Begel and T. Zimmermann, "Analyze This! 145 Questions for Data Scientists in Software Engineering," in ICSE'14: Proceedings of the 36th International Conference on Software Engineering, Hyderabad, India, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Lin and D. Ryaboy, "Scaling Big Data Mining Infrastructure: The Twitter Experience," SIGKDD Explorations, vol. 14, no. 2, pp. 6--19, April 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Thusoo, Z. Shao, S. Anthony, D. Borthakur, N. Jain, J. S. Sen, R. Murhty and H. Liu, "Data Warehousing and Analytics Infrastructure at Facebook," in Proceedings of ACM SIGMOD International Conference on Management of Data, New York, NY, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. R. Sumbaly, J. Kreps and S. Shah, "The Big Data Ecosystem at LinkedIn," in Proceedings of the ACM SIGMOD International Conference on Management of Data, New York, NY, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. V. R. Basili, "Software modeling and measurement: the Goal/Question/Metric paradigm," College Park, MD, USA, 1992.Google ScholarGoogle Scholar
  29. V. R. Basili, M. Lindvall, M. Regardie, C. Seaman, J. Heidrich, J. Münch, D. Rombach and A. Trendowicz, "Linking software development and business strategy through measurement.," IEEE Computer, vol. 43, p. 57--65, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. R. Kaplan and D. Norton, "The balanced scorecard---measures that drive performance," Harvard Business Review, pp. 71--80, January/February 1992.Google ScholarGoogle Scholar
  31. J. McGarry, D. Card, C. Jones, B. Layman, E. Clark, J. Dean and F. Hall, Practical Software Measurement: Objective Information for Decision Makers, Addison-Wesley Professional, 2001.Google ScholarGoogle Scholar
  32. V. R. Basili, "The experience factory and its relationship to other," in ESEC'93: Proceedings of European Software Engineering Conference on Software Engineering, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. C. B. Seaman, "Qualitative Methods," in Guide to Advanced Empirical Software Engineering, F. Shull, J. Singer and D. I. Sjøberg, Eds., Springer, 2008.Google ScholarGoogle Scholar
  34. L. Goodman, "Snowball sampling," Annals of Mathematical Statistics, vol. 32, no. 1, p. 148--170, 1961.Google ScholarGoogle ScholarCross RefCross Ref
  35. S. J. Janis and J. E. Shade, Improving Performance Through Statistical Thinking, ASQ Quality Press, 2000.Google ScholarGoogle Scholar
  36. D. Spencer, Card Sorting: Designing Usable Categories, Rosenfeld Media, 2009.Google ScholarGoogle Scholar
  37. M. Kim, T. Zimmermann, R. DeLine and A. Begel, "Appendix to The Emerging Role of Data Scientists on Software Development Teams," Microsoft Research. Technical Report. MSR-TR-2016-4. http://research.microsoft.com/apps/pubs/?id=261085, 2016.Google ScholarGoogle Scholar
  38. R. K. Yin, Case Study Research: Design and Methods, SAGE Publications, Inc; 5 edition, 2013.Google ScholarGoogle Scholar
  39. N. K. Denzin and Y. S. Lincoln, The SAGE Handbook of Qualitative Research, SAGE Publications, Inc; 4 edition, 2011.Google ScholarGoogle Scholar
  40. K. Glerum, K. Kinshumann, S. Greenberg, G. Aul, V. Orgovan, G. Nichols, D. Grant, G. Loihle and G. Hunt, "Debugging in the (Very) Large: Ten Years of Implementation and Experience," in SOSP '09: Proceedings of the 22nd ACM Symposium on Operating Systems Principles, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. R. Musson and R. Smith, "Data Science in the Cloud: Analysis of Data from Testing in Production," in TTC '13: Proceedings of the International Workshop on Testing the Cloud, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. S. Lohr, For Big Data Scientists, "Janitor Work" is Key Hurdle to Insights, New York Times, Aug. 17, 2014. http://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html?_r=1.Google ScholarGoogle Scholar
  43. P. L. Li, R. Kivett, Z. Zhan, S.-e. Jeon, N. Nagappan, B. Murphy and A. J. Ko, "Characterizing the differences between pre- and post-release versions of software," in ICSE '11: Proceedings of the 33rd International Conference on Software Engineering, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. R. Musson, J. Richards, D. Fisher, C. Bird, B. Bussone and S. Ganguly, "Leveraging the Crowd: How 48,000 Users Helped Improve Lync Performance," IEEE Software, vol. 30, no. 4, pp. 38--45, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. R. Kohavi, R. Longbotham, D. Sommerfield and R. M. Henne, "Controlled experiments on the web: survey and practical guide," Data Mining and Knowledge Discovery, vol. 18, no. 1, pp. 140--181, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. McKinsey Global Institute, Big data: The next frontier for innovation, competition, and productivity, http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation, 2011.Google ScholarGoogle Scholar
  47. T. Xie, N. Tillmann, J. d. Halleux and W. Schulte, "Future of developer testing: building quality in code," in FoSER '10 Proceedings of the FSE/SDP workshop on Future of software engineering research, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. M. Hewner, "Undergraduate conceptions of the field of computer science," in ICER '13: Proceedings of the international ACM conference on International computing education research, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. L. A. Sudol and C. Jaspan, "Analyzing the strength of undergraduate misconceptions about software engineering," in ICER '10: Proceedings of the international workshop on Computing education research, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The emerging role of data scientists on software development teams

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ICSE '16: Proceedings of the 38th International Conference on Software Engineering
          May 2016
          1235 pages
          ISBN:9781450339001
          DOI:10.1145/2884781

          Copyright © 2016 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 14 May 2016

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate276of1,856submissions,15%

          Upcoming Conference

          ICSE 2025

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader