ABSTRACT
Creating and running software produces large amounts of raw data about the development process and the customer usage, which can be turned into actionable insight with the help of skilled data scientists. Unfortunately, data scientists with the analytical and software engineering skills to analyze these large data sets have been hard to come by; only recently have software companies started to develop competencies in software-oriented data analytics. To understand this emerging role, we interviewed data scientists across several product groups at Microsoft. In this paper, we describe their education and training background, their missions in software engineering contexts, and the type of problems on which they work. We identify five distinct working styles of data scientists: (1) Insight Providers, who work with engineers to collect the data needed to inform decisions that managers make; (2) Modeling Specialists, who use their machine learning expertise to build predictive models; (3) Platform Builders, who create data platforms, balancing both engineering and data analysis concerns; (4) Polymaths, who do all data science activities themselves; and (5) Team Leaders, who run teams of data scientists and spread best practices. We further describe a set of strategies that they employ to increase the impact and actionability of their work.
- T. Menzies and T. Zimmermann, "Software Analytics: So What?," IEEE Software, vol. 30, no. 4, pp. 31--37, July 2013. Google ScholarDigital Library
- A. Mockus, "Engineering big data solutions.," in Fose '14: Proceedings of the on Future of Software Engineering, Hyderabad, India, 2014. Google ScholarDigital Library
- D. Patil, Building Data Science Teams, O'Reilly, 2011.Google Scholar
- T. H. Davenport, J. G. Harris and R. Morison, Analytics at Work: Smarter Decisions, Better Results, Harvard Business Review Press, 2010.Google Scholar
- A. Simons, "Improvements in Windows Explorer," http://blogs.msdn.com/b/b8/archive/2011/08/29/improvements-in-windows-explorer.aspx, 2011.Google Scholar
- B. Adams, S. Bellomo, C. Bird, T. Marshall-Keim, F. Khomh and K. Moir, "The Practice and Future of Release Engineering: A Roundtable with Three Release Engineers," IEEE Software, vol. 32, no. 2, pp. 42--49, 2015.Google ScholarDigital Library
- D. Fisher, R. DeLine, M. Czerwinski and S. M. Drucker, "Interactions with big data analytics," Interactions, vol. 19, no. 3, pp. 50--59, 2012. Google ScholarDigital Library
- S. Kandel, A. Paepcke, J. Hellerstein and J. Heer, "Enterprise Data Analysis and Visualization: An Interview Study," in IEEE Visual Analytics Science & Technology (VAST), 2012.Google Scholar
- T. H. Davenport and D. Patil, "Data Scientist: The Sexiest Job of the 21st Century," Harvard Business Review, pp. 70--76, OCtober 2012.Google Scholar
- C. O'Neil and R. Schutt, Doing Data Science: Straight Talk from the Frontline, O'Reilly Media, 2013. Google ScholarDigital Library
- J. W. Foreman, Data Smart: Using Data Science to Transform Information into Insight, Wiley, 2013. Google ScholarDigital Library
- T. May, The New Know: Innovation Powered by Analytics, Wiley, 2009. Google ScholarDigital Library
- H. D. Harris, S. P. Murphy and M. Vaisman, Analyzing the Analyzers: An Introspective Survey of Data Scientists and Their Work, O'Reilly, 2013. Google ScholarDigital Library
- Accenture, Most U.S. Companies Say Business Analytics Still Future Goal, Not Present Reality, http://newsroom.accenture.com/article_display.cfm?article_id=4777, 2008.Google Scholar
- A. E. Hassan and T. Xie, "Software intelligence: the future of mining software engineering data," in FOSER '10: Proceedings of the Workshop on Future of Software Engineering Research, 2010. Google ScholarDigital Library
- D. Zhang, Y. Dang, J.-G. Lou, S. Han, H. Zhang and T. Xie, "Software Analytics as a Learning Case in Practice: Approaches and Experiences," in MALETS '11: Proceedings International Workshop on Machine Learning Technologies in Software Engineering, 2011. Google ScholarDigital Library
- R. P. L. Buse and T. Zimmermann, "Analytics for software development," in FOSER '10: Proceedings of the Workshop on Future of Software Engineering Research, 2010. Google ScholarDigital Library
- J.-G. Lou, Q. W. Lin, R. Ding, Q. Fu, D. Zhang and T. Xie, "Software Analytics for Incident Management of Online Services: An Experience Report," in ASE '13: Proceedings of the Internation Conference on Automated Software Engineering, 2013.Google Scholar
- T. Menzies, C. Bird, T. Zimmermann, W. Schulte and E. Kocaganeli, "The Inductive Software Engineering Manifesto: Principles for Industrial Data Mining," in MALETS '11: Proceedings International Workshop on Machine Learning Technologies in Software Engineering, 2011. Google ScholarDigital Library
- D. Zhang and T. Xie, "Software analytics: achievements and challenges," in ICSE '13: Proceedings of the 2013 International Conference on Software Engineering, 2013. Google ScholarDigital Library
- D. Zhang and T. Xie, "Software Analytics in Practice," in ICSE '12: Proceedings of the International Conference on Software Engineering., 2012. Google ScholarDigital Library
- D. Zhang, S. Han, Y. Dang, J.-G. Lou, H. Zhang and T. Xie, "Software Analytics in Practice," IEEE Software, vol. 30, no. 5, pp. 30--37, September 2013. Google ScholarDigital Library
- R. P. Buse and T. Zimmermann, "Information needs for software development analytics," in ICSE '12: Proceedings of 34th International Conference on Software Engineering, 2012. Google ScholarDigital Library
- A. Begel and T. Zimmermann, "Analyze This! 145 Questions for Data Scientists in Software Engineering," in ICSE'14: Proceedings of the 36th International Conference on Software Engineering, Hyderabad, India, 2014. Google ScholarDigital Library
- J. Lin and D. Ryaboy, "Scaling Big Data Mining Infrastructure: The Twitter Experience," SIGKDD Explorations, vol. 14, no. 2, pp. 6--19, April 2013. Google ScholarDigital Library
- A. Thusoo, Z. Shao, S. Anthony, D. Borthakur, N. Jain, J. S. Sen, R. Murhty and H. Liu, "Data Warehousing and Analytics Infrastructure at Facebook," in Proceedings of ACM SIGMOD International Conference on Management of Data, New York, NY, 2010. Google ScholarDigital Library
- R. Sumbaly, J. Kreps and S. Shah, "The Big Data Ecosystem at LinkedIn," in Proceedings of the ACM SIGMOD International Conference on Management of Data, New York, NY, 2013. Google ScholarDigital Library
- V. R. Basili, "Software modeling and measurement: the Goal/Question/Metric paradigm," College Park, MD, USA, 1992.Google Scholar
- V. R. Basili, M. Lindvall, M. Regardie, C. Seaman, J. Heidrich, J. Münch, D. Rombach and A. Trendowicz, "Linking software development and business strategy through measurement.," IEEE Computer, vol. 43, p. 57--65, 2010. Google ScholarDigital Library
- R. Kaplan and D. Norton, "The balanced scorecard---measures that drive performance," Harvard Business Review, pp. 71--80, January/February 1992.Google Scholar
- J. McGarry, D. Card, C. Jones, B. Layman, E. Clark, J. Dean and F. Hall, Practical Software Measurement: Objective Information for Decision Makers, Addison-Wesley Professional, 2001.Google Scholar
- V. R. Basili, "The experience factory and its relationship to other," in ESEC'93: Proceedings of European Software Engineering Conference on Software Engineering, 1993. Google ScholarDigital Library
- C. B. Seaman, "Qualitative Methods," in Guide to Advanced Empirical Software Engineering, F. Shull, J. Singer and D. I. Sjøberg, Eds., Springer, 2008.Google Scholar
- L. Goodman, "Snowball sampling," Annals of Mathematical Statistics, vol. 32, no. 1, p. 148--170, 1961.Google ScholarCross Ref
- S. J. Janis and J. E. Shade, Improving Performance Through Statistical Thinking, ASQ Quality Press, 2000.Google Scholar
- D. Spencer, Card Sorting: Designing Usable Categories, Rosenfeld Media, 2009.Google Scholar
- M. Kim, T. Zimmermann, R. DeLine and A. Begel, "Appendix to The Emerging Role of Data Scientists on Software Development Teams," Microsoft Research. Technical Report. MSR-TR-2016-4. http://research.microsoft.com/apps/pubs/?id=261085, 2016.Google Scholar
- R. K. Yin, Case Study Research: Design and Methods, SAGE Publications, Inc; 5 edition, 2013.Google Scholar
- N. K. Denzin and Y. S. Lincoln, The SAGE Handbook of Qualitative Research, SAGE Publications, Inc; 4 edition, 2011.Google Scholar
- K. Glerum, K. Kinshumann, S. Greenberg, G. Aul, V. Orgovan, G. Nichols, D. Grant, G. Loihle and G. Hunt, "Debugging in the (Very) Large: Ten Years of Implementation and Experience," in SOSP '09: Proceedings of the 22nd ACM Symposium on Operating Systems Principles, 2009. Google ScholarDigital Library
- R. Musson and R. Smith, "Data Science in the Cloud: Analysis of Data from Testing in Production," in TTC '13: Proceedings of the International Workshop on Testing the Cloud, 2013. Google ScholarDigital Library
- S. Lohr, For Big Data Scientists, "Janitor Work" is Key Hurdle to Insights, New York Times, Aug. 17, 2014. http://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html?_r=1.Google Scholar
- P. L. Li, R. Kivett, Z. Zhan, S.-e. Jeon, N. Nagappan, B. Murphy and A. J. Ko, "Characterizing the differences between pre- and post-release versions of software," in ICSE '11: Proceedings of the 33rd International Conference on Software Engineering, 2011. Google ScholarDigital Library
- R. Musson, J. Richards, D. Fisher, C. Bird, B. Bussone and S. Ganguly, "Leveraging the Crowd: How 48,000 Users Helped Improve Lync Performance," IEEE Software, vol. 30, no. 4, pp. 38--45, 2013. Google ScholarDigital Library
- R. Kohavi, R. Longbotham, D. Sommerfield and R. M. Henne, "Controlled experiments on the web: survey and practical guide," Data Mining and Knowledge Discovery, vol. 18, no. 1, pp. 140--181, 2009. Google ScholarDigital Library
- McKinsey Global Institute, Big data: The next frontier for innovation, competition, and productivity, http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation, 2011.Google Scholar
- T. Xie, N. Tillmann, J. d. Halleux and W. Schulte, "Future of developer testing: building quality in code," in FoSER '10 Proceedings of the FSE/SDP workshop on Future of software engineering research, 2010. Google ScholarDigital Library
- M. Hewner, "Undergraduate conceptions of the field of computer science," in ICER '13: Proceedings of the international ACM conference on International computing education research, 2013. Google ScholarDigital Library
- L. A. Sudol and C. Jaspan, "Analyzing the strength of undergraduate misconceptions about software engineering," in ICER '10: Proceedings of the international workshop on Computing education research, 2010. Google ScholarDigital Library
Index Terms
- The emerging role of data scientists on software development teams
Recommendations
Soft skills in software development teams: a survey of the points of view of team leaders and team members
CHASE '15: Proceedings of the Eighth International Workshop on Cooperative and Human Aspects of Software EngineeringBesides technical knowledge and experience, the so-called "soft skills" of team members are also an important factor in software engineering projects. The study of this subject is gaining the attention of researchers and practitioners in recent years. ...
Soft Skills in Software Development Teams: A Survey of the Points of View of Team Leaders and Team Members
CHASE '15: Proceedings of the 2015 IEEE/ACM 8th International Workshop on Cooperative and Human Aspects of Software EngineeringBesides technical knowledge and experience, the so-called "soft skills" of team members are also an important factor in software engineering projects. The study of this subject is gaining the attention of researchers and practitioners in recent years. ...
Data scientists in software teams: state of the art and challenges
ICSE '18: Proceedings of the 40th International Conference on Software EngineeringThe demand for analyzing large scale telemetry, machine, and quality data is rapidly increasing in software industry. Data scientists are becoming popular within software teams. For example, Face-book, LinkedIn and Microsoft are creating a new career ...
Comments