The Lenses of Empirical Software Engineering – ESEM 2015 Keynote

I presented the following keynote at the International Symposium on Empirical Software Engineering and Measurement (ESEM 2015).

We live in the golden age of data. In industry, data science is more popular than ever. People with data science skills are in high demand. In this talk, I will shed light on the emerging roles of data scientists. I will distill some of the lessons learned from doing empirical research at Microsoft as well as observing successful data scientists into what I call the lenses of empirical software engineering. You haven’t seen anything until you’ve seen everything.

Ramp-up Journey of New Hires: Tug of War of Aids and Impediments – ESEM 2015

The slides for the paper Ramp-up Journey of New Hires: Tug of War of Aids and Impediments, presented at ESEM 2015.

Data Ninja III: The Rise of Data Scientists in the Software Industry – SBES 2015 Keynote

I presented the following keynote at the Brazilian Software Engineering Symposium (SBES 2015).

There is a new kid in town. Data scientists now help software teams to infer actionable insights from large amounts of data about the development process and the customer usage. To understand this new role, we interviewed and surveyed data scientists across several product groups at Microsoft. In this talk, I will motivate the need for data analytics, introduce questions for data scientists, and characterize how data scientists work in a large software companies such as Microsoft. I will highlight opportunities for researchers, practitioners, and educators.

Belief & Evidence in Empirical Software Engineering – ICSE 2016

Empirical software engineering has produced a steady stream of evidence-based results concerning the factors that affect important outcomes such as cost, quality, and interval. However, programmers often also have strongly-held a priori opinions about these issues. These opinions are important, since developers are highly-trained professionals whose beliefs would doubtless affect their practice. As in evidence-based medicine, disseminating empirical findings to developers is a key step in ensuring that the findings impact practice. In this paper, we describe a case study, on the prior beliefs of developers at Microsoft, and the relationship of these beliefs to actual empirical data on the projects in which these developers work. Our findings are that a) programmers do indeed have very strong beliefs on certain topics b) their beliefs are primarily formed based on personal experience, rather than on findings in empirical research and c) beliefs can vary with each project, but do not necessarily correspond with actual evidence in that project. Our findings suggest that more effort should be taken to disseminate empirical findings to developers and that more in-depth study the interplay of belief & evidence in software practice is needed.

[click for more details…]

The Emerging Role of Data Scientists on Software Development Teams – ICSE 2016

Creating and running software produces large amounts of raw data about the development process and the customer usage, which can be turned into actionable insight with the help of skilled data scientists. Unfortunately, data scientists with the analytical and software engineering skills to analyze these large data sets have been hard to come by; only recently have software companies started to develop competencies in software-oriented data analytics. To understand this emerging role, we interviewed data scientists across several product groups at Microsoft. In this paper, we describe their education and training background, their missions in software engineering contexts, and the type of problems on which they work. We identify five distinct working styles of data scientists: (1) Insight Providers, who work with engineers to collect the data needed to inform decisions that managers make; (2) Modeling Specialists, who use their machine learning expertise to build predictive models; (3) Platform Builders, who create data platforms, balancing both engineering and data analysis concerns; (4) Polymaths, who do all data science activities themselves; and (5) Team Leaders, who run teams of data scientists and spread best practices. We further describe a set of strategies that they employ to increase the impact and actionability of their work.

[click for more details…]

Ramp-up Journey of New Hires: Tug of War of Aids and Impediments – ESEM 2015

Hiring top talent is essential for any software company’s success. After joining the company, new hires often spend weeks or months before making any major contribution and attaining the same productivity level as existing employees. We use the term ramp-up journey to refer to this transition of new hires from novice to experts. There can be several factors, such as lack of experience or lack of familiarity with processes unique to the new company, which influence the ramp-up journey. To understand such aids and impediments in the ramp-up journey, we conducted a study by analyzing data extracted from version control systems of eight large and popular product groups in Microsoft with several thousand software developers. In particular, we studied two aspects of the ramp-up journey. First, we studied time taken to make the first check-in into the version control system, an important milestone in the ramp-up journey indicating the first contribution. Second, we analyzed the time taken to reach the same productivity level as existing employees in terms of check-ins. We further augmented our quantitative study with qualitative results derived by surveying 411 professional developers. Our study produced promising results, including factors such as having a mentor, prior knowledge of required skill sets, and proactively asking questions, that could help reduce the ramp-up journey of new hires.

[click for more details…]

What Drives People: Creating Engagement Profiles of Players from Game Log Data – CHI PLAY 2015

A central interest of game designers and game user researchers is to understand why players enjoy their games. While a number of researchers have explored player enjoyment in general, few have talked about methods for enabling designers to understand the players of their specific game. In this paper we explore the creation of engagement profiles of game players based on log data. These profiles take into account the different ways that players engage with the game and highlight patterns associated with active play. We demonstrate our approach by performing a descriptive analysis of the game Forza Motorsport 5 using data from a sample of 1.2 million users of the game and discuss the implications of our findings.

[click for more details…]