Publications

The list below shows only recent publications. Please visit here for a complete list of publications.

Software development is a data rich activity with many sophisticated metrics. Yet engineers often lack the tools and techniques necessary to leverage these potentially powerful information resources toward decision making. In this paper, we present the data and analysis needs of professional software engineers, which we identified among 110 developers and managers in a survey. We asked about their decision making process, their needs for artifacts and indicators, and scenarios in which they would use analytics.

The survey responses lead us to propose several guidelines for analytics tools in software development including: Engineers do not necessarily have much expertise in data analysis; thus tools should be easy to use, fast, and produce concise output. Engineers have diverse analysis needs and consider most indicators to be important; thus tools should at the same time support many different types of artifacts and many indicators. In addition, engineers want to drill down into data based on time, organizational structure, and system architecture.

Co-author: Raymond P.L. Buse (The University of Virginia)

{ 0 comments }

Fixing bugs is an important part of the software development process. An underlying aspect is the effectiveness of fixes: if a fair number of fixed bugs are reopened, it could indicate instability in the software system. To the best of our knowledge there has been on little prior work on understand-ing the dynamics of bug reopens. Towards that end, in this paper, we characterize when bug reports are reopened by using the Microsoft Windows operating system project as an empirical case study. Our analysis is based on a mixed-methods approach. First, we categorize the primary reasons for reopens based on a survey of 358 Microsoft employees. We then reinforce these results with a large-scale quantitative study of Windows bug reports, focusing on factors related to bug report edits and relationships between people involved in handling the bug. Finally, we build statistical models to de-scribe the impact of various metrics on reopening bugs ranging from the reputation of the opener to how the bug was found.

Co-authors: Nachiappan Nagappan (Microsoft Research), Philip J. Guo (Stanford University), and Brendan Murphy (Microsoft Research)

{ 2 comments }

Over the next ten years, collaboration in software engineering will change in a number of ways and research will need to shift its focus to enable and enhance such collaboration. Specifically, we claim that software in the small will become more popular and even large software will be built by fewer people due to better tools. For large projects, research will need to address the collaboration needs of project members other than just developers, including quality assurance engineers, build engineers, architects, and operations managers. Finally, code reuse and sharing will change as a result of a growing software remix culture, leading to more loosely coupled and indirect collaboration.

[click for more details...]

{ 0 comments }

The practices of industrial and academic data mining are very different. These differences have significant implications for (a) how we manage industrial data mining projects; (b) the direction of academic studies in data mining; and (c) training programs for engineers who seek to use data miners in an industrial setting.

[click for more details...]

{ 0 comments }

The high cost of software maintenance necessitates methods to improve the efficiency of the maintenance process. Such methods typically need a vast amount of knowledge about a system, which is often mined from software repositories. Collecting this data becomes a challenge if the system was developed using multiple code branches.

In this paper we present an integration resolution algorithm that facilitates data collection across multiple code branches. The algorithm tracks code integrations across different branches and associates code changes in the main development branch with corresponding changes in other branches. We provide evidence for the practical relevance of this algorithm during the develop-ment of the Windows Vista Service Pack 2.

[click for more details...]

{ 1 comment }

Data miners can infer rules showing how to improve either (a) the effort estimates of a project or (b) the defect predictions of a software module. Such studies often exhibit conclusion instability regarding what is the most effective action for different projects or modules.

This instability can be explained by data heterogeneity. We show that effort and defect data contain many local regions with markedly different properties to the global space. In other words, what appears to be useful in a global context is often irrelevant for particular local contexts.

This result raises questions about the generality of conclusions from empirical SE. At the very least, SE researchers should test if their supposedly general conclusions are valid within subsets of their data. At the very most, empirical SE should become a search for local regions with similar properties (and conclusions should be constrained to just those regions).

[click for more details...]

{ 1 comment }

Failures after the release of software products are expensive and time-consuming to fix. Each of these failures has different reasons pointing into different portions of code. We conduct a retrospective analysis on bugs reported after beta release of Eclipse versions. Our objective is to investigate what went wrong during the development process. We identify six in-process metrics that have explanatory effects on beta-release bugs. We conduct statistical analyses to check relationships between files and metrics. Our results show that files with beta-release bugs have different characteristics in terms of in-process metrics. Those bugs are specifically concentrated on Eclipse files with little activity: few edits by few committers. We suggest that in-process metrics should be investigated individually to identify beta-release bugs. Companies may benefit from such a retrospective analysis to understand characteristics of failures. Corrective actions can be taken earlier in the process to avoid similar failures in future releases.

[click for more details...]

{ 1 comment }