Branching plays a major role in the development process of large software. Branches provide isolation so that multiple pieces of the software system can be modified in parallel without affecting each other during times of instability. However, branching has its own issues. The need to move code across branches introduces additional overhead and branch use can lead to integration failures due to conflicts or unseen dependencies. Although branches are used extensively in commercial and open source development projects, the effects that different branch strategies have on software quality are not yet well understood. In this paper, we present the first empirical study that evaluates and quantifies the relationship between software quality and various aspects of the branch structure used in a software project. We examine Windows Vista and Windows 7 and compare components that have different branch characteristics to quantify differences in quality. We also examine the effectiveness of two branching strategies – branching according to the software architecture versus branching according to organizational structure. We find that, indeed, branching does have an effect on software quality and that misalignment of branching structure and organizational structure is associated with higher post-release failure rates.
Co-authors: Emad Shihab (Queen's University), Christian Bird (Microsoft Research)
by Tom on February 19, 2012
In this paper we provide a brief introduction to some of the games user research activities at Microsoft. We focus on the analysis of automatically collected game data. We will show how this data can lead to insights about game usage and player progression.
by Tom on January 28, 2012
Software development is a data rich activity with many sophisticated metrics. Yet engineers often lack the tools and techniques necessary to leverage these potentially powerful information resources toward decision making. In this paper, we present the data and analysis needs of professional software engineers, which we identified among 110 developers and managers in a survey. We asked about their decision making process, their needs for artifacts and indicators, and scenarios in which they would use analytics.
The survey responses lead us to propose several guidelines for analytics tools in software development including: Engineers do not necessarily have much expertise in data analysis; thus tools should be easy to use, fast, and produce concise output. Engineers have diverse analysis needs and consider most indicators to be important; thus tools should at the same time support many different types of artifacts and many indicators. In addition, engineers want to drill down into data based on time, organizational structure, and system architecture.
Co-author: Raymond P.L. Buse (The University of Virginia)
by Tom on January 28, 2012
Fixing bugs is an important part of the software development process. An underlying aspect is the effectiveness of fixes: if a fair number of fixed bugs are reopened, it could indicate instability in the software system. To the best of our knowledge there has been on little prior work on understanding the dynamics of bug reopens. Towards that end, in this paper, we characterize when bug reports are reopened by using the Microsoft Windows operating system project as an empirical case study. Our analysis is based on a mixed-methods approach. First, we categorize the primary reasons for reopens based on a survey of 358 Microsoft employees. We then reinforce these results with a large-scale quantitative study of Windows bug reports, focusing on factors related to bug report edits and relationships between people involved in handling the bug. Finally, we build statistical models to describe the impact of various metrics on reopening bugs ranging from the reputation of the opener to how the bug was found.
Co-authors: Nachiappan Nagappan (Microsoft Research), Philip J. Guo (Stanford University), and Brendan Murphy (Microsoft Research)
by Tom on December 10, 2011
Over the next ten years, collaboration in software engineering will change in a number of ways and research will need to shift its focus to enable and enhance such collaboration. Specifically, we claim that software in the small will become more popular and even large software will be built by fewer people due to better tools. For large projects, research will need to address the collaboration needs of project members other than just developers, including quality assurance engineers, build engineers, architects, and operations managers. Finally, code reuse and sharing will change as a result of a growing software remix culture, leading to more loosely coupled and indirect collaboration.
by Tom on November 16, 2011
The practices of industrial and academic data mining are very different. These differences have significant implications for (a) how we manage industrial data mining projects; (b) the direction of academic studies in data mining; and (c) training programs for engineers who seek to use data miners in an industrial setting.
The high cost of software maintenance necessitates methods to improve the efficiency of the maintenance process. Such methods typically need a vast amount of knowledge about a system, which is often mined from software repositories. Collecting this data becomes a challenge if the system was developed using multiple code branches.
In this paper we present an integration resolution algorithm that facilitates data collection across multiple code branches. The algorithm tracks code integrations across different branches and associates code changes in the main development branch with corresponding changes in other branches. We provide evidence for the practical relevance of this algorithm during the develop-ment of the Windows Vista Service Pack 2.