Codebook: Discovering and Exploiting Relationships in Software Repositories – ICSE 2010

by Andrew Begel, Khoo Yit Phang, Thomas Zimmermann

Large-scale software engineering requires communication and collaboration to successfully build and ship products. We conducted a survey with Microsoft engineers on inter-team coordination and found that the most impactful problems concerned finding and keeping track of other engineers. Since engineers are connected by their shared work, a tool that discovers connections in their work-related repositories can help.
 
Here we describe the Codebook framework for mining software repositories. It is flexible enough to address all of the problems identified by our survey with a single data structure (graph of people and artifacts) and a single algorithm (regular language reachability). Codebook handles a larger variety of problems than prior work, analyzes more kinds of work artifacts, and can be customized by and for end-users. To evaluate our framework's flexibility, we built two applications, Hoozizat and Deep Intellisense. We evaluated these applications with engineers to show effectiveness in addressing multiple inter-team coordination problems.

Download as PDF.

Reference

Andrew Begel, Khoo Yit Phang, Thomas Zimmermann. Codebook: Discovering and Exploiting Relationships in Software Repositories. In Proceedings of the 32th International Conference on Software Engineering (ICSE 2010), Cape Town, South Africa, May 2010.

BibTeX Entry

@inproceedings{begel-icse-2010,
    title = "Codebook: Discovering and Exploiting Relationships in Software Repositories",
    author = "Andrew Begel and Khoo Yit Phang and Thomas Zimmermann",
    year = "2010",
    month = "May",
    booktitle = "Proceedings of the 32th International Conference on Software Engineering",
    location = "Cape Town, South Africa",
}