Mining Version Archives to Guide Software Changes

by Thomas Zimmermann

We apply data mining to version histories in order to guide programmers along related changes: "Programmers who changed these functions also changed..." Given a set of existing changes, such rules (a) suggest and predict likely further changes, (b) show up coupling that is undetectable by program analysis, and (c) prevent errors due to incomplete changes. Our approach consists of two phases: (1) Preprocessing mirrors a complete version history in a database, and searches for finegrained changes—that are changes on functions rather than on complete files. (2) Mining creates the rules that are used for recommendations. We developed our own mining technique that mines only for matching rules on the fly. Thus we can make up-to-date recommendations very fast. Our evaluation involving eight large open-source projects shows that after an initial change, our ROSE prototype can correctly predict 26% of further files to be changed—and 15% of the precise functions or variables. The topmost three suggestions contain a correct location with a likelihood of 64%.

Download as PDF.

Reference

Thomas Zimmermann. Mining Version Archives to Guide Software Changes. Diploma Thesis, Universität Passau, June 2004.

BibTeX Entry

@mastersthesis{zimmermann-thesis-2004,
    title = "Mining Version Archives to Guide Software Changes",
    author = "Thomas Zimmermann",
    year = "2004",
    month = "June",
    school = "Universität Passau",
    type = "Diploma Thesis",
}