Preprocessing CVS Data for Fine-grained Analysis – MSR 2004

by Thomas Zimmermann, Peter Weißgerber

All analyses of version archives have one phase in common: the preprocessing of data. Preprocessing has a direct impact on the quality of the results returned by an analysis. In this paper we discuss four essential preprocessing tasks necessary for a fine-grained analysis of CVS archives: (a) data extraction, (b) transaction recovery, (c) mapping of changes to fine-grained entities, and (d) data cleaning. We formalize the concept of sliding time windows and show how commit mails can relate revisions to transactions. We also present two approaches that map changes to the affected building blocks of a file, e.g. functions or sections.

Download as PDF.
See also: http://www.softevo.org/

Reference

Thomas Zimmermann, Peter Weißgerber. Preprocessing CVS Data for Fine-grained Analysis. In Proceedings of the First International Workshop on Mining Software Repositories (MSR 2004), Edinburgh, United Kingdom, May 2004, pp. 2-6.

BibTeX Entry

@inproceedings{zimmermann-msr-2004,
    title = "Preprocessing CVS Data for Fine-grained Analysis",
    author = "Thomas Zimmermann and Peter Weißgerber",
    year = "2004",
    month = "May",
    booktitle = "Proceedings of the First International Workshop on Mining Software Repositories",
    location = "Edinburgh, United Kingdom",
    pages = "2--6",
}