Locating Matching Method Calls by Mining Revision History Data – Bugs 2005

by V. Benjamin Livshits, Thomas Zimmermann

Developing an appropriate fix for a software bug often requires a detailed examination of the code as well as generation of appropriate test cases. However, certain categories of bugs are usually easy to fix. In this paper we focus on bugs that can be corrected with a one-line code change. As it turns out, one-line source code changes very often represent bug fixes. Moreover, a significant fraction of previously known bug categories can be addressed with one-line fixes. Careless use of file manipulation routines, failing to call free to deallocate a data structure, failing to use strncpy instead of strcpy for safer string manipulation, and using tainted character arrays as the format argument of fprintf calls are all well-known types of bugs that can typically be corrected with a one-line change of the program source.
 
This paper proposes an analysis of software revision histories to find highly correlated pairs of method calls that naturally form application-specific useful coding patterns. Potential patterns discovered through revision history mining are passed to a runtime analysis tool that looks for pattern violations. We focus our pattern discovery efforts on matching method pairs. Matching pairs such as hfopen, fclosei, hmalloc, freei, as well as hlock, unlocki-function calls require exact matching: failing to call the second function in the pair or calling one of the two functions twice in a row is an error. We use common bug fixes as a heuristic that allows us to focus on patterns that caused bugs in the past. The user is presented with a choice of patterns to validate at runtime. Dynamically obtained information about which patterns were violated and which ones held at runtime is presented to the user. This combination of revision history mining and dynamic analysis techniques proves effective for both discovering new application-specific patterns and for finding errors when applied to very large programs with many man-years of development and debugging effort behind them.
 
To validate our approach, we analyzed Eclipse, a widelyused, mature Java application consisting of more than 2,900,000 lines of code. By mining revision histories, we have discovered a total of 32 previously unknown highly application-specific matching method pairs. Out of these, 10 were dynamically confirmed as valid patterns and a total of 107 previously unknown bugs were found as a result of pattern violations.

Download as PDF.
See also: http://www.softevo.org/

Reference

V. Benjamin Livshits, Thomas Zimmermann. Locating Matching Method Calls by Mining Revision History Data. In PLDI Workshop on the Evaluation of Software Defect Detection Tools (Bugs 2005), Chicago, Illinois, USA, June 2005. Proceedings only distributed at workshop

BibTeX Entry

@inproceedings{livshits-bugs-2005,
    title = "Locating Matching Method Calls by Mining Revision History Data",
    author = "V. Benjamin Livshits and Thomas Zimmermann",
    year = "2005",
    month = "June",
    booktitle = "PLDI Workshop on the Evaluation of Software Defect Detection Tools",
    editors = "Bill Pugh and Jim Larus",
    location = "Chicago, Illinois, USA",
}