Mining Bug Databases

Meet iBUGS: a benchmark for defect localization

Researchers have proposed a number of tools for automatic bug localization. Given a program and a description of the failure, such tools pinpoint a set of statements that are most likely to contain the bug. Evaluating these tools is a difficult task because existing benchmarks are limited in size of subjects and number of bugs.

The benchmark: 401 real bugs in 124kLOC. More than 2000 test cases.

We developed an approach that semiautomatically extracts benchmarks for bug localization from the history of a project. The result is the iBUGS dataset, a benchmark with real bugs for large test subjects (AspectJ, Rhino).

The iBUGS Repository – 401 bugs in AspectJ and Rhino
Extraction of Bug Localization Benchmarks from History – ASE 2007
Extraction of Bug Localization Benchmarks from History – extended TR

Identification of bug-introducing changes

Bug-fixes are widely used for predicting bugs or finding risky parts of software. However, a bug-fix does not contain information about the change that initially introduced the bug. Such bug-introducing changes can help identifying important properties of software bugs such as correlated factors or causalities. For example, they reveal which developers or what kinds of source code changes introduce more bugs.

Don’t program on Fridays!

In contrast to bug-fixes that are relatively easy to obtain, the extraction of bugintroducing changes is challenging. We developed algorithms to automatically and accurately identify bug-introducing changes.

When do Changes Induce Fixes? On Fridays – MSR 2005
HATARI: Raising Risk Awareness – ESEC/SIGSOFT FSE 2005
Automatic Identification of Bug-Introducing Changes – ASE 2006