Predicting Software Quality

In software development, the resources for quality assurance (QA) are typically limited. A common practice among managers is resource allocation that is to direct the QA effort to those parts of a system that are expected to have most defects. Our research helps to predict the most defect-prone parts of a software and supports managers with resource allocation.

Relation between dependencies and defects

At Microsoft, we explored the relation between dependencies and defects. We found out that the more complex the dependencies of a component are, the more defects it will have. In addition, the presence of cyclic dependencies increases the number of defects. The more important (central) a binary is in the dependency graph, the more defects it will have. We also observed a domino effect for binaries:

Depending on defect-prone binaries increases the likelihood of a defect in a binary.

We built prediction models that successfully identified the most defect-prone parts of Windows Server 2003.

Predicting Subsystem Defects using Dependency Graph Complexities – [1]
Predicting Defects using Network Analysis on Dependency Graphs – [2]
Program Dependencies and the Domino Effect – in submission
[1] ISSRE 2007   [2] ICSE 2008

Defect prediction for open-source projects

For Eclipse, we discovered that the defect-proneness of a component depends on the packages and classes that are used. For example using compiler packages is more defect-prone than using UI packages. We built prediction models for defects from this information.

90% of the 5% components predicted as defect-prone, turned out to be defect-prone.

Typically, usage relationships between components are defined in the design phase; thus, designers can easily explore and assess design alternatives in terms of expected quality.

For seven open-source projects, we observed that defects do not occur in isolation, but rather in bursts of several related defects. Therefore, we cache locations that are likely to have defects: starting from the location of a known (fixed) defects, we cache the location itself, any locations changed together with the fault, recently added locations, and recently changed locations.

The cache selects 10% of the source code files; these files account for 73%-95% of defects.

By consulting the cache at the moment a defect is fixed, a developer can detect likely defect-prone locations.

Predicting Component Failures at Design Time – ISESE 2006
Predicting Faults from Cached History – ICSE 2007

Eclipse defect data!

We have mined the Eclipse bug and version databases to map defects to Eclipse components (packages and files). The resulting data set lists the defect density of all Eclipse components for release 2.0, 2.1, and 3.0.

What is it that makes software fail?

As we demonstrate in three simple experiments, the bug data set can be easily used to relate code, process, and developers to defects and to build prediction models for software defects. The dataset is publicly available for download and use. The next step is yours!

Eclipse Bug Data! Release 2.0, 2007-12-01
Predicting Defects for Eclipse – PROMISE 2007
If Your Bug Database Could Talk… – ISESE 2006 (Short Paper)