In software development, the resources for quality assurance (QA) are typically limited. A common practice among managers is resource allocation that is to direct the QA effort to those parts of a system that are expected to have most defects. Our research helps to predict the most defect-prone parts of a software and supports managers with resource allocation.
Relation between dependencies and defects
At Microsoft, we explored the relation between dependencies and defects. We found out that the more complex the dependencies of a component are, the more defects it will have. In addition, the presence of cyclic dependencies increases the number of defects. The more important (central) a binary is in the dependency graph, the more defects it will have. We also observed a domino effect for binaries:
We built prediction models that successfully identified the most defect-prone parts of Windows Server 2003.
Predicting Subsystem Defects using Dependency Graph Complexities – 
Predicting Defects using Network Analysis on Dependency Graphs – 
Program Dependencies and the Domino Effect – in submission
 ISSRE 2007  ICSE 2008
Defect prediction for open-source projects
For Eclipse, we discovered that the defect-proneness of a component depends on the packages and classes that are used. For example using compiler packages is more defect-prone than using UI packages. We built prediction models for defects from this information.
Typically, usage relationships between components are defined in the design phase; thus, designers can easily explore and assess design alternatives in terms of expected quality.
For seven open-source projects, we observed that defects do not occur in isolation, but rather in bursts of several related defects. Therefore, we cache locations that are likely to have defects: starting from the location of a known (fixed) defects, we cache the location itself, any locations changed together with the fault, recently added locations, and recently changed locations.
By consulting the cache at the moment a defect is fixed, a developer can detect likely defect-prone locations.
Eclipse defect data!
We have mined the Eclipse bug and version databases to map defects to Eclipse components (packages and files). The resulting data set lists the defect density of all Eclipse components for release 2.0, 2.1, and 3.0.
As we demonstrate in three simple experiments, the bug data set can be easily used to relate code, process, and developers to defects and to build prediction models for software defects. The dataset is publicly available for download and use. The next step is yours!