Mining Version Archives

Recommendation systems for software development

If you browse the books at Amazon or a similar shop, you may have encountered suggestions of this type: “Customers who bought this book also bought…” Such findings stem from Amazon’s purchase history: Buying two books or more together establish a relationship between these two books. We realized a similar feature for software:

Developers who changed the field fKeys[] also changed the method initDefaults().

For this purpose, we analyze version histories of large software systems, trying to identify commonalities and anomalities, and guiding the developer in understanding and maintenance.

Mining Version Histories to Guide Software Changes – ICSE 2004
Mining Version Histories to Guide Software Changes (extd.) – TSE 2005
eROSE plugin for Eclipse – no longer maintained

Discovering application-specific usage patterns

When developers change code they add new method calls. Method calls that are added together (“co-added”) are often related to each other. Our DynaMine prototype obtains this co-addition relationship from version archives and identifies usage patterns that describe how methods should be called, for instance:

A call to addWidget() should be followed by removeWidget().

Besides simple pairs, usage patterns come as state machines or grammars. They explain to developers how to use certain methods and violations of a pattern may be reported as warnings. DynaMine scales up to the history of industrial-sized projects such as ECLIPSE.

DynaMine: Finding Common Error Patterns by Mining Software Revision Histories – ESEC/SIGSOFT FSE 2005

Locating cross-cutting concerns in version histories

Our HAM prototype identifies cross-cutting changes in version histories:

A developer inserted calls to lock() and unlock() into 1284 different locations.

To identify such changes, we apply concept analysis on additions of method calls. This helps developers to become aware of cross-cutting concerns in legacy systems and to refactor them into aspects, which in the long term avoids serious maintenance challenges.

Mining Aspects from Version History (basic technique) – ASE 2006
Mining Eclipse for Cross-Cutting Concerns (concept analysis)– MSR 2006
HAM: Cross-Cutting Concerns in Eclipse (tool) – ETX 2006