Empirical Software Engineering: Special Issue on Mining Software Repositories

Jim Whitehead and I have edited a special issue of the Empirical Software Engineering journal with some of the best papers of MSR 2010. The special issue has been published today. The same issue also contains best papers from MSR 2009 (edited by Michael Godfrey and Jim Whitehead).

Read the special issue:
Empirical Software Engineering, Volume 1 / 1996 – Volume 17 / 2012

In the first paper, “Clones: What is that Smell?”, Rahman, Bird, and Devanbu try to validate conventional wisdom that cloning makes code more defect-prone by analyzing the software repositories of four open-source projects. Assessing the validity of common software engineering folklore is a frequent application of mining software repositories. The findings in the paper do not support the claim that clones are generally a “bad smell”—especially with respect to defects. They found that clones may be even less defect-prone than non-cloned code. They also found little evidence that clones with more copies are actually more error prone. As put it in the paper, “perhaps we can clone, and breathe easily, at the same time.”

In the paper “Evaluating Defect Prediction Approaches: A Benchmark and an Extensive Comparison”, D’Ambros, Lanza, and Robbes introduce several novel datasets for defect prediction. As they put it “predicting software defects is one of the holy grails of software engineering”. Over the past years, researchers have devised and implemented literally hundreds of defect prediction approaches (read the systematic review by Hall et al. (2011) for a good summary). However, the absence of benchmarks made it difficult to compare approaches. In their paper, D’Ambros et al. present a benchmark and provide an extensive comparison of well-known defect prediction approaches, together with novel approaches that they devised. The benchmark is available at http://bug.inf.usi.ch/

In the paper, “The Evolution of Java Build Systems”, McIntosh, Adams, and Hassan study the build systems of six open-source projects. While build systems are important to create the executable files of software, especially in industry, build systems have largely been ignored by research until recently. McIntosh et al. observed that the sizes of the build system and source code are highly correlated and that often restructuring the source code also required restructuring the build system. Understanding build processes helps project managers to better allocate personnel and resources to the build system.