List of Data Sets

With this post, I want to highlight some of the public datasets that my co-authors and I’ve published in papers over the past years (no particular order):

Paper: What Makes a Good Bug Report? – TSE 2010
Get the data: Supplemental Material to What Makes a Good Bug Report? (Computer Society Digital Library)
Bug reports with quality ratings. R scripts to analyze the data and detailed tutorial on how to replicate and extend the study.

Paper: Information Needs in Bug Reports – CSCW 2010
Get the data: Appendix to Information Needs in Bug Reports (University of Calgary, Technical Report 2009-945-24)
Includes 900+ questions from 600 bug reports, the results from a card sort, and the data and analysis scripts used to analyze question time, response rate, and response time.

Paper: Predicting Defects for Eclipse – PROMISE 2007
Get the data: Eclipse Bug Data! (Saarland University)
Defect counts and complexity metrics for Eclipse releases 2.0, 2.1, 3.0. Includes R script for experiments on defect prediction.

Paper: Change Bursts as Defect Predictors – ISSRE 2010
Get the data: Eclipse Burst Data! (Saarland University)
Extends the Eclipse Bug Data (PROMISE 2007) with change burst data.

Paper: Security Trend Analysis with CVE Topic Models – ISSRE 2010
Get the data: Security Trend Analysis with CVE Topic Models (University of Calgary, Technical Report 2010-970-19)
Tools, scripts, and data that are needed to replicate the trend analysis. See the README file for details.

Paper: Do Crosscutting Concerns Cause Defects? – TSE 2008
Get the data: ConcernTagger (Marc Eaddy)
Source code for the subject programs and measurement tools, complete concern and bug lists, concern-code and bug-code mappings, and results.