TA-RE: An Exchange Language for Mining Software Repositories – MSR 2006

by Sunghun Kim, Thomas Zimmermann, Miryung Kim, Ahmed E. Hassan, Audris Mockus, Tudor Girba, Martin Pinzger, E. James Whitehead Jr., Andreas Zeller

Software repositories have been getting a lot of attention from researchers in recent years. In order to analyze software repositories, it is necessary to first extract raw data from the version control and problem tracking systems. This poses two challenges: (1) extraction requires a non-trivial effort, and (2) the results depend on the heuristics used during extraction. These challenges burden researchers that are new to the community and make it difficult to benchmark software repository mining since it is almost impossible to reproduce experiments done by another team. In this paper we present the TA-RE corpus. TA-RE collects extracted data from software repositories in order to build a collection of projects that will simplify extraction process. Additionally the collection can be used for benchmarking. As the first step we propose an exchange language capable of making sharing and reusing data as simple as possible.

Download as PDF.


Sunghun Kim, Thomas Zimmermann, Miryung Kim, Ahmed E. Hassan, Audris Mockus, Tudor Girba, Martin Pinzger, E. James Whitehead Jr., Andreas Zeller. TA-RE: An Exchange Language for Mining Software Repositories. In Proceedings of the Third International Workshop on Mining Software Repositories (MSR 2006), Shanghai, China, May 2006, pp. 22-25.

BibTeX Entry

    title = "TA-RE: An Exchange Language for Mining Software Repositories",
    author = "Sunghun Kim and Thomas Zimmermann and Miryung Kim and Ahmed E. Hassan and Audris Mockus and Tudor Girba
and Martin Pinzger and E. James Whitehead Jr. and Andreas Zeller",
    year = "2006",
    month = "May",
    booktitle = "Proceedings of the Third International Workshop on Mining Software Repositories",
    location = "Shanghai, China",
    pages = "22--25",