by Tom on December 17, 2012
How do video game skills develop, and what sets the top players apart? We study this question of skill as measured by a rating generated from repeated multiplayer matches called TrueSkill. Using these ratings from 7 months of games from over 3 million players, we look at how play intensity, breaks in play, skill change over time, and other titles affect skill. These analyzed factors are then combined to model future skill and games played; the results show that skill change in early games is a useful metric for modeling future skill, while play intensity explains eventual games played. The best players in the 7-month period, who we call “Master Blasters”, have varied skill patterns that often run counter to the trends we see for typical players. The data analysis is supplemented with a 70 person survey to explore how players’ self-perceptions compare to the gameplay data; most survey responses align well with the data and provide explanations for the observed behavior. Finally, we wrap up with a discussion about hiding skill information from players, and implications for game designers.
[click for more details...]
by Tom on December 5, 2012
Christian Bird, Tim Menzies and I are organizing an ICSE workshop on Data Analysis Patterns in Software Engineering. The goal of the workshop is to collect and compile a catalog of reusable data analyses (“patterns”) that can be applied to many different data types in software engineering. Think of design patterns for data science.
Please consider submitting your favorite analysis patterns. I know you have one! We solicit short 3-page papers that can be both archival and non-archival. The deadline for archival submissions is February 7, 2013. The workshop will be May 21, 2013. For more details, including illustrative examples of analysis patterns, please visit the DAPSE workshop web-page.
In addition to the workshop, we plan to edit a book on “Data Science for Software Engineers” to showcase some of the data analysis patterns. We will invite selected authors from the workshop to contribute chapters to this book.
Data scientists in software engineering seek insight in data collected from software projects to improve software development. The demand for data scientists with domain knowledge in software development is growing rapidly and there is already a shortage of such data scientists.
Data science is a skilled art with a steep learning curve. To shorten that learning curve, this workshop will collect best practices in form of data analysis patterns, that is, analyses of data that leads to meaningful conclusions and can be reused for comparable data. In the workshop we will compile a catalog of such patterns that will help both experienced and emerging data scientists to better communicate about data analysis. The workshop is intended for anyone interested in how to analyze data correctly and efficiently in a community accepted way.
by Tom on December 5, 2012
Software monitoring systems have high performance overhead because they typically monitor all processes of the running program. For example, to capture and replay crashes, most current systems monitor all methods; thus yielding a significant performance overhead. Lowering the number of methods being monitored to a smaller subset can dramatically reduce this overhead. We present an approach that can help arrive at such a subset by reliably identifying methods that are the most likely candidates to crash in a future execution of the software. Our approach involves learning patterns from features of methods that previously crashed to classify new methods as crash-prone or non-crash-prone. An evaluation of our approach on two large open source projects, ASPECTJ and ECLIPSE, shows that we can correctly classify crash-prone methods with an accuracy of 80-86%. Notably, we found that the classification models can also be used for cross-project prediction with virtually no loss in classification accuracy. In a further experiment, we demonstrate how a monitoring tool, RECRASH could take advantage of only monitoring crash-prone methods and thereby, reduce its performance overhead and maintain its ability to perform its intended tasks.
[click for more details...]
by Tom on November 21, 2012
When software engineers fix bugs, they may have several options as to how to fix those bugs. Which fix is chosen has many implications, both for practitioners and researchers: What is the risk of introducing other bugs during the fix? Is the bug fix in the same code that caused the bug? Is the change fixing the cause or just covering a symptom? In this paper, we investigate the issue of alternative fixes to bugs and present an empirical study of how engineers make design choices about how to fix bugs. Based on qualitative interviews with 40 engineers working on a variety of products, 6 bug triage meetings, and a survey filled out by 326 engineers, we found that there are a number of factors, many of them non-technical, that influence how bugs are fixed, such as how close to release the software is. We also discuss several implications for research and practice, including ways to make bug prediction and localization more accurate.
[click for more details...]
by Tom on November 11, 2012
Existing research is unclear on how to generate lessons learned for defect prediction and effort estimation. Should we seek lessons that are global to multiple projects, or just local to particular projects? This paper aims to comparatively evaluate local vs. global lessons learned for effort estimation and defect prediction. We applied automated clustering tools to effort and defect data sets from the PROMISE repository. Rule learners generated lessons learned from all the data, from local projects, or just from each cluster. The results indicate that the lessons learned after combining small parts of different data sources (i.e., the clusters) were superior to either generalizations formed over all the data or local lessons formed from particular projects. We conclude that when researchers attempt to draw lessons from some historical data source, they should (a) ignore any existing local divisions into multiple sources; (b) cluster across all available data; then (c) restrict the learning of lessons to the clusters from other sources that are nearest to the test data.
[click for more details...]
by Tom on September 29, 2012
I am editing a book on “Recommendation Systems in Software Engineering“, together with Martin P. Robillard, McGill University (Canada), Walid Maalej, University of Hamburg (Germany), and Robert J. Walker, University of Calgary (Canada). The book will be published by Springer. Please consider contributing a chapter. The intent of submission is October 15, 2012.
OVERVIEW
Recommendation systems support decision making by helping their users navigate through large information spaces. Many activities in software engineering require searching, understanding, and managing large amounts of highly-technical and inter-related information.
With the growth of public and private data stores and the emergence of off-the-shelf data-mining technology, recommendations systems have emerged that specifically target the unique challenges of navigating software engineering data.
This book will collect state-of-the-art knowledge on the basic techniques required to mine software engineering data to produce recommendations, on the best way to apply these techniques effectively in various application domains, and on the approaches that can be employed to assess the value of recommendations in software engineering.
FORMAT
We invite proposals for chapters that synthesize existing knowledge on relevant background topics and application areas for recommendation systems. Chapters should be accessible to senior undergraduate students and graduate students with a background in Computer Science, Software Engineering, or related disciplines. Chapters are not expected to correspond to the description of a single research project or technique. The proposed Table of Contents offers suggestions for target topics.
For more details, including proposed Table of Contents, please visit the RSSE Book webpage. For any inquiries please contact Martin Robillard.