by Tom on December 18, 2009
Many factors are believed to increase the vulnerability of software system; for example, the more widely deployed or popular is a software system the more likely it is to be attacked. Early identification of defects has been a widely investigated topic in software engineering research. Early identification of software vulnerabilities can help mitigate these attacks to a large degree by focusing better security verification efforts in these components. Predicting vulnerabilities is complicated by the fact that vulnerabilities are, most often, few in number and introduce significant bias by creating a sparse dataset in the population. As a result, vulnerability prediction can be thought of us preverbally “searching for a needle in a haystack.” In this paper, we present a large-scale empirical study on Windows Vista, where we empirically evaluate the efficacy of classical metrics like complexity, churn, coverage, dependency measures, and organizational structure of the company to predict vulnerabilities and assess how well these software measures correlate with vulnerabilities. We observed in our experiments that classical software measures predict vulnerabilities with a high precision but low recall values. The actual dependencies, however, predict vulnerabilities with a lower precision but substantially higher recall.
[click for more details...]
by Tom on December 16, 2009
We performed an empirical study to characterize factors that affect which bugs get fixed in Windows Vista and Windows 7, focusing on factors related to bug report edits and relationships between people involved in handling the bug. We found that bugs reported by people with better reputations were more likely to get fixed, as were bugs handled by people on the same team and working in geographical proximity. We reinforce these quantitative results with survey feedback from 358 Microsoft employees who were involved in Windows bugs. Survey respondents also mentioned additional qualitative influences on bug fixing, such as the importance of seniority and interpersonal skills of the bug reporter.
Informed by these findings, we built a statistical model to predict the probability that a new bug will be fixed (the first known one, to the best of our knowledge). We trained it on Windows Vista bugs and got a precision of 68% and recall of 64% when predicting Windows 7 bug fixes. Engineers could use such a model to prioritize bugs during triage, to estimate developer workloads, and to decide which bugs should be closed or migrated to future product versions.
[click for more details...]
by Tom on August 26, 2009
Software development can be challenging because of the large information spaces that developers must navigate. Without assistance, developers can become bogged down, and spend a disproportionate amount of their time seeking information at the expense of other value-producing tasks. Recommendation Systems for Software Engineering are software tools that can assist developers with a wide range of activities, from reusing code to writing effective bug reports. We provide an overview of recommendation systems for software engineering: what they are, what they can do for developers, and what they might do in the future.
[click for more details...]
For many software projects, bug tracking systems play a central role in supporting collaboration between the developers and the users of the software. To better understand this collaboration and how tool support can be improved, we have quantitatively and qualitatively analysed the questions asked in a sample of 600 bug reports from the MOZILLA and ECLIPSE projects. We categorised the questions and analysed response rates and times by category and project. Our results show that the role of users goes beyond simply reporting bugs: their active and ongoing participation is important for making progress on the bugs they report. Based on the results, we suggest four ways in which bug tracking systems can be improved.
[click for more details...]
Prediction of software defects works well within projects as long as there is a sufficient amount of data available to train any models. However, this is rarely the case for new software projects and for many companies. So far, only a few have studies focused on transferring prediction models from one project to another. In this paper, we study cross-project defect prediction models on a large scale. For 12 real-world applications, we ran 622 cross-project predictions. Our results indicate that cross-project prediction is a serious challenge, i.e., simply using models from projects in the same domain or with the same process does not lead to accurate predictions. To help software engineers choose models wisely, we identified factors that do influence the success of cross-project predictions. We also derived decision trees that can provide early estimates for precision, recall, and accuracy before a prediction is attempted.
[click for more details...]
A bug report is typically assigned to a single developer who is then responsible for fixing the bug. In Mozilla and Eclipse, between 37%-44% of bug reports are “tossed” (reassigned) to other developers, for example because the bug has been assigned by accident or another developer with additional expertise is needed. In any case, tossing increases the time-to-correction for a bug.
In this paper, we introduce a graph model based on Markov chains, which captures bug tossing history. This model has several desirable qualities. First, it reveals developer networks which can be used to discover team structures and to find suitable experts for a new task. Second, it helps to better assign developers to bug reports. In our experiments with 445,000 bug reports, our model reduced tossing events, by up to 72%. In addition, the model increased the prediction accuracy by up to 23 percentage points compared to traditional bug triaging approaches.
[click for more details...]
Global and distributed software development increases the need to find and connect developers with relevant expertise. Existing recommendation systems typically model expertise based on file changes (implementation expertise). While these approaches have shown success, they require a substantial recorded history of development for a project. Previously, we have proposed the concept of usage expertise, i.e., expertise manifested through the act of calling (using) a method. In this paper, we assess the viability of this concept by evaluating expert recommendations for the ASPECTJ and ECLIPSE projects. We find that both usage and implementation expertise have comparable levels of accuracy, which suggests that usage expertise may be used as a substitute measure. We also find a notable overlap of method calls across both projects, which suggests that usage expertise can be leveraged to recommend experts from different projects and thus for projects with little or no history.
[click for more details...]
We show in an empirical study of 3241 Red Hat packages that software vulnerabilities correlate with dependencies between packages. With formal concept analysis and statistical hypothesis testing, we identify dependencies that decrease the risk of vulnerabilities (beauties) or increase the risk (beasts). Using support vector machines on dependency data, our prediction models successfully and consistently catch about two thirds of vulnerable packages (median recall of 0.65). When our models predict a package as vulnerable, it is correct more than eight out of ten times (median precision of 0.83). Our findings help developers to choose new dependencies wisely and make them aware of risky dependencies.
[click for more details...]
by Tom on January 31, 2009
It is important that information provided in bug reports is relevant and complete in order to help resolve bugs quickly. However, often such information trickles to developers after several iterations of communication between developers and reporters. Poorly designed bug tracking systems are partly to blame for this exchange of information being stretched over time. Our paper addresses the concerns of bug tracking systems by proposing four broad directions for enhancements. As a proof-of-concept, we also demonstrate a prototype interactive bug tracking system that gathers relevant information from the user and identifies files that need to be fixed to resolve the bug.
[click for more details...]
by Tom on December 26, 2008
Modern programming environments automatically collect lots of data on software development, notably changes and defects. The field of mining software archives deals with the automated extraction, collection, and abstraction of information from this data. This is the introduction to a special issue of IEEE Software on Mining Software Archives presenting a selection of the exciting research that is taking place in the field.
[click for table of contents]
[click for guest editor introduction]