The Art and Science of Analyzing Software Data

The book “The Art and Science of Analyzing Software Data” that I edited with Christian Bird and Tim Menzies is now available. Get your copy now!

The Art and Science of Analyzing Software Data

The Art and Science of Analyzing Software Data provides valuable information on analysis techniques often used to derive insight from software data. This book shares best practices in the field generated by leading data scientists, collected from their experience training software engineering students and practitioners to master data science.

The book covers topics such as the analysis of security data, code reviews, app stores, log files, and user telemetry, among others. It covers a wide variety of techniques such as co-change analysis, text analysis, topic analysis, and concept analysis, as well as advanced topics such as release planning and generation of source code comments. It includes stories from the trenches from expert data scientists illustrating how to apply data analysis in industry and open source, present results to stakeholders, and drive decisions.

Many thanks to our wonderful contributors to this book project: Alberto Bacchelli, Olga Baysal, Ayse Bener, Aditya Budi, Bora Caglayan, Gul Calikli, Joshua Charles Campbell, Jacek Czerwonka, Kostadin Damevski, Madeline Diep, Robert Dyer, Linda Esker, Davide Falessi, Xavier Franch, Thomas Fritz, Nikolas Galanis, Marco Aurélio Gerosa, Ruediger Glott, Michael W. Godfrey, Alessandra Gorla, Georgios Gousios, Florian Groß, Randy Hackbarth, Abram Hindle, Reid Holmes, Lingxiao Jiang, Ron S. Kenett, Ekrem Kocaguneli, Oleksii Kononenko, Kostas Kontogiannis, Konstantin Kuznetsov, Lucas Layman, Christian Lindig, David Lo, Fabio Mancinelli, Serge Mankovskii, Shahar Maoz, Daniel Méndez Fernández, Andrew Meneely, Audris Mockus, Murtuza Mukadam, Brendan Murphy, Emerson Murphy-Hill, John Mylopoulos, Anil R. Nair, Maleknaz Nayebi, Hoan Nguyen, Tien Nguyen, Gustavo Ansaldi Oliva, John Palframan, Hridesh Rajan, Peter C. Rigby, Guenther Ruhe, Michele Shaw, David Shepherd, Forrest Shull, Will Snipes, Diomidis Spinellis, Eleni Stroulia, Angelo Susi, Lin Tan, Ilaria Tavecchia, Ayse Tosun Misirli, Mohsen Vakilian, Stefan Wagner, Shaowei Wang, David Weiss, Laurie Williams, Hamzeh Zawawy, and Andreas Zeller.

ISEC 2016: India Software Engineering Conference – Goa, India

Please submit to the India Software Engineering Conference (ISEC 2016), which will be held February 18-20, 2016 in Goa, India. The submission deadline for the research track is September 25 (abstracts September 18; as always, please check the webpage for any updates). I’m a member of the Program Committee for the Research Track.

ISEC is the annual conference of iSOFT, the India chapter of ACM SIGSOFT ( ) under the umbrella of ACM India. The 9th ISEC will be held at BITS Pilani, Goa, India. ISEC will bring together researchers and practitioners from across the world to share the results of their work. The goal of the conference is to provide a forum for researchers and practitioners from both academia and industry to meet and share cutting-edge advancements in the field of software engineering.

The conference invites technical papers describing original and unpublished results of conceptual, empirical, and experimental software engineering research. The goal of the conference is to provide a forum for researchers and practitioners from both academia and industry to meet and share cutting-edge advancements in the field of software engineering. Submissions are invited in all aspects of software engineering.

ICSE 2016: International Conference on Software Engineering – Austin, TX, USA

Please submit to the International Conference on Software Engineering (ICSE 2016), which will be held May 14-22, 2016 in Austin, TX, USA. The submission deadline for the research track is August 28. I’m a member of the Program Committee for the Research Track.

ICSE — the premier conference in software engineering sponsored by ACM and IEEE CS — is coming to Austin for a second time in May 14-22, 2016. There, the top minds in software engineering research and practice will convene for a week of inspirational talks, demos and conversation. A quarter-century after the Texas state capital first hosted ICSE, Austin has become a hub for technology, entrepreneurship, music, outdoor recreation and nightlife. ICSE 2016 will be located in the lovely Austin Arboretum Area — known as “new Austin” — at the start of the Texas hill country and at the peak of the Texas wildflower season. What better setting could there be for flourishing ideas?

Please mark your calendars. Contribute your results and ideas. Then plan to join us to make the 38th ICSE one for the record books since, as the saying goes, “Everything is bigger in Texas!”.

APSEC 2015: Asia-Pacific Software Engineering Conference – New Delhi, India

Please submit to the Industry Track of the Asia-Pacific Software Engineering Conference (APSEC 2015), which will be held December 1-4, 2015 in New Delhi, India. The submission deadline is July 31 (as always, please check the webpage for any updates). I’m a member of the Program Committee for the Industry Track.

The Asia-Pacific Software Engineering Conference (APSEC) is the leading international conference on Software Engineering and technology in the Asia-Pacific region. APSEC aims to bring together researchers and practitioners from industry, academia, and government to advance the state of the art in Software Engineering and technology and to encourage wider collaboration between academics and industries.?

The APSEC Industry Track or Software Engineering in Practice track is primarily for researchers and practitioners in Industry to present experience report and case-study papers. The objectives and focus of the Industry Track papers is on describing tools, research prototypes and innovative solution offerings developed and deployed within the organization. The paper should clearly describe the problem, solution approach, techniques or methodologies employed, related work, case-study, experience, learnings and best-practices that emerged as a result of deploying the solution offering in the Industrial setting.

Products, Developers, and Milestones: How Should I Build my N-Gram Language Model (Industry Track) – ESEC/FSE 2015

Recent work has shown that although programming languages enable source code to be rich and complex, most code tends to be repetitive and predictable. Advances in the use of NLP techniques applied to source code such as n-gram language models show great promise in areas such as code completion, aiding impaired developers, and code search. In this paper we advance the understanding of programming language models by answering three research questions related to different methods of constructing language models. Specifically, we ask: Do product specific, but smaller language models perform better than language models across projects? Are developer specific language models effective and do they differ depending on what parts of the codebase a developer is working in? Finally, do language models change over time (i.e., Does a language model from early development model changes later on in development?). The answers to these questions enable techniques that make use of programming language models in development to choose the model training corpus more effectively. We evaluate these questions by building 28 language models across developers, time periods, and projects within Microsoft Office and present the results in this paper. We find that developer and product specific language models perform better than models from the entire code-base, but that temporality has little to no effect on language model performance.

[click for more details…]

How Practitioners Perceive the Relevance of Software Engineering Research – ESEC/FSE 2015

The number of software engineering research papers over the last few years has grown significantly. An important question here is: how relevant is software engineering research to practitioners in the field? To address this question, we conducted a survey at Microsoft where we invited 3,000 industry practitioners to rate the relevance of research ideas contained in 571 papers ICSE and FSE papers that were published over a five year period. We received 17,913 ratings by 512 practitioners who labelled ideas as essential, worthwhile, unimportant, or unwise. The results from the survey suggest that practitioners are positive towards studies done by the software engineering research community: 71% of all ratings were essential or worthwhile. We found no correlation between the citation counts and the relevance scores of the papers. Through a qualitative analysis of free text responses, we identify several reasons why practitioners considered certain research ideas to be unwise. The survey approach is this paper is very lightweight, participants spent only 22.5 minutes to respond to the survey. At the same, the results can provide useful insights to conference organizers, authors, and the participating practitioners.

[click for more details…]

Quantifying Developers’ Adoption of Security Tools – ESEC/FSE 2015

Security tools could help developers find critical vulnerabilities, yet such tools remain underused. We surveyed developers from 14 companies and 5 mailing lists about their reasons for using and not using security tools. The resulting thirty-nine predictors of security tool use provide both expected and unexpected insights. As we expected, developers who perceive security to be important are more likely to use security tools than those who do not. However, that was not the strongest predictor of security tool use, it was instead developers’ ability to observe their peers using security tools.

[click for more details…]