Font Size: a A A

Summarizing Bug Reports And Source Code Using Supervised Learning Techniques

Posted on:2017-09-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:NAJAM NAZARFull Text:PDF
GTID:1318330512461473Subject:Software engineering
Abstract/Summary:PDF Full Text Request
While performing software tasks, developers require interacting with software artifacts such as bug reports and source code. This interaction may involve reading through artifacts thoroughly to get required information. However, extracting valuable information from bug reports and source code is a time-consuming, tedious, and strenuous task. To efficiently tackle the task, researchers have advised building automatic summaries for software artifacts.In this dissertation, we proposed the use of supervised learning techniques for summarizing bug reports and source code to facilitate developers in extracting required information efficiently. We investigated summarization of bug reports using duplicate bug reports as an example of natural language text In another investigation, we performed source code fragments summarization as an example of source code to source code summarization.For bug reports, we developed a bug report summarization technique based on PageRank, which we called PRST. The PRST utilized three similarity measures based on VSM, Jaccard, and WordNet to calculate the similarities between master bug reports and their corresponding duplicate bug reports. Since publicly available bug report corpora lack master and duplicate bug reports mappings, it is hard to utilize the information contained in duplicate bugs for summarizing bug reports. Therefore, we developed a separate bug report corpus, OSCAR, consisting of 59 bug reports extracted from Mozilla, KDE, Gnome and Eclipse projects. Meanwhile, the existing BRC corpus was restructured by adding duplicate bug reports as the baseline for comparison. We extrinsically evaluated the effectiveness of the proposed method by employing state-of-the-art statistical evaluation metrics, Precision, Recall, F-Score, and Pyramid Precision. The results show that reasonably accurate summaries can be produced for bug reports and our proposed methods, improve the precision of existing supervised summary generation methods for bug reports.Similarly, for summarizing source code, we developed a code fragment summarization (CFS) algorithm based on SVM and NB classifiers to automatically generate source to source summaries of code fragments. We introduced, for the first time in software artifact summarization paradigm, data-driven small-scale crowdsourcing (through crowd enlistment) to extract source code syntactical features. We constructed a corpus of 127 code fragments retrieved from Eclipse and NetBeans official FAQs to test our method. We verified the efficacy of proposed method by employing aforementioned statistical measures and also compared our method with existing methods. The results show that our code fragment summarizer outperformed existing code fragments methods w.r.t. precision. It also shows that syntactical features have profound effect on the accuracy of generated summaries. The generated summaries are effective in helping developers address the software tasks at hand efficiently as well as it helps in improving the software performance and quality effectively.
Keywords/Search Tags:Mining Software Repositories, Mining Software Engineering Data, Supervised Learning, Bug Reports, Source Code, Duplicate Bugs, Code Fragments
PDF Full Text Request
Related items