Font Size: a A A

Research Of Automated Duplicate Bug Report Detection

Posted on:2017-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:B WangFull Text:PDF
GTID:2308330485470927Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Recently, with the rapid development of information technology industry, application scenarios of software has become more and more varied. As a result, the complexity and scale of software projects increase sharply. It is in this context that software maintenance which is an important part of software development, plays a vital role in the running of the whole project. In order to improve the quality of software, bug tracking system is usually used to track, record and manage bugs which are found in the process of maintenance in most large software projects. In bug tracking system, a bug report which is a structured document acts as a carrier to track and record bugs.Open-source software projects usually allow different testers, maintainers or users to submit bug reports into bug tracking system due to their own features. This may lead to the problem that a bug is occasionally reported by more than one reporters, resulting in duplicate bug reports. Detecting whether a new bug report is a duplicate one is crucial. It helps reduce the maintenance efforts from developers and maintainers.In the thesis, we conduct a research on duplicate bug reports which are generally existed in bug repository of open-source projects. And we also analyze many pre-existing models of duplicate bug reports detection. Finally, we propose a new detection model PVREP which linearly combines the similarity with the context information of bug reports’text, the similarity with the surface information of bug reports’text and the similarity with the bug reports’metadata. In PVREP, the Paragraph2Vec model that acts as one of the most popular neural network language model is used to calculate the similarity with the context information of bug reports’text; while an improved model REPext based on information retrieval is used to calculate the similarity with the surface information of bug reports’text and metadata.We have validated our technique on three large software bug repositories for Eclipse, Mozilla and OpenOffice. The experiments show about 1%-3% improvement in recall rate@k and about 3% improvement in mean average precision over previous model REP.
Keywords/Search Tags:Software Engineering, Bug Report, Neural Network Language Model, Paragraph Vector, Information Retrieval, Duplicate Detection
PDF Full Text Request
Related items