Research Of Automated Duplicate Bug Report Detection

Posted on:2017-03-26

Degree:Master

Type:Thesis

Country:China

Candidate:B Wang

Full Text:PDF

GTID:2308330485470927

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Recently, with the rapid development of information technology industry, application scenarios of software has become more and more varied. As a result, the complexity and scale of software projects increase sharply. It is in this context that software maintenance which is an important part of software development, plays a vital role in the running of the whole project. In order to improve the quality of software, bug tracking system is usually used to track, record and manage bugs which are found in the process of maintenance in most large software projects. In bug tracking system, a bug report which is a structured document acts as a carrier to track and record bugs.Open-source software projects usually allow different testers, maintainers or users to submit bug reports into bug tracking system due to their own features. This may lead to the problem that a bug is occasionally reported by more than one reporters, resulting in duplicate bug reports. Detecting whether a new bug report is a duplicate one is crucial. It helps reduce the maintenance efforts from developers and maintainers.In the thesis, we conduct a research on duplicate bug reports which are generally existed in bug repository of open-source projects. And we also analyze many pre-existing models of duplicate bug reports detection. Finally, we propose a new detection model PVREP which linearly combines the similarity with the context information of bug reports’text, the similarity with the surface information of bug reports’text and the similarity with the bug reports’metadata. In PVREP, the Paragraph2Vec model that acts as one of the most popular neural network language model is used to calculate the similarity with the context information of bug reports’text; while an improved model REPext based on information retrieval is used to calculate the similarity with the surface information of bug reports’text and metadata.We have validated our technique on three large software bug repositories for Eclipse, Mozilla and OpenOffice. The experiments show about 1%-3% improvement in recall rate@k and about 3% improvement in mean average precision over previous model REP.

Keywords/Search Tags:

Software Engineering, Bug Report, Neural Network Language Model, Paragraph Vector, Information Retrieval, Duplicate Detection

PDF Full Text Request

Related items

1	Software Bug Report Quality Detection Research Based On Convolutional Neural Network
2	Research On The Effectiveness Of Duplicate Bug Report Detection Based On Deep Learning
3	Research On Key Issues Of Software Bug Report Management
4	A Detection Method Of Duplicate Defect Reports Based On Fusing Text And Categorization Information
5	Research, Large-scale Approximation Paragraph Fingerprint-based Web Page Detection Algorithm
6	Research On The Language Model Based Information Retrieval System
7	Research On Software Multi-feature Defect Location Method Based On Information Retrieval
8	Research On Query Optimization And Vectorization Technique In Document Retrieval
9	A Study On Language Models Based On Neural Networks
10	Using Statistical Language Modeling For Ad Hoc Information Retrieval