A Detection Method Of Duplicate Defect Reports Based On Fusing Text And Categorization Information

Posted on:2020-11-26

Degree:Master

Type:Thesis

Country:China

Candidate:D Y Fan

Full Text:PDF

GTID:2428330578451275

Subject:Software Engineering Technology

Abstract/Summary:

PDF Full Text Request

Defect report submitted by users and testers is one of the important ways to find defects.Defect report is the carrier of describing defects,and the repair of defect report is the necessary means to improve software.Testers and users submit reports for the same defect repeatedly,resulting in a large number of duplicate reports in the defect report library.With the increasing scale and complexity of software,manual triage has been unable to adapt to more and more complex software systems.The detection of duplicate defect reports can filter duplicate reports from defect report libraries and effectively improve the execution efficiency of software maintenance activities.It is a research hotspot in the field of software maintenance.At present,there is still much room to improve the prediction accuracy of detection of duplicate defect reports,which can not meet the expectations of the industry.The difficulty to improve the prediction accuracy is to find a suitable and comprehensive method to measure the similarity between defect reports.Drawing on the idea of data fusion method,a new method CBLO(Combination of BM25F、LSI and One-Hot)for detecting duplicate defect reports is proposed by using text information and categorization information.This method consists of four steps:1.Data preprocessing,and extracting text information and categorization information of defect reports.2.BM25F and LSI algorithm are used to process text information numerically,then the similarity measure of text information is given.3.One-Hot algorithm is used to process categorization information numerically,and similarity measure of categorization information is given.4.Based on similarity fusion method,the similarity measure of text information and categorization information is fused,and a recommendation list of duplicate defect reports is generated for each defect report.In order to verify the effectiveness of this method,compared with the baseline method DBTM on the OpenOffice.The experimental results show that the accuracy is improved by an average of 4.7%.

Keywords/Search Tags:

Duplicate defect report, Information retrieval method, Latent semantic index, One-Hot, Similarity fusion

PDF Full Text Request

Related items

1	Research On Chinese Concept Retrieval Based On Latent Semantic Analysis
2	The Semantic Query Expansion Model Based On Latent Semantics Index Model
3	Research Of Automated Duplicate Bug Report Detection
4	Design And Implementation Of Multilingual Information Retrieval System Based On Latent Semantic Analysis
5	Research On Web Image Retrieval Based On The Fusion Of Textual And Visual Information
6	Domain Ontology-based Semantic Information Retrieval And Related Technologies
7	The Research Of Key Technology In Personalized Information Retrieval Based On Internet
8	Research On Near-Duplicate Image Detection And Its Application
9	Based On Latent Semantic Indexing, Text Classification And Research In Science And Technology Information Retrieval
10	Research Of Medical Records Semantic Retrieval Method Based On LDA And LSA