Font Size: a A A

A Detection Method Of Duplicate Defect Reports Based On Fusing Text And Categorization Information

Posted on:2020-11-26Degree:MasterType:Thesis
Country:ChinaCandidate:D Y FanFull Text:PDF
GTID:2428330578451275Subject:Software Engineering Technology
Abstract/Summary:PDF Full Text Request
Defect report submitted by users and testers is one of the important ways to find defects.Defect report is the carrier of describing defects,and the repair of defect report is the necessary means to improve software.Testers and users submit reports for the same defect repeatedly,resulting in a large number of duplicate reports in the defect report library.With the increasing scale and complexity of software,manual triage has been unable to adapt to more and more complex software systems.The detection of duplicate defect reports can filter duplicate reports from defect report libraries and effectively improve the execution efficiency of software maintenance activities.It is a research hotspot in the field of software maintenance.At present,there is still much room to improve the prediction accuracy of detection of duplicate defect reports,which can not meet the expectations of the industry.The difficulty to improve the prediction accuracy is to find a suitable and comprehensive method to measure the similarity between defect reports.Drawing on the idea of data fusion method,a new method CBLO(Combination of BM25F?LSI and One-Hot)for detecting duplicate defect reports is proposed by using text information and categorization information.This method consists of four steps:1.Data preprocessing,and extracting text information and categorization information of defect reports.2.BM25F and LSI algorithm are used to process text information numerically,then the similarity measure of text information is given.3.One-Hot algorithm is used to process categorization information numerically,and similarity measure of categorization information is given.4.Based on similarity fusion method,the similarity measure of text information and categorization information is fused,and a recommendation list of duplicate defect reports is generated for each defect report.In order to verify the effectiveness of this method,compared with the baseline method DBTM on the OpenOffice.The experimental results show that the accuracy is improved by an average of 4.7%.
Keywords/Search Tags:Duplicate defect report, Information retrieval method, Latent semantic index, One-Hot, Similarity fusion
PDF Full Text Request
Related items