Font Size: a A A

Research On Software Defect Analysis Method Based On Clustering And Keyword Extraction

Posted on:2021-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:J T GaoFull Text:PDF
GTID:2518306464998669Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Software Defect can be defined as errors that occur or are unacceptable in software documents,software programs,and software data.With the constant need to upgrade software and modern technology,the software industry has developed rapidly.Because most of the software development process is done by people,defects will inevitably arise during the development process.In fact,the number of software products is constantly increasing,so a crucial step in the development process is the management of software defects.This will improve defect repair efficiency,reduce defect repair time,save defect repair costs,and improve software quality.Defect management also provides data experience for software updates and development,thereby reducing the recurrence of defects.Therefore,automatic defect analysis has important theoretical significance and application value for software repair.The defect report is a text written by the tester after the defect is tested.Due to the large amount of data in the defect report and the redundant information,it is difficult for the repairer to quickly obtain the defect repair information,resulting in low efficiency in manual analysis of the defect report.Therefore,classifying defects and analyzing the potential meaning to provide to the repairer automatically will improve the repair efficiency of the repairer.Clustering in unsupervised machine learning methods uses data-driven classification,which does not require manual annotation and saves time and efficiency.Keyword extraction can quickly extract the keywords of this type of defects and provide them to the repairer.Therefore,a software defect classification method based on clustering is proposed.On this basis,a keyword extraction method is used to obtain the keywords of each type of defects and provide them to software repairers.The specific work is as follows:1.Get complete reports of partial defects from defect management software.Defect management mainly refers to the effective recording,collection and statistics of defects.In defect management software,this thesis uses bugzilla,but this tool only records defect attributes,transactions,statistical information,and so on.Next,more in-depth research is needed on the defect report data.So first download the full report of some defects from bugzilla for subsequent analysis.2.Cluster software defects and classify all defects in the defect report.After obtaining the defect report,first use the clustering method to divide the defects into different categories.The clustering process is mainly divided into four steps: preprocessing,effective feature extraction,text representation,and clustering.First,after obtaining the defect report,some information in the defect report will be extracted;Then preprocess the defect information;and then the effective features related to the defect will be extracted to make the clustering more accurate;next,use textual representations to transform natural language into computer-recognizable form;finally,the K-Means method is used to cluster the defects for subsequent analysis of the defects.Here,the defects are divided into six categories.3.Extract keywords for each type of defect.After defect clustering,Defect categories are not enough for repairers.This information can only enable the repairer to know which defects are similar.But it does not enable the repairer to quickly understand this type of defect.Based on the defect category information,LDA method is used to accurately extract the keywords of each type of defect,so that the repairer can quickly understand the type of defect and find a corresponding solution.4.After getting the extraction result,return it to the defect report in the form of a label.The results after clustering and keyword extraction will be presented in the form of a folder,which makes it necessary for the repairer to compare the defect id to find the defect report during the repair process,which increases the repair time.Therefore,the extraction result is inserted into the defect report in the form of a label and provided to the repairer,so that the repairer can quickly understand the defect type and related information when viewing the defect report for the first time,which will reduce the repair time.
Keywords/Search Tags:Software defect, Automatic defect analysis, Defect report, Clustering, Topic models
PDF Full Text Request
Related items