Research On The Data Set Reduction Method For Bug Triaging

Posted on:2021-01-05

Degree:Master

Type:Thesis

Country:China

Candidate:M M Wei

Full Text:PDF

GTID:2428330602489123

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

As software functions and their development process become more and more complex,the number of bugs reported in the bug repository has increased dramatically.Traditional manual interpretation and analysis can no longer adapt to large-scale bug data,and more and more automatic bug dispatching research based on text classification has emerged.Most studies use the short descriptions in bug reports as textual information,and then optimize it on the dispatch model.Ignores long descriptions that provide more information but also cause more noise.If the quality of the data set is not high,optimizing the dispatch model will not achieve good results anyway.On the other hand,the number of bugs is huge,and the time and number of developers are limited.Therefore,priority is given to fixing bug reports that have a greater impact,and the damage caused by them can be minimized.However,bug report data sets are often unbalanced.Therefore,in order to improve the maintenance and management of bug reports in the bug warehouse and reduce labor costs,further research is needed on the noise contained in the text description information and how to effectively identify high-impact bug reports.Considering that different bugs cause different potential threats to the system,the higher the severity of the bug report,the higher the priority.Aiming at the phenomenon of large scale,low quality and unbalanced data,this paper studies its bug allocation method.The contributions of this paper are as follows:(1)In this paper an optimized bug triage technique is proposed to build a high quality set of bug data by removing the noisy and non-informative bug reports while ensuring the maximum accuracy of bug triaging with weights and binary constraints.The proposed technique is built upon three feature selection algorithms and four in-stances selection algorithms with intention to recommend the bug and to auto-matically assign it more accurately even with noisy bug descriptions.Several ex-periments are conducted and the experimental results show that the reduced train-ing sets by the proposed approach can achieve better accuracy in several cases,about 4%on average better than the original ones.(2)We propose a high-impact bug reports identification approach that combines the data reduction and imbalanced learning strategies.In data reduction phase,we combine feature selection with instance selection method to build a small-scale and high-quality set of bug reports by removing bug reports and words,which are redundant or non-informative;in imbalanced learning strategies phase,we handle imbalanced distributions of bug reports through four imbalanced learning strategies.We experimentally verified that the method of combining the data reduction and imbalanced learning strategies could effectively identify high-impact bug reports.

Keywords/Search Tags:

Bug Triaging, Imbalance, Feature Selection, Instance Selection, Text Classification

PDF Full Text Request

Related items

1	Cost-Sensitive Feature And Instance Selection For Imbalanced Netwrok Abnormal Datasets
2	Research On KNN Text Classification
3	Research On Text Classification Model And Algorithm For Small Dataset
4	Classification Research On News Text Classification Based On Feature Selection Method
5	A Study Of Text Classification Algorithms Based On Feature Selection
6	Research On Text Classification Method Based On Improved Feature Selection Algorithm
7	Research On Instance Selection Method For K Neighborhood Classification
8	Feature Selection Methods For Text Categorization
9	Research On Software Defect Prediction Based On Feature Selection And Instance Transfer
10	The Design And Application Of SSVM's Text Classification Based On Feature Selection Optimization