Research On High-dimensional Data Processing In Software Defect Prediction

Posted on:2021-05-20

Degree:Master

Type:Thesis

Country:China

Candidate:R Li

Full Text:PDF

GTID:2428330611988267

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The scale and complexity of current software are increasing day by day,so software reliability is of great concern.In software engineering,if it is possible to find the modules and their distribution that may have defects in the software system,it plays an important role in the software developer's rational allocation of resources and the improvement of software quality.Software defect prediction(SDP)technology is to predict whether there are defects in software modules,and based on historical data and software metrics such as defects that have been found,predict which modules are prone to errors.Reasonable prediction of software defects can effectively help testers quickly locate and make up for software defects,thereby achieving the effect of significantly reducing software development costs and improving software credibility.Current research usually formalizes the implementation of defect prediction as a machine learning problem,and many machine learning techniques are used for defect prediction.However,the existing defect prediction methods still have many problems in practical applications.For example,the performance of these methods is not stable enough.In the case of high-dimensional data(such as a large number of redundant and irrelevant measurement elements),the prediction accuracy is not high,and high-dimensional data is very common in practical applications.In addition,because the defective class(also called "positive class")is usually much less than the non-defective class(also called "negative class"),that is,the historical defect data has class imbalance,which is easy to cause the prediction model to prefer the negative class,thereby reducing the prediction accuracy of the positive class.Due to the limited classification ability ofsingle classifiers,it can not effectively deal with imbalanced data.Therefore,many scholars use ensemble learning methods to predict defects.This thesis systematically studies the problems of high dimensionality and class imbalance in software defect prediction.First,in order to deal with high-dimensional and imbalanced data in defect prediction,we conducted a comparative study on the application effects of existing oversampling methods and feature selection methods in defect prediction;Second,the concepts of rough set theory and knowledge granularity are introduced into feature selection,and a new information entropy model�harmonic granularity decision entropy is proposed,and a feature selection algorithm FSHGE based on harmonic granularity decision entropy is constructed from this;Third,for the problem that the single classifier has limited classification ability and poor defect prediction effect,we propose a multi-modal selective ensemble learning algorithm SE_RSFS,and use SE_RSFS for defect prediction.The SE_RSFS algorithm uses the previously proposed feature selection algorithm FSHGE and resampling technology to simultaneously disturb the attribute space and sample space of the training set,thereby achieving an efficient multi-modal disturbance.

Keywords/Search Tags:

software defect prediction,SDP, feature selection, ensemble learning, rough sets, class imbalance, harmonic granularity decision entropy

PDF Full Text Request

Related items

1	Research On Decision Tree Algorithm Based On Rough Sets And Ensemble Learning
2	Research On Imbalanced Data Processing In Software Defect Prediction
3	Research On Software Defect Prediction Method Based On Fusion Feature Selection And Ensemble Learning
4	Researches And Applies On Software Defect Prediction Method Based On Ensemble Learning
5	Research On Software Defect Prediction Model Based On Active Ensemble Learning
6	Research On Software Defect Prediction Based On Ensemble Learning
7	Research On Intrusion Detection Approach Based On Incremental Learning And Ensemble Learning
8	Research On Software Defect Prediction Method Based On Feature Selection
9	Based On Ensemble Sampling And Data Imbalance Self-adaptive Processing Method In Defect Prediction Context
10	Research On Software Defect Prediction Method Based On Semi-supervised Integration