Font Size: a A A

Research On Software Defect Prediction Method Based On Feature Selection

Posted on:2022-06-11Degree:MasterType:Thesis
Country:ChinaCandidate:Z LiuFull Text:PDF
GTID:2518306329990629Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Software testing is a process of testing whether the software meets the expectations through manual or automatic methods according to requirements documents and so on.Testing is very important in software development.The goal of software testing is to discover as more as defects in the software.A defect is found earlier,it will cause less impact and lost,so the cost of repair will lower.However,in order to find as many software defects as possible while reducing the investment cost and ensuring the progress of development,manual testing method has been unable to meet this requirement.Software defect prediction technology can identify software modules that may have defects in advance,and give more testing resources for modules with high risk of defects,so as to ensure software quality and testing efficiency.At present,the software function and scale of the rapid increase,resulting in the data set has high dimensional characteristics,easy to cause dimensional explosion problem.However,there are useless features in software features,which will have no effect or even negative effects in the construction of defect prediction model and affect the performance of the model.Feature selection refers to the process of reducing the dimension of the data set by selecting a few of the most effective features from the existing original features.Feature selection used in defect prediction model can improve the irrelevant and redundant features in defect data set,so as to improve the classification ability of prediction model.Based on the method of feature selection,a series of researches on integration feature selection and clustering feature selection are carried out in view of irrelevant features and redundant features existing in the data set.The following two aspects was introduced in this paper:(1)Aiming at the problem of irrelevant features in software defect data set,a feature sorting selection method based on integration is proposed.The first is the preprocessing stage,in order to improve the data quality,for the next stage of the construction of classification model preparation.Then,the feature correlation was sorted in descending order according to the ensemble sorting algorithm based on three feature sorting algorithms(GR,CS and GI),and the feature subset was selected from the sorted feature set according to the percentage to build the model.The experimental results show that using a few features has a better effect than using all features,and the sorting algorithm based on integration has better stability than the single feature sorting algorithm.(2)To solve the problem that the redundant features in defect data set cannot be solved effectively,a feature selection method based clustering algorithm.Firstly,the features were clustered by using K-Medoids clustering algorithm,and the features with redundant relationship were clustered into the same cluster.Then,in order to build software defect prediction models,select the centroid of each cluster and the features which has the lowest correlation with centroid,and finally,the first few features with high correlation among the remaining features in the cluster are selected.The experimental results show that the feature selection algorithm considering both redundancy and correlation has more advantages than the feature selection algorithm considering only correlation.At the same time,the prediction performance of the classification algorithm on different scale data sets is compared.Using the MDP dataset of NASA as the experimental data,several experiments were conducted to verify the two methods proposed in this paper.The experimental results show that the method proposed in this paper is effective in improving the performance of defect prediction.
Keywords/Search Tags:software defect prediction, Feature selection, Logistic regression, Support Vector Machine
PDF Full Text Request
Related items