Font Size: a A A

Research On Software Defect Prediction Method Based On Feature Selection And Oversampling

Posted on:2023-10-30Degree:MasterType:Thesis
Country:ChinaCandidate:L L XuFull Text:PDF
GTID:2558306848967509Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the development of the information age,software plays a more and more important role in people’s life.People put forward higher requirements for the security of software.Limited by the experience of developers and managers,software defects are inevitable.Accurate and efficient software defect prediction technology can enable people to arrange testing work and ensure the safety of software.This paper studies the high dimensionality and class imbalance of data in software defect prediction,and puts forward solutions.The main research contents are as follows:Firstly,aiming at the problem of high dimensionality of data,a feature selection algorithm based on Chi-square and genetic algorithm is proposed.The feature correlation is sorted by chi square value,and different numbers of features are selected according to the correlation from large to small to form multiple feature selection candidates,so as to complete the initialization of genetic algorithm.Then select the appropriate fitness function,iteratively optimize the feature selection candidates,and finally export the feature selection scheme with the best performance to end the feature selection process.Compared with the traditional genetic algorithm,the feature selection algorithm proposed in this paper can achieve faster convergence and better performance.Secondly,aiming at the class imbalance problem,a minority class sample oversampling algorithm based on multi centroid is proposed.By separating the positive samples and clustering the positive samples,we can obtain a better distribution of positive samples,then sort the samples in the cluster and set boundary points.The boundary points are obtained by linear interpolation between the positive samples and the centroid.The boundary points are mainly to control the distribution range of the newly generated samples.Finally,the new samples will be evenly generated between the positive samples and the boundary points.If there are too many generated samples,some samples will be deleted according to the sample sorting in the cluster,and the positive and negative samples will be balanced.The algorithm proposed in this paper can greatly reduce the sample overlap rate and the introduction risk of noise data,and generate more valuable new samples.Finally,according to the proposed feature selection algorithm and oversampling algorithm,a software defect prediction model based on feature selection and oversampling is proposed.The model uses multi-layer perceptron as classifier to predict the defects of software entities.Some public data sets of MDP software defect prediction are selected for empirical research.Through comparative experiments,the effectiveness of the feature selection algorithm,oversampling algorithm and software defect prediction model proposed in this paper are verified,and the experimental results are analyzed.
Keywords/Search Tags:software defect prediction, class imbalance learning, feature selection, data preprocessing
PDF Full Text Request
Related items