Font Size: a A A

Research On Software Defect Prediction Method Based On Random Forest And SVM

Posted on:2021-05-10Degree:MasterType:Thesis
Country:ChinaCandidate:J H HuFull Text:PDF
GTID:2428330623967322Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
As the size and complexity of software systems become larger and larger,correctly finding faulty software modules requires a lot of manpower,material resources and other resources.Software defect prediction technology improves development efficiency and saves development costs by mining and analyzing historical development data in software projects to detect potentially defective software modules.At present,in the field of software defect prediction,researchers mainly use the theory of machine learning to construct defect prediction models.In practical applications,software defect data sets will contain a large number of redundant features and unbalanced distribution of data sets.The factors will largely affect the predictive performance of the model.In this thesis,the random forest algorithm is used to select the features to remove the redundant features in the data set.The SMOTE oversampling combined with the random undersampling technique is used to improve the unbalance rate of the data set.In this thesis,SVM is selected as the basic classifier based on the characteristics of dataset,and a software defect prediction model based on random forest and SVM is designed.The main research contents of the thesis are as follows:(1)Using the classification accuracy of random forest classifier as the basis of feature separability,the feature selection is realized by selecting the feature subset corresponding to the highest classification accuracy.(2)Combining SMOTE oversampling and random undersampling techniques to balance the number of majority and minority classes in the sample,and the number of samples is reduced to half of the original sample number,which improves the classification speed.(3)For the sequence of feature selection and data sampling,and the data set before the data is applied to the sampled data set or the original data set,four training scenarios are proposed.This thesis uses the software defects provided by NASA.The public data set was used to conduct experiments,and the optimal prediction model was selected by comparing the performance of the software defect prediction model based on the four scenarios.The experimental results show that the prediction model based on the feature selection of the sampled data and the feature selection result applied to the sampled data generated in combination with the SVM is the best in overall performance.(4)In the model optimization process,this thesis uses the PSO algorithm to select parameters for the prediction model in the optimal scenario,so as to obtain the software defect prediction model for the optimal parameters.After a series of comparative experiments,it was proved that the prediction model finally achieved the overall optimal level in terms of comprehensive performance.The software defect prediction model proposed in this thesis can achieve a good prediction effect on software defect data sets,thus providing software developers with guiding solutions to improve software quality and improve development efficiency.
Keywords/Search Tags:software defect prediction, random forest, SVM, PSO
PDF Full Text Request
Related items