Font Size: a A A

Research On Data Preprocessing Technology In Cross Project Software Defect Prediction

Posted on:2022-12-07Degree:MasterType:Thesis
Country:ChinaCandidate:T ZhangFull Text:PDF
GTID:2518306749458174Subject:Art
Abstract/Summary:PDF Full Text Request
With the continuous upgrading and development of computer Internet,people's demand for software is increasing day by day.Although software can provide great convenience for people in daily life,the huge cost caused by software defects has dealt a heavy blow to people.Software defect prediction technology is one of the important means to solve the problem of software defects.With the application of machine learning in the field of software defect prediction,good results have been achieved in the same project.However,compared with the same project,cross project software defect prediction technology has more practical significance.In the research process of cross project software defect prediction,it is found that directly using a large amount of data for cross project software defect prediction often produces poor prediction results,which is due to the problems of class imbalance and feature difference in cross project software defect prediction.Data preprocessing technology can alleviate the problems of class imbalance and feature difference,so data preprocessing technology is very important in cross project software defect prediction.The main work of this paper is as follows:(1)Aiming at the problem of feature difference,this paper proposes a filtered feature selection method cpfrfs(cross project of feature selection and feature redundancy).Through this method,the feature set with low feature redundancy and high feature similarity can be screened,and the number of migrated feature sets constructed by this feature set is less than that of the original feature set,This improves the effect of cross project software defect prediction.(2)Aiming at the problem of class imbalance,this paper proposes a hybrid sampling method msksmote(K-means mixed smote method).This method can delete noise points,eliminate most fuzzy class data on boundary points and add a few class data on boundary points,which can make boundary points clearer,so as to achieve data balance.(3)In order to further improve the effect of cross project software defect prediction,this paper combines msksmote algorithm and cpfrfs algorithm to propose a cross project software defect prediction model of MSK + CP.Firstly,msksmote mixed sampling method is applied to the data set,and then cpfrfs algorithm is used to screen the optimal feature set.Experimental results show that the algorithm can achieve better results in F1 value than the classical cross project software defect prediction algorithm.
Keywords/Search Tags:software defect prediction, feature transfer, feature select, class imbalance
PDF Full Text Request
Related items