Font Size: a A A

Research On Software Defect Prediction Method Based On Semi-supervised Integration

Posted on:2022-09-16Degree:MasterType:Thesis
Country:ChinaCandidate:J H ZhouFull Text:PDF
GTID:2518306479471904Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,the field of software defect prediction has received widespread attention.Most of the reason is that people have gradually begun to pay attention to software quality,and software defects affect and threaten software quality.Therefore,how to solve software defects in the software development process has become a problem.Important research topics.At present,an important reason for limiting the performance of software defect prediction models is the inability to obtain enough defect marker samples to fully train the model.Therefore,how to solve the problem of insufficient labeled samples has become an important research branch in this field.In the research,it is found that the software defect data set is unbalanced,and the proportion of defective samples is very small.At the same time,with the development of software measurement technology,although more software module characteristic information can be obtained,this also produces Many irrelevant features and redundant features will not only increase the computational cost of the model,but also cause the performance of the software defect prediction model to decrease.Although there are certain related researches on solving the problem of class imbalance and feature selection in software defect prediction,few researches comprehensively consider solving class imbalance and feature redundancy,and irrelevant problems are related to semi-supervised software defect prediction models.To influence.Therefore,this paper has carried out experimental analysis on various methods to solve the problems of class imbalance,feature redundancy,and irrelevance of the semi-supervised defect prediction model,and proposed the best solution.The main research work and contributions of this paper are as follows :(1)In view of the class imbalance of the data set,the sampling method was researched.The research found that in actual development,only 10% to 40% of the labeled sample data of a complete project can be obtained,and the defective samples are even more common.Very few,if the commonly used under-sampling methods in previous studies are used,a lot of valuable labeled sample information will be lost.This paper compares and analyzes various sampling methods on two data sets commonly used in the field of software defect prediction,AEEEM and NASA.The experimental results prove that the use of oversampling methods can achieve better and more stable prediction results.Among them,ADASYN self The best prediction results can be obtained by adapting to the comprehensive oversampling method.(2)Aiming at the problems of irrelevance between the features and the label column and the redundancy between the features in the software defect prediction research,a variety of feature selection algorithms were compared and analyzed on the AEEEM and NASA data sets,and the experimental results proved the smallest The redundant maximum correlation m RMR algorithm can select the best feature subset to make the model prediction result more stable and superior.At the same time,in order to make full use of the information of the initial labeled and unlabeled samples and prevent over-fitting problems,this paper proposes a feature selection framework suitable for semi-supervised software defect prediction.Experiments show that this framework can effectively improve semi-supervised learning.The performance of the next feature selection algorithm.(3)In order to maximize the performance of semi-supervised software defect prediction,this paper proposes a semi-supervised software defect prediction model Fe SSTri based on the semi-supervised integrated learning algorithm Tri-training and the above research results.The experimental results show that it is compared with the classic software defect Forecast method,Fe SSTri can achieve better and more stable forecast results.
Keywords/Search Tags:Semi-supervised learning, software defect prediction, class imbalance, feature selection
PDF Full Text Request
Related items