Font Size: a A A

A Cost-sensitive Hybrid Software Defect Prediction Model

Posted on:2019-10-29Degree:MasterType:Thesis
Country:ChinaCandidate:W XiangFull Text:PDF
GTID:2438330548973574Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the era of rapid development of the Internet,the scale and complexity of software are increasing.If defects in the software are not discovered in time,it may have a huge impact on related fields.Therefore,finding the defects hidden in the software is an urgent problem to solve before the software is released.In the software defect prediction,the machine learning algorithm plays an important role and has a good effect,but in the actual software project,the software defect prediction still has the following problems.(1)The cost of the test software module is too high,and only a few software modules can be tested,that is,only a few data samples are labeled;(2)The costs caused by different wrong points are inconsistent and there will be defective data samples.The judgment that there is no defect is obviously more expensive than the judgment of the flawed data sample as defective;(3)The software defect prediction dataset has the characteristics of class imbalance,and the defective data sample is only a minority;(4)existing The defect prediction model does not consider the semantic information implicit in the source code.In this paper,a cost-sensitive mixed software defect prediction model is proposed for the above problems.The main work contents are as follows:1.Propose a modified semi-supervised support vector machine,use a 10-fold cross validation method to extract training data sets and test data sets,and perform repeated experiments to avoid contingency;meanwhile,semi-supervised support vector machines can use a small amount of tag data samples and a large number of Defect prediction models are built together without tag data samples.2.A cost-sensitive support vector machine is proposed to solve the data set class by giving a higher cost for "prediction of defective data samples without defects" and a lower cost for "no flawed data samples predicted to be defective".Unbalanced problems and minimized misclassification costs.3.Combining feature orientation into software defect prediction,modeling the subject of defect features and software source code to obtain defect information and subject information in the code,performing similarity matching,and categorizing the result of the classification and the result of feature positioning.Linear combination,the final result of hybrid software defect prediction.In a word,this paper combines characteristic positioning and semi-supervised learning,proposes a low-cost software defect prediction model,uses Lucene,Eclipse JDT Core and other open source software to verify the defect prediction model of this paper,and finds that this paper has few sample labels and high cost rate.The most significant improvement is in the data results.
Keywords/Search Tags:Software Defect, Cost Sensitive, CS4VM+L, Topic Modeling
PDF Full Text Request
Related items