A Cost-sensitive Hybrid Software Defect Prediction Model

Posted on:2019-10-29

Degree:Master

Type:Thesis

Country:China

Candidate:W Xiang

Full Text:PDF

GTID:2438330548973574

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

In the era of rapid development of the Internet,the scale and complexity of software are increasing.If defects in the software are not discovered in time,it may have a huge impact on related fields.Therefore,finding the defects hidden in the software is an urgent problem to solve before the software is released.In the software defect prediction,the machine learning algorithm plays an important role and has a good effect,but in the actual software project,the software defect prediction still has the following problems.(1)The cost of the test software module is too high,and only a few software modules can be tested,that is,only a few data samples are labeled;(2)The costs caused by different wrong points are inconsistent and there will be defective data samples.The judgment that there is no defect is obviously more expensive than the judgment of the flawed data sample as defective;(3)The software defect prediction dataset has the characteristics of class imbalance,and the defective data sample is only a minority;(4)existing The defect prediction model does not consider the semantic information implicit in the source code.In this paper,a cost-sensitive mixed software defect prediction model is proposed for the above problems.The main work contents are as follows:1.Propose a modified semi-supervised support vector machine,use a 10-fold cross validation method to extract training data sets and test data sets,and perform repeated experiments to avoid contingency;meanwhile,semi-supervised support vector machines can use a small amount of tag data samples and a large number of Defect prediction models are built together without tag data samples.2.A cost-sensitive support vector machine is proposed to solve the data set class by giving a higher cost for "prediction of defective data samples without defects" and a lower cost for "no flawed data samples predicted to be defective".Unbalanced problems and minimized misclassification costs.3.Combining feature orientation into software defect prediction,modeling the subject of defect features and software source code to obtain defect information and subject information in the code,performing similarity matching,and categorizing the result of the classification and the result of feature positioning.Linear combination,the final result of hybrid software defect prediction.In a word,this paper combines characteristic positioning and semi-supervised learning,proposes a low-cost software defect prediction model,uses Lucene,Eclipse JDT Core and other open source software to verify the defect prediction model of this paper,and finds that this paper has few sample labels and high cost rate.The most significant improvement is in the data results.

Keywords/Search Tags:

Software Defect, Cost Sensitive, CS4VM+L, Topic Modeling

PDF Full Text Request

Related items

1	Research On Software Defect Prediction Algorithm Based On Cost-sensitive Learning
2	Research On Software Defect Prediction Method Based On Cost Sensitive Learning Adacost
3	Research On Software Defect Prediction Method Based On Cost Sensitive Learning
4	Research On Fusion Cost Sensitive Sampling And Integration Algorithms In Software Defect Prediction
5	Cost-Sensitive Feature Selection Algorithms With Application In Software Defect Prediction
6	Feature Selection Based On Cost Sensitive Learning For Software Defect Prediction
7	Software Defect Prediction Based On Cost-Sensitive Bayesian Network
8	Research On Software Defect Prediction Method Based On Feature Dimensionality Reduction And Cost Sensitive Learning
9	Research On Software Defect Prediction Methods
10	Research On Software Module Defect Prediction Method In Fire Maintenance System