Font Size: a A A

Software Defect Prediction Based On An Ensemble Model

Posted on:2020-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:M Y HuFull Text:PDF
GTID:2428330572961754Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
In modern society,the computer software industry is developing rapidly,and every corner of life is full of the shadow of computer software.Software defect prediction is an important part of software development engineering.If potential defects in software can be found and corrected in time during the development process,the quality of software can be improved to a certain extent.Therefore,software defect prediction plays an important role in ensuring software quality.The purpose of software defect prediction is to identify defective samples effectively.In order to predict software defects,many data mining algorithms have been proposed,such as support vector machines and Bayesian.However,the real data in software defect prediction is often unbalanced,and these traditional methods can't deal with such data effectively.In view of this,researchers have proposed resampling,cost-sensitive learning and ensemble learning.But these methods either change the authenticity of the original data set,or simply deal with category imbalances in the training or decision stages.In order to deal with this kind of class imbalance in software defects,a bagging integrated model based on improved class weight self-adaptation,soft voting and threshold moving is proposed.This model considers the class imbalance problem in the training stage and decision stage without changing the original data sets.In order to prove the validity of the method proposed in this paper,the NASA software defect standard data sets and the Eclipse software defect standard data sets are used to predict the software defect in the experiment.Compared with the software defect prediction methods proposed in recent years,experimental result show that our method for dealing with class imbalance in software defect prediction has better overall performance than other software defect prediction methods,and has better prediction effect.The main research work of this paper is as follows:(1)In order to avoid the painful cost of misclassification,we give different weights to different classes so that the classifier gives different attention to different classes in the training process.In this paper,the optimal weights of different classes are obtained through class weight adaptive learning without changing the proportion of classes in the original data set(the unbalanced rate of classes on the training set is equal to that on the test set).(2)In the training stage,because the selection of classifier should satisfy the characteristics of “good but different”(base classifier should have certain “accuracy” and “diversity”),this paper chooses three classical classifiers: Decision Tree(DT),Support Vector Machine(SVM)and Logical Regression(LR),trains three base classifiers using the optimal weights obtained in the first step,combines three base classifiers by soft ensemble method to obtain three class weighted base classifiers and calculates the confidence of each class weighted base classifier.(3)In the decision stage,we calculate the classification probability of three class weighted base classifiers on the test set,and use the soft voting method to vote on the predicted results of the classifier.Finally,the decision is made according to the threshold moving model to get the final prediction category.
Keywords/Search Tags:Software defect prediction, Class weighted self-adaptation, Soft voting, Ensemble learning, Soft ensemble, Threshold-moving
PDF Full Text Request
Related items