Font Size: a A A

A Method Of Feature Selection Based On Extended Bayesian Information Criteria In Software Defect Prediction

Posted on:2020-12-09Degree:MasterType:Thesis
Country:ChinaCandidate:J P TuFull Text:PDF
GTID:2428330578951275Subject:Software engineering theory and methods
Abstract/Summary:PDF Full Text Request
Soft-ware defect prediction uses metrics and historical data relative to software defects to build prediction model,which can predict whether the module to be tested is defect-prone or not.During the defect prediction research,feature selection is an important step.This step selects a certain dimension of metrics to build prediction model,which can achieve the aim of improving the performance of the model,compressing feature dimensions,improving the accuracy of the prediction model,reducing the complexity of the prediction model,and saving computing resources.Using a large number of metrics to build software defect prediction model may affect the performance of the prediction model because of unrelated metrics,thus,feature selection in defect prediction has research significance.The existing feature selection methods have the problem of low effectiveness when building a prediction model of good performance;or have the problem of resource-consuming.and the dimension of features selected is relatively high,besides,it is easy to cause dimensional disaster when there are a large number of features.Aiming at solving the problems of the above methods,this paper proposed a feature selection method named EBIC-FS based on the Extended Bayesian Information Criterion,which can select the feature subset with the lowest sum of residuals with less feature dimensions:The first round is feature ranking,which calculates the EBIC value of each feature and obtains the ranking list;In the second round,sequentially adds the feature dimension in the sequence of best feature.best feature + second-best feature and so on,the feature set of each dimension is a feature subset.Calculating the EBIC value of each subset,and the subset with the lowest EBIC value is selected as the best feature subset.Once the best subset is decided,we can build the prediction model by using Logistic Regression,Naive Bayesian.Decision Tree,k-Nearest Neighbor and Random Forest respectively.Experiments were conducted on two public benchmark datasets,contains defect data of three open source software,and the results show that the method this paper proposed can effectively compress the dimension of features.and the prediction models of EBIC-FS method performs better than the prediction model build by the original dataset,three feature ranking methods and one feature subset selection method.Thus,the validity of the EBIC-FS method in this paper is verified.
Keywords/Search Tags:Software defect prediction, Feature selection, Extended Bayesian Information Criterion, Best feature subset
PDF Full Text Request
Related items