Font Size: a A A

Research And Application Of Feature Selection For Software Defect Data

Posted on:2018-08-25Degree:MasterType:Thesis
Country:ChinaCandidate:J J ZouFull Text:PDF
GTID:2428330596968730Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The static software defect prediction technology has been one of the main technology in recent years,through the static software features software modules are classified into two types: defective or not defective.However,not all features lead to a best classification,so feature selection for software product characteristic is the premise to ensure the accuracy of the classification of software modules.Feature selection can be typed into three types according to the evaluation standards: the filter,the wrapper and the mixed.The filter is evaluated according to the mathematical characteristics of the feature,the classifier evaluation process is not involved,so it is fast with little computation.;The wrapper selection method is based on the results of classification,it is more accurate but needs more computation.;The mixed selection method is the combination of two methods,computation accuracy can be guaranteed and the dimension can be reduced rapidly with this method.The general feature selection method is based on the premise of a balanced data distribution,but the software defect data is often uneven distribution,namely the defective samples are far less than the non-defective samples,which leads to the feature selection is not conducive to the classification accuracy of the minority class.In addition,an efficient classifier for software detect data was really necessary to improve the speed and precision of classification.Based on these goals,two research points of this paper are as follows:1.A feature selection method oriented software defect data was proposed in this paper.The process was divided into two stages: in the first stage,information entropy and mutual information theory were used to calculate the relationship between the characteristics and categories,and among characteristics.According to the principle that the maximum correlation between features and categories and minimum redundancy among features,the candidate feature set was selected from the original feature set.2.In the second stage of feature selection,a further optimization of candidate features was processed.There was a fully consideration of the imbalance of data with the new weight policy in the learning process.More attention was given to the minority class,in addition,if the weight of majority sample was too high,the sample would be deleted from data sets.In this way,we can improve the classification accuracy of the minority class.3.A cascade classifier based on software features for software defect classification was presented in this paper.The software features which have been selected were applied in this classifier as the input of the classifier.Some Adaboost classifiers were in series to form a strong cascade structure,and some non-defective samples would be eliminated in the front of the structure,defective ones and other non-defective samples could go to the back-end of the structure.In this way,we could solve the problems that majority class consumed many system resources and redundant weak classifiers were prone to be generated in a single Adaboost classifier.The simulation experiment was conducted to proof the good performance of proposed methods,the results shows that methods proposed in this paper are better than traditional methods.Besides,the runtime of the classifier was reduced with higher accuracy because of features selected from the proposed method.
Keywords/Search Tags:Software defect prediction, feature selection, imbalance, mutual information, cascade classifier
PDF Full Text Request
Related items