Font Size: a A A

The Research Of Software Defect Prediction Model Based On Rule Learning And Na?ve Bayes Algorithm

Posted on:2020-10-12Degree:MasterType:Thesis
Country:ChinaCandidate:B MaFull Text:PDF
GTID:2428330575480526Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Software defects are the deviations in computer systems,programs and code segments,which exist in the software in a static form.Software defects are hidden problems and errors that may be activated at runtime and affect the normal functioning ability of software.Software defects are accompanied by the whole life cycle of software development.With the continuous growth of software scale,the number of software defects is on the rise,and the difficulty of software defects detection is increasing.The software defects will have a serious impact on all areas and cause incalculable losses.However,the speed and efficiency of human code review can no longer meet the needs of software development.Therefore,more efficient software defect prediction becomes particularly important.Software defect prediction technology analyses the historical data of the software.The program module is extracted from historical data,and the software complexity data,development process data and defect data of the module are counted as software metrics.Software metrics with strong correlation with software defects are selected as features.A machine learning algorithm is used to build a software defect prediction model based on the above features.The purpose of software defect prediction is to use some method to predict the potential defects of software,find out the possible software defect code segment in advance,and prevent the generation of software defects.Software defect prediction is generally divided into static prediction and dynamic prediction,static prediction is mainly for the existence of software defects,the number and distribution of defects to predict,dynamic prediction is mainly for the cycle of software defects,the time distribution of software defects to predict.In this paper,the static prediction of the application of machine learning algorithms to predict whether there are defects in the software module.In this research field,the applied algorithms mainly include random forest,vector machine,bayesian,dictionary learning,etc.Na?ve bayes algorithm is based on prior probability to estimate the posterior probability,and its computational complexity is lower than typical algorithms such as decision tree,vector machine and neural network.The effect of data sets on model prediction is far greater than the selection of classification algorithm.To solve the problem of missing data in the process of data collection,this paper proposes to use K-means algorithm to fill the missing values in the data set.Firstly,clustering is carried out based on non-missing data in the data set.Then,according to the Euclidean distance between the missing data samples and various clusters,the class cluster of the missing data samples is found.Finally,the missing items of the missing data samples are filled with the average value of such clusters,so as to obtain a complete data set without missing values.Traditional software static metric elements are not suitable for the features of the software defect prediction model based on machine learning,and the selection of metric elements is an important part of software defect prediction.This paper proposes to select software metric elements using information gain method.According to the amount of information contained in software metric elements,metric elements that have great information contribution to software defect prediction are selected as the training features of software defect prediction model.The conditional independence hypothesis of na?ve bayes model will lose the correlation between features.In view of the actual forecast of software defects and the deficiency of na?ve bayes algorithm,this paper proposes to use the combination strategy of integrated learning weighted average method to improve the na?ve bayes model based on rule learning.Rule learning takes into account the influence of the combination of features on the prediction,which effectively makes up for the deficiency of na?ve bayes model.The missing value processing,features selection,class imbalance and other aspects of the data set were optimized,which effectively improved the prediction effect of the model.In this paper,the NASA MDP data set,which is widely used in defect prediction technology,is used as experimental data for multiple 10-fold cross validation.The experimental results show that the software defect prediction model based on rule learning and na?ve bayes algorithm has improved the precision,recall,f-measure,AUC and other evaluation indexes than the na?ve bayes defect prediction model.Compared with the improved algorithm proposed by other researchers,the model proposed in this paper is more effective in software defect prediction.
Keywords/Search Tags:Software defect prediction, Na?ve bayes, Feature selection, Rule learning, Data set processing
PDF Full Text Request
Related items