Font Size: a A A

Research On Software Defect Prediction Method Based On Training Data Selection

Posted on:2018-03-02Degree:MasterType:Thesis
Country:ChinaCandidate:S Q PengFull Text:PDF
GTID:2348330512497931Subject:Systems analysis and integration
Abstract/Summary:PDF Full Text Request
The unprecedented development of the Internet,completely changed our way of life,the role of software play also more and more prominent,has penetrated into all aspects of our lives,resulting in people's software quality requirements are getting higher and higher.As we all know,software maintenance costs account for about 70%of the total software development costs,software defects prediction and repair is one of the main tasks.Software defect prediction helps identify the most likely problems of the module,so reasonable allocation of test resources,improve the software development process,improve the quality of development,software engineering has been a hot topic of concern.The traditional software defect prediction method is to use the historical data of the project itself to establish the prediction model,and then used for the follow-up version of the defect prediction.High-quality forecasting models require sufficient historical data,which is difficult for some new projects or software projects that are not yet active.In recent years,more and more data has been available on the Internet,and some researchers have used other similar software project data to train and construct cross-project defect prediction models to solve the problem of traditional defect prediction for historical data bottleneck.However,there are already working in the cross-project training data selection mostly based on the similarity of the source code metrics,but ignores the defect attribute information,such as the number of defects.In fact,in the data selection process,when there are multiple training instances with a target instance have the same similarity value,you need to determine which should be preferred which or several examples.From an experiential software engineering point of view,training instances with more flawed numbers will be preferred because these examples contain more informative information.Therefore,this paper introduces a new training data selection cross-project software defect prediction method by introducing the information of defect quantity.(1)Based on the commonly used source code metrics,consider the introduction of specific defect information to calculate the similarity between instances,and use five different typical standardized methods for defect information.(2)Explore the three commonly used similarities and(1)use different standardized methods for defect information,and discuss the quality of different training examples.(3)Based on six typical single classifiers(LR,J48,NB,SVM,KNN and RF),the defect prediction integration model is established,and the advantages of each single classifier are fully utilized,and the performance evaluation index F-measure is used to evaluate The predictive performance of each classifier is analyzed,and the voting integration and weighted integration are proposed to predict whether the target instance is defective.In order to verify the rationality and correctness of the idea of this paper,the results show that:(1)It is helpful to improve the quality of cross-project defect data by introducing the defect quantity information;(2)Using different similarity measures and standardized methods to deal with the impact of the impact of data on the forecast results,which use the Manhattan distance measurement examples of source code index similarity or linear standardization method to deal with the number of defects when the performance is better;(3)The proposed model is weighted and integrated to further improve the prediction performance.
Keywords/Search Tags:Software quality assurance, defect prediction, cross project defect prediction, Similarity
PDF Full Text Request
Related items