Font Size: a A A

Research On Data Preprocessing And Integrated Forecasting Methods In Cross-project Software Defect Forecasting

Posted on:2021-05-07Degree:MasterType:Thesis
Country:ChinaCandidate:S WangFull Text:PDF
GTID:2428330605482448Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of computer technology,users have put forward higher requirements for software quality.And software defects will affect the quality of software,as a result,software defect prediction technology deserves more attention.Due to the limited historical defect information of some new projects,it is very difficult to predict new modules using its own historical information.Therefore,research on cross-project software defect prediction is necessary.Cross-project software defect prediction is to build a model from one or several projects and then apply it to other projects.Generally,the data of different projects are very different,and different classifier models show different classification effects.Therefore,data preprocessing and model construction are issues that require in-depth research.In this paper,the problems of data preprocessing,classifier improvement,and integrated model are studied,and experimental verification is performed on the NASA dataset and five open source software datasets.The contribution of this article is mainly in three aspects:(1)In view of feature redundancy and the data difference between different projects,we propose a two-stage feature transformation method,including feature selection and baseline conversion,which can not only reduce redundant features,but also can narrow the difference in the distribution of sample characteristic values of different project.The method is mainly divided into two steps.In the first step,excellent feature subsets is selected by feature clustering and feature selection.In the second step,an improved baseline conversion method is proposed to transform the feature subsets to the same order of magnitude,reducing the distribution difference between the features of different project and improving prediction performance.(2)Considering the impact of the similarity between the training set and the test set on the prediction performance,we propose a method to calculate the similarity between the training set and the test set and combine the similarity into the Naive Bayes classifier.The weight of training set samples,which are more similarto the test set,are assigned higher weight according to the similarity.And on this basis,a Weighted Naive Bayes classifier is built to improve the prediction performance.(3)Since different classifiers can identify different defect subsets,we use the Stacking integration method to integrate multiple heterogeneous classifiers to ensure diversity of base classifiers.Moreover,we combine the proposed feature transformation and Weighted Naive Bayes to the Stacking integrated model,and then build an integrated cross-project software defect prediction model to identify more defects.
Keywords/Search Tags:software defects, feature transformation, Naive Bayes, Stacking integration, cross-project defect prediction
PDF Full Text Request
Related items