Font Size: a A A

Research On Cross-project Software Defect Prediction Method Based On Feature Selection And Instance Transfer

Posted on:2024-05-08Degree:MasterType:Thesis
Country:ChinaCandidate:K X ShiFull Text:PDF
GTID:2568306932480344Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the popularity and wide application of software products in social life,the scale of software continues to expand,and software defects become an important factor affecting software quality.Software defect prediction technology can discover possible defect information in advance based on software historical repository information.However,for newly developed software projects,there is a lack of historical data for defect prediction.Cross-project software defect prediction is accomplished by several source project data to predict defects in the current project.This technique solves the problem of insufficient historical data in traditional software defect prediction techniques,but also introduces new problems:(1)there are a large number of unrelated or redundant features in the project;(2)there exists a large difference in the data distribution between the source and target projects;(3)the problem of negative migration of data due to weak correlation between projects.This paper mainly aims at the above problems and designs corresponding solutions based on feature selection and instance migration to improve the performance of cross-project software defect prediction.Specific research work includes:(1)For feature redundancy and data distribution discrepancy problems,a two-stage crossproject software defect prediction method CPDP-FSITr was proposed.In the feature selection stage,KPCA(Kernel Principal Component Analysis)was used to clean up the redundant data in the source project.Then,according to the attribute feature distribution of the source project and the target project,the candidate source project data that is closest to the target project distribution will be selected.At the instance migration stage,the improved-Tr Ada Boost method based on the evaluation factor was used to find out a few instances that are close to the distribution of tagged instances in the target project.The experimental results showed that the F1 measure of the CPDP-FSITr method was 3.34%,5.84% and 105.42% higher than that of the benchmark experiment on the AEEEM dataset,and 1.36%,5.25% and 85.97% higher than that of the benchmark experiment on the NASA dataset,respectively,which can achieve better prediction results.(2)Aiming at the problem of negative migration of data,a cross-project software defect prediction method based on multi-source migration learning CPDP-MSITr was proposed.On the basis of CPDP-FSITr,filter the characteristics of each source project.Then,each source project data and target project data are trained to obtain several weak classifiers,and the weak classifiers are integrated into candidate classifiers through weight factors.The experiment showed that this method can obtain effective source project feature information.Using F1 as the performance evaluation index,the method improves the prediction performance of singlesource cross-project defect prediction method by 1.32% and 0.61% respectively on AEEEM and NASA datasets,and effectively reduce the generation of negative migration of data.(3)Based on the design idea of software engineering,a cross-project software defect prediction prototype system based on feature selection and instance migration was developed.The test results showed that each functional module of the prototype system operates normally,which can assist software developers and testers to predict defects,find software defects as early as possible,and reduce the possible risks and losses,which has good practical significance.
Keywords/Search Tags:Software Defect Prediction, Feature Selection, Instance Transfer, Kernel Principal Component Analysis, Multi-source Transfer Learning
PDF Full Text Request
Related items