Font Size: a A A

Impact Analysis Of The Concept Drift On Evolution-Oriented Software Defect Prediction

Posted on:2024-06-27Degree:MasterType:Thesis
Country:ChinaCandidate:S ZhangFull Text:PDF
GTID:2568307118477304Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Software development is a continual process of evolution.Owing to factors such as modular alterations,personnel changes and source code modifications,the distribution of defects within a project is subject to change.This change can lead to a decline in the performance of cross-version defect prediction models,a phenomenon known as concept drift.This problem greatly impacts the accuracy of defect prediction.Hence,further in-depth study of concept drift in evolution-oriented projects defect prediction and revealing its impact mechanism are of great practical significance to improving the performance of defect prediction.Cross-Version defect prediction for evolving projects usually constructs defect prediction models based on historical version datasets to predict the defect distribution of the version to be tested.Although scholars at home and abroad have conducted more in-depth research on the concept drift in cross-version defect prediction,there are still some shortcomings.It is mainly manifested in:(1)There is a lack of in-depth research on the characteristics of concept drift in cross-version defect prediction for evolving projects,and the mechanism of concept drift on cross-version defect prediction needs to be further revealed;(2)The existing cross-version defect prediction approaches for evolving projects deal with the problem of concept drift in a coarse-grained manner and lack targeted countermeasures.To address the above problems,the main work of this thesis is as follows:(1)Through a large number of experiments,the characteristics of concept drift in cross-version defect prediction for evolving projects are studied in depth.We build a cross-version defect prediction model based on random forest and use 10-fold cross-validation to obtain the evaluation metrics on the training set;after the model training is completed,various evaluation metrics on the test set are obtained,and the Wilcoxon signed rank test and Cliff’s delta are used to analyze the significance and effect of model performance variation;then,based on the data of all versions of each project,the corresponding performance change curves are constructed,and curve clustering approaches are used to classify concept drift;finally,according to the classification results,correlation analysis approaches are used to study the characteristics of the project that affect the type of concept drift.(2)We propose a cross-version defect prediction approach based on Kolmogorov-Smirnov test and prediction model selection.This approach uses the KS test to calculate the difference in data distribution between the training set and the test set before building a defect prediction model.Then,based on the detection results,the type of concept drift is predicted using a concept drift classification model based on the K-Nearest Neighbor algorithm.According to the optimal model selection approach for the concept drift type,a defect prediction model more suitable for the current concept drift type of the project is constructed.The experimental results show that this approach can effectively mitigate the impact of concept drift on the model prediction performance and improve the accuracy of cross-version defect prediction.(3)We design and implement a cross-version defect prediction tool for evolving projects incorporating concept drift detection.This tool realizes the automatic detection of concept drift,prediction of concept drift type,and cross-version defect prediction model selection approach based on concept drift type.This thesis has 33 figures,9 tables,and 93 references.
Keywords/Search Tags:evolving projects, cross-version defect prediction, concept drift, curve clustering, KS test
PDF Full Text Request
Related items