Impact Analysis Of The Concept Drift On Evolution-Oriented Software Defect Prediction

Posted on:2024-06-27

Degree:Master

Type:Thesis

Country:China

Candidate:S Zhang

Full Text:PDF

GTID:2568307118477304

Subject:Computer technology

Abstract/Summary:

Software development is a continual process of evolution.Owing to factors such as modular alterations,personnel changes and source code modifications,the distribution of defects within a project is subject to change.This change can lead to a decline in the performance of cross-version defect prediction models,a phenomenon known as concept drift.This problem greatly impacts the accuracy of defect prediction.Hence,further in-depth study of concept drift in evolution-oriented projects defect prediction and revealing its impact mechanism are of great practical significance to improving the performance of defect prediction.Cross-Version defect prediction for evolving projects usually constructs defect prediction models based on historical version datasets to predict the defect distribution of the version to be tested.Although scholars at home and abroad have conducted more in-depth research on the concept drift in cross-version defect prediction,there are still some shortcomings.It is mainly manifested in:(1)There is a lack of in-depth research on the characteristics of concept drift in cross-version defect prediction for evolving projects,and the mechanism of concept drift on cross-version defect prediction needs to be further revealed;(2)The existing cross-version defect prediction approaches for evolving projects deal with the problem of concept drift in a coarse-grained manner and lack targeted countermeasures.To address the above problems,the main work of this thesis is as follows:(1)Through a large number of experiments,the characteristics of concept drift in cross-version defect prediction for evolving projects are studied in depth.We build a cross-version defect prediction model based on random forest and use 10-fold cross-validation to obtain the evaluation metrics on the training set;after the model training is completed,various evaluation metrics on the test set are obtained,and the Wilcoxon signed rank test and Cliff’s delta are used to analyze the significance and effect of model performance variation;then,based on the data of all versions of each project,the corresponding performance change curves are constructed,and curve clustering approaches are used to classify concept drift;finally,according to the classification results,correlation analysis approaches are used to study the characteristics of the project that affect the type of concept drift.(2)We propose a cross-version defect prediction approach based on Kolmogorov-Smirnov test and prediction model selection.This approach uses the KS test to calculate the difference in data distribution between the training set and the test set before building a defect prediction model.Then,based on the detection results,the type of concept drift is predicted using a concept drift classification model based on the K-Nearest Neighbor algorithm.According to the optimal model selection approach for the concept drift type,a defect prediction model more suitable for the current concept drift type of the project is constructed.The experimental results show that this approach can effectively mitigate the impact of concept drift on the model prediction performance and improve the accuracy of cross-version defect prediction.(3)We design and implement a cross-version defect prediction tool for evolving projects incorporating concept drift detection.This tool realizes the automatic detection of concept drift,prediction of concept drift type,and cross-version defect prediction model selection approach based on concept drift type.This thesis has 33 figures,9 tables,and 93 references.

Keywords/Search Tags:

evolving projects, cross-version defect prediction, concept drift, curve clustering, KS test

Related items

1	Research On Key Technology Of Cross-Version Software Defect Prediction Based On Machine Learning
2	Research On Software Defect Prediction For Evolving Projects
3	Research On Log Anomaly Detection Method Under Concept Drift
4	Research On Software Defect Prediction For Cross-version Software
5	Analysis On Evolving Clustering For Categorical Data Stream
6	Research And Application Of Online Ensemble Method On Evolving Data Stream With Concept Drift
7	Auxiliary Fabric Defect Detection Algorithm Based On Fuzzy Clustering And Pattern Version
8	Research On Detection And Adaptive For Mixed Types Concept Drift
9	Time Prediction Based On Process Mining Taking Concept Drift Into Consideration
10	Research On Competence Model-Based Adaptive Learning Techniques For Handling Concept Drift