Data Preprocessing Method Based On Sample Weight In Just-in-Time Software Defect Prediction

Posted on:2024-03-02

Degree:Master

Type:Thesis

Country:China

Candidate:W Y Zhuang

Full Text:PDF

GTID:2568306941464134

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Just-in-time software defect prediction technology focuses on the actual development process of a project,aiming to predict in advance whether there are defects in every code change submitted by developers,so as to help developers optimize resource allocation,and improve code quality.Due to the differences in time order of commits and in quality,the level of attention given to them should also be different,that is,different submitted samples should be given different weights.However,there currently needs to be more relevant research on sample weights in the field of just-in-time software defect prediction.This thesis focuses on the continuous development process of the project.It proposes sample weights can be updated as the prediction model iterates to preprocess the data,allowing the prediction model to focus on more important samples and using sample weights to alleviate the problem of data with imbalanced class.(1)This thesis proposes a sample importance weight based on time dimension to address the need for more differentiation of time order of commits in existing defect prediction methods.This method extracts time dimension information from the version control system of the project and then uses a time decay function to assign higher weights to the newly submitted samples,while the old committed samples will have lower weights.These weights will affect the importance of defect prediction models on different commits,thereby improving the performance of defect prediction targetedly.This article conducted experiments on 10 open source projects,each with 5 data intervals and 9 iteration processes,on a total of 450 tasks in a model iteration scenario to verify the effectiveness of the sample weights.(2)This thesis proposes a sample importance weight based on feature contribution to address the difficulty in distinguishing commits at similar times.This method focuses on submitted samples that were misclassified during the previous round of model iteration,calculates their feature contribution and classification propensity through a machine learning interpretation model,assigns higher weights to commits with significant deviations between classification propensity and actual categories,and finally performs denoising processing.Experiments on 450 tasks are performed to verify the effectiveness of sample weights based on feature contribution.(3)This thesis proposes a data sampling method based on sample importance weights to solve the problem that the selected samples for synthesis using data sampling methods in defect prediction that are not representative.On the one hand,this method sorts submitted samples through sample importance weights,selecting important samples for synthesizing new ones;On the other hand,when calculating the sample distance,the importance weight of the samples will also be considered,resulting in a more diverse synthesized sample.Experiments on 450 tasks in this paper verify the effectiveness of the data sampling method that integrates time dimension and feature contribution.This thesis proposes corresponding solutions to the three remaining problems in existing research,further improving the performance of just-in-time software defect prediction.It is of great significance for timely defect detection,improving code quality,scheduling testing resources,and reducing maintenance costs.

Keywords/Search Tags:

software defect prediction, data preprocessing, sample weights, class imbalance

PDF Full Text Request

Related items

1	Research On Software Defect Prediction Technology For Few-sample Data
2	Wide Research Of Data Mining With Machine Learning On Software Defect Prediction
3	Research On Data Preprocessing Technologies For Software Defect Prediction
4	Research On Data Preprocessing Technology In Cross Project Software Defect Prediction
5	Research On Software Defect Prediction Method Based On Feature Selection And Oversampling
6	Software Defect Prediction Strategy Design For Imbalanced Data
7	Research And Implementation Of Software Defect Prediction Model Construction And Sharing Methods
8	Research On Sampling Integration Algorithm Of Unbalanced Data In Software Defect Prediction
9	Research On Unbalanced Data Classification Algorithm In Software Defect Prediction
10	Research On High-dimensional Data Processing In Software Defect Prediction