Research On Semi-supervised Software Defect Prediction And Localization Methods

Posted on:2024-01-09

Degree:Master

Type:Thesis

Country:China

Candidate:W Y Cheng

Full Text:PDF

GTID:2568307064972259

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Software defects are the antithesis of software quality and threaten software security.The residual defects will generate more and more thorny problems with the software iteration,and once the defects break out,it will cause unpredictable disastrous consequences.Therefore,software defects must be detected and repaired.It has long been proved that the earlier the defect is found,the lower the repair cost and the more losses will be recovered.However,with the continuous development of the software industry,the scale and structure of the program also become larger and more complex,which makes the defects hidden deeper and harder to be found.Therefore,how to detect software defects as soon as possible and then repair them at a lower cost has become an urgent scientific problem to be solved.To solve this problem,scholars have proposed a software defect prediction method based on machine learning,in order to find software defects as soon as possible.However,these methods are difficult to overcome the problems of high dimensionality of defect data,insufficient labeled samples,unbalanced classification and too coarse prediction grain,which seriously restricts the improvement of prediction efficiency and accuracy.In order to solve the above problems,this paper proposes a semi supervised software defect prediction and localization method.The main contributions and innovations are as follows:(1)To address the problem of high feature dimension of defect data,which affects the classification accuracy of prediction model,this paper proposes a filtering feature selection method based on correlation and redundancy.The method includes two stages,the first stage calculates the correlation of features,the second stage calculates the redundancy,and combined with the previous correlation ranking,selects the optimal feature subset.The innovation of this method is that the ranking sequence of the three features is combined,and each feature is given weight,so as to effectively improve the generalization ability of the prediction model and avoid the instability of a single feature selection method.At the same time,considering the correlation between features,it can effectively eliminate redundant features and reduce the feature dimension.Experimental results show that this method can better improve the classification accuracy of software defect prediction model compared with other filtering feature selection methods.(2)To address the problem of insufficient sample of defect markers and unbalanced classification leading to difficulties in predicting defects in the early stages of software development,this paper proposes a semi supervised software defect prediction model based on tritraining.Firstly,the feature normalization method is used to smooth the feature data to eliminate the impact of too large or too small eigenvalues on the classification performance of the model.Secondly,the oversampling method is used to expand and sample the data to solve the problem of unbalanced classification of labeled samples.Finally,tri-training algorithm is used to learn training samples and establish defect prediction model.Experiments using NASA datasets show that compared with the existing four supervised learning and semi supervised learning methods,the proposed method is superior to the existing methods in accuracy,recall and F1-Score.(3)To address the problem of existing methods having too coarse to accurately locate software defects,this paper proposes a software defect location method based on defect prediction and code naturalness.Firstly,the source code is segmented and a code corpus is constructed;Then the N-gram model is used to calculate the cross entropy of all the code lines of the defective module and sort them in descending order;The last row sorted at the top is more likely to be defective.This method uses a method of prediction before positioning,which first predicts the modules that may have defects in the project,and then locates the code line level in the module,so as to solve the problem that the defect prediction grain is too coarse.Experimental results show that the proposed method has better localization performance than the existing methods.

Keywords/Search Tags:

Software defect prediction, software defect localization, feature selection, classification imbalance, semi-supervised learning

PDF Full Text Request

Related items

1	Research On Software Defect Prediction Method Based On Semi-supervised Integration
2	Research On Software Defect Prediction For Cross-version Software
3	Research On Machine Learning Based Software Defect Prediction
4	Research On Software Defect Prediction Based On Ensemble Learning
5	Research On Software Defect Prediction Based On Learning Mechanism
6	Research On Software Defect Prediction Method Based On Feature Selection And Oversampling
7	Feature Extraction Based Software Defect Prediction
8	Research On High-dimensional Data Processing In Software Defect Prediction
9	System Study Of Software Defect Mining With Weak Label
10	Research On Software Defect Prediction Method Based On Fusion Feature Selection And Ensemble Learning