Font Size: a A A

Research And Implementation Of Software Defect Prediction Model Construction And Sharing Methods

Posted on:2020-09-26Degree:MasterType:Thesis
Country:ChinaCandidate:D ZhangFull Text:PDF
GTID:2428330614462355Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Software defect prediction(SDP)can identify the software modules that may be defective.Currently,software defect prediction can be divided into classification based problems and regression based problems according to prediction targets.The classification based SDP can predict whether the software module is defective,and the regression based SDP can predict the number of defects in the module.Most of previous studies focus on classification based SDP,and the regression based method has not been thoroughly studied.In addition,the sharing of the SDP models constructed in previous studies can facilitate other researchers to replicate their experiments,reduce implementation costs and avoid implementation errors.Moreover,it can extend and optimize existing methods.However,directly sharing the trained models may have the risk of leaking training data privacy.Therefore,it is necessary to propose new methods to deal with the privacy data leakage issues involved in sharing models.For the regression based prediction,to the best of our knowledge,the unsupervised methods are first investigated for the software defect number prediction(SDNP)These methods sort the modules(such as in the ascending order or in the descending order)according to the metric value.The modules with highly rank have more defects.In empirical research,unsupervised methods are used for SDNP by using data sets collected from open-source projects.Firstly,the design experiment identify a method LOC?D with best performance(based on Kendall an FPA value),and then compares LOC?D with the supervised methods using SMOTEND algorithm to solve the class imbalanced problem,and finds that LOC?D is better than supervised methods.The SMOTEND algorithm is further improved.In particular,the three parameters are extracted and the differential evolution algorithm is used for parameter optimization.The experimental results show that the differential evolution algorithm can further improve the performance of the supervised method,and then these methods are compared with LOC?D.The result shows that LOC?D performance is still better in most cases.To the best of our knowledge,differential privacy(DP)is first used in SDP model sharing.DP is a data privacy protection technology and can be used in traditional machine learning algorithms.The models built by the algorithms under DP can protect training data privacy.The differential privacy technology is used to share the SDP model,and the DP-Share algorithm is proposed.The DP-Share algorithm is based on the random forest algorithm.It first uses the oversampling method for minority classes,then discretizes the continuous features to optimize the privacy budget allocation,and then uses the DP-Share-Sampling method to generate training set for each subtree.Finally,integrating these subtrees to form a random forest.For the DP-Share algorithm,the dataset collected from the open source project is also used,and the AUC is used as the performance measure.The experimental results show that the DP-Share method is better than baseline method DF-Enhance in most cases.Based on the above research,two suggestions are given for the future SDP study.(1)The unsupervised method is simple and has good performance in SDNP.It is suggested that the unsupervised method should be used as the baseline method,(2)The researchers are recommended to share the SDP model by using differential privacy to protect the privacy of training data.In addition,the related prototype tools were designed to implement the above studied methods.
Keywords/Search Tags:Software defect prediction, Software defect number prediction, Differential evolutionary, Class imbalance learning
PDF Full Text Request
Related items