Font Size: a A A

Research On Software Defect Prediction Based On Extreme Learning Machine

Posted on:2020-04-05Degree:MasterType:Thesis
Country:ChinaCandidate:D SunFull Text:PDF
GTID:2428330590950990Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In current internet industry,the scale of software is getting bigger and bigger,which leads to the increasing complexity.It means that the hidden defects in the software will increase,and in the worst case,it will cause economic losses to the software company and even threaten people's safety.Therefore,how to find software defects effectively is particularly important.Software defect prediction is to construct a prediction model by means of existing project data to predict defects in the target project.At present,mainstream methods are to construct defect prediction models mainly based on machine learning,and most of them have achieved good prediction results.However,existing defect prediction models still have shortcomings,such as improving model performance,unbalanced datasets,and scarcity of some target project data.Therefore,in view of the existing problems,this paper proposes a software defect prediction model based on ELM(extreme learning machine),which specifically includes the following work:1)On the problem that the prediction accuracy and training speed of the defect prediction model need to be improved,this paper constructs the supervised classification and unsupervised clustering prediction on the software defect prediction by means of extreme learning machine's strong generalization and fast training speed.This paper proposes two algorithms: ELM-CLA(ELM-Classification)and ELM-CLU(ELM-Clustering).On the 11 datasets of NASA,the proposed method ELM-CLA has better prediction results than the existing classification algorithms,and greatly improves the training speed of the model.At the same time,the ELM-CLU method achieves better ACC values than the most popular spectral clustering algorithms.In addition,based on the data conversion,this paper discusses the priority of defect data,and provides ideas for software defect repair work.Future work can be combined with actual defect level data and relevant developer experience,which helps selection and maintenance of software defect repair work.2)In the problem of class imbalance in the defect datasets,this paper introduces the concept of relative density information based on the WELM(weighted extreme learning machine),and avoids the calculation of probability density in high dimensional space.The K-nearest neighbor probability density estimation method is used to calculate the relative density between each training sample,and the membership function is designed.The weights of each sample are fuzzy and personalized,and the weight matrix in the weighted limit learning machine is replaced.A fuzzy weighted extreme learning machine based on the relative density method,including FWELM-WD(fuzzy weighted extreme learning machine based on the inner-class relative density information)and FWELM-ID(fuzzy weighted extreme learning machine based on the inter-class relative Density information)algorithmS.The experimental results show that the proposed model has better prediction performance and better performance in G-mean,AUC and Balance.Therefore,the FWELM series of algorithms can effectively alleviate the category imbalance problem in the software defect datasets,improve the accuracy of the software defect prediction model,and thus help the software defect prediction work to be better.3)In the question of how to find the potential feature space for source project and target project,this paper proposes a software defect prediction method based on ensemble learning.This method selects the most top-k similar source projects as training datasets to construct prediction models by calculating the similarity between candidate source project and target project.The experiment was carried out on the software defect prediction datasets selected from the PROMISE database.Compared with other CPDP(Cross-Project Defect Prediction)methods,the method has good performance on F-measure,AUC and Accuracy evaluation.In summary,this paper focuses on the problems of software defect prediction and the problems existing in the defect data set.Based on the method of extreme learning machine,the research of three work is carried out,and the corresponding solutions and improvement methods are proposed.The effectiveness of the proposed method is verified by experiments.However,the research work in this paper still has some shortcomings.Further research work can be carried out from the following aspects:1)The first two tasks in this paper are based on the NASA datasets,and the last work is based on the PROMISE datasets,which is a set of data published in the current software defect prediction field,although it can largely verify the work of this paper.It still has certain limitations.Future plans are to be verified on more industrial data.2)In this paper,the FWELM series algorithms proposed for the class imbalance problem in the software defect datasets uses the KNN-PDE(K-nearest neighbors-based probability density estimation)method,which needs to calculate the neighbor distance between samples,so the time complexity is high,and the algorithm is on WELM.Improvements have also increased the time complexity of the algorithm.Future work will be designed related methods to reduce the time complexity of the algorithm.3)The current cross-project problem in software defect prediction is a difficult point in this field.The work of this paper mainly solves cross-project problems by analyzing the similarity between source project and target project.Future work can be considered in combination with the current popular migration learning method,by mining more source projects,analyzing its internal structural features,and designing more efficient cross-project defect prediction models.
Keywords/Search Tags:software defect prediction, extreme learning machine, class imbalance, cross-project defect prediction
PDF Full Text Request
Related items