Font Size: a A A

Research On Software Defect Prediction Method Based On Ensemble Learning And Multi-hierarchical LSTM Feature Fusion

Posted on:2024-07-10Degree:MasterType:Thesis
Country:ChinaCandidate:J P XuFull Text:PDF
GTID:2568307130953539Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the software industry,the quality requirements of users for software products are also constantly increasing.The reliability of software quality is closely related to software defects.Therefore,software defect prediction technology is proposed to help testers find potential defects as soon as possible,and reasonably allocate limited testing resources,so as to improve testing efficiency and ensure software reliability.However,there is a serious class imbalance problem in the defect datasets,with the nondefect data dominating the distribution,making it is easy to generate inaccurate software defect prediction models.Ensemble learning has been proven to be one of the best ways to address class imbalance problem.Existing ensemble defect prediction methods are usually ensemble the results of several base classifiers simply,and most of them only ensemble once,rarely consider the diversity of ensemble or the combination of ensemble learning and neural network.Moreover,traditional software defect prediction focuses on manual code metrics and builds defect prediction models by learning these metrics.However,artificial features cannot capture the semantic information of the source code,which is important for building more accurate prediction models.Given the aforementioned difficulties and challenges,this thesis conducts extensive research and presents solutions.The primary research objectives and innovative contributions of this thesis are outlined below:(1)Aiming at serious class imbalance problem in defect prediction,this thesis proposes a novel dual ensemble software defect prediction approach combined with neural network.The approach combines ensemble learning and deep learning,selects multiple machine learning algorithms to build server classifiers,and builds the deep neural network model as meta classifier of dual ensemble framework.By performing homogeneous ensemble and heterogeneous ensemble on these base classifiers successively,the advantages of each classifier can be combined to alleviate the class imbalance problem.The experimental results on eight open source datasets show that the proposed method outperforms other methods and effectively improves the performance of defect prediction model.(2)Aiming at the problem of using manual features in defect prediction,this thesis proposes a software defect prediction method based on multi-hierarchical LSTM feature fusion.The approach transforms the source code into word embedding based on program slicing and abstract syntax tree to extract semantic features,making up for the missing semantic information in extracting semantic features using abstract syntax tree alone.Design a multihierarchical Bi-LSTM(Bi-directional Long-Short Term Memory)network structure to further learn artificial features and semantic features based on program slicing and abstract syntax trees,and effectively fuse the semantic and manual features through the Squeeze-and-Excitation Network,thereby improving the prediction accuracy of the defect prediction model.Experimental results on seven open source projects have verified the effectiveness of this approach,which can better predict software defects.(3)Design and implement the prototype system for defect prediction method based on ensemble learning and multi-hierarchical LSTM feature fusion.The above two approaches are implemented using Pytorch while the prototype system is developed using Python.The system mainly includes three modules: data import,defect prediction model management,and defect prediction report output.Among them,the defect prediction model management module mainly contains four functions: data pre-processing,model training,defect prediction and experiment termination.Through this system,the entire defect prediction process can be completed,and relevant defect prediction reports can be output to guide testers.
Keywords/Search Tags:Software defect prediction, Class imbalance, Dual ensemble, Semantic feature, Feature fusion
PDF Full Text Request
Related items