Font Size: a A A

Cross-project Software Defect Prediction Based On Deep Learning

Posted on:2020-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:H LiFull Text:PDF
GTID:2518306131461914Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Software defect prediction can predict defect-prone modules in software systems,helping developers and testers optimize the allocation of test resource and fix defects as much as possible with limited time and resource.However,in real software development,a project requiring defect prediction(i.e.,target project)may have little data of itself to construct a defect prediction model;directly using data from other projects to construct models could not achieve satisfactory prediction performance in most cases.Therefore,cross-project defect prediction(CPDP),i.e.,leveraging data from other projects to construct an effective defect prediction model for the target project,has attracted extensive attention from researchers.This thesis proposes a CPDP method based on ASTToken2Vec,bidirectional long short-term memory(LSTM)and attention mechanism.First,each software module is modeled as a simplified abstract syntax tree(S-AST)and token sequences are extracted from S-ASTs.For each node in S-AST,only the project-independent node type is remained and other project-specific information(such as name of variable and method)is ignored.This modeling method does not rely on project-specific information and thus is suitable for CPDP problems.Furthermore,in order to construct the semantic representations of token sequences,an unsupervised vector representation learning algorithm ASTToken2Vec is proposed to automatically learn semantic representations of tokens from S-AST's natural structure,and further construct the semantic representations of token sequences.Sequentially,it is proposed to leverage bidirectional LSTM to extract the context semantic features from the sequences,and leverage attention mechanism to learn paying more attention to defect-prone parts in sequences,and consequently the CPDP model is built based on the automatically learned features.Finally,taking largescale open source projects as experimental objects,extensive experiments are carried out on a large number of source-target project pairs and statistical analysis methods are applied to analyze the results,in order to verify the effectiveness of the method.Experimental results show that(1)ASTToken2Vec can effectively learn the semantic representations of tokens and significantly improves the predictive performance of defect prediction,and(2)the CPDP method based on ASTToken2Vec,bidirectional LSTM and attention mechanism achieves significantly better predictive performance than other 5 state-of-the-art CPDP methods.In summary,the proposed CPDP method based on ASTToken2Vec,bidirectional LSTM and attention mechanism significantly improves the performance of CPDP,and it shows that deep-learning-based CPDP is effective and promising.
Keywords/Search Tags:Software defect prediction, Cross-project, Abstract syntax tree, Vector representation learning, Deep learning
PDF Full Text Request
Related items