Font Size: a A A

Research And Implementation Of Semantic Based Software Defect Prediction

Posted on:2020-12-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y YuFull Text:PDF
GTID:2428330575957063Subject:Intelligent Science and Technology
Abstract/Summary:PDF Full Text Request
Software defect prediction can assist developers in finding potential bugs and reducing maintain cost.Traditional approaches usually utilize software metrics(Lines of Code,Cyclomatic Complexity,etc.)as features to build machine learning classifiers and predict defective software mod-ules.However,software metric features often fail to capture programs' syn-tax and semantic information.This thesis proposes Seml,a novel approach that combines word em-bedding and deep learning methods to learn programs' semantic infor-mation and perform defect prediction.Specifically,for each program source file,this approach first extracts a token sequence from its abstract syntax tree.Then it maps each token in the sequence to a real-valued vector using a mapping table,which is trained with an unsupervised word embed-ding model.Finally,it uses the vector sequences and their labels(defective or clean)to build a Long Short Term Memory(LSTM)network.This model can automatically learn the order of the vector sequences and thus capture the semantic information of programs.Evaluation results on eight open source projects in PROMISE data repository show that Seml outperforms the state-of-the-art deep learning approaches(DBN approach and tree-based LSTM approach,tb-LSTM for short)and metrics-based defect prediction approach(ISDA approach)for both within-project defect prediction(WPDP)and cross-proj ect defect pre-diction(CPDP).Specifically,for WPDP,Seml improves DBN,tb-LSTM and ISDA approach by 2.1%,4.3%,and 9.6%in F1 on average respectively.For CPDP,Seml improves the three approaches by 3.5%,0.8%and 5.6%in F1 on average.In addition,the evaluation results on token embedding step shows that the token vectors trained by word embedding model can effectively capture the semantic information of the tokens,and thus the to-ken embedding step is helpful to Seml approach.
Keywords/Search Tags:software defect prediction, long short-term memory, word embedding
PDF Full Text Request
Related items