Research And Implementation Of Semantic Based Software Defect Prediction

Posted on:2020-12-15

Degree:Master

Type:Thesis

Country:China

Candidate:Y Yu

Full Text:PDF

GTID:2428330575957063

Subject:Intelligent Science and Technology

Abstract/Summary:

PDF Full Text Request

Software defect prediction can assist developers in finding potential bugs and reducing maintain cost.Traditional approaches usually utilize software metrics(Lines of Code,Cyclomatic Complexity,etc.)as features to build machine learning classifiers and predict defective software mod-ules.However,software metric features often fail to capture programs' syn-tax and semantic information.This thesis proposes Seml,a novel approach that combines word em-bedding and deep learning methods to learn programs' semantic infor-mation and perform defect prediction.Specifically,for each program source file,this approach first extracts a token sequence from its abstract syntax tree.Then it maps each token in the sequence to a real-valued vector using a mapping table,which is trained with an unsupervised word embed-ding model.Finally,it uses the vector sequences and their labels(defective or clean)to build a Long Short Term Memory(LSTM)network.This model can automatically learn the order of the vector sequences and thus capture the semantic information of programs.Evaluation results on eight open source projects in PROMISE data repository show that Seml outperforms the state-of-the-art deep learning approaches(DBN approach and tree-based LSTM approach,tb-LSTM for short)and metrics-based defect prediction approach(ISDA approach)for both within-project defect prediction(WPDP)and cross-proj ect defect pre-diction(CPDP).Specifically,for WPDP,Seml improves DBN,tb-LSTM and ISDA approach by 2.1%,4.3%,and 9.6%in F1 on average respectively.For CPDP,Seml improves the three approaches by 3.5%,0.8%and 5.6%in F1 on average.In addition,the evaluation results on token embedding step shows that the token vectors trained by word embedding model can effectively capture the semantic information of the tokens,and thus the to-ken embedding step is helpful to Seml approach.

Keywords/Search Tags:

software defect prediction, long short-term memory, word embedding

PDF Full Text Request

Related items

1	Research And Implementation Of Software Defect Prediction Method Based On Source Code Semantics
2	Research On Software Defect Prediction Based On Code Representation
3	Adversarial Learning Based Software Defect Prediction For Long And Short Memory Networks
4	Design And Research On Software Defect Prediction System Based On Program Source Code Semantics
5	Research On Software Defect Prediction Method Based On Semantic Information Of Program Source Code
6	Research And Application Of The Short-term Memory Network For Adjusting Gate Length
7	Research On Chinese Word Segmentation Method Based On Two-way Long And Short-term Memory Model
8	Research On Lexical Analysis Based On Neural Networks
9	Lstm Based Short Message Service(SMS) Modeling For Spam Classification
10	Multi-prototype Word Vector Based On Context Word Embedding