Font Size: a A A

Design And Implementation Of Software Defect Prediction System Based On Knowledge Graph And Representation Learning

Posted on:2022-10-24Degree:MasterType:Thesis
Country:ChinaCandidate:X D ZhengFull Text:PDF
GTID:2518306740495044Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the prosperity of the open source community,more and more open source software is spread on the Internet and widely used.However,a large number of open source software have not been strictly reviewed,and they may have various defects that affect the information security of the Internet.Besides,with the increasing scale of software,it becomes more and more difficult to review the code completely manually.Therefore,the Software Defect Prediction is a suitable task for this situation.Current Software Defect Prediction methods usually use information such as features and attributes of code fragments in code files to build a classifier to classify whether these code fragments are defective or not.However,there are several issues in the current Software Defect Prediction methods.Firstly,the current methods are usually aimed at small-scale and single-granularity data,and cannot effectively predict defects in large-scale and multi-granularity datasets.Secondly,the current methods usually use Feature Engineering to train the representation of code,or simply use the structural information or semantic information of code as features,which fails to reflect the relationship between code and the influence of information propagation on the representation.Finally,the problem of class imbalance in Software Defect Prediction has not been solved well.In order to solve these issues,this thesis proposes a Software Defect Prediction system based on Knowledge Graph technique and methods such as Knowledge Graph Embedding and Graph Embedding.Firstly,the semantic representation of code is learned through the Knowledge Graph Embedding method,and then the information propagation between code is simulated by using Graph Embedding.By optimizing the representation of code,the issue of class imbalance is solved,and the semantic information and structural information of code are utilized by Expectation-Maximization Algorithm to learn the label distribution for unlabeled code.The main contributions of this thesis are as follows:(1)In this thesis,we propose a Knowledge Graph Embedding method,which can learn semantic representations from different granularity such as concepts and instances,while retaining their transitivity,symmetry and other properties.Experiments on multiple datasets show that the performance of this method outperforms the current methods.(2)In this thesis,we propose an end-to-end code Software Defect Prediction method,which can use Graph Embedding to simulate the information propagation between code with different granularity,while using Expectation-Maximization Algorithm combine the semantic information and structural information of code,and overcome the influence of class imbalance on performance.Experiments on public datasets show that,compared with the current methods,the performance of evaluation metrics has improved.(3)Based on the methods above,we design a Software Defect Prediction system.After a variety of tests,the system meets the expected performance and functional requirements.To sum up,we proposed a Knowledge Graph Embedding method and an end-to-end Software Defect Prediction method to solve the issues in current Software Defect Prediction methods such as single detection granularity,inability to combine multiple information and class imbalance.Finally,a Software Defect Prediction system is designed and its effectiveness and implementation are tested.
Keywords/Search Tags:Software Defect Prediction, Knowledge Graph, Representation Learning
PDF Full Text Request
Related items