Font Size: a A A

Research On Code Clone Detection Based On Approximate Graph Matching

Posted on:2021-05-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZouFull Text:PDF
GTID:2428330602994414Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years,code clone detection is becoming more and more important in the field of software development,maintenance and bug detection.At present,for de-tecting the gap clones with text differences,which is type-3/type-4 clone defined in the academic field,the existing methods mostly have the problem of low detectiong rage(recall rate).The PDG-based(program dependency graph)methods can detect more gap clones since the PDG contains both the syntax and structure information.However,these methods mostly rely on the accurate graph matching such as subgraph isomor-phism,which have high complexity and low recall rate.Therefore,how to calculate the similarity between PDGs more quickly,comprehensively and accurately is a big chal-lenge for PDG-based code clone detections.The thesis focuses on the simplification of PDG structures,the filtering methods and the approximate graph matching algorithms.The main research contents and contributions are as follows:(1)Preprocessing of PDGs and design of filtering strategyWe have studied and analyzed the PDGs generated by static code analysis tools.There are many meaningless nodes in PDGs which have no relations with the code information.For this problem,we proposed a heuristic strategy to simplify the PDG structures,which eliminates these meaningless nodes and edges,and merges the sub-graphs of the third-party library function calls in the code block.We also proposed a series of filtering strategies to filter the candidate PDG pairs from numerical and seman-tic features of codes to accelerate the clone detection.The experimental results show that the structure simplification of PDGs can reduce the scale size of PDGs more than 20%,while the filtering strategies can reduce the size of candidate PDG pairs more than 90%.(2)Design of methods based on Weisfeiler-Lehman graph kernelMost of the existing PDG-based clone detections use subgraph isomorphism,which is a NP-hard problem with high time complexity and low recovery.Therefore,we proposed an approximate graph matching algorithm based on Weisfeiler-Lehman graph kernel,which counts the number of subtree isomorphism between two graphs as the measure of graph similarity.This method can find the best approximate solution of the subgraph isomorphism with lower complexity and retain the structure information of PDGs.Finally,we combine the preprocessing,filtering and detection methods pro-posed in this paper,and with the existing detection methods.The experimental results show that,on the real and artificial datesets,the proposed method can detect more code clones and the running time is also faster than the existing PDG-based methods.
Keywords/Search Tags:Code Clone Detection, Program Dependency Graph, Weisfeiler-Lehman Graph Kernel, Approximate Graph Matching
PDF Full Text Request
Related items