| “Copy-paste-modify” is a common operation in the software development process.This operation usually brings a lot of code duplication.These same or similar code fragments are called clone code.The goal of clone detection is to detect clone code from software systems.Clone detection is a basic research in software engineering.It is used in software repair,plagiarism detection,code quality assessment and malware detection.It is also important for software maintenance and refactoring in industry.Clone detection is divided into 4 levels: textual,lexical,syntactic and semantic,the difficulty of detection increases in order.Existing technologies have achieved many excellent results for the first three levels of detection,but no significant results have been achieved at the semantic level of clone detection.This research first subdivides semantic level clone into two types: local semantic clone and global semantic clone.Afterwards,for these two types of semantic clone,a semantic level clone detection approach combining traditional technology and neural network is proposed.The detection granularity is method,using control flow graph(CFG)as the intermediate representation.First,construct a node feature model based on the bidirectional LSTM autoencoder,and combine the dynamic time warping(DTW)algorithm with the node feature model to detect local semantic clone.Then,construct a graph convolutional network(GCN)-based CFG feature model,using the output of the CFG feature model to detect global semantic clone.In the experimental part,this research first uses five open source systems including Apache commons imaging,Apache commons math3,Catalano Framework,Colt and Weka to form a code corpus,and builds a global semantic clone dataset.Experiment on these two datasets and compare with existing approaches.The experimental results show that,in addition to being able to detect the first three types of clones,this approach can also detect 9-150 pairs of true local semantic clone in the above five systems;When the scale of the global semantic clone pairs and nonclone pairs is not less than 1:20,the global semantic clone detection approach proposed in this research can achieve a recall rate of more than 0.627,and also has a high accuracy rate.These results confirm that the approach can achieve good results in detecting local and global semantic clones. |