Font Size: a A A

Research On Code Clone Detection Based On Deep Learning

Posted on:2022-03-11Degree:MasterType:Thesis
Country:ChinaCandidate:Z C LuFull Text:PDF
GTID:2518306338486714Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the continuous development of computer technology,the scale of software is gradually expanding.However,due to various reasons in the process of programming code,such as subjective plagiarism,the use of reuse technology and other reasons,code clone is produced.Although code clone can help the development of software systems,it will bring vulnerabilities,backdoors and other risks,as well as intellectual property disputes and other issues in complex software systems such as large-scale national defense software systems and commercial software systems.The traditional code clone detection mainly uses manual feature extraction and comparison,but the detection effect is poor.The method based on deep machine learning can get deeper syntactic and semantic information,which can effectively improve the detection accuracy.Therefore,the related research has become a hot spot.Based on the analysis and summary of the existing technology of code clone detection based on deep learning at home and abroad,this paper proposes three main research contents according to the different intermediate representation and whether there is labeled data.Then,this paper proposes the improved method on each research content.(1)Research on deep supervised code clone detection based on AST representation.This paper proposes a neural network TBCGSA(Tree Based CNN with BIGRU and Self Attention)which combines tree convolution network and bidirectional gating recurrent unit with self-attention.Through experiments,the accuracy of detection is better than any other existing models on the data set used in this paper.(2)Research on deep supervised code clone detection based on graph neural network.This paper proposes a method which adds edges to the syntax tree of source code,and two common graph neural networks are used to extract code feature vectors.Through experiments,on the data set used in our paper,the detection efficiency has achieved good results.(3)Research on deep unsupervised code clone detection based on AST representation.This paper proposes an improved recursive autoencoder network MTBRAE(Multi Tree Based Rucursive Autoencoders),which is based on multitree input.Through experiments,on the data set used in this paper,it achieves better results than other traditional and unsupervised code clone detection methods in terms of high syntax clone and semantic clone.
Keywords/Search Tags:code clone detection, abstract syntax tree, tree convolution network, self-attention mechanism, graph neural network, recursive automatic encoder
PDF Full Text Request
Related items