Font Size: a A A

Code Clone Detection Based On Sequence Alignment And Deep Learning

Posted on:2020-03-21Degree:MasterType:Thesis
Country:ChinaCandidate:J Z YangFull Text:PDF
GTID:2428330605466650Subject:Computer Science and Technology
Abstract/Summary:
Due to code copy and paste in the software development process,there are code clones in current software systems.Generally,these code clones are quite harmful to software maintenance.Therefore,researchers have proposed many methods and techniques to identify candidate code clones.However,there are still some problems in current code clone detection research.On the one hand,most researchers mainly focus on code clone detection based on source code,and the Java bytecode can well reflect the semantic information of the source code.On the other hand,the effectiveness of existing code clone detection techniques based on deep learning are not promising,especially for functionality code clone detection.In view of the above problems,this thesis attempts to explore more effective methods based on Java bytecode and deep learning.The main work of this paper are as follows:(1)A method based on Byte Code Squence Alignment(BSA)is proposed in this thesis.First,the BSA method constructs a bytecode instruction path graph according to the bytecode instruction execution and jump conditions.Then,the instruction sequences are normalized by employing the tree based network instruction model.Finally,the BSA method applies the Smith-Waterman algorithm with a static acceleration penalty strategy to alignment bytecode sequence to detect code clones.(2)A novel code clone detection method based on Deep Learning(DCCDL)is also proposed in this thesis.First,DCCDL converts the source code into AST and extracts method level code fragments.Then,DCCDL respectively extracts the semantic,structural and functionality features of source code based on AST,and constructs the feature similarity vectors for any method pair.Finally,a classifier based on deep neural network model is trained by training set to detect candidate code clones in software systems.(3)The large scale experiments are carried out on five open software systems and one big data set,and the proposed methods are compared with some existing code clone detection methods.The experimental results show that the BSA method has improved at least 14.4% in terms of F-measure compared to comparative methods based on source code;Compared to some deep learing methods,DCCDL achieves 8% improvement in terms of F-measure at least.What's more,DCCDL can effectively detect functionality code clones.
Keywords/Search Tags:Code Clone, Code Clone Detection, Bytecode Sequence Alignment, Smith-Waterman, Deep Learning, Functionality Code Clone Detection
Related items