Font Size: a A A

An Ensemble Learning Approach For Code Clone Detection

Posted on:2020-07-16Degree:MasterType:Thesis
Country:ChinaCandidate:X L ZhangFull Text:PDF
GTID:2428330575959711Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Code Clone,refers to those similar code fragments distributed in several positions within a project or cross-project.It is usually caused by the copying habits or design concepts of programmers,often resulting in a software system that is difficult to maintain.How to automately and efficiently locate these similar code to better reconstruct and maintain the software is a research hotspot.Traditional code cloning detection methods are mostly based on static text analysis.In fact,they perform poorly in capturing semantic similarity.Recently,some research have made deep learning model integrated into code clone detection and achieved good results.But the parameters of deep neural network models are too complicated,which makes the training overhead particularly large.To address this issue,we adopt a solution based on simple ensemble learning technique.Instead of using sophisticated neural network,we carry several weak classifiers to capture all kinds of profound patterns of similarity and thus obtain high accuracy.Also,we introduce proper feature engineering to improve the capability of mining syntactic and semantic information.The main contributions and innovations of our paper are as follows:1.We leverage weak classifier and ensemble learning method to solve the code clone detection problem,which achieves excellent performance outperforms the existing deep learning method.2.We propose a feature engineering method,which regards the source code as a document with context logic and uses the word embedding model to extract the vector representation of the code segment.Without too much intermediate form conversion on the source code structure,this method retains the complete information of the code.To verify the proposed method,we conduct experiments on a popular data set named BigCloneBench.The results show our method can achieve state-of-art performance in code clone detection task,especially in semantic clone detection.
Keywords/Search Tags:Code Clone, Ensemble Learning, Feature Engineering
PDF Full Text Request
Related items