| Code clone detection has important research value in the field of software engineering,which can be widely used for code quality analysis,reuse detection,bug detection,etc.In various types of methods,token-based detection method is getting more and more attention for its time performance and generality.However,traditional tokenbased methods have a poor effectiveness for detecting variant Type-3,Type-4 clones.For this consideration,this thesis proposes the method to label tokens with different weight based on their frequency and type,and therefore improves the detection ability of token-based detection method and develops a general method which can be used for performance bug detection.The main work and contributions of this thesis are as follows:(1)Code clone detection based on labeled tokenTraditional token-based clone detection methods are delicate distinction of different types of tokens’ importance and therefore can not highlight key tokens which are related to code structure.Based on tokens’ frequency from TF-IDF(term frequencyinverse document frequency)and their type,this thesis designs a method to label tokens based on their weights.First we build index and filter in line granularity to get candidate clone pairs,then calculate the longest common sequence and measure their similarity based on tokens’ weight.Experimental results on the public dataset,clone pairs in BigCloneBench show that,compared with existing token-based clone detection method,our method has a significant performance improvement on Type-3 and Type-4 clones.(2)The application of method:finding performance bug based on code templatePerformance bug detection aims at finding code snippets which have significant impacts on program performance,and plays an important role in improving software quality.However,existing methods are mainly specific methods,which find certain types of performance bug and therefore are lack of generality.Meanwhile,these methods also need information from control flow graph of the program or dynamic analysis in detection process,which lead to a high cost of time.For this consideration,this thesis proposes an idea using clone detection technique for finding performance bugs,and realizes it as a method finding performance bugs using code templates.For any types of performance bugs,we first build code templates.Then use clone detection method to find code snippets similar to the templates in code base.Experimental results on the dataset of open source projects prove that our method can reach the similar detection performance compared with existing method and have much lower time cost.Meanwhile,our method also finds new types of performance bug which have never been found by existing methods in these open source project. |