Font Size: a A A

Detection Of Function-based The Structural Clone And The Semantic Clone

Posted on:2020-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y M YangFull Text:PDF
GTID:2428330590996783Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the software development phase,developers often perform software development tasks by copying and pasting existing code segments in order to shorten development time and reduce costs.Generally,the development process in this way is called code clone and the code used duplicately in software is called cloned code.Although code clone can effectively reduce the cost of software development,it may introduce potential threats to the quality of overall software and significantly increase maintenance costs.Therefore,for ensuring the quality of the software,it is necessary to perform code clone detection on the software.Refactoring the detected clones can allow the maintenance personnel to reduce the number of them in the software.To mitigate this problem,existing studies have proposed many automated techniques and tools for code clone detection.However,most of these studies can detect the first two types of clones generally and identify them from a coarse granularity,such as the clone detection based on project,file,or class.However,structural clones and semantic clones in fine-grained have a huge impact on software quality and are difficult to detect,both in development and maintenance.Considering that the function that is the smallest meaningful unit constitutes the system,we attempt to detect the structural clone and the semantic clone at the function level.In this thesis,we conduct the following work:(1)In order to detect the structural clone in software at the function level effectively,we focus on investigating function based code clone detection and leveraging the structural similarity to measure the similarity of code fragments.The method first enhances Abstract Syntax Tree(AST)to achieve more abstract code representations by using defined node types instead of the original node representations,and then adopts a local comparison algorithm,Smith Waterman,to calculate the similarity scores of pairs of code fragments in the function level.To evaluate the effectiveness of our method,we construct five publicly available datasets based on open source software projects and share them for further research.Experiments conducted over the five datasets show that our method can achieve 90.46% in precision and 96.24% in rec-all on average and outperform the comparative algorithms by up to 10.94% and 4.02%,respectively.Meanwhile,experimental results show that our method can achieve 92.22% in precision and 97.30% in recall on average in code clone detection over cross-projects.(2)In order to realize the effective detection of the semantic clone in software,we propose a new semantic clone detection method,called SCDVP.This method first constructs a program dependency graph(PDG)by transforming the code into intermediate representation,and extracts the control dependency information and data dependency information from PDG.Then,we use the semantic model in deep learning,paragraph2 vec,to calculate the semantic vector re-presentation of each function separately.Finally,we use the cosine similarity to calculate the similarity between vectors to determine whether the functions are clone pairs.To evaluate the detection performance of this method,we select a standard dataset of cloned code,BigCloneBench,as the dataset in our experiments.The experimental results show that SCDVP can achieve 95% in precision and 96% in recall,which means our method works well in function based code clone detection.Besides,the performance of SCDVP significantly outperforms the baseline methods over the dataset.
Keywords/Search Tags:Code Clone Detection, Semantic Clone, Abstract Syntax Tree, Program Dependency Graph, Paragraph2Vec
PDF Full Text Request
Related items