Font Size: a A A

Research On Large Scale And Efficient Code Clone Detection Method Based On Sequence Alignment

Posted on:2020-11-22Degree:MasterType:Thesis
Country:ChinaCandidate:J Z LiuFull Text:PDF
GTID:2518306548993829Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the software development process,it is a common programming method for developers to copy and paste an existing source code from somewhere and modify it as needed.This software development method is called code reuse in the field of software engineering.Although code reuse technology can bring a lot of convenience to the development of software systems,it will lead to multiple identical or very similar code fragments in the software system,which is called code clone.Code clone not only makes the entire source code bloated,increases the maintenance cost of the software system,but also affects the quality of the software,leading to the introduction and proliferation of code vulnerabilities.Therefore,the detection of code clone has always been a hot research issue.This article conducts in-depth research on code clone detection issues.The main work and contributions are as follows:1)A large-gap code clone detection method based on sequence alignment is proposed and implementedLarge-gap clone is a complex type of code clone that is characterized by the addition,deletion or modification of many statement blocks in cloned code fragments.The detection and tracking of such clones is of great significance.However,due to the large differences in the statement structure of large-gap clones,it is difficult to detect them by conventional detection techniques.To this end,this paper studies and proposes a code clone detection method that combines code fingerprint and sequence alignment.By designing an MD5-based code fingerprint generation algorithm,each code fragment in the source code to be detected is represented as a code fingerprint sequence,thereby transforming the code clone detection problem into a matching problem between code fingerprint sequences.Based on the SmithWaterman algorithm,a sequence alignment algorithm based on dynamic parameter optimization is proposed,which solves the problem of missing the matching region caused by the original Smith-Waterman algorithm directly used in the detection of cloned statements.Finally,based on the directed code clone pairs defined in this paper,a clone identification criterion for large gap clone detection is designed.The experimental results show that our approach has good performance in largegap clones detection.At the same time,our approach remains competitive with advanced code clone detection tools in the detection of general clones.2)A large-scale open source code distributed efficient clone detection platform is proposed and implementedBuilding a large-scale code clone detection platform is a valuable application scenario for code clone detection research.However,the establishment of a large open source project code database and the resulting code size have caused enormous storage and computational pressure on clone detection,which will restrict the development of the clone detection platform.To this end,this paper proposes and implements a large-scale open source code distributed and efficient clone detection platform.The core idea is: Firstly,a code processing method based on code fingerprint is proposed,which greatly reduces the storage overhead of each open source project.At the same time,an open source project real-time update system is designed to realize the open source project code database for Github.The construction of a cluster-based distributed computing framework is designed to improve the detection efficiency of the code clone detection platform.The experimental results show that the large-scale open source code distributed high-efficiency clone detection platform implemented in this paper can effectively detect code cloning and has a significant improvement in detection efficiency compared with the stand-alone operation mode.
Keywords/Search Tags:Code Clone, Large-gap Clone, Directed Clone Pair, Sequence Alignment, Distributed Computing
PDF Full Text Request
Related items