Design And Implementation Of Code Clone Analysis System Based On Sequence Matching

Posted on:2010-10-24

Degree:Master

Type:Thesis

Country:China

Candidate:T Q Xin

Full Text:PDF

GTID:2178360272470120

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the development of computer technology and the increasing demands for software, code-reuse is emphasized more frequently in software development and re-engineering. Good code-reuse approach reflects good design, reduces development costs and improves software quality, while the bad approach may bring lots of negative factors. The code-reuse analysis becomes a concern. Code clone analysis, as an effective method for evaluating code-reuse, plays an important role in software development, maintenance and quality assurance.Code clone detection is a technology that detects and analyzes the reused code segments. By applying some detection algorithms, it detects similar code structures within the source code with different granularities in text structure or semantic logic. It has been applied to various fields, such as program analysis, software comprehension, software quality analysis, program plagiarism detection, system evolution, software re-engineering etc. This paper gives a system based on the technology, using Smith-Waterman algorithm to detect clones.In practice, the traditional clone detection technology is either text-based or tree-based. The former has high speed, but low accuracy rate, while the latter is with high accuracy rate and also high calculation cost. This paper investigates the source code abstract representations, clone detection granularities, and code clone detection technologies. Based on the common clone detection process and matching algorithms, it implements a code sequence generating method from Abstract Syntax Tree (AST), and gives a code clone detection method with the combination of tree-based detection and sequence matching algorithm.A code clone analysis system is then designed and implemented. In the front-end, the AST generated after lexical analysis and syntax analysis is flattened and the transformed code is stored in a hash table as sequences, from which the detection granularity gets controlled. Both the above solutions have greatly reduced the input data for the detection. The back-end is mainly constituted of a clone dot-plot view, a clone file-bar view and an extendable code browser with syntax highlighting. This system provides the output as text, CSV and XML, which facilitates the further integration and development. It can server as a basis that helps developers in software comprehension, software re-engineering and quality assurance.

Keywords/Search Tags:

Code Clone, Sequence Matching, Abstract Syntax Tree, Smith-Waterman

PDF Full Text Request

Related items

1	Automatically Based On The Abstract Syntax Tree And Static Analysis Of The Cloned Code Refactoring
2	Code Clone Detection Based On Sequence Alignment And Byte Code
3	Code Clone Detection Based On Sequence Alignment And Deep Learning
4	Detection Of Function-based The Structural Clone And The Semantic Clone
5	Research On Code Clone Detection Based On Deep Learning
6	Pyreview:A Python Source Code Analysis Tool Based On Abstract Syntax Tree Differencing Algorithm
7	Research On Clone Detection Based On Intermediate Representation Of Source Code
8	Research And Implementation Of Code Clone Detection Technology Based On Deep Learning
9	Research On Source Code Plagiarism Detection Based On Abstract Syntax Tree
10	Design And Implementation Of Abstract Syntax Tree Based Code Defect Detection