Font Size: a A A

Design And Implementation Of Code Clone Analysis System Based On Sequence Matching

Posted on:2010-10-24Degree:MasterType:Thesis
Country:ChinaCandidate:T Q XinFull Text:PDF
GTID:2178360272470120Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of computer technology and the increasing demands for software, code-reuse is emphasized more frequently in software development and re-engineering. Good code-reuse approach reflects good design, reduces development costs and improves software quality, while the bad approach may bring lots of negative factors. The code-reuse analysis becomes a concern. Code clone analysis, as an effective method for evaluating code-reuse, plays an important role in software development, maintenance and quality assurance.Code clone detection is a technology that detects and analyzes the reused code segments. By applying some detection algorithms, it detects similar code structures within the source code with different granularities in text structure or semantic logic. It has been applied to various fields, such as program analysis, software comprehension, software quality analysis, program plagiarism detection, system evolution, software re-engineering etc. This paper gives a system based on the technology, using Smith-Waterman algorithm to detect clones.In practice, the traditional clone detection technology is either text-based or tree-based. The former has high speed, but low accuracy rate, while the latter is with high accuracy rate and also high calculation cost. This paper investigates the source code abstract representations, clone detection granularities, and code clone detection technologies. Based on the common clone detection process and matching algorithms, it implements a code sequence generating method from Abstract Syntax Tree (AST), and gives a code clone detection method with the combination of tree-based detection and sequence matching algorithm.A code clone analysis system is then designed and implemented. In the front-end, the AST generated after lexical analysis and syntax analysis is flattened and the transformed code is stored in a hash table as sequences, from which the detection granularity gets controlled. Both the above solutions have greatly reduced the input data for the detection. The back-end is mainly constituted of a clone dot-plot view, a clone file-bar view and an extendable code browser with syntax highlighting. This system provides the output as text, CSV and XML, which facilitates the further integration and development. It can server as a basis that helps developers in software comprehension, software re-engineering and quality assurance.
Keywords/Search Tags:Code Clone, Sequence Matching, Abstract Syntax Tree, Smith-Waterman
PDF Full Text Request
Related items