Research On Techniques Of Code Clone Detection Based On Indexing And Sequence Match

Posted on:2016-02-17

Degree:Master

Type:Thesis

Country:China

Candidate:X Shu

Full Text:PDF

GTID:2308330467982301

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

It is a common operation to copy-paste-edit code during the softwaredevelopment. This reuse mechanism usually leads to a lot of code duplicate or similarcode fragments in the code base, or code clones. Code reuse is convenient fordevelopers, but it brings a certain amount of resource consumption and increases thedifficulty of software maintenance. Since1990s, both the academia and industry beginto analyze code clones and research the detection methods. With the growing size ofthe software products, the original methods reach the bottleneck of using resource andcanâ€™t even work on a single machine. For example, the similarity comparison has tobe carried out among the large scale of software systems developed by differentcompanies and organizations. If the detection method could not obtain the code clonesin time, it would significantly reduce the effectiveness of the code clone detectionapproach.This paper discusses the basic principles and key technologies of the traditionalcode clone detecting methods and presents the index-based and the sequence-matchcode clone detection approaches. The author confirms that the proposed methods havegood effects on the code clone detection of large-scale software systems. The mainwork includes:(1) It proposes an index-based code clone detection method. It normalizes thesource codes to lexeme sequences and statement segments, and then employs the hashvalue of segments to find cloned code. Since the lexeme sequences are persistentduring the detection process, it avoids generating the intermediate sequence andfragmentation repeatedly, and therefore significantly improves the speed of detection.(2) It proposes an improved Smith-Waterman algorithm to detect code clones. Itdetects the code clones based on a lexeme sequence. The method effectively solvesthe mosaic problem occurred in the conventional Smith-Waterman algorithm byadjusting the score matrix and improves the backtracking method to get the best localsimilar code sequence.(3) It presents the experiment on four kinds of software written by Java. It provesthat, compared with the traditional approach, the index-based approach improves thedetection efficiency without sacrificing the precision and recall. Meanwhile, compared with the traditional Smith-Waterman algorithm, the improved one increasesthe precision about1%and recall about2%.

Keywords/Search Tags:

software maintenance, clone detection, index, sequence match

PDF Full Text Request

Related items

1	Research On Code Clone Extension Analysis And Management Technology
2	Research On Software Clone Genealogies Construction And Evolution Features Extraction
3	Research On Analysis And Consistency Maintenance Of Code Clone Based On Software Evolution
4	Research On Large Scale And Efficient Code Clone Detection Method Based On Sequence Alignment
5	Research On Code Clone Detection And Clone Bug Finding
6	Research On Clone Code Consistency Maintenance Base On Clone Genealogy
7	Code Clone Detection Based On Sequence Alignment And Deep Learning
8	Research And Implementation Of Software Maintenance Technology For College Lab Based On Lan
9	The Amorphous Clone Code Detection And Reconstruction System Design And Implementation
10	Quality Analysis And Improvement Of The Code Clone On Large Software Systems Maintenance