Research And Design Of Source Code Homology Detection System Based On Text And Abstract Syntax Tree Compare

Posted on:2012-12-01

Degree:Master

Type:Thesis

Country:China

Candidate:Y Z Ren

Full Text:PDF

GTID:2178330335460872

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

With the development of computer industry, the enormous economic and social efficiency, make the value of software gain more and more attention. How to protect software intellectual property rights and the rights and interests of software developers, is worthy of study. Software source code homology identification can provide effective evidences to intellectual property. The technology of Homology identification of source code is to find the similarity of two or more software systems, which can provide strong evidences to software intellectual property.At present, the source code homology identification techniques can be divided into two categories, one is on text level, the other on the level of grammatical structure. The principle of Text-based identification technologies is to detect the same or similar characters of the source files, and then obtain the similarity of two or more files. These technologies are simple and effective for the low-level code Plagiarism, and there are many mature tools. Technologies based on grammatical structure, with the idea to generate an abstract syntax tree of a source code file, and then compare the syntax tree structure information to determine the similarity of the source files, are deep identification technologies, which can detect much more complex code Plagiarism. However, because of the complexity, few tools can be used now.Integrated measurement of homology of software is a relatively new field. How to effectively use a variety of homology identification methods and Integrate the results reasonably is worthy of study. Integrated measurement methods are divided into two categories: quantitative and qualitative, which apply to different application scenarios.In this paper, with much theoretical study and a lot number of experiments, it describes a differential system to identify homology of source code. The system implements the text and the abstract syntax tree level identification technologies, and proposes using two comprehensive measure of homology algorithm:multiple linear regression analysis and hierarchical analysis, to identify the source code homologous, which provide quantitative and qualitative analysis of the results, making the identification more comprehensive.

Keywords/Search Tags:

source code homology, text compare, abstract syntax tree, integrate measurement

PDF Full Text Request

Related items

1	Design And Implement Of Software Defects Detection System Based On Source Code Homology Detection Technology
2	Research On Source Code Plagiarism Detection Based On Abstract Syntax Tree
3	Automatically Based On The Abstract Syntax Tree And Static Analysis Of The Cloned Code Refactoring
4	Study On Software Homology Detection Technology Based On Ast Structure Optimization And CFG Comparison
5	Short Text Similarity Research Based On Abstract Syntax Tree
6	Design And Implementation Of Abstract Syntax Tree Based Code Defect Detection
7	Pyreview:A Python Source Code Analysis Tool Based On Abstract Syntax Tree Differencing Algorithm
8	Development Of Static Code Defect Detection Tool Based On Abstract Syntax Tree
9	An Optimization Algorithm For Homology Comparison Based On Abstract Syntax Tree
10	Code Homology Analysis Based On Ast And Improved Particle Swarm Optimization Algorithm