Font Size: a A A

Software Plagiarism Detection Algorithm Based On Abstract Syntax Tree

Posted on:2012-12-04Degree:MasterType:Thesis
Country:ChinaCandidate:J S LiFull Text:PDF
GTID:2178330335960870Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
The software plagiarism detection technology plays a very important part in the work of plagiarism detection and software evaluation. Software plagiarism mainly appears as copy-and-paste or with a little modification after this which will not change the function of the code.There are three kinds of plagiarism detection tools nowadays: text-based, token-based and syntax structure-based. Without considering the syntax factors, the first two kinds have obvious limitations. This paper puts forward a code comparison algorithm based on the Abstract Syntax Tree (AST). According to the syntax trees characteristics, the algorithm calculates theirs hash values, transforms their storage forms, and then compares them note by note. As a result, the efficiency improves. The algorithm also takes special measurement to reduce mistake when calculate the hash value of the operation like subtraction and division. This paper presents a source code based on the homology of the abstract syntax tree matching algorithm for the characteristics of syntax tree to calculate the Hash value, convert the storage form of syntax tree, and the syntax tree node-by-match Improve the efficiency of the algorithm. Hash syntax tree calculated on the value of time of the subtraction, division, location and other exchange variables will change in the special case of semantic conducted a special treatment to reduce the false alarm rate.This paper firstly introduced the research background of software plagiarism detection technology and some related knowledge of abstract syntax tree, including the basic content of compiler theory and the structure of the abstract syntax tree. And an algorithm is proposed, which mainly focused on how to compare the abstract syntax tree with its hash value. Finally, a detailed experimental evaluation was made to show the actual effect of this algorithm. After all the experiments, this paper proved that the algorithm can perform well in the code comparison field.
Keywords/Search Tags:software plagiarism detection, syntax tree, Hash value
PDF Full Text Request
Related items