Font Size: a A A

Research On Source Code Plagiarism Detection Based On Abstract Syntax Tree

Posted on:2018-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:D Q FuFull Text:PDF
GTID:2428330575998758Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In the field of code plagiarism detection,the judge of codes which have common fragments or similar structures is always a big challenge.The traditional plagiarism detection method based on text matching can't give accurate results for the code with such characteristics.This paper proposes two source code plagiarism detection methods,named ASTK(Abstract Syntax Tree Kernel)and WASTK(Weighted ASTK).Taking the hierarchical features of the program code,they transfer the source code of a program to an abstract syntax tree and then get the similarity by calculating the tree kernel of two syntax trees.Meanwhile,WASTK applies an idea similar to TF-IDF(Term Frequency-Inverse Document Frequency)in the field of information retrieval to solve the problems.Depending on the frequency of each piece of code,the corresponding node in an abstract syntax tree is assigned a weight by TF-IDF.ASTK and WASTK is evaluated on different datasets and,as a result,performs much better than other popular methods like Sim and JPlag.
Keywords/Search Tags:Abstract Syntax Tree, Tree Kernel, TF-IDF
PDF Full Text Request
Related items