Font Size: a A A

Code Plagiarism Detection Research Based On Suffix Tree

Posted on:2012-07-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y C LiFull Text:PDF
GTID:2178330335974901Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the continues development of information technology, plagiarism is becoming increasingly easy and difficult to prevent. In programming course assignments and online test evaluation, phenomenon of code plagiarism is also common in students. Australia monash university once conduct a survey about plagiarism in programming courses. Statistics shows that more than 85.4% students admit plagiarism in homework and in tests. The growing phenomenon of plagiarism has been seriously undermined the normal order of teaching, meanwhile, it has affected the quality of teaching and the improvement of student's quality. To restrains the bad study ethos, research on efficient code plagiarism detection methods is increasingly necessary.In this paper, analyzed the state of the art in plagiarism detecting techniques, we proposed a suffix tree-based code plagiarism detection method for C code similarity detection focus on common plagiarism means. Firstly, take advantage of open source tool -ANTLR(Another Tool for Language Recognition) generated C language lexical and syntax analyzer, do lexical and syntax analysis with C code and generated suffix tree of c code; secondly eliminate the Redundant information through optimizing the syntax tree and then generate the sequence of the suffix tree; Thirdly, we use the improved algorithm GST (Greedy-String-Tiling) string matching algorithm to match the sequence of suffix tree.; Finally, select appropriate decision fuction to caculate the similarity of two source code and then determine whether there is plagiarism.Based on the above method, we designed and implemented a plagiarism detection experimental system, The system can detect the similarity of each two code in a program set. we develop a program code sample set, according to common ways in code plagiarism. The experimental results show that the higher accuracy of detection was obtained,because the hierarchical relationship of the nodes in character sequence reflects the original code logic and some semantic information. Compared with the MOSS system, this experimental system is superior both in accuracy and efficiency.
Keywords/Search Tags:Plagiarism detection, Code similarity, Suffix tree, ANTLR, Greedy-String-Tiling, Decision Function
PDF Full Text Request
Related items