Code Plagiarism Detection Research Based On Suffix Tree

Posted on:2012-07-05

Degree:Master

Type:Thesis

Country:China

Candidate:Y C Li

Full Text:PDF

GTID:2178330335974901

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the continues development of information technology, plagiarism is becoming increasingly easy and difficult to prevent. In programming course assignments and online test evaluation, phenomenon of code plagiarism is also common in students. Australia monash university once conduct a survey about plagiarism in programming courses. Statistics shows that more than 85.4% students admit plagiarism in homework and in tests. The growing phenomenon of plagiarism has been seriously undermined the normal order of teaching, meanwhile, it has affected the quality of teaching and the improvement of student's quality. To restrains the bad study ethos, research on efficient code plagiarism detection methods is increasingly necessary.In this paper, analyzed the state of the art in plagiarism detecting techniques, we proposed a suffix tree-based code plagiarism detection method for C code similarity detection focus on common plagiarism means. Firstly, take advantage of open source tool -ANTLR(Another Tool for Language Recognition) generated C language lexical and syntax analyzer, do lexical and syntax analysis with C code and generated suffix tree of c code; secondly eliminate the Redundant information through optimizing the syntax tree and then generate the sequence of the suffix tree; Thirdly, we use the improved algorithm GST (Greedy-String-Tiling) string matching algorithm to match the sequence of suffix tree.; Finally, select appropriate decision fuction to caculate the similarity of two source code and then determine whether there is plagiarism.Based on the above method, we designed and implemented a plagiarism detection experimental system, The system can detect the similarity of each two code in a program set. we develop a program code sample set, according to common ways in code plagiarism. The experimental results show that the higher accuracy of detection was obtained,because the hierarchical relationship of the nodes in character sequence reflects the original code logic and some semantic information. Compared with the MOSS system, this experimental system is superior both in accuracy and efficiency.

Keywords/Search Tags:

Plagiarism detection, Code similarity, Suffix tree, ANTLR, Greedy-String-Tiling, Decision Function

PDF Full Text Request

Related items

1	A Research On Program Coding-oriented Plagiarism Detection Techniques By AST-based Strategy
2	Research On Similarity Measure Method Of Program Code
3	Research And Implementation Of Code Plagiarism Detection Based On Subtree Tracking
4	Open Electronic Document Plagiarism Detection Services To Build Technology Research
5	Research Of Ch-En Cross-Lingual Plagiarism Detection Based On Translation Features And Contents
6	C Code Similarity Measurement Algorithm Based On Levenshtein Distance
7	Research Of Souce Code Plagiarism Detection Method Based On N-gram
8	Research And Application Of Plagiarism Detection Technology Based On Code Style Classification
9	Research On Code Similarity Detection Algorithm Based On Deep Learning
10	Research And Application Of Program Code Similarity Detection Method