Font Size: a A A

A Research On Program Coding-oriented Plagiarism Detection Techniques By AST-based Strategy

Posted on:2011-07-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y B ZhaoFull Text:PDF
GTID:2178360305492669Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Plagiarism is always a common phenomenon. In recent years, it is becoming more serious and its appearance is more deceptive. Increasing students tend to approve the plagiarism behavior and experience it in person. In higher education, because of characteristics of computer specialty focused on engineering applications, the teachers emphasize practice teaching and examination based on computer. However, electronic texts are copied and spread easily by computer. All above factors result in the population of plagiarism directly. Foreign researchers once conducted a survey about the code plagiarism in programming courses in Monash University, Australia. Statistics shows that more than 85.4% students admitted to have plagiarized other's programming assignments. These problems have seriously affected students'cultivation and teachers'normal teaching. The detection techniques of code plagiarism can discover the plagiarized code samples efficiently and quickly. It is helpful to prevent the ill studying habits, improve the quality of teaching and ensure the objectivity of evaluation.Author analyzed the state of the art in plagiarism detecting techniques on program codes, domestic and abroad, and put forward a detecting method based on the Abstract Syntax Tree (AST) in this paper. The essential strategies of this method are followed. First, construct an AST for each source code in C programming language with GCC compiler, then parse the semantics contained in program codes to each AST's node. There is plenty detailed information in AST which are valuable for compiling. Second, optimize the structural relationship of AST by eliminating the redundancy nodes and preserving useful nodes. Third, according to those nodes, create characteristic token strings that are completely different from linear strings, but a set of nodes, which contains more programs semantic. Finally, compute the similarity between token pairs in set of characteristic token strings with decision function and complete code plagiarism detection. Additionally, for the sake of recognizing plagiarized pairs existed effectively, author proposed a kind of similarity threshold adaptive-mechanism which can choose the option automatically in different detection process.Beyond exploration on the level of theory, author also try to design and accomplish an experimental system of plagiarism detecting on program codes based on AST. This prototype system can optimize the structure of AST, analyze nodes and detect plagiarism code-pairs automatically. Utilizing uniform C language program codes as test corpora, author compared the ex-perimental result of system with that of MOSS. The experimental result shows that the system based on AST can detect plagiarism efficiently among complex programming structures equipped with various plagiarism means, including function calls.
Keywords/Search Tags:Plagiarism Detection, the C Programming Language Code, Abstract Syntax Tree, Characteristic Token String, Similarity Computing, Decision Function
PDF Full Text Request
Related items