Font Size: a A A

C Code Similarity Measurement Algorithm Based On Levenshtein Distance

Posted on:2018-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:J X ZhangFull Text:PDF
GTID:2428330569975169Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Code plagiarism is very common,especially in C language courses.Code plagiarism detection brings heavy work to teachers and reduces the quality of teaching.So how to detect code plagiarism automatically and accurately by using computer program is the main problem to be solved.By analyzing the C language course source code,proposing the algorithm of code structure comparison based on structure tree,and experimentally validating the algorithm with the C language course source code,the author found that those codes with similar internal logic structures have larger probability of plagiarizing.Through the analysis of the C language beginners' codes,the author found that there are four kinds of most commonly used plagiarism types: adding,deleting,modifying comments;changing the identifier name or data type;changing the location of function definition and increasing the redundant code.Aiming at these four types,four corresponding formatting algorithms are proposed.Combining the code structure comparison algorithm based on structure tree and code formatting algorithm,people proposed a C code similarity measurement algorithm based on Levenshtein distance.The experimental results show that the C-code similarity measurement algorithm based on levenshtein distance has high accuracy in the above four kinds of plagiarism detectionFinally,under the linux system,by adopting the C language and php language,the author designed and implemented a set of C code plagiarism detection system.The system is also able to view the history detection,the source code,compare the code difference function,and it can be used as a teaching assistant tool to help teachers to automatically detect the code plagiarism.
Keywords/Search Tags:Code Plagiarism, Code Block Tree, Structural Similarity, Code Formatting, Similarity Measurement
PDF Full Text Request
Related items