Font Size: a A A

Design And Implementation Of Code Duplicate Checking Software

Posted on:2022-10-10Degree:MasterType:Thesis
Country:ChinaCandidate:R H DuanFull Text:PDF
GTID:2518306605970209Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Code duplication refers to the process of detecting whether there is a process of plagiarism or reference between two pieces of code in the process of software development and maintenance.This kind of plagiarism or reference coding method is generally called code cloning,similar code fragments called clone code.After a large amount of open source code,code plagiarism began to prevail in colleges and universities,which not only prevented teachers from mastering the real learning situation of students,but also seriously undermined the fairness of the exam.Since the manual duplication check of code cloning is heavy and difficult,it is obviously necessary to design a duplication check software that can detect code plagiarism to judge the similarity of students' submissions.In order to strengthen teachers' understanding of students' programming abilities in this subject,and timely discover and combat plagiarism in students' homework codes,a software for checking duplicates of students' homework codes has been proposed and implemented.The main research work is as follows:(1)According to the text feature design of the student code,an efficient code preprocessing process is realized.Through the study of a large number of student codes,the students' idioms in code plagiarism are summarized.Extract the token sequence in the student code block by using the abstract syntax tree,and design a global token frequency map according to the student code token frequency and sort the sequence,and then compare the obtained student code according to the preset similarity judgment threshold the redundant code pairs are filtered,the entire pre-processing process is completed,and the processing performance of the entire duplication check software is optimized.(2)Aiming at the characteristics of the code written by students,a code duplication algorithm based on the longest common subsequence and abstract syntax tree is designed.The algorithm mainly uses the method of dynamic programming,combined with the serialization of the candidate clone code block token to improve the longest common subsequence algorithm,so that the optimized longest common subsequence algorithm can more accurately and quickly carry out student plagiarism code ?,? type of testing.Then the remaining candidate clone pairs are further checked.By optimizing the abstract syntax tree structure and optimizing the detection logic of the Diff algorithm,the student plagiarism code type III detection is efficiently realized.By combining the two methods in an orderly manner,the accuracy and completeness of the software duplication check are improved.(3)Designed and implemented a complete student code duplication software.Functional modules such as code preprocessing,code duplicate checking,and visual analysis of duplicate checking results are realized.The software has been tested for function and performance,and the effectiveness of the software has been verified by comparison and analysis with the more mature MOSS Duplicate Check Software of Stanford University.It is integrated and applied to the smart education platform built by the school to provide effective support for the school code course homework checking.
Keywords/Search Tags:code duplication, code plagiarism, similarity, student code
PDF Full Text Request
Related items