The development of computers and the Internet has changed people’s production and lifestyles.Many traditional fields of work can use computers to improve work efficiency.For example,in the program project experimental courses in the education field,the method of manually reviewing the program work submitted by students is very inefficient,and it is difficult to find possible code plagiarism.Therefore,the design and implementation of a student experimental source code duplication system that detects plagiarism has Significance.In the duplicate check scene of the experimental coursework of the program project,the project engineering work generally provides a framework that contains the basic modules,and the remaining multiple modules are jointly developed in the form of small groups of labor.Re-checking the project engineering needs to consider removing the template code,checking the overall project and detailed checking the team members,etc.At the same time,the checking system should be able to deal with common code obfuscation methods.In order to fulfill the above functional requirements,the duplication check system adopts a token-based structural analysis method for detection.The system includes a code preprocessing module,an analysis conversion module,a duplicate check execution module,a result analysis module,and a detailed result generation module.The code preprocessing module filters and extracts the project files submitted by the team to obtain the content to be checked and store it in the comparison database.The parsing and conversion module is responsible for reading the group file used to check duplicates from the comparison library and parse and convert it into a token sequence that can be used for similarity matching.The lexical analyzer ignores comments,spaces,and changes to variable names during the process of scanning the code to generate the token sequence.Unified mapping,etc.The duplication checking execution module performs a comparison database check on the objects obtained by the analysis and conversion,and uses the Greedy String Tilling(Greedy String Tilling,GST)algorithm to perform similar matching,and collects the results of each match.The result analysis module is responsible for obtaining matching results from the duplicate check execution module and performing statistical analysis to obtain the overall plagiarism ratio of the group and the plagiarism ratio of each module and the corresponding team members.The detailed result generation module is used to generate detailed matching results,which is convenient for teachers to review the code of plagiarism.The system test results show that the duplicate check system uses lexical analysis to generate token sequences for comparison.It has good effects on conventional code obfuscation methods such as comments and output changes,variable name replacement,code format modification,etc.The application of greedy string matching algorithms can Effectively deal with code rearrangement;the duplicate check system can complete the comparison database check of the project,analyze the overall plagiarism ratio of the project and the plagiarism status of different modules and corresponding development team members,and generate an overall report and detailed result files. |