Font Size: a A A

Research On Code Similarity Detecting Based On CNN And Code Plagiarism Checking System

Posted on:2019-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:D P YinFull Text:PDF
GTID:2348330545462552Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
The rapid development of the Internet has promoted the increasing of information sharing channels and also provided a good convenience for code copying.In the field of college teaching,the phenomenon of code plagiarism has been repeatedly banned and even intensified.This seriously affects the teaching work order and the teaching quality.This paper mainly studies the code of programming assignment in college programming language teaching.On the one hand,based on the traditional similarity detection method of code text,this paper proposes an unsupervised similarity detection scheme with better detection effect.On the other hand,based on the convolutional neural network(CNN),an end-to-end monitoring similarity detection model is proposed to realize the automatic learning of code content and copying methods.On the basis of two schemes,a code search system for teaching tasks is constructed.In this paper,we first study the optimization scheme of traditional code text similarity detection based on winnowing algorithm.Considering the dependence of winnowing algorithm on the preprocessing of code text,according to the existing 200 g students' work code,eight methods of code copying are calculated and the preprocessing rules can be designed to counter the copying methods.First introduces simhash algorithm code text similarity detection field,comprehensive simhash algorithm and the design characteristics of winnowing algorithm based on simhash,winnowing,and code the weighted model of attribute measurement,known as sim-win weighted method,the accuracy data sets in the assignments code is much higher than traditional code text similarity detection method.Further,this paper consider the similarity calculation of unsupervised in supervised learning model,similarity of considering CNN On image and text feature extraction,based On CNN to build a suitable for Code text similarity detection model of the end-to-end called CPOC(Code Plagiarism On CNN),input of the model for the Code of the original text,through the network model to study the Code text characteristic,based On a large number of training samples to learn new Plagiarism means,to implement adaptive Code text similarity detection model.It is proved that CPOC method is much better than traditional code text similarity detection method,and is much better than other similar detection algorithms based on neural network model(such as LSTM).Then,this paper designs a code checking system.Including the design of system architecture,design and implementation of database structure,interaction of data,and back-end development.Through the establishment of a service system for code that is in the programming language teachers,on the one hand,the above research algorithm into functional applications,on the other hand,convenient for management and evaluation of teachers'homework to the student code.The code review system is now deployed on the Intranet of the school.
Keywords/Search Tags:similarity, plagiarism checking, simhash, Convolutional Neural Network, system design
PDF Full Text Request
Related items