Research And Implementation Of Code Plagiarism Detection Based On Subtree Tracking

Posted on:2019-10-11

Degree:Master

Type:Thesis

Country:China

Candidate:Z X Zhang

Full Text:PDF

GTID:2428330566968735

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the development of Internet technology,communication becomes more and more convenient,which makes code plagiarism easier.Code plagiarism is a behavior which is complicated and difficult to define,and manual detection is inefficient,ineffective,and subjective.Due to the lack of a credible inspection system,the examination of program design questions detections in most domestic universities are still done manually.The purpose of this article is to solve this problem and improve the existing code plagiarism detection dilemma.Based on the analysis of the research results of the existing code plagiarism detection technology,we propose a code plagiarism detection method based on subtree tracking.In addition,the existing researches only detect the similarity between two samples and rarely consider the existence of plagiarism groups among plagiarism samples.Therefore,based on an improved k-means method,we further propose a detection grouping method which can effectively identify the plagiarism groups in detecting plagiarism.The specific research content of this thesis includes:(1)A code plagiarism detection method based on subtree tracking is proposed for high-level code masquerading detection.The main steps of the method include: Transform the code into an abstract syntax tree;extract the features of the abstract syntax tree and track the eigenvector of each subtree;Calculate the distance between each eigenvector and get the feature similarity matrix;Finally,the code similarity is quantified by code distance and distance threshold.The code distance is calculated by weighting the distance according to the nearest distance of each vector in the feature similarity matrix and the informa tion contained in the feature vector Experimental results show that this method can deal with a variety of plagiarism types,especially "code reordering" type and its detection efficiency is better than existing systems.(2)K-means clustering algorithm has to specify initial k value(cluster number)and is not suitable for the code plagiarism sample clustering.Therefore,we propose an improved k-means clustering method which automatically searches for k-values through comparing the cluster diameter and the distance between the clusters and determines whether a cluster is completely forming.All clusters are searched progressively.The experiment results show the efficiency of the method.(3)An online plagiarism detection system is designed and implemented based on the two methods proposed in this thesis.The system is mainly to serve the teachers and students.In order to make it more convenient to use,except for the code plagiarism detection and grouping functions,the function of question and answer platform is developed to provide an online communication learning platform for teachers and students.At the same time,in order to improve the user experience,we use MySQL master-slave replication and Nginx+keepalived tools to improve the high availability of onlin e plagiarism detection systems from both data and applications.

Keywords/Search Tags:

Code plagiarism, Similarity, Clustering, k-means, Abstract syntax tree

PDF Full Text Request

Related items

1	Research On Source Code Plagiarism Detection Based On Abstract Syntax Tree
2	Research On Similarity Measure Method Of Program Code
3	A Research On Program Coding-oriented Plagiarism Detection Techniques By AST-based Strategy
4	Software plagiarism detection using abstract syntax tree and graph-based data mining
5	Software Plagiarism Detection Algorithm Based On Abstract Syntax Tree
6	Automatically Based On The Abstract Syntax Tree And Static Analysis Of The Cloned Code Refactoring
7	Short Text Similarity Research Based On Abstract Syntax Tree
8	Research And Application Of Automatic Scoring Scheme For C Programming Problems Based On Abstract Syntax Tree
9	Design And Implementation Of Abstract Syntax Tree Based Code Defect Detection
10	Development Of Static Code Defect Detection Tool Based On Abstract Syntax Tree