Font Size: a A A

Research On Source Code Similarity Detection Based On Process Mining

Posted on:2021-02-20Degree:MasterType:Thesis
Country:ChinaCandidate:L L LiFull Text:PDF
GTID:2518306032959259Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Source code similarity detection has important applications in computer programming teaching and software intellectual property protection.In the teaching of computer programming courses,students may use some complex code obfuscation techniques,such as opaque predicates,loop unrolling,function inlining and outlining,to reduce the similarity between source code fragments and evade the plagiarism detection.Existing source code similarity detection approaches only consider static features of source code,making it difficult to fight against more complex code obfuscation techniques.Aiming at computer programming teaching,we propose an approach of source code similarity detection based on process mining that considers the runtime dynamic features of source code.Specifically,first of all,some output statements are inserted into the code through the code instrumentation to obtain the dynamic characteristics of the code.Therefore,we define the code insertion statements and rules that are suitable for code similarity detection,and get the running logs of two code fragments through the instrumentation and running of the code.Next,the process mining technique is used to obtain the flow charts of two source code fragments that reflect their dynamic features at runtime by their running logs.Finally,the flow charts obtained by process mining are used to measure the similarity of code dynamic features.Therefore,the similarity between two flow charts is calculated by graph similarity algorithm,which is used as the similarity between codes.Experimental results show that the proposed approach can deal with more complex obfuscation techniques including opaque predicates,loop unrolling and function inlining and outlining that existing approaches cannot defeat,compared with the existing representative code similarity detection approaches Sim and GPLAG.Therefore,it is more powerful in fighting against code obfuscation techniques compared with existing source code similarity detection approaches.
Keywords/Search Tags:Source code similarity detection, Code obfuscation, Code instrumentation, Process mining, Flow chart
PDF Full Text Request
Related items