Font Size: a A A

Research On Cross-Language Code Similarity Detection Method Based On Program Flow Chart

Posted on:2020-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:Q SongFull Text:PDF
GTID:2518306305486324Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Program source codes similarity detection has important applications in detecting code plagiarism and protecting software intellectual property.Currently,there are two types of methods of program codes similarity detection:attribute counting and structure metric.Structure metric is the mainstream method,including string-based,tree-based and graph-based detection methods.However,these detection methods are mainly for similarity detection between codes of the same programming language.Because there are differences in basic grammars between different programming languages,these detection methods are not suitable for cross-language similarity detection.Aiming at the above problems,this paper proposes a code similarity detection crossing language(CSDCL)method based on program flow chart.In the method,the codes written in different programming languages are converted into a standard flow chart that can express the dependencies between code statements,and the similarity between codes in different languages is measured according to the similarity of the flowchart.Firstly,combined with the traditional Program Dependency Graph(PDG),a program dependency flow chart model is presented to express the relationship between code flow structure and statements in order to standardize the flow chart of different language codes.In the program dependency flow chart model,the node types are defined,and the sequential execution edge expressing the logic sequence of the code and the control dependency edge expressing the statement dependency are expressed.Then,converting flowcharts generated by code in different programming languages into program dependency flowcharts.Secondly,based on the VF2 sub-graph isomorphism algorithm,the PDFC-VF2 sub-graph isomorphism algorithm is proposed to calculate the similarity between two PDFCs.This method reduces the node search space in PDFC,and improves the efficiency of algorithm detection while implementing cross-language program codes similarity detection.In order to verify the detection effect of CSDCL method,the paper carried out experiments and analysis from two aspects:method accuracy and the effect on handing code obfuscation.The experimental results show that compared with the traditional PDG-based codes similarity detection methods,the CSDCL method has higher accuracy and detection efficiency in cross-program language codes similarity detection.In addition,for plagiarism using code obfuscation,the method can detect plagiarism using completely copying the original program,modifying the comment,changing the program format and blank lines,changing the variable name,changing the data type,etc.Furthermore,high-level plagiarism,such as adjusting code word order structure,replacing with equivalent control structure,and adding redundant code,can also be detected with the nearly 90% accuracy.
Keywords/Search Tags:Code similarity detection, Cross-program language, Program flow chart, Sub-graph isomorphism
PDF Full Text Request
Related items