Font Size: a A A

Code Homology Analysis Based On Ast And Improved Particle Swarm Optimization Algorithm

Posted on:2018-01-04Degree:MasterType:Thesis
Country:ChinaCandidate:L Y ZhangFull Text:PDF
GTID:2348330542451813Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the development of the software industry,the demand for software developers is increasing year by year.However,the huge amount of code reviewed by human becomes impossible.Meanwhile,the convenience of the Internet makes the plagiarism more serious.The homology analysis on the source code can tell the difference by comparing two pieces of source code in order to distinguish the code plagiarism more effectively.Therefore,the source code homology analysis is very important for the protection of the intellectual property rights of the code.There are three kinds of common homology analysis:homology analysis based on text,homology analysis based on token and homology analysis based on syntax tree.Among those methods,the homology analysis based on syntax tree is widely used.In this thesis,we propose a new method based on abstract syntax tree and improve the particle swarm optimization.Firstly,we obtain the intermediate code by pre-processing.Then we do the lexical grammar analysis and use the abstract syntax tree storage grammatical structure information.At last,the improved particle swarm optimization algorithm is used to solve the problem.Specific work done is as the following.Firstly,we study the process of constructing lexical analyzer and parser using ANTLRV4 based on the ANTLRV4 tool.The Lexical analysis mainly read the source code,to constitute a source program into a sign which can identify the character of flow as input of syntax analysis.And the syntax analysis mainly accepts the token stream from the lexical analyzer output and transforms it into abstract syntax tree.Secondly,according to the characteristics of C++ language,we describe the abstract syntax tree structure.In order to avoid the failure of matching,we propose a way to increase the virtual nodes.We carry out the traversal of the syntax tree by using the listener monitor of the ANTLR which provide the data for the application of particle swarm optimization algorithm.Thirdly,we propose three improvements according to the basic particle swarm optimization algorithm,because the algorithm is easy to fall into the local optimization.Namely,the improvement of particle coding.And the optimization of the diversity of particles and the convergence factor.Fourthly,we establish and analyze the anti-plagiarism detection system.The system is mainly composed of input module,preprocessing module,AST analysis module,similarity calculation module and output module.Fifthly,we use JAVA and springMVC to design the experimental procedure of homology analysis.Experiments show that the improved particle swarm optimization algorithm can improve the accuracy and efficiency of source code plagiarism.
Keywords/Search Tags:code plagiarism, abstract syntax tree, lexical analyzer, parser analyzer, particle swarm algorithm
PDF Full Text Request
Related items