Font Size: a A A

Code Homology Detection Method For Java Source Code

Posted on:2021-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y XiongFull Text:PDF
GTID:2428330632963031Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the continuous development of the computer industry,numerous and complex computer software systems continue to emerge,these software as people's Daily "necessities" have been integrated into all aspects of our life,work,learning,at the same time,the social demand for software developers is also increasing year by year.As the core of software,source code is the hard work of developers,and also the crystallization of wisdom and art.However,due to the versatility of some software functions,there are not only plagiarism in design,but also plagiarism in underlying source code among different software,which leads to frequent software infringement cases.The homology analysis technology of code can be used to compare the similarity between codes,which has been widely used in code plagiarism,software knowledge copyright protection,vulnerability detection and other fields.Based on the systematic study of existing methods and tools,this paper proposes two improved homology analysis methods based on sequence and syntax tree for Java source code.The first method is sequence based alignment.In addition to the general preprocessing mechanism,this method also combines the characteristics of Java language itself,and uses the methods of compilation and decompilation,code optimization,and less expensive operation syntax tree to replace and transform several special syntax of Java language,so as to improve its ability of homology analysis under the same semantics of different syntax.At the same time,a filtering mechanism is designed to reduce the size of candidate comparison set by using similarity measurement model.Finally,combining the longest common subsequence and the longest common substring algorithm can improve the detection efficiency of the algorithm while ensuring the detection rate.In the second method,the algorithm of source code syntax tree level comparison is optimized.In the traditional tree edit distance algorithm,TF-IDF based edit distance algorithm with weight tree is proposed.The algorithm calculates the weight of different nodes in the syntax tree,and the cost of editing the corresponding node in the tree corresponds to the weight of the node.Considering the importance of different nodes in the tree,compared with the traditional tree editing distance algorithm with fixed or empirical weights,the method in this paper has better adaptability.
Keywords/Search Tags:java, code transformations, tree edit distance, tf-idf, homology detection
PDF Full Text Request
Related items