Font Size: a A A

Code Similarity Analysis Research Based On Program Features

Posted on:2022-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:S Y TianFull Text:PDF
GTID:2518306338986959Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Code clone is a common development method in the software development process.As technologies such as open source components,code reuse technologies and development frameworks play an increasingly important role in program development,the number of clone codes has also grown rapidly.Although code clone improves the efficiency of program development to a certain extent,it also has some negative impacts on software management and maintenance,including code library redundancy,software defect propagation,and malicious code propagation.Code similarity analysis technology aims to detect the clone code in an automated way and reduce the negative impact of the cloned code.At the same time,with the continuous development of the software industry,code similarity analysis technology has been increasingly used in software copyright protection,software security,program plagiarism detection and other fields.Therefore,research on efficient and accurate code similarity analysis technology has become an increasingly important issue.In the traditional code similarity analysis technologies,the method based on text content is more efficient,but it ignores information such as lexical and grammar,and the accuracy is poor;the detection scheme based on grammatical structure such as graph and tree consume too much time,and the detection granularity is too high.It is often limited to the function level and complex statement block level,and it is difficult to apply to the actual production environment.At the same time,for the clone code that contains large-scale insertion and modification,traditional solutions often cannot effectively identify and analyze it.Aiming at the pain points of the existing technologies,this paper proposes a code similarity analysis study based on program attribute features,symbolic features and context features,making full use of the effective information of the program as much as possible to improve the detection accuracy,while considering the execution efficiency of the similarity analysis process.(1)In terms of program feature extraction,this article is based on the program abstract syntax tree.Among them,the context feature solves the problem that it is difficult to identify too much interference content in the clone code in the previous scheme,and improves the completeness and accuracy of the analysis.This is also the innovation of this article;the symbolic feature and the attribute feature include the program vocabulary information and attribute information;these three characteristics provide support for subsequent similarity analysis.(2)In terms of code similarity analysis,this article implements similarity analysis based on the corresponding symbol feature distance algorithm for code fragments,effectively solving the impact of inserting code modification,changing variable names,modifying code layout,etc.,and ensuring the analysis process effectiveness.(3)For the problem that it is difficult to detect the clone code that contains large-scale insertion and modification in the previous schemes,this paper proposes an algorithm to merge code fragments based on the program context features,which can solve this problem well.Based on the above research,this article uses the software security analysis and verification tool system of our laboratory as the basis for secondary development,builds the code similarity analysis subsystem,and provides services to users through client software.Through multiple open source projects,feature extraction tests,similarity analysis tests,and horizontal comparison tests of similar products were carried out to verify that the system functions meet the expected requirements.
Keywords/Search Tags:code clone, program feature, feature extraction, code similarity analysis
PDF Full Text Request
Related items