Font Size: a A A

Research On ACFG-Based Binary Clone Detection Against Code Obfuscation

Posted on:2020-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:L L ZhengFull Text:PDF
GTID:2428330590958351Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
In recent years,the popularity of the Internet has brought great convenience to people's lives.Various types of software have emerged in an endless stream,and the number has exploded.This has also brought many problems,such as code plagiarism and malicious code.To solve such problems,reverse engineering has become an indispensable technology to detect software infringement,malicious code variants,etc.by identifying duplicate or similar fragments of unknown code and known code.However,as the various obfuscation tools become more and more mature,if the logic-similar programs are confused by the tools,the reverse-disassembled code is very different in both the opcode and the program structure.Existing techniques for detecting binary similarity,such as judging whether the control flow charts are isomorphism,or whether the text is similar by longest common subsequence,etc.,are no longer resistant to attack by confusing tools.Therefore,research of an binary clone detection method against code obfuscation has become an important direction in the field of software copyright protection.The paper designs a binary clone detection system against code obfuscation for C language.First,the obfuscated binary is reversed,and the function is converted into a digital vector that can describe its functional features and confusing features,including control flow chart features,operation code features describing the basic blocks of the flow chart,and constant features,which is to convert Control Flow Chart into Attribute Control Flow Chart.Secondly,the XGBoost algorithm is used to learn the vector features of different obfuscated strategies,and the model is saved for the classification detection of unknown confusion vectors.Then,for different confusion strategies,the system performs targeted shelling on the feature vector,and based on the Manhattan distance algorithm,a similarity calculation method,which is more suitable for the system,is designed for the cloning detection of the feature vector after shelling.Finally,according to the actual situation of the system,the XGBoost model parameters are optimized to improve the classification detection accuracy.Compared with the traditional code cloning detection algorithm and the recent popular natural language processing algorithm,the system for clone detection against obfuscation has great advantages in performance,accuracy and scalability.The data set contains 21249 functions.The experimental results show that the model training time of XGBoost model is about 24 s,the average detection rate is 41us/func,the accuracy of XGBoost algorithm is 87.56%,and the optimization is 88.80%.The accuracy of clone detection about the Instructions Substitution?Bogus Control Flow and Control Flow Flattening were 99.50%,89.44% and 77.16% respectively.Due to the feature extraction,classification and similarity comparison algorithm,the system supports the extension of the detectable confusion strategy.
Keywords/Search Tags:Clone detection, Against code obfuscation, Binary, Obfuscation classification, Attribute Control Flow Chart
PDF Full Text Request
Related items