Font Size: a A A

Research On Binary Code Obfuscation From The Perspective Of Reverse Analysis

Posted on:2022-05-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y J ZhaoFull Text:PDF
GTID:1488306521464474Subject:Information security
Abstract/Summary:PDF Full Text Request
Obfuscation is an important technique for preventing reverse engineering from infringing software intellectual property.It transforms the program into a poorly readable form while ensuring the program's semantics unchanged.Although code obfuscation has been actively researched for quite a long time,how to systematically measure the effectiveness of an obfuscator remains an open problem.It has been proven that a perfect obfuscator does not exist.Thus,there is a widely accepted consensus that obfuscation aims to protect the code by making reverse engineering so technically difficult that it becomes impossible or,at the very least,economically inviable.It is proposed to evaluate an obfuscation technique with respect to four dimensions: potency,resilience,cost,and stealth.Reverse engineering is a process to extract design and implementation information by analyzing a software system,either in whole or in part.Code obfuscation and reverse analysis mutually restrict and promote each other.Starting research from the perspective of reverse analysis will give us some inspiration to improve the above four dimensions.However,there are also some limitations of existing achievements.First,the goal of code obfuscation is to reduce the readability of obfuscated codes.How to effectively quantify the “readability” to improve the potency of obfuscating transformation? Second,code obfuscation always reduces the possibility of being discovered by reverse analysts in a disguised way.How to effectively quantify this “possibility” to improve stealth? Third,code obfuscation essentially increases the difficulty of reverse analysis.How to effectively quantify this “difficulty” to improve resilience? Forth,security always comes at the expense of performance.How to balance the relationship between safety and performance when enabling obfuscation?So,this thesis studies obfuscation evaluation,obfuscation detection,and deobfuscation from the perspective of reverse analysis.The purpose is to guide how to improve the potency,stealth,resilience when designing obfuscation.The contribution of this thesis can be summarized as follows:(i)Research on obfuscation evaluation to improve potency.Potency describes how difficult to be analyzed and understood by human experts? This thesis proposes a reverse engineering-oriented quantitative evaluation method for the potency of obfuscation.First,the process of reverse engineering in our research is mainly divided into four key links,including disassembly,control flow analysis,data flow analysis,and decompilation.Then,it is proved that instruction is the fundamental factor of control flow and data flow.After that,we abstract two metrics with respect to instruction entropy and instruction N-gram,quantifying to what extent that code obfuscation can make attackers more difficult to perform.Finally,these two metrics are applied to kinds of code obfuscation transformation.Experimental results indicate that these two metrics can assess the effectiveness of obfuscating transformation and quantitatively compare the potency of different obfuscating transformations.(ii)Research on obfuscation detection to improve stealth.Stealth describes how difficult it is to find the target program is obfuscated and what types of obfuscation schemes may be used.However,obtaining such information is challenging without having access to the source code of the original program.This thesis presents a new way to estimate the obfuscation scheme of a compiled binary.It achieves this by using semantic information of the disassembled binary to predict if the program has been obfuscated and,if so,what type of obfuscation scheme may be used.At the core of our approach is a set of deep neural networks that can effectively characterize and leverage the contextual information available in the assembly code.Our models are first trained offline,and the learned models can then be applied to new previously unseen obfuscated binaries.We evaluate our approach by applying it to a large dataset of over 277,000 obfuscated samples with different individual obfuscation schemes and their combinations.Experimental results show that our approach is highly effective in identifying the obfuscation scheme,with a prediction accuracy of at least 83%(up to 98%).(iii)Research on deobfuscation to improve resilience.The deobfuscation based on program synthesis provides a good solution of treating the target program as a black box.Thus,deobfuscation becomes a problem of finding the shortest instruction sequence to synthesize a program with the same input-output behavior as the target program.Existing work has two limitations: assuming that obfuscated code snippets in the target program are known and using a stochastic search algorithm resulting in low efficiency.In this thesis,we propose a fine-grained obfuscation detection for locating obfuscated code snippets by machine learning.Besides,we also combine the program synthesis and a heuristic search algorithm of Nested Monte Carlo Search.We have applied a prototype implementation of our ideas to data obfuscation in different tools,including OLLVM and Tigress.Our experimental results suggest that this approach is highly effective in locating and deobfuscating the binaries with data obfuscation,with an accuracy of at least 90.34%.Compared with the state-of-the-art deobfuscation technique,our approach's efficiency has increased by 75%,with the success rate increased by 5%.(iv)Research on the balance of security and performance for proposing an obfuscating transform with good stealth,high resilience,strong potency,and modest cost.This thesis presents a novel approach for Android applications to transfer code virtualization from DEX level to the native level.Our approach contains two components: pre-compilation and compile-time virtualization.Pre-compilation is designed for performance improvement by identifying and decompiling the critical functions which consume a significant fraction of execution time.Compile-time virtualization builds upon the widely used LLVM compiler framework.It automatically translates the DEX bytecode into the common LLVM intermediate representations,where a unified code virtualization pass can be applied for DEX code.We have implemented a working prototype,Dex2 VM,of our technique and applied it to eight representative Android applications.Experimental results show that Dex2 VM can effectively protect the target code against a state-of-the-art code reverse engineering tool that is specifically designed for code virtualization,and it achieves good stealth and strong potency with only modest cost.
Keywords/Search Tags:Reverse analysis, Code obfuscation, Deobfuscation, Neural network, Program synthesis
PDF Full Text Request
Related items