| Binary code is the most prevalent form of software in the computer ecosystem and is widely used in various architectures.Code obfuscation techniques increase code complexity and the difficulty of being analyzed without compromising the original semantics of the code.As a result,malware developers use code obfuscation techniques,such as the O-LLVM obfuscation framework and virtualization obfuscation techniques,to make binary malware more difficult to analyze.Eliminating code obfuscation is a critical part of the code analysis process.Current deobfuscation methods rely on dynamic software analysis or simulated execution,but these methods have limitations.Dynamic software analysis relies on expert experience and domain knowledge,and for less experienced researchers,they need to learn how to use binary software debuggers?simulated execution techniques face the problem of manually simulating the execution environment and high runtime resource consumption.In addition,existing deobfuscation methods are mostly specific system and architecture and cannot be generalized to multiple architectures.The main work of this paper can be summarized in the following two aspects:(1)For the O-LLVM obfuscation framework,this paper proposes a generic automated multi-architecture deobfuscation framework referred to as GOAMD.GOAMD has strategies to deal with all three major obfuscation methods of O-LLVM.For instruction substitution obfuscation,GOAMD uses pattern matching.For bogus control flow obfuscation,GOAMD uses concolic execution techniques to obtain the call relations of basic blocks to reconstruct the original program and avoid dead loops during symbolic execution by assigning values to opaque predicates in advance.For control flow flattening obfuscation,GOAMD first classifies the basic blocks of the program,uses concolic execution techniques to obtain the invocation relationships between useful blocks,and reconstructs the original program based on these relationships.Experimental results show that the framework ensures the generation of functional programs that are more than 90% similar to unobfuscated programs.(2)For virtualization obfuscation based on LLVM framework,this paper proposes a static analysis method that relies on machine-independent optimization techniques.The program is reconstructed by analyzing the opcodes stored inside the program in data form after virtualization obfuscation.First,the virtual machine opcodes are extracted.The opcodes are subjected to control flow analysis to form a control flow graph.The control flow graph is simplified using data flow analysis methods.Other methods of obfuscation are eliminated,such as control flow flattening.Finally the program is refactored and recompiled to generate an executable binary program with the obfuscation eliminated.Experimental results show that this method can complete the devirtualization of virtualized obfuscated programs with more than 99% accuracy in a very short time.In this paper,the above work effectively solves a series of code obfuscation problems faced by binary software in the computer ecosystem,and provide innovative methods and ideas for software analysts. |