Font Size: a A A

Research On Code Obfuscation Model

Posted on:2016-04-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y B YangFull Text:PDF
GTID:1108330482457815Subject:Information security
Abstract/Summary:PDF Full Text Request
As a result of the further development and maturity of the IT industry, instruments such as computers and software system have been identified as the necessary tools for daily life that could improve the work efficiency significantly. Certainly, as a result, how to improve the security of the software system becomes a general concern for all the security software institutions and manufacturers. The technique of code obfuscation, as the mainstream technology in the software security area, has been gradually and widely accepted as a mechanism for protection of the software system. Resorted to the code obfuscation technology, the software system is allowed to protect its internal significant information and key algorithms with its own program logic. Thus, without the interference of any security hardware or secret keys, the security of the software system could be guaranteed, which in turn safeguards the benefits of the software developers or users.The recent years witness the progresses of the code obfuscation technique and the research achievements in this area, which effectively promote the intellectual property right protection and data protection inside the software system. However, there are still many problems confusing the researchers.This paper focuses on the obfuscation preprocessing model based on the CO-Vine IL (Vine Intermediate Language for Code Obfuscation), the static obfuscation algorithm based on SJT-BF (Second Jump and Transform based on Branch Function), the dynamic obfuscation algorithm based on the demand-driven symbol execution, and the dynamic-static obfuscation algorithm based on multipoint functions, in an effort to investigate how to improve the obfuscation effect of the binary programs from different perspectives. The research efforts and possible contributions are detailed as follow:First, there are two kinds of the obfuscation for original programs, known as the source code obfuscation and binary program obfuscation. This paper mainly emphasizes the binary program obfuscation in which the translation of binary programs into the assembly language is identified as a precondition for obfuscation preprocessing. However, due to the failure of the assembly code to provide the detailed information regarding the implementation of the program, it is easy to encounter higher error rate on the analysis and the lower efficiency. Facing this problem, this paper comes up with the obfuscation preprocessing model based on CO-Vine IL as a solution in which CO-Vine IL serves as an intermediate form to provide more information regarding the program register, internal memory and variables to the obfuscation preprocessing and to improve the accuracy of the algorithm for obfuscation preprocessing. Meanwhile, with this model, the control flow graph could be clearer, which is favorable to improve the efficiency.Second, the static code obfuscation technique for binary programs is mainly concentrated in the assembly layer, which effectively resists the linear sweep and recursive traversal as the disassembly methods. The complicated control flow is currently the major method for obfuscation. Also, in case that the control flow becomes more complicated, the program performance will be impaired. This research contributes a static obfuscation algorithm based on SJT-BF in which Junk Instruction will be applied to delay the resynchronization of the linear sweep. Also, the Branch Function, through the obfuscation on the JMP instruction, can effectively resist the recursion. As a result, the complication of control flow could be reduced through the data reutilization of the Junk Instruction and the deformation and concealment of the Branch Function. Thus, the obfuscation effect can be fully guaranteed while the program performance could be effectively improved. The design and practice of this algorithm shall be based on the assembly layer, with instructions location of the algorithm based on the CO-Vine IL. A comparison suggested that the performance cost would only increase around 2% under the circumstances that the average obfuscation resilience went up by 20%.Third, there are two kinds of the dynamic code obfuscation techniques, including the self-modifying code and the dynamic path execution. The self-modifying code method is flawed as there would be the window period for the instruction recovery when the program is running, which makes it easy to be traced. Although the dynamic path execution method successfully avoids such a flaw, its fixed path distribution function is also easy to be located. Also, the program performance will drop down as these two methods both run the additional obfuscation operation. Based on the dynamic path execution method, this paper comes up with the demand-driven symbol execution for dynamic obfuscation algorithm, which will create a new execution path based on the conjunction of jump nodes. Also, the obfuscation for programs could be achieved through the path explosion while the dynamic distribution of jump nodes could allow the distribution function to be free from being traced. The status of the path execution will be stored in the node summary information to avoid duplicate calculation and improve the execution efficiency. The demand-driven information will be utilized to reduce the times to execute the key nodes and improve the black box characteristic of the program. Experiments suggested that as a result of the linear growth of the number of jump nodes, the execution path generated by the reverse instruments and the traversal time will be increase dramatically, which substantially makes the recovery of the program more difficult.At last, the opaque predicates remain the key technology for realizing the code obfuscation. However, there are two flaws with the opaque predicates. First, the functional logic of the opaque predicates is independent and isolated from each other and thus remains fragile. Second, the insufficient strength of the black box characteristic will impair the obfuscation effect. This paper therefore puts forth the dynamic-static obfuscation algorithm based on multipoint function, which utilizes the black box characteristic and the convergence characteristic of the multipoint function to intensify the obfuscation. Also, the multi-input multi-output system of the multipoint function will be a key mechanism to strengthen the correlation between opaque predicates, in an attempt to avoid the impairment of obfuscation effect caused by the fragility of the opaque predicates. Second, this algorithm serves as a kind of functional algorithm in the code obfuscation model, which could make use of the logical feature of the algorithm to integrate the static and dynamic obfuscation algorithms. Also, the integration of the dynamic obfuscation with the static one will improve the obfuscation effect. A comparison among this algorithm、the AA algorithm and the PA algorithm suggested that the obfuscation resilience will be improved by 20% under the circumstance that the average obfuscation potency goes up by 10% while the performance cost only went up by linear growth.
Keywords/Search Tags:Code Obfuscation, Intermediate Language, Symbolic Execution, Opaque Predicate, Muhi-point Function, Static-dynamic Combined
PDF Full Text Request
Related items