Font Size: a A A

Research On Software Obfuscation Technique

Posted on:2009-12-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:X S ZhangFull Text:PDF
GTID:1118360272476567Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Software piracy has long been a confusing challenge to the software industry; especially with the popularity of the Internet today, and this threat is growing more seriously. Rapid sharing and spreading of knowledge and technology, free distribution of software analysis tools, simplification of operating systems, all these provide the breeding ground for the rampancy of software piracy. Software is a valuable form of data, representing significant intellectual property, and reverse engineering of software code by competitors may reveal important technological secrets, cause great harm to software developers and software providers. Because of its deterministic and self-cleared behaviour, as well as the envirmental dependency property, when running under a malicious host, software may be accessed and modified by infinite resources and tools, all useful information would be definitely exposed to the attacker, which brings about great difficulty to software protection.Everybody dealing with program understanding knows that, in many cases, even small programs require considerable efforts to reveal their meaning. Software obfuscation is received more attention gradually based on this viewpoint, which is a kind of software protection technique developed in the last decade. The practical goal of obfuscation is to make reverse engineering uneconomical by various semantic preserving transformations. It is sufficient that the program code be difficult to understand, requiring more effort from the attacker than writing a program with the same functionality from scratch. This paper makes a brief review of commonly used software protection methods firstly, and points out their strength and weakness respectively. Then the ordinary measures and classification of software obfuscation technique, as well as its main research direction currently focused on are introduced with emphasis. The corresponding research results are as follows:1. There is no universal software algorithm and layout obfuscator. Obfuscating theory tries to construct a strictly provable security based on cryptography, that is, expecting the un-analytical result of the obfuscated program under the virtual black-box assumption. However, Rice's Theory tells us that all non-trivial property of any language is undecidable, and a series of concrete problem extending from this meaning are also undecidable, such as the halting problem and the universal interpreter problem etc. For any given specific program, there are always some non-trivial properties that belong to the algorithm or the layout. It is impossible for an obfuscator to transform these non-trivial properties automatically. Thus, the obfuscator cannot change some inherent characteristics of the program. The applications of program obfuscation are notably diversified, and the objectives to be pursued in these applications are also different. When obfuscation is intended to turn a private-key cryptosystem into a public-key one it is expected that such transformation hides a secret key but not necessarily an implementation of the encryption algorithm. On the contrary, when obfuscation is designed for software protection against reverse engineering it is often bound to hide only the implementation of the most valuable algorithms but not a processed data. This means that the term"obfuscation problem"is but a generic name for the set of particular problems, each being determined by the specific security requirement. Two kinds of program classes are given, which are obfuscatable. This positive result comes from the normalization to the program, i.e. only the trivial property is preserved. However, for most programs, it is impossible to perform normalization.2. Quantitative analysis of resistance of data encodings. Data encoding is one kind of data obfuscation technique, which maps some variables into a series of new variables, and the corresponding arithmetic operation is performed in the new vector space. The original variable in the program usually has some sensible meanings in real world, which will lost in encode world, leading to a more obscure program that is hard to understand. By carefully choosing encoding methods and parameters, we can change the data-flow and control-flow of program, conceal the original arithmetic operation, and obtain better obfuscation result. On the premise of the adversary using only information from encoded world, we analyzed four kind of data encoding methods—linear encoding, mixed encoding, affine encoding and exclusive-or encoding, and get the result that mixed encoding has the strongest resisting ability, affine encoding and linear encoding take the second place, exclusive-or encoding is the weakest. However, the more resisting ability one encoding method has, the more calculations it will take, and the more remarkable influence it will have on the program performance; Further, if some un-appropriate parameters are chosen, it will cause some upper data overflow or lower data overflow, leading to a wrong result. Therefor, the choice of encoding method also relies on the concrete application's performance and security requirements, and compromise should be made between these two aspects.3. Java call-flow obfuscation technique. Due to its cross platform movability and network loading property, Java language has been the most widespread program language. In order to meet platform independent characteristic, Java introduces a form of symbolic linking technique, which stores all the type and interface information into classfile's constant pool. But this also facilitates decompilation at the same time, and makes valuable info easier to be found and stolen. Because of the"sandbox"characteristic of virtual machine, and the strong type feature of Java language, certain restriction will receive when obfuscating Java programs, such as stack cannot be accessed directly, and its own code cannot be modified dynamically. A Java call-flow obfuscation technique is proposed to encapsulates the signature (type and number of parameters, as well as the return type) of some methods in user-defined classes uniformly, extract their codes, transfer them into the object pool, interleave these codes into some interface's methods, and upcast all objects inherit from this interface to their base type. The program will send request to the object pool when it tries to call a merged method. The object pool will return a different object during each method call even request with the same parameters. This kind of transformation both disturbs the level info and drastically obscured the program flow by introducing object alias that cause the static analysis unable to determine the true method being called. By introducing universal hashing, each time the program startup, it will construct the object pool dynamically using randomly generated parameters, which makes this technique able to resist dynamic analysis to a certain extent. Experimental result shows there is little influence on the program size and performance.4. Java instruction obfuscation. With the popularity of smart phone, handset Java game software market is growing gradually. Free from time, location and option, handset game is attracting more and more users. It is more and more valuable to protect this kind of software. Till now, nearly all Java Virtual Machines on mobile phone are customized based on J2ME as their prototype, which makes Java software migration among different mobile phone models much easier without modifications or with small modification. We present the instruction obfuscation scheme targeting primarily mobile phone Java applications. The basic idea is: if the interpretation of program is obscure, it cannot be understood by an adversary, and the interpreted program itself is also obscure, because the user lacks information of how to read them. Instruction set is divided into subgroups according to their operation on stack, and one-to-many map is constructed among all instructions in the same subgroup. These specific permutation rules are stored in the auxiliary input encrypted using the SIM card info, and saved in the code attribute table. The interpreter integrated into the Java Virtual Machine determines the real semantics of the replaced instruction according to the auxiliary input. In this way, each user has the unique version for the same application, which can effectively protect application program from being generally attacked due to attack on specific smart phone. Analysis shows that the permutation space is sufficiently large, and experimental result shows there is little influence on the program performance.
Keywords/Search Tags:Software protection, Software obfuscation, Virtual black-box, Data encoding, Call-flow
PDF Full Text Request
Related items