Font Size: a A A

Research On Key Technologies Of Malicious Code Binary Program Behavior Analysis

Posted on:2013-06-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:J X ZhongFull Text:PDF
GTID:1228330374499557Subject:Information security
Abstract/Summary:PDF Full Text Request
With the booming of information technology, the Internet has become an indispensable part in people’s daily life. The Internet significantly benefits people’s life, but it also makes people frequently attached by malicious code. When the generating and propagating of malicious code becomes a black industrial chain, the amount of malicious code becomes greater and its producing speed becomes more rapid. Therefore, the negative affect from malicious code in Internet security becomes more significant, which even threatens the Internet security of our country.Due to the severely security menace brought by malicious code, the analysis to malicious code has aroused great attentions of security organizations and manufactures. In specific the malicious code analysis based on binary executable programs has been widely applied in daily analysis works. This enables the security analysis people to explore the malicious code in a quick response. In that way, the propagation of malicious code can be well suppressed, which protects the rights of legal users from being intruded by malicious code.With the development of Internet technology and the appearing of the black industrial chain, the self-protecting of malicious code is getting matured. This makes the current malicious code analysis technology not satisfy the need of malicious code analysis and detection. Therefore, there are great challenges for current malicious code analysis technology.First, kernel programming technology has been gradually applied to malicious code, which makes the malicious code deeply hide in the system. The malicious code has become more difficult to be detected than ever before by using kernel programming technology, which makes it difficult to be detected by malicious code analysis tools. Moreever, The malicious acts are also hard to be perceived by common users in the application layer of the system, which makes it more difficult to detect and analyze the malicious code. How to analyze the kernel-level malicious code efficiently has become a great challenge in current research.Second, when performing the malevolence analysis, the description of the malicious code and acts are all represented in the assembly language。 The assembly codes are usually hard to be interpretable, which costs lots of time and efforts for the security analysis people to understand them. This will lower the working efficiency and at the same increase the possibility of making mistakes. Currently, there is no appropriate intermediate language for describing the behaviors of malicious codes, and all the behavioral analysis research results could not be clearly expressed by the formalization description method. Therefore, to build an intermediate language for malicious code analysis has become another hot topic for analysis technology.Finally, the establishment of the underground industrial chain rapids the generating of malicious codes. Homologous codes are sharply increased by the spread of the Polymorphism and Code Morphing Technology, which requests higher standards for the efficiency of malicious code analysis. This also requires the detection and analysis of malicious code to be finished as soon as possible to suppress the rapid propagating of malicious code. At the same, the self-protecting of malicious code will produce a large number of paths branch when they are analyzed. For example, code obfuscation techniques will greatly increase the number of executable directories, causing the so-called directories explosion problem. This problem will severely limit the efficiency and accuracy of analysis. Therefore, how to promise the accuracy of malicious code analysis without lowering the efficiency is another topic of current research.Although the current malicious code behavior analysis technology has been proposed and developed for several years, there are still various drawbacks which cannot satisfy the needs. Those drawbacks include several folds.First, for current malicious code analysis, most methods describe the behavior in the bottom level using assembly language. The assembly language is usually hard to be understood intuitively. Although some researchers utilize the intermediate language of the complier to abstract the behavior, because of the intermediate language focus on positive compiled to program execution and Lack of understanding of decompilation technology effectively. So it leads to a lot of the problems in the compiler cannot be correct description. This causes the incompleteness of an abstractive description for malicious code behavior and the incorrectness for showing a complete pipeline that how the malicious code operates. Also due to the lack of intermediate language, lots of analysis methods stuck at the technical details level, which cannot show a comprehensive and complete pipeline of the analysis method.Second, no matter dynamic or static behavior analysis, current technology usually focuses on monitoring the function calling of malicious codes, and do not watch the changes in memory and registers. Due to the lack of deep exploring in bottom level of operating system, the behavior of the kernel-level malicious code cannot be monitored efficiently. Although there are existing malicious code analysis tools which can monitor the kernel system call, the changes in memory and registers are only monitored for kernel-level functions instead of data change. This approach cannot recognize and analyze the malicious code written in kernel.Finally, the authors of malicious codes also make efforts to improve their codes for self-protecting and the avoiding of being analyzed. Especially, the code obfuscation technique will produce lots of executable directory branches when the malicious codes are being analyzed. Most those branches are redundant directories, which severely increase the branches of program. This will cause directory explosion to make the analysis fail or cannot be finished in a tolerable time. In that case, the analysis cannot cover all conditions triggering the malicious behavior, which influences the completeness of malicious code analysis and costs huge time. This thesis discusses and solves the above problems. The main contents and contributions include:To deal with the lacking of an intermediate language with formalization, the MDIL has been proposed. MDIL can be effectively used in formalization for malicious code binary behavior analysis. The operating of malicious codes in the bottom level of the operating system, the symbolic abstract of the memory and registers can be formalization effectively by MDIL. MDIL also conveniently demonstrate all the details of malicious code binary analysis, which facilitates the pipeline of analysis.For the lack of monitoring to the bottom level in current malicious code analysis tool, a malicious code behavior analysis model based on MDIL (Malicious code Detection Intermediate Language) is proposed. The proposed method detects the behavior of malicious codes by tracking the behavior of binary program, monitoring the changes in memory and registers and performing formal analysis in intermediate language level. Through the binary program analysis to malicious code, the monitoring to the bottom level of operating system, the tracking of memory and registers, the proposed method effectively solves the problem that the the kernel-level malicious code cannot be analyzed effectively. Meanwhile, the similarity metric measurement method improves the accuracy of malicious code detection and lowers the false rate.For the efficiency problem in current method, a binary program analysis method based on Partitioned Symbolic Execution (PSE) model has been proposed to shortening the running time to meet the needs of increasing malicious codes. The proposed method is defined on MDIL, and subdivides the program into several small units, which can analyze one by one. For each program unit, due to the decreasing of branches, the directory explosion problem becomes less significant, thus the computation with high time complexity can be avoided. As a result, the total analysis time equals to the sum of time for all units, and the total time consuming is reduced. The experiments result shows the effectiveness of the proposed method in solving the directory explosion problem comparing to existing methods. The enhanced machine learning techniques have been integrated to get the optimal subdivision strategy.To deal with the directory explosion problem owing to the self-protecting of malicious codes, a malware analysis method based on symbolic execution tree has been proposed. This method builds the tectonic symbol executive tree to constraint the directory conditions for malicious codes. By making them as sink node, the repeated traversing can be efficiently avoided to reduce the number of branches and solve the directory explosion problem. At the same time, the malware analysis method based on symbolic execution tree also has advantages in time complexity, which significantly speeds up the analysis. The experiments prove that the malware analysis method based on symbolic execution tree can shorten the time cost for malicious code analysis at a great scale to improve the efficiency and completeness.To verify the methods in real malicious analysis detection application, and discover the technical problems in implementation, a prototype system has been developed, based on the analysis model, use a binary program analysis method based on Partitioned Symbolic Execution (PSE) and the malware analysis method based on symbolic execution tree proposed in this thesis.
Keywords/Search Tags:Malicious Code Behavior, Intermediate Language, Binary Analysis, Path Explosion, Symbolic Execution
PDF Full Text Request
Related items