Font Size: a A A

A Research On Key Technologies Of Concolic Testing

Posted on:2017-04-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:X LeiFull Text:PDF
GTID:1318330518494029Subject:Information security
Abstract/Summary:PDF Full Text Request
As the important strategic sources,all kinds of game behaviors in cyber security including vulnerability detection, malware behavior analysis, and automatic model extraction gain positive attentions from security analysts and hackers. How to exploit vulnerability and detect malware behavior effectively through software testing is the basic method to solve software defects as well as a research emphasis in software security. For different goals of security analysis,vulnerability detection,malware behavior analysis, and automatic model extraction are all based on the automatic generation for test cases with typical features. Therefore, the efficiency of security analysis depends on the quality of test cases and the generation of automation degree.It is worth noting that concolic testing solves the problem of automatic generation for effective test cases covering specific execution paths. With high code coverage, low redundant test cases and high automated testing, concolic testing contributes to security analysts’ understanding of data flow, conditions and modes for malicious behaviors and so on. On one hand, in the process of vulnerability exploiting, researchers use concolic testing to capture data constraints precisely in complex data dependency and record path constraints in the runtime,thus to further determine the path feasibility and completeness of path constraints,which is the key to detect vulnerabilities with various vulnerability models. On the other hand,in the process of malware analysis and defense,researchers use concolic testing to accurately detect conditions for malicious code with some searching strategy and to capture malicious behaviors.Concolic testing is a mixed program analysis technology, which is able to cover special execution paths in program analysis theory. With the combination of abstraction,inference and model checking, concolic testing provides the basis for security field including program optimization, vulnerability analysis and detection, and malware behavior analysis. The term "concolic" is composed by two words "concrete" and"symbolic",representing a combination of two technologies,which are fuzzing and symbolic execution. Both of them are the typical technologies in the field of static and dynamic program analysis. The key idea behind symbolic execution is to emulate running the program with symbolic rather than concrete inputs and to represent the values of program variables as symbolic expressions over symbolic inputs. On the contrary,the essence of fuzzing is to execute the program with an initial random or a given input. Although both of them are path oriented,symbolic execution is not running program with path explosion,and fuzzing has a large amount of random inputs driven the same path with redundant test cases. Hence, researchers have been focused on the combination of their advantages.Nevertheless, the main reason that concolic testing has not been used at large scale in industry is that there are still some challenges, such as path explosion, environmental problem,complex mathematical function problem, symbol pointer, intermediate language optimization and floating-point computation, etc. The above challenges hinder the development of concolic testing, so that concolic testing fails to achieve 100%path coverage, and is doubted on the efficiency and accuracy. In this thesis, this article aims to significantly improve the performance and usability of concolic testing and research on three typical problems, which are intermediate language optimization,environmental problem and complex mathematical function problem. In summary,this dissertation makes contributions as follows:Firstly, this dissertation proposes a systematic approach for automatically detecting and eliminating redundant Eflags to optimize intermediate language. The goal of intermediate language optimization is to increases the efficiency of the subsequent analysis by simplifying the structure and deleting redundant instructions of intermediate language. This paper analyzes the reason why redundant Eflags exist in intermediate language and the relationship between Eflags and assembly language. Then, an indexing model for x86 instructions is established, according to which the procedure of redundant Eflags detection and elimination are discussed. Moreover, this article proposes a systematic approach for automatically detecting and eliminating redundant Eflags and implements the approach on top of Vine IL. The experiment results demonstrate that our approach can diminish the volume of Vine IL obviously, and provide accurate representation about the assembly code. The approach includes the following three parts. The first one is the indexing model for x86 instructions basing on the relationship between Eflags and assembly language. The second is the algorithm for inversely linear-sweep instructions of code segment and constructing flag relevant chain (FRC) by context. The last part is the design of FRC optimization strategy, so the system is able to filter redundant Eflags and maximally simplify intermediate language.Secondly, this dissertation proposes a unified approach for identifying and modifying outputs of the data input interface function (DIIF) to address environmental problem. Environmental problem refers to analyze the essence of the phenomenon that partial codes which are explored in particular exterior environments caused by DIIF and aims to discover hidden paths of environment-intensive program. The dissertation discusses the reasons for the environmental problem caused by DIIF and analyzes the process of identification and modification for DIIF. Based on two different instrumentation tools, the prototype to find hidden paths and potential vulnerabilities for environment-intensive programs is developed. The experiment results show that our approach can identify the DIIF precisely and discover hidden path obviously. The approach includes the following three parts. The first one is the definition of DIIF according to its parameters with output characteristics,based on which we build indexing models for DIIF from Windows SDK and C standard library. The second is the algorithm of a fine-grained taint analysis for detecting and marking the data that interacts with the environment in the memory. The last one is precise correction for the output parameter of DIIF according to its function signature.Lastly,this dissertation proposes a unified approach for identifying and modifying return values of complex mathematical function (CMF) based on local fuzzing to address unsolved paths. CMF problem refers to analyze the essence of the phenomenon that the constraint solver fails to understand complex logic operation of CMF which leads to a large number paths unsolved. The article analyzes the characteristic of CMF and discusses the reason why constraint solver fails on the path with CMF. Then a return value model for CMF is established and an approach to trace and detect the return value of CMF during the runtime is proposed. This dissertation designs and implements the prototype on the binary instrumentation tool in the user-mode. The experiment results show that our approach can precisely revised the return value of CMF and significantly reduce the number of path unsolved. This approach includes the following three parts. The first one is the algorithm of a fine-grained taint analysis for tracing and marking the return value of CMF based on the return value model in the memory.According to the essence of black-box function, once identified a CMF, we only focus on the predicates at the conditional branch with the return value of the corresponding CMF. The second one is the path searching strategy for priority exploring the path with CMF based on the depth first algorithm. The last one is the local fuzzing test for CMF to obtain the range of CMF return value, which is able to avoid infeasible paths being explored.
Keywords/Search Tags:dynamic symbolic execution, intermediate language, environmental problem, complex mathematical functions
PDF Full Text Request
Related items