Research On Key Technology About Coverage-guided Fuzzing

Posted on:2021-05-05

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Y C Wang

Full Text:PDF

GTID:1488306230472064

Subject:Cyberspace security

Abstract/Summary:

PDF Full Text Request

Software vulnerabilities usually refer to software errors or defects that can be maliciously exploited.They are the main threats to the security and reliability of information systems.Many well-known Internet security incidents are directly related to the vulnerability exploitation.With the increasing scale and complexity of software,how to discover hidden vulnerabilities in them efficiently is a very challenging and important task.Coverage-guided fuzzing is one of the current hot topics in both academia and industry.It has been proved that it is the most successful automatic dynamic vulnerability detection method and it has found a large number of high-risk vulnerabilities in many mainstream software which help vendors to guarantee the security of their software.Through the analysis of the current state-of-the-art coverage-guided fuzzing tools,this paper finds and summarizes four key issues that limit the effectiveness of vulnerability discovery:(1)Inefficiency of vulnerability discovery caused by insensitivity to program paths;(2)Poor quality of mutated samples caused by unawareness of the input format;(3)Insufficient testing for deep code caused by unawareness of semantics of structured input;(4)Low code coverage caused by a large number of complex branch constraints in the programs.In order to alleviate the above limitations,this paper proposes an improved coverage-guided fuzzing framework,which optimizes seed selection strategies,seed mutation strategies and testing methods from the perspectives of program understanding,input awareness,and method integration,which can help improve the ability of effective sample generation,code coverage and vulnerability triggering,thus achieving the overall improvement of vulnerability discovery efficiency.This framework found a large number of unknown vulnerabilities in a variety of real applications and third-party libraries.Most of these vulnerabilities have been confirmed by corresponding vendor and twenty-one of them are collected by CVE(Common Vulnerabilities and Exposures).The main work of this paper is summarized as follows:1.This paper proposes a path-sensitive seed selection and energy allocation strategy.The current fuzzing method is not sensitive to the program execution path,tests all paths indiscriminately,and uniformly allocates mutation energy,which results in low efficiency of vulnerability discovery.For this reason,from the perspective of program understanding,this paper uses deep neural networks to optimize the seed selection strategy of current fuzzing tools.First,leverage the deep neural network's feature learning capability to learn a potential vulnerability pattern from a large number of labeled vulnerable and non-vulnerable paths to obtain a prediction model,and then this model is used to guide the online seed selection,and perform vulnerability prediction on the paths triggered by mutated seeds.The seeds that trigger the vulnerable paths will be prioritized and assigned more mutation energy in the next iteration of testing,while reducing the mutation energy of the non-vulnerable seeds.This method can guarantee that more effort is focused on vulnerable code area especially within limited time budget,thereby speeding up vulnerability discovery.Experiments show that this method can find more vulnerabilities faster,which greatly improves the efficiency of vulnerability discovery.2.This paper proposes a field-aware seed mutation strategy.The current fuzzing is not aware of the input format,sees the input as a sequence of continuous binary byte,treats all bytes equally and mutates them blindly,resulting in poor quality of generated samples and reducing the efficiency of triggering vulnerabilities.Therefore,this paper optimizes the seed mutation strategy from the perspective of input understanding.First,extract the boundary and type information of the input field automatically with the existing input specifications,then use the code risk assessment method to evaluate the importance of the field according to the field's usage behaviour in the program,and finally based on the obtained field boundary,type and importance information,we can achieve field-level mutation,prioritize more important fields for mutation,and adopt the best mutation strategy according to the field type to improve the quality of generated sample and the capability of triggering vulnerability.Experiments show that this method can find more code vulnerabilities with less but more meaningful input.3.This paper proposes a semantic-aware sample generation strategy for structured inputs.Current fuzzing tools that use bit/byte flipping as main mutation strategy are difficult to apply to structured input,because they struggle to pass the stage of program syntax checking.Although grammar-based fuzzing can improve the grammar validity of generated samples to a certain extent,a lot of them have runtime errors due to its insensitivity to code semantics.To this end,this paper optimizes the sample generation strategy for structured input and gives the fuzzers ability of syntax and semantic awareness.This paper takes the currently widely used Java Script code as research target.First,a large number of JS samples are parsed to obtain the abstract syntax tree.Code components are built by code fragmenting and the grammar rules are automatically learned.Then replace the subtree of the syntax tree based on obtained code components and learned grammar rules.And finally repair the wrong semantics based on the code context and test the JS engine with in the newly generated samples.Experiments show that this method can generate more samples with valid syntax and semantics,thereby improving the vulnerability discovery ability in deep layer of code.4.This paper proposes a hard-to-reach branch driven hybrid fuzzing method.Usually,real-world programs contain a large number of complex branch constraints,which makes it difficult to find some code paths hidden behind them by fuzzing alone.Although the current hybrid fuzzing method can alleviate this problem to some extent,it is extremely inefficient or difficult to apply to real-world applications,which greatly limits its scalability.To this end,this paper proposes a hard-to-reach branch driven fuzzing method.By analyzing the branch coverage information during fuzzing and using lightweight program analysis methods to extract candidate hard branches,then the hardest branch is prioritized to be solved by concolic execution.At the same time,in order to reduce the overhead of symbolic emulation,dynamic taint analysis is used to identify input bytes related to the target branch to be flipped and only symbolize them,thereby greatly reducing the number of symbolic instructions that need to be emulated during concolic execution.Experiments show that this method can effectively improve the efficiency of current hybrid fuzzing tools and achieve higher code coverage.

Keywords/Search Tags:

Vulnerability Discovery, Fuzzing, Seed Selection, Seed Mutation, Concolic Execution

PDF Full Text Request

Related items

1	Optimization And Improvement Of Grey-box Fuzzing Techniques
2	An Automated Vulnerability Verification System For C/C++ Programs
3	Research On Vulnerability Detection Technology Based On Fuzzing Test And Concolic Execution
4	Fuzz Based On Analysis Of The Potential Of Seeds And Identification Of Key Data
5	Research On Seed Selection Strategy Of White-box Fuzzing Based On Combination Of Dynamic And Static Analysis
6	Research On Optimization Technology Of Fuzzing Test Based On Key Block Inference Network
7	Research And Implementation Of Fuzzing Based On Symbolic Execution
8	Research On Greybox Fuzzing Test Method Integrating Heat Among Seeds
9	Research And Implementation Of Concolic Execution Assisted Fuzzing Technique
10	Research Of Vulnerability Discovery Based On Fuzzing