With the increase in software complexity,the probability of software errors gradually increases,the number of software security vulnerabilities continues to increase,and the security situation in cyberspace is becoming increasingly severe.As a fast and efficient vulnerability mining technology,fuzzing has received increasing attention from security researchers.However,many existing vulnerability mining solutions are based on source code instrument and compile.It cannot be used without source code,not to achieve the expected vulnerability mining effect.In recent years,researchers use symbolic execution,dynamic taint analysis,machine learning algorithms and other technologies combined with fuzzing to improve the quality of newly generated testcases,although these technologies can alleviate shortcomings of low coverage when fuzzing for unsourced binary programs.However,a bring in of new technologies will also bring additional challenges to the fuzzing system from different perspectives,such as path explosion problems caused by symbolic execution,and memory and performance consumption problems caused by dynamic stain analysis.In order to solve the challenge of input verification,unintelligent seed mutation and low code coverage in unsourced fuzzing.Taking account into the efficiency of fuzzing vulnerability mining while introducing new technologies,this thesis starts with the initial seeds generation and fuzzing mutation strategy.Based on the three key challenges of "difficult to pass verification,how to mutate to generate a better set of testcases,and how to improve code coverage" faced by the fuzzing model,we use static analysis,dynamic analysis,information extraction,biological genetic mutation and multi-dimensional mutation strategies,respectively put forward three testcase generation optimization technologies.The main work and contributions of this thesis are as follows:1.Proposing and implementing a seed files generation technology based on lightweight dynamic taint analysis.Aiming at the problems of weak penetration and low coverage when the testcases face complex logic code blocks generated by the current fuzzing test system,we first analyze and research the solutions of the existing fuzzing system.On the basis of rigorous research and analysis,we introduced lightweight dynamic taint analysis technology to heuristically generate initial seed sets to quickly generate valuable initial seed files.This thesis uses lightweight dynamic taint analysis technology to quickly and effectively generating highquality initial seed files to improve the code coverage of newly generated testcases in fuzzing system.2.Proposing and implementing magic byte bypasses technology based on static analysis.In view of the problems that the current binary program-based fuzzer has a blind mutation strategy,cannot obtain the internal detailed information on target programs,cannot pass the verification protection and execute the deep-level protected path,we introduce static analysis techniques and conduct research.We use static analysis technology to extract the instruction type and its corresponding operand content,then pass the static information to the mutation stage to improve the speed of new generated testcase bypassing the complicated internal verification of binary program.3.Proposing and implementing intelligent optimization technology for fuzzing testcases generation.In view of the fact that it is difficult for black-box fuzzing to use one-dimensional mutation generation to bypass the complex verification in the application,we propose a multidimensional mutation strategy based on gene mutation and cross-exchange to improve the divergence of testcases,and avoid fuzzer falling into some code block of a long time to make meaningless mutations.For the question about when to call multi-dimensional mutation strategy,we introduce particle swarm optimization algorithm for mutation operator scheduling.In order to prevent the fuzzer from executing the high-frequency branch for a long time and misses the execution of the effective low-frequency branch,we introduce a rare branch-first seed file selection strategy to improve the performance score of rare branch seeds,so that system can spend more time testing rare branch that may trigger vulnerabilities.Finally,we design and implement the prototype system Ex AFL for fuzzing testcase generation optimization technology,and comprehensively evaluate the system on three different types of data sets: lava-m,cgc and real programs,verifying that the system is lightweight,efficient,and high code coverage.We have through experiments that this system can quickly and accurately detect vulnerabilities in binary programs,and quickly find more unique crashes. |