Font Size: a A A

Research On Optimization Of Fuzzing Seed Input Based On Machine Learning

Posted on:2021-03-30Degree:MasterType:Thesis
Country:ChinaCandidate:M WangFull Text:PDF
GTID:2428330605480067Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
With the continuing development of the computer industry the concomitant software vulnerabilities also become a growing problem worldwide.Fuzzing,the most commonly used approach for finding hidden vulnerabilities,is widely adopted in current various software and applications.But one must admit that fuzzing has low sensitivity to the input format,resulting a large number of generated seeds cannot pass the format detection.Thus,it is expected to be a meaningful study by exploring the internal grammatical knowledge of input to generate higher-quality seeds.In recent years,As Machine Learning(ML)methods have been widely used in many fields and achieved good results,combination of fuzzing and ML shows potential for promising application in vulnerability detection.Available research finds that some progress has been made by researchers,but most of these studies work only on seeds with simple formats.When it comes to context-sensitive complex formats such as PDF files,essentially all existing works either only make a small modification on the original seed files or even just unable to work in practice,and in this case,a new set of seed files cannot be fully generated.Therefore,the effectiveness of current studies is limited in the improvement of fuzzing performance.Accordingly,this thesis studies the use of machine learning models to generate seeds with complex formats and improve the effect of fuzzing.The contribution of this thesis mainly includes the following parts:1.This thesis presented a ML-based framework to improve the quality of seed inputs for fuzzing programs that took PDF files as input and new seed file generated also in PDF format as output.Meanwhile,several Machine Learning models related in this thesis were evaluated using some indicators,and Transformer model was finally selected here as the best one for the generation of seed files,This is the first time that the Transformer model has been applied to a complex format seed generation task.The framework is divided into three parts:PDF object parser,PDF object generator,and PDF encapsulator,which respectively complete the analysis of PDF grammar rules,generate new PDF objects according to the grammar rules,and encapsulate the generated objects into complete PDF seeds file.2.This thesis also proposes two sampling algorithms:Sample and SampleFunction to sample the learned distribution to increase the diversity of seed files generated by our framework,respectively.Some small probability cases were sampled to generate some special sequences,while ensuring that the sequence of the object was predicted according to the probability distribution,solved the problems of too many repetitions seeds generated by related researchIn the end,we selected the optimal model Transformer model based on three evaluation indicators,and combined the framework of this article to generate a new PDF seed in the experiment.The initial coverage of the generated seed was 0.47%higher than that of the original seed.After 24 hours of fuzzing,the number of paths covered was 24.03%more than that of the original seed,and 23 crashes were triggered,but the original seed did not cause any crashes.These experimental results prove that the seed quality generated by the framework of this thesis is higher and verifies the advanced of the framework of this thesis.
Keywords/Search Tags:Fuzzing, Machine learning, Generated seeds, Transformer model
PDF Full Text Request
Related items