Font Size: a A A

The Design And Implementation Of Static Code Analysis System Based On Machine Learning For Java

Posted on:2021-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:W Y XuFull Text:PDF
GTID:2428330647450870Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As the application area of software systems continues to expand,attacks on soft-ware and information systems are increasing.So nowadays people pay more and more attention to software security.During software development and testing,engineers often scan software codes with static code analysis systems.Only software that has passed the security testing can be deployed online.While traditional static code anal-ysis is often conservative.In order not to miss any security risks,they often give a large number of false positives.These false positives not only increase the workload of security engineers,but also delay the progress of software development.With the development of machine learning,researchers have applied machine learning on code analysis to discover vulnerabilities or reduce false positives.However,their works are only applicable to small-scale programs.This system aims to apply academic research to the real world.For one of the most common languages in Web development,Java,this system utilizes taint analy-sis,program slicing,and BLSTM to provide more accurate code analysis services for development or security engineers.In terms of taint analysis,the system uses a large number of rules in Find Security Bugs to ensure low false negatives.In addition,it can give taint propagation paths to make the report more readable.Then,the system slices each vulnerability instance.In order to ensure slicing efficiency and stability,this system optimizes the slicer for the actual Jar package and proposes an idea called segmented slicing.For a taint vulnerability,its taint paths are divided into small set of taint flow fragments,then the system uses backward program slicing to slice each taint flow.Finally,the system uses the BLSTM model to predict vulnerability's slicing set which has been pre-processed,and infer whether the vulnerability instance is false positive based on the slice prediction and taint flow logic.This system has replaced the traditional taint analysis engine online.The experi-mental results show that the system can obtain more accurate scanning results within an acceptable scanning time.In terms of efficiency,the system optimizes traditional slic-ing to ensure that the scan time of each project does not exceed 1 hour.For accuracy,the system's precision rate reaches 90.53%.In other words,compared to Find Security Bugs,the system has eliminated 25.44% false positives,which greatly reduced the code audit work.
Keywords/Search Tags:SCA, Taint Analysis, Program Slicing, BLSTM
PDF Full Text Request
Related items