Font Size: a A A

Research On Java Code Vulnerability Mining Algorithm Based On Deep Fores

Posted on:2024-03-30Degree:MasterType:Thesis
Country:ChinaCandidate:W K FuFull Text:PDF
GTID:2568307109495454Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The value of vulnerability mining lies in significantly reducing the information security risks of computer systems.The development of J2 EE technology and its wide application in enterprises,government agencies and other industries have brought unprecedented challenges to vulnerability mining.Machine learning-based source code vulnerability mining has become a significant area of study in recent times.As a research hotspot in source code vulnerability mining,current code representation methods based on abstract syntax trees have the problem of losing semantic structure information when converting syntax trees into sequences,where syntax trees with different structures are converted into the same sequence.Besides,the redundant information of irrelevant nodes in the abstract syntax tree will lead to longer training time and overfitting risk.And how to make the classifier better mine deep features in the code still needs to be studied.To address the above issues,this article proposes a code representation method based on pruning statement trees and establishes a Java source code vulnerability classification model(PSTDF)based on pruning statement trees and deep forests.The main content is as follows:(1)To solve the problem of semantic structure information loss caused by syntax trees with different structures being converted to the same sequence when syntax trees are converted to sequences,and the problem of longer training time and overfitting caused by redundant information of irrelevant nodes in syntax trees,this paper proposes a code representation based on pruning statement trees(PST method).The PST method parses Java source code into an abstract syntax tree and then performs breadth-first traversal of the abstract syntax tree to generate a sequence of statement trees.For each statement tree in the sequence,a pruning algorithm is used to remove irrelevant nodes and obtain a sequence of pruned statement tree.The results of comparative experiments on public datasets show that the PST method can effectively solve the loss of semantic structure information when converting syntax trees into sequences and eliminate the negative impact of redundant information in irrelevant nodes of the syntax tree.Compared with existing methods,the classification accuracy is improved by about 2.17%,and the training time is reduced by about 4.79%.(2)In order to enable the classifier to better mine deep features in source code,this paper proposes a vulnerability classification model based on pruned statement trees and improved deep forests(PSTDF),based on the PST method and deep forest.The improved deep forest stage of PSTDF proposes a maximum pooling scanning with a preprocessing stage based on the original deep forest multi-granularity scanning.The maximum pooling scanning takes a fixed width and unequal height variable length twodimensional vector as input in the preprocessing stage.After transposing the input vector,each row is maximally pooled,and the variable length two-dimensional vector is converted into a fixed length one-dimensional vector.In the scanning stage,each row of pooling is scanned,and the output probability vector is used as input for the cascaded forest.The comparative experimental results with the original multi-granularity scanning show that maximum pooling scanning can more effectively mine deep features in code,with an accuracy improvement of about 1.54% compared to the original multi-granularity scanning.This paper compares PSTDF with the existing model in the public datasets from theoretical analysis and experimental evaluation indicators such as precision,recall rate,F1 score,confusion matrix,etc.The experimental results indicate that PSTDF has achieved more significant results than other models.
Keywords/Search Tags:Abstract Syntax Tree, Code Representation, Deep Forest, Vulnerability Classification
PDF Full Text Request
Related items