Deep neural networks have breakthrough applications in many fields,especially in the field of text classification,making them the basic theory for many advanced artificial intelligence applications.In recent years,researchers pay more attention to the vulnerability detection framework based on neural network for identifying vulnerability patterns,in order to detect software vulnerabilities in a large scale,high precision and automation.A lot of work has emerged in the field,but existing methods cannot fully capture semantics from source code and adopt appropriate neural network design,resulting in their good performance on data sets but difficulty in detecting vulnerabilities in real-world software.Specifically,in terms of code preprocessing,the existing methods are difficult to extract path semantic information from source code,so the path changes of statements cannot be sensitively captured.In terms of network construction,predefined network structures limit the length of lexical units in the code to be processed,thereby discarding and interfering with the semantics of the code.SEVul Det,a deep learning framework based on semantic enhancement,is proposed in this paper,which enhances semantics from two aspects of code preprocessing and network construction.By preserving path semantics in code slices and accommodating flexible code lengths,vulnerability patterns can be precisely determined.The main work of this paper is:(1)the semantic enhancement of the system vulnerability detection framework is enhanced from the two processes of code preprocessing and network construction.Specifically,this paper proposes a path-sensitive code slicing algorithm,which can extract sufficient path semantics and control flow logic from source code during code preprocessing.The path semantics can segment control ranges that are not semantically connected in the code gadget reconstructed in a stacked manner.In addition,this paper proposes to replace maximum pooling in convolutional neural networks with improved spatial pyramid pooling to avoid semantic loss caused by truncation or filling operations during network construction,thus enabling flexible length codes to be processed.This paper also builds a multi-layer attention mechanism to better capture the bounded hierarchy of source code to learn more potential vulnerabilities.(2)this paper designs extensive experiments to fully evaluate the effectiveness of a semantic enhancement-based deep learning framework.Based on the above design,the vulnerability detector SEVul Det is implemented and open source in this paper.The effectiveness of semantic enhancement and multi-layer attention mechanism is demonstrated by comparison and ablation experiments.The advancedness of SEVul Det in the field of vulnerability detection is demonstrated by comparing it with classical static detection frameworks and deep learning detection frameworks.When applied on the real software Xen,SEVul Det found three bugs that had not been reported.Finally,the decision and basis of neural networks are explained by visualizing the attention weight of tokens.Experimental results show that the proposed SEVul Det is significantly superior to the classical static method and the most advanced solution based on deep learning,improving F1-Measure to about 94.5%.In particular,SEVul Det found more real-world software vulnerabilities than prior art,demonstrating the effectiveness of semantic enhanced vulnerability detectors on synthetic data sets and real-world software products. |