Font Size: a A A

Research And Implementation Of Software Source Code Security Vulnerability Representation Learning And Detection Technology

Posted on:2022-04-08Degree:MasterType:Thesis
Country:ChinaCandidate:L WangFull Text:PDF
GTID:2518306341482314Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
Software security vulnerability is one of the main root causes of information system security risks,which seriously threatens and restricts the development process of information technology.Software security vulnerabilities have become a key research object in the field of cyberspace security because of their harmfulness,diversity and extensiveness.Software security vulnerability detection can be divided into binary analysis and source code analysis according to the form of the tested code,and source code analysis is more applicable to the scenario of comprehensive deep excavation of software vulnerabilities than binary analysis,therefore,it is of great practical significance to carry out research on the representation learning and detection technology for software source code security vulnerabilities.Traditional machine learning-based software vulnerability detection technology mainly relies on expert feature engineering,which is a form of semi-automated detection.Feature quality is overly dependent on expert experience,which easily leads to weak vulnerability representation and low efficiency,and can no longer meet the growing demand for software vulnerability detection.Due to the strong data representation capability of deep learning in terms of big data,applying it to the field of vulnerability detection has become a hot spot for cutting-edge research.In this paper,we mainly study the application of deep representation learning and heterogeneous integration learning in the field of vulnerability detection,and propose a software source code security vulnerability representation learning and detection framework.The main research contents are as follows.1.A vulnerability source code representation method based on word vectors is proposed to solve the problems of weak vulnerability characterization,over-reliance on expert experience,and low efficiency of feature engineering.This paper invokes the concept of text representation,preprocesses the source code into a piece of sliced text,and adopts the Word2Vec method of word embedding in the field of natural language processing to extend the expressiveness of the source code language.From the perspective of feature engineering,the same vulnerability type with similar semantic representation is achieved to provide a basis for subsequent detection.2.A vulnerability detection method based on deep representation learning is proposed.It solves the current problem of the limitation of traditional machine learning in the field of vulnerability detection on the learning ability of big data,as well as its high false alarm rate and low recall rate of detection results.This paper adopts a combination of deep representation learning and machine learning classification for source code vulnerability detection.In order to obtain richer vulnerability features,we design a three-parallel structured convolutional neural network for representation learning.The network can learn multi-level features of sliced code under different long-and short-term execution dependencies by using different convolutional kernel sizes,and finally use heterogeneous integrated learning methods for multi-level feature classification to complete vulnerability detection.3.A source code vulnerability representation learning and its detection framework is designed and implemented,which contains a data preprocessing module,a word embedding representation module,a representation learning module,and a vulnerability detection module.First,the code slicing technique is used to implement the data pre-processing module,then the Word2Vec word embedding technique is used to obtain the code embedding representation,which is input to the representation learning module to obtain the high-level semantic features of the source code,and finally the integrated learning module is used to complete the vulnerability detection.4.validation evaluation of this paper's vulnerability detection framework,in the word embedding,feature extraction,representation learning,vulnerability detection module are carried out in each stage of the comparison experiments,the experimental results show that the dynamic CBOW word embedding approach proposed in this paper compared to static word embedding,Skip-Gram word embedding and other algorithms,in terms of having different contexts of semantic representation distinction,has certain advantages;three parallel convolutional neural network in source code high-dimensional feature representation learning compared to recurrent neural network,tandem convolutional neural network and other models have the advantages of low resource consumption and good detection effect;Finally,this paper conducted a comprehensive performance comparison experiment of the overall framework,the experimental results show that the vulnerability detection framework proposed in this paper has higher accuracy and recall rate compared to the current advanced work of vulnerability detection based on deep learning algorithms.
Keywords/Search Tags:Software Security, Vulnerability Detection, Word Embedding, Representation Learning, Heterogeneous Integration
PDF Full Text Request
Related items