Font Size: a A A

Research And Implementation Of A PHP Code Auditing Technology

Posted on:2021-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:W ZhaoFull Text:PDF
GTID:2518306050468044Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
In the online world,various Web applications provide rich content for users.While people are increasingly connected with the Internet,the potential security problems of Web applications also seriously threaten the security of personal information of users,the normal development of enterprises and the security of national cyberspace.Therefore,how to effectively find the potential vulnerabilities of Web applications has become a key research problem in the field of software engineering and network security.Among numerous Web applications,the Web applications that based PHP is close to 80%.Due to the features of PHP,various security vulnerabilities are prone to be introduced in Web applications.In recent years,the academic and industry has been committed to the research and innovation of PHP vulnerabilities mining technology.With the successful applied of machine learning in many fields,combined the history vulnerability samples of the Web application with the machine learning algorithms to train vulnerabilities mining model provides a new method for the PHP vulnerabilities mining technology.Considering that traditional machine learning-based vulnerability mining methods cannot effectively represent the information of PHP code.This paper transforms the vulnerability mining problem into the PHP vector-representation problem.An effective code vectorized representation method is used to transform PHP code into a vector that fully contains the syntax,semantics and context information of the code.On the basis an effective automated vulnerability mining model for PHP can be implemented.The main work of this paper is as follows:(1)Analyze and summarize the execution principle of PHP language and common vulnerabilities of Web applications,this paper deeply studies the causes of vulnerability of Web applications based on PHP.On this basis,we apply machine learning to construct vulnerability mining model based abstract syntax tree and opcode sequence of PHP.(2)A PHP vulnerability mining model based on abstract syntax tree is proposed.Considering that the existing methods have single analysis dimension of abstract syntax tree and the code information loss caused by the transformation of abstract syntax tree into binary tree.The model first divides the abstract syntax tree into expression trees.Secondly,the syntax and inter-process information of the code are obtained by aggregating the path information of the expression tree,and the semantics and in-process information of the code are retained by aggregating the node information.Finally,the attention mechanism is introduced into the model to provide accurate vulnerability locating ability.(3)A PHP vulnerability mining model based on opcode sequence is proposed.Considering the information loss caused by the traditional model based on opcode sequence,which treats the sequence as a single dimensional text and ignores the variable information in the sequence.This model divides the sequence into multiple sub-sequences according to the jump statement.In addition,this model proposes a method to dynamically update the variable vector with the data flow to reasonably represent the variable in the opcode sequence.(4)Finally,we designed four sets of experiments based SARD-testsuite-103 to compare the performance of vulnerability mining models which based on ASTencoder and Opcode2 Vec and the traditional model.The experimental results showed the accuracy,recall and F1 of the model which based on ASTencoder reached 98.42%,96.69% and 98.03% on average.At the same time,the accuracy,recall and F1 of the model based on Opcode2 Vec reached 88.27%,84.43% and 83.88% on average.In addition,a set of experiment were designed to compare the performance differences between the PHP code auditing tools commonly used in the industry,Seay and RIPS.The experimental results show that the accuracies of the models that based on ASTEncoder and Opcode2 Vec reached 98% and 50% respectively,which were all higher than Seay and RIPS.
Keywords/Search Tags:Code Audit, Vectorized Representation of Code, Abstract Syntax Trees, Opcode Sequences, Attention Mechanisms
PDF Full Text Request
Related items