Font Size: a A A

Research And Inplementation On Fine-grained Webshell Detection Based On Deep-learning

Posted on:2023-11-15Degree:MasterType:Thesis
Country:ChinaCandidate:B J ChengFull Text:PDF
GTID:2568306914460184Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Webshell is a web script containing malicious code fragments,which is usually used as a medium for hackers to launch network attacks.It is implanted into the website server directory by hackers,which poses a serious threat to the data security of users.Therefore,detecting web shell is of great significance in network security.In this field,researchers have designed and developed many detection tools.However,the flexibility of scripting languages such as PHP provides attackers with a variety of ways to confuse webshell,which makes it difficult for traditional webshell detection tools to detect malicious code fragments.Deep learning provides new ideas for webshell detection and improves the efficiency of existing detectors.However,the effect of deep learning detectors depends on feature engineering and the adopted deep learning model.The feature representation and model adopted by the existing schemes can not mine the syntax and semantic characteristics of Webshell well.At the same time,these methods can not locate malicious behavior.In view of these defects,this paper makes the following work:(1)Collect datasets from open-source platforms and perform data cleaning and data analysis tasks.After that,this thesis improves the current feature representation according to the characteristics of webshell to fully mine the semantic information lies in webshells.(2)The existing detectors can not mine the syntax characteristics of PHP well.Therefore,this thesis introduces the CodeBert model which is based on Transformer architecture to mine the syntax characteristics of PHP code.Specifically,this thesis lets CodeBert model learn the syntax characteristics of PHP code through pre training tasks.Then,this thesis fine-tunes the CodeBert model on webshell detection task to make the model learn the high-level semantic characteristics of webshell.(3)Aiming at soving the problem that the existing detectors can not locate malicious behavior,this thesis introduces and optimizes an interpretation algorithm proposed in the field of deep-learning based source code vulnerability detection.After optimization,the interpretation algorithm is capable of selecting important features from the feature representation of webshell as the basis for model judgment,and then maps these features back to the corresponding position in the source code,As a result of malicious behavior location.Experiment results show that the new code feature representation proposed in this thesis can better mine the semantic information of webshell,in webshell detection task,script sequence based detector achieves 4.5%higher in F1 score than text sequence based detector.And the pre-trained CodeBert model can well mine the syntax characteristics of PHP code,which achieves 95.3%in accuracy in syntax tree labeling task,thereby improving the webshell detection effect.After our optimization,the interpretation algorithm adopted by this thesis can select the most relevant features of the webshell from the source code feature representation to better perform fine-grained detection tasks,the average accuracy of locating malicious behavior is 6.2%higher than that before optimization.
Keywords/Search Tags:webshell detection, feature representation, Trasformer model, interpretation algorithm
PDF Full Text Request
Related items