Font Size: a A A

Research And Implementation Of Malicious WebShell Detection Algorithm Based On Ensemble Learning

Posted on:2022-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:Z AiFull Text:PDF
GTID:2518306539998379Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the arrival of 5G era and the rapid development of network,Web-based applications have had a great impact on people’s lives.All aspects of clothing,food,housing and transportation can be completed in the application,thus making it more convenient.With the rapid growth of website traffic,websites also store a large amount of personal information.Therefore,how to protect the security of these private information has become the primary task of website maintenance personnel.As the number of websites implanted with backdoors increases year by year,the problem of network security is becoming more and more prominent.How to detect the backdoors of websites is the key to data security.Malicious Webshell files are an important branch of website attack,so it is very important to detect the malicious Webshell files in the website.At present,the detection of malicious Webshell files requires a large number of feature libraries and the cooperation of professional staff,and the detection effect is limited by the feature libraries.In order to solve the above problems of malicious Webshell files,this paper proposes a malicious Webshell attack detection algorithm WSLSMR based on ensemble learning.The main research work of this paper includes the following three points:(1)Feature indicator set construction suitable for malicious WebShell.One is the collection of data sets.This paper collects the currently Typical Webshell data sets.The second is feature extraction.The static and dynamic features of malicious Webshell samples and normal website samples are extracted from the data set samples respectively.The static features include five static features,such as string length variance,file coincidence index,information entropy,file compression rate and feature code matching.The feature vocabulary of unigram and 4-gram is extracted by TF-IDF algorithm.The third is the construction of feature index set.Random forest algorithm is used to carry out importance analysis on the Unigram feature and obtain the importance score value of each Unigram feature.4-grams feature is based on the importance score value for the first time to screen the feature.Implement a second filtering of features.Therefore,dynamic and static feature vocabularies that can detect encrypted and unencrypted Webshell files are built.(2)Ensemble learning detection algorithm suitable for malicious Webshell.Firstly,it is suitable for the selection of ensemble learning base classifier.By comparing the common machine learning algorithms,logistic regression,multi-layer perceptron,support vector machine and random forest are used to form ensemble learning model.The second is the construction of the ensemble learning algorithm model.In ensemble learning,the weight parameters of the base classifier are mainly calculated by the prediction accuracy of each base classifier in the verification set.The ensemble learning algorithm model in this paper is constructed by the weight parameters of each base classifier.(3)In order to verify the effectiveness of the proposed feature index set and the model of ensemble learning algorithm,we conducted a series of comparative experiments on the data set.The algorithm proposed in this paper is compared with common machine learning algorithms,integrated detection model and common detection tools through evaluation criteria such as recall rate and accuracy rate.Experimental results show that compared with the traditional single machine learning algorithm,the recall rate and accuracy are increased to 99.14% and 94.29%,respectively,which proves that the proposed method has better detection performance.
Keywords/Search Tags:Network Security, Ensemble learning, information entropy, WebShell
PDF Full Text Request
Related items