Font Size: a A A

CNN Webshell Detection Based On Active Learning

Posted on:2022-09-29Degree:MasterType:Thesis
Country:ChinaCandidate:M L SongFull Text:PDF
GTID:2518306560990679Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the first half of 2020,about 35,900 Websites in China were implanted with backdoors,an increase of 36.9% over the first half of 2019.It can be seen that companies are still facing many Webshell attacks.The industry's detection of Webshell mainly relies on security equipment alarms to build an integrated security in-depth defense system.Academically,the detection research of Webshell mainly includes two forms of text content and text features.The mainstream algorithms include decision trees,SVM,CNN,LSTM,and so on.Regardless of industrial applications or academic research,the detection methods are centered on content detection and anomaly detection.The mainstream Webshell detection takes content detection and anomaly detection as the core of detection,ignoring misuse detection and abnormal behavior detection.The detection effect of adversarial samples,variant samples or 0Day vulnerability samples is not ideal,and Webshell attacks still have a greater threat to enterprises.This article aims to study Webshell behavior feature detection based on misuse behavior.On the one hand,this paper implements a two-layer network global pooling CNN Webshell detection model with misuse behavior as the core detection point.The main work in this process is to find suitable behavior characteristics,and these behavior characteristics form a behavior matrix for use as a model enter.The two-layer network global pooling model solves the problem of data sparseness through the two-layer network,so that the model has a better hit rate and accuracy.Three different convolution kernels are used simultaneously,so that the local receptive field of the model can cover different sizes of context and capture more feature sets.The global pooling model merges the convolutional layers before outputting.It reduces the number of features in the fully connected layer and solves the over-fitting problem caused by convolution kernels of different sizes.On the other hand,this paper realizes the Active Learning algorithm for CNN detection model.Through Active Learning,the problem of excessively high sample labeling costs and difficulty in adapting to the actual safety operation requirements of enterprises is alleviated.On the basis of the CNN detection model,find the appropriate selection function Q and the termination strategy of the Active Learning method to realize the model's active selection of samples,reduce the influence of noise samples on the model,and reduce the model's dependence on labeled data.The improved maximum feature distance algorithm is used as the sample selection function using the improved minimum estimated risk strategy.It comprehensively considers the feature distance and model prediction value,and uses the machine to automatically label the samples to further improve the detection speed of the algorithm and reduce the cost of manual labeling.In this paper,the experiment collected the traffic of a bank in a randomly selected time window when the business was busy,and realized the CNN Webshell detection model characterized by misuse behavior.The accuracy of the optimal result of this experiment can reach 96%,of which the hit rate is 96%,the false alarm rate is 6.00%,and the AUC is 96.70%.In addition,the model performed well in adversarial Webshell attacks,with 41% of the attack traffic hitting the security monitoring device without warning;the attacker's capture rate was as high as 98.3%,which can effectively supplement the security monitoring system.The improved active learning algorithm for the CNN Webshell detection model can take advantage of its active selection.When the LU is 60%,the model learning effect is the best,the accuracy can reach 96.8%,of which the hit rate is 97%,and the false alarm rate is 7.74% and AUC of 97.30%.At this time,the accuracy and hit rate are even better than a single CNN Webshell detection model based on behavioral misuse.The model training time and false alarm cost are within acceptable limits,and the cost of manual labeling However,it has dropped by40%,realizing technology cost reduction and efficiency enhancement.
Keywords/Search Tags:Webshell, behavioral characteristics, CNN model, Active Learning, misuse detection
PDF Full Text Request
Related items