| Botnet is a cluster of infected hosts remotely controlled by a botmaster.With the development of the Internet,as well as the Internet of Things devices raising,intelligent terminals,cloud platforms,and social platforms,botnets have the characteristics of diversified platforms,concealed communications,and intelligent control.In addition to traditional detection technologies including port scanning-based technology and deep packet inspection-based technology,studing on botnet detection technology based on statistics and network behavior has become more and more mature in recent years.In statistical and network behavior-based research,researchers have made great progress using multiple features to build machine learning models that can identify botnets.These characteristics are usually set empirically by the researcher before the model is built.These detection models are found to have high recall rates in experiments.However,there are also some disadvantages.First,manual selection has higher requirements having much more knowledge of the designer.Second,fixed features also provide opportunities for attackers to change the characteristics of botnet traffic in a targeted manner,thereby evading model detection.The botnet shape and command and control mechanism are gradually changing,and artificial feature selection is becoming more and more difficult.Deep learning technology are widely used,some models such as neural networks,reinforcement learning,and knowledge graphs are gradually being applied to the field of botnet detection.This thesis studies how to use deep learning methods to extract effective botnet spatial and temporal two-dimensional features and how to deal with the problem of low F1 value caused by imbalanced datasets in multi-classification tasks.The main details of this article include four details:1.Introduce the characteristics and hazards of botnets,and analyze and summarize previous research literature.Introduce in detail the technologies involved in the current botnet detection and the processing methods for imbalanced datasets in botnet multi-classification tasks;elaborate theories and technologies related to feature extraction and deep learning,and provide an overview of previous methods for dealing with imbalanced multi-classification datasets.Do a comparative study.2.Aiming at the problems of poor generalization ability and strong feature dependence of previous botnet detection methods,a detection model based on spatiotemporal residual network is proposed.The spatial and temporal features of botnets are learned in parallel with deep 1DCNN and LSTM,and residual connections(shortcut connections)are introduced between layers to finally obtain higher-level feature representations.The CTU-13 dataset is used for binary and multi-classification tasks,and the model generalization is tested with the heterogeneous dataset N-Ba Io T.3.Aiming at the problem of low multi-class F1 value caused by unbalanced data distribution of botnets in reality,a model combining G-SMOTE algorithm and one-dimension residual neural network(1DMs Res Net)is proposed.The model effectively increases the learning of a few samples and is computationally inexpensive.Models are trained and tested using the Bot-Io T dataset.The innovations of this thesis include the following points:1.A new botnet detection model is proposed,aiming at the existence of spatial and temporal two-dimensional features of botnets,using deep 1DCNN and LSTM to extract spatiotemporal features in parallel,and then using Residual Network(Res Net)to solve the problem of network degradation.The short-cut connection technology of residual network transfers the fused spatiotemporal features across layers,and finally performs binary and multi-classification on the output representation.In the multi-classification task,the Res-1DCNN-LSTM model improves the F1 value by 0.63% and 1.33% over the CNN and LSTM fusion model CNN-LSTM on the CTU-13 and N-Ba Io T datasets,respectively.2.A new model for dealing with botnet data imbalance is proposed.Few samples are oversampled using the G-SMOTE oversampling algorithm,and then the dataset is trained and tested on Bot-Io T with a 1DMs Res Net-based model.As a result,the accuracy of the G-SMOTE-1DMs Res Net model is 9.48% higher than that of the GRU model in the multi-classification task. |