| With the rapid development of Internet applications,the network traffic generated by various types of network services and applications is growing explosively,and network traffic,as a carrier of information transmission and interaction in cyberspace,contains a large amount of important information.Due to the increasing complexity of network viruses and attacks,it is of great significance to quickly detect and accurately identify abnormal network traffic to ensure network information security.Due to the diversified and highly hidden characteristics of abnormal network traffic,the existing network traffic analysis methods can not effectively detect abnormal network behavior.In this dissertation,for the non-equilibrium,high-dimensional and complex non-linear characteristics of network traffic,we adopt the theoretical methods of machine learning and data mining to study the network abnormal traffic detection technology and realize the effective analysis and accurate detection of abnormal network traffic characteristics.First,to address the problem of the impact of redundant features of network traffic on the complexity and accuracy of the detection and analysis model,a multi-dimensional feature redundancy elimination method based on Extra Tree-Recursive Feature Elimination(ET-RFE)of network traffic is proposed.The method adopts the Extra Tree algorithm as the base model of recursive elimination,and sorts the network traffic features through continuous loop iteration,and evaluates the model performance after each round of feature elimination by Cross verification(CV)method to obtain a subset of network traffic features with high correlation with the objective function,thus eliminating the redundant traffic features.Second,to address the problem that traditional network traffic feature selection methods cannot cover global significant features,an ET-Boruta-based all-relevant significant feature selection method is proposed.The method adopts a randomness strategy to generate shadow features,obtains feature importance assessment scores according to the Extra Tree algorithm,and selects global features with significant contributions by comparing the importance between original features and shadow features.The method can effectively screen out the optimal set of features that are significantly related to abnormal network behaviors,thus achieving the purpose of dimensionality reduction while ensuring the effective analysis of abnormal network traffic.Thrid,to address the sample imbalance and dispersion problem in network traffic characterization data,a data equalization method based on All-KNN is proposed.The method is based on the K-Nearest Neighbor(KNN)algorithm,by reconstructing the traffic data with hot encoding,using the distance of the central traffic data to determine the nearest neighbor samples,determining the retention or exclusion of the central traffic data according to the number of types to which the nearest neighbor samples belong,and using the Bayesian optimized Light GBM(LGBM)model for the balanced sample data Evaluation.The All-KNN can effectively solve the data category imbalance problem without increasing the additional model complexity.The balanced data can effectively improve detection accuracy and achieve high recall for small sample attack types.Then,to address the problem of low detection accuracy and high false alarm rate due to the high dimensionality and nonlinearity of network traffic feature data,the A Multi Head Attention-Multilayer Perceptron(MHA-MLP)network anomaly traffic detection method based on Multi Head Attention mechanism is proposed.The method realizes the equalization of network traffic data through the All-KNN method and selects the optimal salient feature subspace using the ET-RFECV and ET-Boruta algorithms.On this basis,the optimal feature space is stacked and combined with the multilayer perceptron to realize the nonlinear mapping to the input space,the output of each attention head of the multi-head attention mechanism is weighted and summed according to its attention weight to obtain the final output of the multi-head attention layer fed to the next layer,the gradient is calculated according to the loss function and back propagation to construct the MHA-MLP network traffic anomaly detection model.Thus,accurate detection of different attacks is realized.Final,experimental analysis and validation are conducted for the network anomaly traffic analysis method proposed in this dissertation.In order to evaluate the performance of the proposed method in this dissertation,metrics such as accuracy and recall are used to analyze and evaluate the performance on the public datasets of network traffic,CICIDS2017 and UNSW-NB15,and compared and analyzed with several existing classical methods to validate the effectiveness of the network anomalous traffic analysis method proposed in this dissertation. |