| The offense-defense game in network security has always been a research hotspot in the information age.The attack methods,technologies and means are constantly developing and evolving.The payload in network traffic contains rich information related to traffic.How to identify abnormal attack behavior through payload data is an important means to effectively protect network attacks.However,the characteristics of payload data with long sequence and many irrelevant contents,complex structure and strong correlation,and high feature dimension pose new challenges to anomaly detection technology oriented to payload data.To be specific,firstly,payload data are mostly long sequences and contain a large amount of contents irrelevant to the determination of abnormal behavior,which to some extent interferes with the accuracy of abnormal detection technology.Second,the payload data structure is complex,which contains a large number of contents related to the security field,resulting in the general feature extraction algorithm cannot achieve the desired effect.At the same time,payload content has contextual semantics,which has strong relevance,but traditional feature extraction methods are often difficult to fully express the relevance of payload content.Third,the feature extraction process often adopts the way of ascending representation dimension better to embed the payload with richer semantics in high dimensional space,but the traditional grading and abnormal returning analysis method the subspace search and feature scoring methods,grow along with the characteristic dimension and its computation complexity exponentially,unable to deal effectively with payload characterization of high-dimensional data.To achieve accurate and effective payload anomaly detection,this paper carried out in-depth research on the payload anomaly detection algorithm based on deep learning,the payload feature extraction algorithm based on tree structure representation,and the payload anomaly attribution algorithm based on neighborhood analysis.The main research advances are as follows:To solve the problem that payload data sequence is long and contains a lot of content unrelated to anomalies,which affects the accuracy of anomaly detection algorithm,this paper proposes an Anomaly Detection algorithm for Payload based on Deep Learning(ADP).The ADP algorithm first parses the payload data according to the protocol type,filters the anomalous irrelevant content,and maps the string of payload data into numerical representation as the input of the deep learning network.ADP encodes the payload data through the cyclic neural network encoder to get the characteristic representation of bytes in the payload data and uses the attention mechanism to weight the characteristics of bytes in the payload sequence,to generate the feature vector of the whole payload data from the global perspective.ADP decoders the feature vectors of payload data using the decoder composed of recurrent neural network and constructs the loss function optimization network based on reconstruction errors.The abnormal data refers to the data with large reconstruction errors,so abnormal payload data can be identified.At the same time,ADP marks abnormal fragments through attentional visualization,which helps to judge abnormal types.Experimental results show that compared with the existing payload-oriented anomaly detection algorithm,the ROCAUC and PR-AUC of ADP are 16.24%and 34.38%higher on average.The payload data structure is complex,which contains a lot of safety-related content,and its content is highly correlated.To fully express the relevance of payload contents and better improve the quality of payload extraction,this paper proposes a Feature Extraction Algorithm for Payload based on Tree Structure Representation(TSR).TSR algorithm firstly presents the payload data as tree structure through protocol analysis algorithm,and then embed related domain knowledge in feature extraction based on tree structure to improve the quality of feature extraction.Then,TSR algorithm integrates payload byte level and syntax level features in a unified feature space to obtain context semantic features.Finally,the payload characteristics are generated from the child node to the root node by structural recursion.Experimental results show that compared with the existing feature extraction algorithms,the ROC-AUC and PR-AUC of TSR are improved by 3.32%and 24.15%on average.Through the anomaly attribution of the detected anomaly data,it is helpful to find the anomaly leading features,to formulate corresponding rules for the anomaly features with higher priority.This paper proposes a Kind of Anomaly Interpretation Algorithm based on Neighborhood Analysis(AINA).Based on the results of anomaly detection,AINA algorithm identifies the most relevant anomaly features in abnormal payload data.The algorithm divides the normal data around the abnormal data to be attributed into several clusters through clustering,extracts representative data from the normal data for mining the normal pattern,and obtains the feature distribution of the normal data.Since the proportion of abnormal data is much smaller than that of normal data,in order to prevent over-fitting,the abnormal data is extended by synthetic sampling algorithm.Finally,the linear classifier is trained with normal data cluster and abnormal data,and the weight of each feature is obtained.The feature with higher weight contributes more to the classification,to determine the feature causing the anomaly.Experimental results show that compared with the existing anomaly attribution algorithm,the ROC-AUC and PRAUC of AINA increase 15.02%and 10.53%on average. |