Font Size: a A A

Research And Implementation On The Key Classification Technologies For Encrypted Network Data Based On Machine Learning

Posted on:2021-08-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y H JuFull Text:PDF
GTID:2428330647957262Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
The effective classification of encrypted data in communication networks is of great significance to network supervision and security.In computer networks like Internet,network traffic mostly refers to the continuous packets with the same five-tuple flow.Traditional methods can effectively classify the encrypted data by using statistical characteristics of flows or payloads.However,in some special networks such as Internet of things and satellite communication network,due to the limitations associated with the nature of the application and the data processing capacity of network devices,the communication protocols used in such networks are mostly private protocols that usually have a simplistic interaction procedure and compact data structure,which leads to the existence of data in the form of discrete and short messages.Besides,it is difficult to obtain the labels of those data as private protocols are widely used.Different from the network traffic,those data have no flow information.As the above-mentioned encrypted network data is short and lack of flow features,the traditional classification methods based on statistical features of flows or payloads cannot be applicable.Besides,the classification without protocol specifications is more difficult.In view of the above problems,this paper focuses on the classification of those encrypted network data,uses deep learning methods to automatically extract features,effectively distinguishes encrypted and unencrypted network data,and identifies the start and end positions of encrypted fields.The main work are as follows:1.Aiming at the issue that the target data are consist of single public protocol and unlabeled,an unsupervised classification algorithm for encrypted network data based on long short-term memory(LSTM)and model transfer is proposed.Taking the network data as time series,a classification model based on LSTM is constructed.The open or generated data in source domain are used for model pre-training,and then the well-trained LSTM network is transferred to the target data for classification.The dynamic time warping(DTW)method is used to guide the selection of source domain.Experimental results show that the accuracy and F1 value on Text,ACARS,HTTP &SSH data are all over 96%.At the same time,the LSTM network can effectively solve the problem that the data are discrete and short.2.Aiming at the issue that the target data are consist of mixed public protocols and sparsely labeled,a two-stage semi-supervised classification algorithm for encrypted network data based on Generative Adversarial Network(GAN)is proposed.In stage one,a data filtering network based on traditional GAN is constructed.Based on a small amount of labelled data in the target domain,the discriminator in GAN is used to filter the same distributed data to solve the problem of different data distribution.In stage two,a GAN-based semi-supervised classification network is constructed to overcome the issue that the target data are sparsely labeled.We use the domain data including ACARS,AIS1,AIS4,HTTP,DNS,SMTP,FTP and SSH data to classify the target data mixed by ACARS,AIS and HTTP data and the results show that when the marking rate is as low as 0.06%,the accuracy and F1 value can still be maintained above 91%,which higher more than 10 percentage points compared with the traditional semi-supervised methods.3.Aiming at the issue that the target data are consist of mixed private protocols and unlabeled,a classification algorithm for encrypted network data based on data reconstruction is proposed.First,the encryption probabilities of each byte of the whole data are obtained based on the data reconstruction and convolutional neural network(CNN)model transfer.Then,the encryption probability jumping points are extracted to generate a suspected encryption field set based on the derivative of discrete sequence.Finally,the encryption field matching is performed on the samples based on the four-dimensional moment eigenvector to judge whether the data are encrypted and determine the start and end positions of the encrypted field in one go.The proposed method achieved a recall rate of 93% and a precision rate of 72% in an experiment of distinguishing the encrypted/unencrypted ones of complex data.The forward coverage,reverse coverage,and F1 value in identifying encrypted fields reached 89%,90%,and 90%,respectively.Compared with the traditional encryption field matching methods,the proposed method exhibited salient advantages.4.Aiming at the application requirements of encrypted network data classification,an intelligent algorithm software module,which can be plug-in to the existing data processing platform in an offline mode,is designed and implemented.According to the actual needs,the whole framework is designed,and the software and hardware support is analyzed.The functions of the software contain data preprocessing,classification of encrypted network data and display of results and index statistics.Finally,the functions of each module are verified by actual data.
Keywords/Search Tags:Network Traffic classification, Encrypted network data, Machine learning, Transfer learning, Data reconstruction
PDF Full Text Request
Related items