Font Size: a A A

The Research Of Unknown Network Protocol Identification

Posted on:2020-07-14Degree:MasterType:Thesis
Country:ChinaCandidate:H FuFull Text:PDF
GTID:2428330575961962Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,the gradual popularization of mobile terminals and the vigorous development of the network have spawned the birth of a new Internet structure and promoted the growth of network traffic.Behind such a large network,effective supervision of network traffic is the cornerstone of network security protection.At present,many studies in the direction of network supervision focus on the analysis of unknown network protocol types.The protocol identification method combined with machine learning is a hot topic in this kind of research.This method extracts data stream features and builds data sets,analyzes unknown network traffic with machine learning algorithm and can obtain better recognition results than traditional network protocol analysis methods.The research in this paper focuses on the analysis of different protocol types and application data encryption methods.Different data streams have different characteristics in terms of duration,data distribution,application data security,client and server interaction due to different bearer applications.Aiming at these differences,this paper proposes a recognition algorithm model to realize the identification and analysis of different protocol streams and different application ciphertext encryption algorithms.Firstly,aiming at the unknown network protocol identification problem,this paper proposes an unknown protocol identification algorithm model which based on K-Means process and outlier analysis after in-depth study of K-Means algorithm.K-Means algorithm has simple process,high efficiency and good recognition of big data sets.However,K-Means algorithm has certain defects in traffic identification: K-Means algorithm adopts random strategy for the selection of initial cluster center points.The data flow performing the same function will show different characteristics in different network environments,so there are more abnormal points in the data set,and the results of the algorithm are susceptible to this factor;the existence of redundant dimensions in the feature construction process,the factor of differences in dimensions also affect the accuracy of the identification.In order to make up for the above defects,this paper proposes a more reasonable unknown traffic identification algorithm.The feature normalization preprocessing,feature selection,LOF outlier analysis,etc.are introduced into this algorithm.The clustering process uses the K-Means++ algorithm,and the maximum local reachable density point in the outlier analysis is used to realize the accurate positioning of initial cluster center point.Secondly,On the basis of analysis of unknown network protocol types,based on the identification of the type of encryption algorithm used by the traffic application data,this paper proposes an algorithm model based on random forest.If the encryption algorithm type identification is performed directly on all network data,the existence of the non-encrypted application data not only affects the accuracy of the recognition effect,but also affects the execution efficiency of the algorithm because too much irrelevant data is processed.Therefore,the encryption algorithm type recognition model proposed in this paper is based on the protocol identification model in Chapter 3.In the results of the clustering algorithm,the protocol class in which the application data is encrypted is used,and the ciphertext is constructed according to the protocol stream characteristics.The random forest algorithm model is established to classify different encryption algorithms.The classification method is based on the clustering algorithm and avoids the unencrypted application data,which not only reduces the amount of data processed by the algorithm,but also obtains better recognition results.The data sets used in this paper are obtained by feature engineering of the original data.When the model features are extracted as much as possible,the data set is processed and optimized to a certain extent.The final experimental results show that the proposed improved algorithm has a certain improvement in the accuracy of unknown protocol identification,and it also has a good recognition effect on the type of encryption algorithm.Through the whole algorithm model,the types of different protocol flows of the Internet can be effectively identified,and the application data ciphertext can be initially identified and analyzed.
Keywords/Search Tags:Unknown traffic identification, K-Means++ algorithm, Encryption algorithm identification, Random forest, Feature engineering
PDF Full Text Request
Related items