Font Size: a A A

Research On Encrypted Network Traffic Classification Based On Data Mining

Posted on:2020-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y SuFull Text:PDF
GTID:2428330575491210Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,the demand for the protection of network data keeps increasing,the requirement for encryption technologies become more stringent,Although encryption technologies can protect personal privacy and improve data security,it also make packet inspection based network management more difficult than before.Payload based,port based and statistical based traffic classification technologies have been proposed to address this issue,and these technologies have been widely used.But the difficulty to classify encrypted network traffic is to find effective features of encrypted network traffic,which directly affects the performance of the machine learning models.Therefore,the mining of features is very important in the classification of encrypted network traffic.Based on the analysis of previous works,this paper deeply analyzes the operating mechanism of anonymous software used in this paper,combines its running process with SOCKS5 protocol,mines feature which have strong correlation based on data mining from different angles.By multi-level processing of feature data,the performance of machine learning models have been improved.Firstly,an encrypted network traffic identification method based on Isolation Forest and XGBoost was proposed.Isolation Forest is used to perform noise reduction to reduce the influence of noise points in the feature data.After noise reduction on the features extracted from encrypted traffic and normal traffic,the machine learning model XGBoost is used to identify the encrypted network traffic.Secondly,an encrypted network traffic classification method based on Spark parallel acceleration DBSCAN and XGBoost was proposed.The purpose of this method is to find the characteristics which more relevant to encrypted traffic.Because DBSCAN clustering requires to calculate the relative distance between every two samples,which will cost a lot of time.Therefore,this paper uses Spark multi-thread parallelization to accelerate the clustering process.Based on the previous characteristics of encrypted network traffic,the anonymous agent was deeply analyzed,and the effective features were extracted in the two-way interaction between the client and the server,and combined different feature data in various aspects.The experimental results show that different features have different influence on the machine learning models.This paper combined data mining and machine learning models into the identification of encrypted network traffic,deeply analyzed the anonymous agents,mined the effective characteristics of encrypted network traffic in multi aspects and multi dimensions,contributed to a better approach of the identification of encrypted traffic.
Keywords/Search Tags:encrypted traffic, traffic identification, anonymous proxy, data mining
PDF Full Text Request
Related items