Font Size: a A A

Research On Network Traffic Identification Based On Data Stream Mining

Posted on:2017-12-30Degree:MasterType:Thesis
Country:ChinaCandidate:P GaoFull Text:PDF
GTID:2428330566953060Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,the types of network applications become richer and richer,and the network traffic is growing explosively.The traffic generated by lots of P2 P services occupies network bandwidth and causes network congestion,which makes the quality of service continue declining.In order to regulate the Internet,we need to identify network traffic in real time,and then we can provide differentiated services,which will ensure the safety of the Internet and optimize network resources.However,in face of massive,constantly coming network traffic,the traditional methods of machine learning and centralized processing platformshave been unable to meet the requirements.Combining data stream mining methods for network traffic identification and Spark Streaming,which is a platform for processing streaming big data,we proposed a solution of online traffic identification.The key points of online traffic identification are the selection of network flow's features andthe methods of data stream mining.This thesis proposed an effective method to select network flow'sfeatures according to the characteristics of network traffic.At the same time we did research on data stream classification and clustering algorithms for network traffic identification respectively,and analyzed the respective characteristics and main application scenarios of these two algorithms.Specific work in this thesis incudes:(1)We did feature selection for network flow's features.The features of network flow are very complex,high-dimensioning,so they can't be applied to online traffic identification directly.This thesis combined two methods called ReliefF and CFS,and proposed a feature selection method based on voting strategy.This method could remove unrelated,redundant features efficiently and finally select a set of network flow's features which are suitable for online traffic identification.(2)We improved the data stream classification algorithm,and applied the improved algorithm to network traffic identification.The data stream classification algorithm has high precision rate and speed.We analyzed the deficiency of the algorithm,then proposed an algorithm named AG_CVFDT,an improved algorithm of CVFDT.AG_CVFDT could resolve the problem of network traffic concept drift and skewed distribution effectively.At the same time we did parallel implementation of AG_CVFDT based on Spark Streaming,which improved the efficiency of network trafficidentification.(3)We did research onthe efficient data stream clustering algorithm and the corresponding solution for network traffic identification.Clustering algorithm can find the emerging application types of the Internet.We analyzed the advantages and disadvantages of data stream clustering algorithm CluStream and D-Stream for network traffic identification.Combining the advantages of these two algorithms,we proposeda data stream clustering algorithm named GDDSC,and designed a corresponding solution for network traffic identification.This method could support evolution analysis and find clusters of any shape.At the same time,this method introduced the judgement of trend,which improved the precision rate of identification.
Keywords/Search Tags:network traffic identification, data stream classification, data stream clustering, Spark Streaming
PDF Full Text Request
Related items