Font Size: a A A

Network Traffic Classification Based On Spark Frame

Posted on:2020-10-31Degree:MasterType:Thesis
Country:ChinaCandidate:Z L LiuFull Text:PDF
GTID:2428330602461451Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the age of big data,with the rapid development of the Internet and the exponential growth of network scale,a variety of new network services emerge in an endless stream,at the same time,the network composition getting more complicated.The classify of Network traffic plays a very important role in analyzing user behavior,enhancing network controllability,improving quality of service and ensuring network security.With the continuous expansion of internet scale and continuous improvement of performance,the current large-scale high-speed network traffic has the characteristics of large data volume,diversity,fast transmission,low value density and so on.Traditional network traffic classification methods seem has difficult to solve the unique characteristics of these large-scale high-speed networks.In order to classify the traffic of large-scale high-speed network quickly and accurately,this paper use similarity and weight to improve the random forest algorithm.Eliminate decision redundancy among decision trees and improve the classification efficiency according to the similarity of decision tree.Each decision tree was given weight according to the classification performance of the decision tree,and random forest was formed by integrating the decision tree according to the weight,which ensured the generalization ability of the model and improved the classification performance of the model at the same time.In addition,the heuristic characteristics of streams are utilized to introduce strong correlation to guide aggregate network traffic so as to achieve better classification performance.The limitation of single machine resources makes it difficult to apply network traffic classification methods to large-scale high-speed network environment.Therefore,in order to break through the limitation of single machine resources,Spark and its related technologies are used to construct a distributed system,and a large-scale high-speed network traffic classification system is constructed by combining the parallel improved random forest algorithm.The experimental results show that the proposed classification system is robust,feasible and expandable.At the same time,the system greatly reduces the classification time and improves the classification efficiency.
Keywords/Search Tags:network with large-scale and high-speed, traffic classification, relevance, weight, Random Forest, Spark
PDF Full Text Request
Related items