Font Size: a A A

Research On High-speed IP Service Awareness Based On Traffic Measurement

Posted on:2013-12-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z ZhangFull Text:PDF
GTID:1228330395980710Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Traffic classification serves as the basic for deeply understanding the essence of networkand effectively comprehending the operation of network. It is also an important component ofnetwork applications including trending network applications, QoS management, network opti-mization and anomaly behavior detection. With the rapid development of information technology,Internet has undergone great changes in the overall scale and architecture. The number of com-puter users is expanding fleetly. Network service becomes more diversiform. P2P traffic con-sumes backbone bandwidth at all times. Illegal information floods every hole and corner in In-ternet. Especially, technologies of disguising ports and encrypting application layer data arewidely utilized in reality. Confronted with those challenges and various contradictions, trafficclassification based on port and payload signature has not been functional.Combined with the fundamental technique research task of identifying user terminals andservices in the Common Security and Control Framework in Tri-Network Convergence projectbelonging to the National High-Tech Research and Development Program of China (863Pro-gram), this dissertation primarily discussed how to better classify network traffic based on meas-urement in high-speed backbone link. Considering the great potential of Deep Flow Inspection(DPI) based on machine learning and Deep User Inspection (DUI) based on user behavior intraffic classification, the paper circumvents two central scientific questions from the flow-leveland user-level viewpoints: How to extract traffic characteristics from the high-speed backbonelink? and How to improve the performance of traffic classification?. Its main work andachievements are outlined as follows:1. Considering the na ve algorithm s deficiency of high false negative probability, a novelscheme called LRU-BF (Least Recent Used&Bloom Filters) is presented. In order to achievehigh accuracy, the algorithm adopts mechanisms of LRU eliminating and Bloom Filters repre-sentation to separate the process of heavy-hitter filtration from the heavy-hitter recognition.Based on Pareto distribution and hypergeometirc distribution, analytical expressions about up-per-bound error probability are deduced. Simulated results indicate that LRU-BF can achievespace saving and lower error probability compared with Na ve-LRU algorithm. Meanwhile, itcan also support the40Gbps line-speed processing.2. Considering the deficiencies of Na ve Counting Bloom Filters (NCBF) which involvelower accuracy and lower space saving, a novel date structure called Geometric Bloom Filters(GBF) is presented. In order to achieve space-efficient storage and fast query, the structureadopts the following methods: introducing hash fingerprints, partitioning Bloom Filter twice andstoring elements based on bucket load. Based on theory of differential equation, analytical ex-pressions are deduced. Also, the relative expressions between error probability and space co m-plexity are conducted. In addition, the inner characteristic of GBF taking on geometric distribu-tion is proofed. Simulated results indicate that GBF can decrease the error probability to10-2andachieve20%space saving without sacrificing computational complexity compared with Na ve Counting Bloom Filter.3. Considering the inferior accuracy of traditional classified methods, a novel schemecalled Semi-supervised internet traffic identification based on Affinity Propagation (SAP) is pre-sented. In order to circumvent the problem with choosing initial points, the method introducesaffinity propagation clustering to construct classification model simply and effectively. Based onthe idea of semi-supervised, a few restrictions of labeled flows and priori manifold distributionof sampled space are abstracted. Also, manifold similarity is defined. Henceforth, thesemi-supervised method can not only largely reduce the complexity of marking sampled flows,but also nicely improve the performance of the classifier. Based on central limit theorem andChernoff bounds, the cohesive performance is analyzed. Experimental results show that the algo-rithm can both achieve90%classification accuracy and keep a lower sum of the squared error.4. Considering the complexity and accuracy of Affinity Propagation (AP), an improved af-finity propagation clustering algorithm called Semi-supervised Affinity Propagation clusteringalgorithm based on Stratified Combination (SAP-SC) is devised. SAP-SC succeeds to and ex-tends SAP. Introducing the stratified clustering method, the proposed algorithm equally partitionsthe integrative clustering process into several smaller blocks. Furthermore, focusing on the hardclustering data, every layer employs semi-supervised learning to conceive pairwise constraintsand map each sub-cluster with the corresponding label. In order to improve the clustering per-formance, assembled boosting method is utilized to weight together all layered results. Finally,theoretical analysis and experimental results show that computational complexity is degraded byO(N1/2) and the overall classification precise is boosted to98%.5. Considering the concept drift problem of traditional machine learning identificationmethods, a novel algorithm called traffic classification based on Host Connection Graph (HCG)is proposed. Considering {IP Address, Port} as the unique user identifier, HCG constructs a hostconnection graph and innovates the concept of user similarity. Based on the theory of graphmining, social community is abstracted from communications among hosts by partitioning thegraph into mutually intersectant behavior clusters. In order to reach traffic classification, HCGnot only conceives a definition called User Behavior Mode (UBM) to analyse the implicit trafficcharacteristics, but also maps application labels to every host behavior by employing UBM andPort. Finally, simulations are conducted based on the real network trace. Results demonstrate thatHCG can circumvent the concept shift problem and ameliorate gracefully computational com-plication without sacrificing accuracy.
Keywords/Search Tags:Traffic Classification, Deep Flow Inspection, Deep User Inspection, TrafficMeasurement, Machine Learning, Affinity Propagation Clustering, Semi-supervised Learning
PDF Full Text Request
Related items