Font Size: a A A

Research On Machine Learning Based P2P Traffic Identification

Posted on:2012-07-12Degree:MasterType:Thesis
Country:ChinaCandidate:H L ChuFull Text:PDF
GTID:2218330371962534Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
The emergence and boom of P2P applications have remarkably changed the composition of Internet traffic. P2P traffic has become the primary Internet traffic, which has brought many problems to network management and put forward higher requirement for traffic monitoring. Meanwhile, in order to escape from identification, P2P applications are developing quickly in the direction of using dynamic ports and encrypting payload. So, traditional traffic identification techniques have difficulty in identifying P2P traffic effectively. Due to its independence of ports and payload, P2P traffic identification based on machine learning has become a hotspot in recent years.Supported by"High Creditability Network Traffic Management and Control System"project of National High Technology Research and Development Program of China (863 Program) and aimed at solving the problems of machine learning based P2P traffic identification and application level classification, this dissertation proposes a P2P traffic identification method based on a hybrid feature selection algorithm and Support Vector Machine (SVM) and a P2P traffic application-level classification method based on an improved kernel fuzzy C-means clustering algorithm. Through the combination of these two methods, this dissertation designs and implements a real time P2P traffic identification prototype system. Details are as follows:1. Filter feature selection algorithms are adopted in current network traffic identification area, which resuts in lower identification accuracy. Thus this dissertation proposes a hybrid feature selection algorithm named R-GA, which combines the merits of filter and wrapper feature selection algorithms. Firstly R-GA adopts the fast ReliefF algorithm to remove the irrelevant features, and then utilizes genetic algorithm (GA) combined with the concrete learning algorithms to remove the redundant features, which could get an optimal feature sets effectively. Based on R-GA and SVM, a new P2P traffic identification method called R-GA-SVM is put forward, which uses R-GA to select the optimal flow feature set and obtains the best SVM identification model through the SVM model parameters optimized by R-GA. Experimental results indicate that this method can achieve higher P2P traffic identification performance with fewer flow statistics, compared with SVM identification method without feature selection or with filter .feature selection algorithm.2. Research on current P2P traffic application-level classification is not mature, after thoroughly analysing the flow statistics of P2P file sharing and P2P multimedia applications, this dissertation summarizes that these two major P2P applications are different in three flow features including packet length, packet inner arrival time and TCP flags, and then proposes an improved algorithm named I-PSO-KFCM to solve the drawback of Kernel Fuzzy C-means (KFCM) clustering algorithm. In this algorithm particle swarm is initialized through the result of KFCM clustering on subset in order to get the near global optimal clustering center, under which KFCM clustering on full set is then done to quickly converge to global optimal solution and increase clustering accuracy. So, this dissertation uses it to classify P2P traffic in application-lever and the experimental results prove its efficiency.3. This dissertation extends the above two methods in real time, designs and implements a real time P2P traffic identification prototype system, and then analyzes and tests its performance. Test results show that this system runs stably. Through the selected flow features on first ten ip packets, it achieves higher than 85% accuracy of P2P real time identification and classification, keeps the balance between identification performance and resources consuming, and thus meets the needs of High Creditability Network Traffic Management and Control System.
Keywords/Search Tags:P2P traffic identification, flow statistics, machine learning, hybrid feature selection, genetic algorithm, support vector machine, kernel fuzzy C-means, particle swarm optimization
PDF Full Text Request
Related items