Font Size: a A A

Research On Key Technologies Of P2P Traffic Identification Based On Support Vector Machine

Posted on:2017-01-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:J GongFull Text:PDF
GTID:1368330491450254Subject:Information networks
Abstract/Summary:PDF Full Text Request
The development and maturity of Internet technologies have ushered in the proliferation of peerto-peer(P2P) applications, making it much more difficult for network administrators and network service providers to perform traffic management and bandwidth control. Consequently, the accurate recognition of P2P traffic in the network has been a focus of many studies. Existing technologies have been classified into four categories: port recognition, application-layer feature recognition, traffic feature-based recognition and machine learning-based recognition. This dissertation proposes a novel method of recognizing P2P traffic by using the support vector machine(SVM)-based classification algorithm.In order to apply this classification algorithm to P2P traffic recognition accurately and efficiently, and to adapt it to the real-world network environment, this dissertation makes the following contributions in terms of kernel function parameter optimization, classification structure, incremental learning and the P2P traffic recognition model:(1) An interval estimation-based method for optimizing the penalty factor is presented to find the optimal parameters for the kernel function used in SVM. The penalty factor can be used to adjust the confidence interval for the deterministic data subspace. In the proposed method, an interval for the optimal parameter is first determined through interval estimation. Then, the value of the penalty factor C is determined by searching the obtained interval based on the confidence coefficient and the semibisection strategy with a small step size. This can substantially reduce the amount of time needed to find the optimal penalty factor C and enable the SVM-based classification algorithm to be applied to the P2P traffic recognition in a real-time manner.(2) A golden mean-based method is proposed to find the optimal values of the two parameters in the Gaussian kernel function. This method relies on the discussion of penalty factor C in Chapter 3 and provides a fast and efficient optimal parameter ? searching algorithm which can alleviate the impact of original learning data on the result. According to the criterion that the parameter near the line log? logC log in the good region constitutes the optimal combination of parameters(C,?), this dissertation proposes to perform an iteration using the golden mean scheme. The interval is partitioned and the maximal value of each subinterval is found. Multiple parallel lines are chosen to improve the coverage of the good region, thereby obtaining the optimal combination of parameters(C,?). The improved optimization algorithm allows P2P traffic recognition to be done more accurately in less time.(3) There is usually abnormal data in the P2P traffic. If the abnormal data occurs in the training sample, the size of the set of training samples will increase. In this case, the classification algorithm has to recalculate, resulting in reduced classification efficiency. Thus, using the basic theory of the SVM-based incremental algorithm, an improved classification structure based on the directed acyclic graph is devised in Chapter 5. After establishing the multi-class graph by checking whether the KKT condition is met, the training data of the classifier is processed iteratively; and the improved incremental learning method is adopted to eliminate the need for retraining when the training samples increase, enabling the learning process to be done more efficiently. This method is effective in dealing with frequent abrupt changes of traffic in various P2P applications.(4) Based on characteristics of P2P traffic, the features that can be used for SVM classification and training are presented and an SVM-based P2P traffic recognition model is established. The network traffic is the online sampled data; its distribution is unstable and the sampled data vary with time. By adopting the concept of feedback, the proposed model provides positive feedback in cases where the classification result deviates slightly from the fact, and provides negative feedback when there is serious deviation. This way, the entire system has the learning ability and the rule base is updated promptly through incremental learning during the process, thereby achieving the optimal learning and classification effectiveness.In this dissertation, the SVM-based classification algorithm is analyzed, and the key processes involved are improved. An SVM-based P2P traffic recognition model is then proposed. A simulation environment is developed and an experiment is conducted in the laboratory to verify the effective recognition of P2P traffic from the real-world networks.
Keywords/Search Tags:Support Vector Machine(SVM), P2P flow identification, Kernel function, Parameter optimization, Penalty factor, Incremental algorithm
PDF Full Text Request
Related items