Font Size: a A A

Research On Some Key Issues For Classification Of Network Protocol

Posted on:2016-05-26Degree:MasterType:Thesis
Country:ChinaCandidate:R Q LinFull Text:PDF
GTID:2308330482479148Subject:Military information science
Abstract/Summary:PDF Full Text Request
With the development of network technology, a variety of unknown application protocols keep emerging, resulting in an increasingly complex network environment. At the same time, the rapid growth of the network bandwidth makes a large expansion of network traffic.As a result, it is becoming a great challenge to the accurate classification of network protocol.Especially, the wide range application of dynamic port number and encryption technology makes the traditional classification methods, which are based on port number, load feature matching or network behavior patterns, gradually begin to lose the ability of classification. The methods based on machine learning use statistical characteristics of the flow to classify network protocol,which can get rid of the dependence on the port number and the data load, and obtain better classification accuracy. It also shows the ability of classifying unknown protocols and encryption protocol,so it has broader development prospects.In this paper, some key issues in the field of network protocol classification based on machine learning are studied.In the study of the known network protocol classification, multimedia traffics occupy the majority of the network traffic, resulting in the proportion of network traffic becoming serious imbalance, which interfere the classification of other network protocols.Moreover, the high dimension of the flow characteristics will greatly increase the complexity of the learning algorithm and has a seriously influence on the performance of network protocol classification. This paper carries out an in-depth analysis on the two key issues.Furthermore, this paper focuses on the problem of unknown network protocol data appearing in the trainings set in the study of unknown network protocol classification.Main work and achievements are outlined as follows:1.An adaptive packet sampling method is proposed, which can effectively downsize the multimedia traffic data and reduce imbalance rate of original network protocol data, under the condition of ensuring the completeness of the packet information.The main sampling principle of the proposed method is to select as many packets as possible with low occurrence rate based on two useful features for multimedia traffic:Packet Size (PS) and Packet Inter Arrival Time (IAT).We build a model of the ideal packet sampling technique for classifying multimedia traffic, which adjusts adaptively the sampling probability of selecting packets according to PS and IAT predicted simultaneously by multi-output support vector regression, and defines general indexes for evaluating the sampling performance of the proposed approach. We compare our approach with other sampling methods and evaluate their impacts on the performance of traffic classification using two machine learning methods with real multimedia traffic data. The experimental results show that the proposed method can remove redundant packets, and obtain a better performance of sampling and traffic classification.2.A semi-supervised feature selection method based on extension of label is proposed to solve the problem of traditional semi-supervised methods that could not select a strong correlation feature set from original network data. The model started from a small number of labeled samples, and the labels of unlabeled samples were extended by K-means algorithm, then combined MDrSVM algorithm to achieve feature selection of multi-class network data. Compared with other methods, the experimental results show that the proposed method has a better classification performance with selecting a strong correlation feature set.3.An unknown network protocol classification method based on improved transductive support vector machine learning is proposed to solve the problem of classifying augmented class when unknown network protocol data appeared in the training process.The method based on semi-supervised classification strategy uses the large number of unlabeled samples to assist training classification model, where the augment loss of new unknown class samples is described by the loss augment function.UPCTSVM optimization model is established and its solving process is deduced in detail, so the classification model can classify the unknown class samples. Compared with other methods, the experimental results illustrate the feasibility and effectiveness of the unknown network applications classified by this proposed method.4.An unknown network protocol classification method based on semi-supervised clustering ensemble learning is proposed to solve the problem of lacking samples and the instable proportion of the clustering results.The method based on semi-supervised clustering strategy starts with extending the number of labeled samples by the help of correlation of the flow, and increasing the proportion of labeled samples in the training set. Then, a novel semi-supervised clustering method assisted by ensemble learning strategy will be carried out to identify the unknown protocol samples.Finally, the mix of unknown protocols will be divided more carefully. Compared with other methods,the experimental results show that the proposed method can identify the unknown protocols more effectively as well as improve the stability of the clustering results, under the condition of low proportion of labeled samples in the training set.
Keywords/Search Tags:Unknown Protocols Classification, Semi-supervised Learning, Packet Sampling, Feature Selection, Traffic Classification
PDF Full Text Request
Related items