| Network traffic classification means recognizing the network protocol in application layer with some rules and strategies. Due to the inefficient of traditional port based and pattern matching traffic classification methods, many researchers applied machine learning method to this area. The statistical based method extracts a series of features from network packets and network flows. After that the extracted feature will be used to train a classifier which can identify different protocols. Previous work presumes that all the protocols were included in the training data, while in practice new and unknown type of protocols appear frequently. If those protocols are neglected, the classification accuracy will decrease. Based on this situation, this paper proposed unknown protocol aware methods which focus on classifying the unknown protocol.Our main works are as follows:1. Considering about the unknown network traffic data extracting, we proposed constrained K-Means based on K-Means. K-Means is a typical clustering method using distance as judgement. When classifying the network traffic data, we use IP address and port number as an additional judgement, so the accuracy rate can increase.2. Considering about the classifying of the network traffic data, we combined binary classifier with multi-classifier to improve accuracy. For each protocol, we train a binary classifier for it. When the testing data is recognized by different binary classifier, we use a multi-classifier to make further decision.According to our experiments, out method get a total of 73% accurate rate. In contrast previous semi-supervised method and one-class S VM method get 96% and 97% accurate rate respectively under all known protocol testing data, while only getting 25% and 38% accurate rate respectively under testing data with unknown protocols. |