Font Size: a A A

Network Flow Classification Algorithm Based On Statistical Features

Posted on:2014-02-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2248330392461042Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Interest in traffic classification has dramatically grown in the past fewyears in both industry and academia. Classic methods based on ports andDPI techniques drop a lot in classification overall accuracy due to the P2Papplication increasing and its low anti-encryption ability. So increasingnumber of scientists begins to concentrate on traffic classification based onstatistical features. It only relies on data packet head and observableexternal statistical features with machine learning algorithm instead ofinspecting the data packet content.The work contained in this paper is as follows:1) Illustrate theoretical base of constructing flow classificationsystem, such as classification unit, classification evaluation criteria,classification granularity, classification features and classificationalgorithm.2) We tag the data packet by using DPI method and extract sourceport, destination port, transport layer protocol and the size of firstfive packets to constitute the feature set for machine learning,which is based on the previous theoretical part, and we apply twodifferent machine learning algorithms---C4.5decision tree andSVM to construct two different classification models forcomparison. Although the recognition precision of C4.5(which is96.93%) is a little bit lower than SVM(which is98.20%), C4.5’srecognition rate is twenty times faster than SVM.3) In the end, we analyze the complicated network environment(packet disorder arrival, packet loss and flow which has less thanfive packets) then we find that under the packet disorder arrival circumstances the C4.5’s classification model’s recognitionprecision has been dramatically affected. So we innovativelypropose a method using Bag-of-Words model combined with thenew feature set---source port, destination port, transport layerprotocol and packet size---to gain about10%higher recognitionprecision than before.
Keywords/Search Tags:Internet flow classification, statistical learning, C4.5, packetdisorder arrival, Bag-of-Words
PDF Full Text Request
Related items