Font Size: a A A

Instant Messaging Traffic Classification Technology Based On Machine Learning

Posted on:2019-08-16Degree:DoctorType:Dissertation
Institution:UniversityCandidate:Full Text:PDF
GTID:1368330590973177Subject:Information security
Abstract/Summary:PDF Full Text Request
The dissertation primarily brings into the light the core classification identification techniques and abridges the challenges and research progresses.Then,it methodically addresses problems: accurate Instant Messaging(IM)traffic classification,effective feature selection,robust features selection and effective feature packet for IM application traffic classification.For the enhancement of the classification accuracy,stability and classifiers performance,some algorithms models are proposed.The main contents of this dissertation are as follow:It is very important to understand network traffic classification and specially Instant Messaging(IM)applications traffic classification.Firstly,the dissertation classified Instant Messaging application service flow traffic using Machine Learning(ML)classifiers.For IM service flow traffic classification,the dissertation used four different machine learning classifiers such Support Vector Machine(SVM),C4.5 decision tree,Bayes Net and Na?ve Bayes machine learning algorithms using two different network environment data sets HIT Trace 1 and Jinyuan dormitory.Furthermore,for this study 50 feature are extracted for training and testing.Experimental results shows that all the ML classifier performance are very effective but C4.5 machine learning classifiers performance are very efficient as compare to other applied ML classifiers.Due to inappropriate features section,ML classifiers prone to misclassify Internet flows as the traffic occupies majority of traffic flows.However,this is also great challenge to select effective features for Instant Messages(IM)traffic classification.To address the problem,a novel features selection metric named Weighted Mutual Information(WMI)is proposed.We developed a hybrid features selection algorithm named WMI_ACC,which filter most of the features with WMI metric with accuracy.We evaluate the proposed approach with five well-known ML classifiers on two different datasets which are captured in different network locations.For better understanding statistical test Wilcoxon pairwise test is applied on the result of proposed approach to find out the robust features.Experimental results shows that our propose approach algorithm get very promising results in terms of accuracy,recall and precision.Similarly,due to imbalance traffic,a number of traffic flows are misclassified based on ML classifiers as the flows of traffic occupy the majority of flows on the internet.To address this problem,a novel feature selection algorithm named Weighted Mutual Information(WMI)is proposed.We design a hybrid feature selection algorithm named WMI_AUC,which filters most of the features with WMI metric and further use a wrapper method to select effective features for specific classifiers with Area Under roc Curve(AUC)metric.Additionally,to overcome the dynamic flows of traffic,we propose an algorithm named RFS(Robust Features Selection)that selects Robust Features from the results We evaluate our approach using 11 well-known ML classifiers on the traces datasets.Experimental results show that our proposed algorithms give promising results.However,an important issue still not unconcerned,that is whether there exist essential effectiveness difference between the two kinds of feature selection techniques.In this chapter,we set out to evaluate the effectiveness of features selection techniques and select optimum features for accurate traffic classification.We firstly proposed Feature Selection Approach(FSA)and designed a features selection algorithm named(FSA).Additionally,to evaluate the effectiveness of feature selection technique we proposed another approached Feature Evaluation Approach(FEA)and design algorithm(FEA)based on mutual information analysis.We evaluated our proposed approaches using nine well-known ML classifiers with two different network environment datasets.Our experimental results shows that our proposed approaches obtained promising results in terms of accuracy.Moreover,our approaches can achieve >98% accuracy on average.Moreover,all the applied ML algorithms get very promising performance results,but RandomForest and C4.5 ML classifiers with FSA and FEA approaches selected features have more identification information as compared to other machine learning classifiers.Similarly selection of effective packet number is also big problem,this issue still needs to be studied deeply to find out effective packet number and effective machine learning classifiers at early stage internet traffic classification,which are very big challenges.Nevertheless,to address the issue,five Internet traffic datasets are conducted.Firstly,20 early packet sizes of 20 packets are extracted and then mutual information between the packets is used of n flow type.Thereafter,10 well-known machine learning classifiers are conducted.Furthermore,two statistical test Friedman and Wilcoxon pairwise test are conducted to confirm and find out the effective packet number for WeChat traffic classification at early stage.Moreover,for the effective machine learning classifiers the utilized statistical are also conducted on ML classifiers.The result shows that,13-19 packets and Random Forest and C4.5 ML classifiers are very effective with respective IM WeChat application traffic classification at early stage.It is evident that our proposed approaches are able to accuratly classify IM application traffic flows and can select effective features for accurate network traffic classification.Nevertheless,it is important to apply these proposed approaches an imbalance dataset and more ML classifiers should be conducted to find out more ML classifiers for network traffic classification and which is our future work.
Keywords/Search Tags:Traffic classification, Instant Messaging, feature selection, Machine learning, Class imbalance
PDF Full Text Request
Related items