Font Size: a A A

Research On Mobile Application Traffic Identification And Anomaly Detection Based On Machine Learning

Posted on:2020-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:Z X ZhouFull Text:PDF
GTID:2428330596476766Subject:Engineering
Abstract/Summary:PDF Full Text Request
As the popular usage of the Internet,the traffic of mobile applications is showing explosive growth.So it has become a difficult task to identify mobile applications and detect abnormal traffic based on traffic.Although many methods have been proposed in this field,there are still several important problems to be solved:?Identifying encrypted traffic and its applications,and supporting online real-time function;?Random forest classifier is very easy to generate bias classification problem for unbalanced data and ignores the minority sample;?There is no authoritative data of mobile application abnormal traffic,which results in abnormal detection of the mobile application traffic is not comprehensive and accurate.The thesis studies the traffic identification and anomaly detection of mobile applications for the above problems.The main work is divided into the following two parts.First,the encrypted or unencrypted traffic can be identified online by proposing an algorithm for improving unbalanced data based on random forests.?This thesis proposes a optimized random forest algorithm to identify tens of thousands of mobile application traffic.The algorithm uses the packet length as the basis for feature extraction and optimizes data pre-processing method.By introducing the concepts of burst and network flow,the traffic data is discretized into traffic blocks,which are divided by smaller granularity.Combined pre-processing method with the optimized random forest learning method,the model enables to identify mobile apps online.?Aiming at the bias problem of unbalanced data,this thesis proposes an improved algorithm for unbalanced data based on sparsity weight method.Different from previous studies,the entire data set is clustered instead of the majority or minority sample,which can avoid over-fitting problems.Then the sparse weight method is added after clustering,which fully considers the data distribution and edge conditions.The algorithm improves the problems which exist in previous research.Second,considering the incompleteness of current abnormal traffic data,a semi-synthetic traffic generation method is designed to make the data set closer to the real and comprehensive situation.At the same time,this paper adopts the combination of correlation feature selection and C4.5 decision tree algorithm.Firstly,the feature optimal subset is selected,and its function is to select the features which are most relevant to the abnormal traffic type according to the data set.Secondly,using the multi-classification characteristic of the C4.5 decision tree algorithm can simultaneously achieve the purpose of automatically detecting abnormal traffic and identifying abnormal types.In conclusion,the model designed in this paper is very lightweight and highly extensible and portable.For identification of mobile app traffic,by using the parameter optimization scheme of the control variable method,this thesis ran a complete set of experiments and comparative experiments.Finally,the results showed that the accuracy can reach more than 98%.For abnormal detection of mobile app traffic,the accuracy of detection of three types of anomaly traffic reached above 94%.Through the combination of the two algorithms,the detection accuracy is improved by 7%for the third type of abnormal traffic,which verifies the reliability and effectiveness of the semi-synthetic data processing method and the two combined algorithms.
Keywords/Search Tags:online identification, traffic identification, imbalanced data, random forest, anomaly detection
PDF Full Text Request
Related items