Font Size: a A A

Design Of Mobile Application Traffic Identification System Based On Gain Factor Weighted Feature Extraction Algorithm

Posted on:2020-10-16Degree:MasterType:Thesis
Country:ChinaCandidate:X Y ZhangFull Text:PDF
GTID:2428330572467211Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid growth of mobile users,the proportion of mobile traffic in network traffic is increasing.Therefore,the management of mobile network traffic becomes more important,but the characteristics of mobile network traffic are different from those of PC.The traditional PC-side traffic identification method is not fully applicable to mobile network traffic identification,which causes There is no small trouble,so the traffic identification on the mobile side has great practical significance for network security management.This thesis studies and analyzes the development status of mobile traffic,and outlines the implementation principles and methods for mobile traffic identification at home and abroad.After investigation and research,it is found that more than 90% of the traffic in the mobile network is transmitted by the HTTP protocol and the HTTPS protocol at the application layer.Therefore,this thesis focuses on identifying the HTTP and HTTPS traffic of the mobile application layer.Firstly,a set of mobile traffic collection marking system based on Wireshark was designed.Secondly,the HTTP and HTTPS protocol traffic characteristics are extracted and optimized separately: for HTTP protocol traffic,HOST and URL are extracted as preliminary feature fields;for HTTPS encryption protocol traffic,request HOST information in the SNI domain as a feature.A feature extraction optimization algorithm based on gain factor weighting is proposed to optimize the initial extraction features.Finally,the optimized features are put into the XGBoost model for training.In the algorithm design,this thesis introduces the gain-weighted feature optimization algorithm,synthesizes the real meaning of the frequency and flow message data appearing in the sample,and performs gain-weighted extraction on the feature,which further improves the efficiency of the feature.To a certain extent,it has the effect of reducing the dimension and reducing the occurrence of null values in the training samples.In the system design,this thesis combines the method of deep packet inspection with the machine learning recognition method based on statistical features,which effectively compensates for the accuracy loss caused by the large amount of background traffic of the network to the recognition system.In this thesis,the real mobile traffic data is tested,and the correct application traffic test set is used to verify the correctness of the recognition model,and the training efficiency,accuracy and precision of the same model are compared with the traditional feature extraction algorithm.In contrast,the comparison results verify that the feature optimization algorithm based on gain factor weighting can improve the training efficiency of machine learning model and the accuracy and recall rate of model recognition.
Keywords/Search Tags:Mobile traffic identification, Feature extraction algorithm, Machine learning method, Recognition evaluation index
PDF Full Text Request
Related items