Research On Internet Traffic Classification And Mobile APP Traffic Identification

Posted on:2021-04-17

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Huang

Full Text:PDF

GTID:2428330620464021

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

With the continuous enrichment of network application types and the explosive growth of network traffic,how to flexibly adjust the network to meet the needs of diversified users has become an urgent problem in the “Internet +” era.Classifying and identifying the data traffic of the entire network link is a prerequisite for achieving control.Mastering the traffic distribution of the whole network link is helpful for the upper layer network management application to deploy strategies according to the existing network conditions.However,the existing recognition technology is faced with many difficult problems,such as the machine learning algorithm is easy to bias to the majority of class samples in the unbalanced data set,resulting in high overall error rate of the model;It is necessary to select features with high class recognition and low redundancy for network traffic,construct training sample sets,and reduce the time and space overhead of model training.This thesis studies the network traffic classification and mobile traffic APP identification for the above issues.The main work is divided into the following two parts.Firstly,an improved data balancing algorithm based on random forest is proposed to classify network application traffic.1.For the problem of category bias of unbalanced sample sets,this thesis proposes an improved algorithm for data balancing based on sparsity weighting.The improved algorithm fully considers the distribution characteristics of the minority samples and the fuzzy boundary conditions of the edges when sampling the new samples to avoid the negative impact of the loss of information richness on the model training.At the same time,the new minority classes are synthesized by linear interpolation between the minority samples and their neighbors The sample method avoids over-fitting the model during training by directly copying a small number of samples.At the same time,the new minority samples are synthesized by linear interpolation between the minority samples and their neighbors,which avoids the over fitting of the training model caused by directly copying the minority samples.2.When selecting the optimal feature subset,the information gain and application category correlation are synthetically measured to obtain an efficient comprehensive feature evaluation index,which reduces the performance overhead of the system.Secondly,in the C4.5 decision tree based mobile app traffic identification method,this thesis optimizes the hundreds of thousands of mobile app traffic data collected by Wireshark software.1.Data packet length and time interval of packet arrival are used as feature extraction objects.Compared with TCP session,burst is introduced as the basic unit of traffic collection in data preprocessing.The behavior of mobile app is characterized by fine-grained,and provides support for online classification of network traffic.2.When selecting the optimal feature subset,the Pearson feature dimensionality reduction method based on category-related mutual information is used to reduce the performance impact of the target variable due to entropy changes on classification,which improves the robustness of model classification and reduces model complexity.Combining the above performance optimization methods,the framework of network traffic classification and mobile traffic APP identification model built in this thesis is lightweight,highly identifiable and scalable,and is suitable for real network application scenarios.

Keywords/Search Tags:

traffic classification, APP identification, unbalanced data, random forest, C4.5

PDF Full Text Request

Related items

1	Research On Optimization Of Random Forest Algorithm And Its Application In Text Parallel Classification
2	The Application Of Ensemble Classification On Unbalanced Data In Bank Marketing
3	Research And Application Of High Dimensional Imbalanced Data Classification Based On Random Forest
4	The Research On Random Forest And Its Parallelization Oriented To Unbalanced High-dimensional Data
5	Research And Application Of Classification Technology For Unbalanced Data
6	Research On Network Traffic Classification Technology Based On Midway Identification
7	Optimization Research And Application Of Unbalanced Data Classification Algorithm
8	Research On Classification Method Of Unbalanced Data Based On Oversampling
9	Research And Implementation Of Classification Algorithm Based On Message Content And User Behavior Relationship
10	Improvement And Application Of Random Forest Algorithm In Credit Card Fraud Detection