Font Size: a A A

Research On Network Traffic Identification And Its Application Based On Machine Learning

Posted on:2021-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:L B YangFull Text:PDF
GTID:2428330626455878Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the continuous development of information technology,the scale of the data center network is expanding rapidly.For some services in data centre,such as data migration and file backup,although the number of the generated traffic is very small,but the amount of data transmitted is extremely large.These traffic is often called elephant flow.In order to make better use of network resources and reduce network congestion,it is necessary to make a distinction between elephant flows and mice flows and optimize them separately.This paper uses machine learning technology to identify the elephant flow in the data center network and discusses the evaluation index of the identification model under the route optimization scenario,and finally expands the two-class identification of traffic size to multi-class identification.The main research content includes the following aspects:(1)In order to detect elephant flows as early as possible,we extract multiple valid features from the first few packets of a flow,and then use LightGBM algorithm to make rapid identifications of elephant flows.Aiming at the sample imbalance of elephant flow and mice flow,we introduce the Focus Loss,and on the basis of it,propose the Biphasic Focal Loss(BFL),which makes the model pay more attention to hard samples.We use three different real datasets to verify the effectiveness of the learning algorithms.The experimental results show that the LightGBM model with BFL not only has higher TPR and TNR,but also has a better robust to changes in the division threshold of elephant flow.(2)In order to improve the performance of the data center network,it is often necessary to optimize the network routing.While in some current route optimization systems based on elephant flow identification,only the Recall of the identification model to the elephant flow is considered,and other indicators such as the Precision are ignored.This paper takes an SDN-based routing algorithm as an example to explore the impact of the Recall and Precision of the identification model on the route optimization effect.It is proposed to use the F-measure to unify the Recall and Precision into one metric,and balance the importance of the two by adjusting the ? coefficient,so as to maximize the final route optimization effect.(3)Due to the dynamic changes in the data center,the single division threshold is difficult to determine.In addition,it is too rough and simple to divide the size of the traffic into two categories,which does not support some more flexible and efficient routing algorithms.In this paper,multiple thresholds are used to classify the traffic more finegrained.At the same time,two routing optimization algorithms based multi-classification are proposed.One is multi-class random routing algorithm,and the other is multi-class multi-weighted routing algorithm.Through routing simulation experiments,it is proved that multi-class routing algorithms can further reduce the transmission time of network traffic.
Keywords/Search Tags:traffic identification, machine learning, data center network, route optimization
PDF Full Text Request
Related items