Font Size: a A A

Research On Imbalanced Oversampling Method For Internet Video Traffic Identification

Posted on:2020-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q XieFull Text:PDF
GTID:2428330578467290Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of network technologies and applications,video traffic has become the fastest growing traffic type in the Internet,accounting for the majority of network traffic.The rapid growth of video traffic poses a serious challenge to the management of the Internet.In addition,a large number of unhealthy and illegal videos are transmitted in the network,which seriously jeopardizes the stability of society and people's physical and mental health.Therefore,effective management of Internet video traffic from the perspective of the network is an urgent problem to be solved.Internet video traffic is a typical type of imbalanced data.Videos like pornography and violence are relatively small compared to normal video traffic,so the identification of Internet video traffic is an imbalance problem.There are many different methods to solve imbalanced problems.The data level methods are widely concerned because they are independent of the classifier.However,these methods have certain defects.They simply consider local neighbor information and then linearly generate data,which leads to the generation of error instances.This paper studies the key issues of imbalanced Internet video traffic identification and establishes a solution from basic data collection to video traffic identification method.In this paper,a new effective feature extraction method,byte code distribution(BCD),is proposed firstly to prepare for the identification of Internet video traffic types.The BCD method first calculates the number of each byte code value(0 to 255)from the video flow,and then calculates the frequency of each byte code.The 256 ratios are features of the extracted video traffic.Compared to traditional packet-level features,the BCD features contains more video type information,which can more accurately identify.In view of the imbalanced problem of video traffic,this paper proposes a new data level approach,namely,generative learning(GL).In GL,a Gaussian mixture model(GMM)is used to fit the distribution of original data and generate new data based on the distribution.The generated data,including the synthetic minority and majority classes,are used to train the learning model.The experimental results show that the GL method is competitive with other compared imbalanced oversampling methods,and the Wilcoxon signed rank test results onceagain prove the significant advantages of the proposed method.The method successfully identifies imbalanced Internet video traffic with a high AUC value.In order to further improve the identification performance of imbalanced Internet video traffic,this paper proposes another new oversampling method,Gaussian distribution guided oversampling(GDGO),due to insufficient for the GL method.In GDGO,the minority class instances are first weighted by a counting factor and a distance factor,then the anchor minority class instances are selected by the probability selection mechanism,and finally new instances are generated based on Gaussian distribution around the selected minority class instances.The experimental results show that the performance of GDGO is higher than that of other compared imbalanced oversampling methods.The hypothesis test results once again verify the effectiveness of the proposed method for solving imbalanced problems.The GDGO method also further enhances the identification of imbalanced Internet video traffic.
Keywords/Search Tags:video traffic identification, byte code distribution, imbalanced learning, oversampling methods
PDF Full Text Request
Related items