| In recent years,with the rapid rise of Internet video applications,network video traffic has shown an explosive growth,which poses a huge challenge to network management,leading to the identification of video traffic has become a new important subject.To accurately identify video traffic,it is necessary to extract a large number of feature from time,space,packet level,stream level and other aspects of the original video stream data.However,these original features contain many irrelevant features and redundant features,and direct use of the original feature data for recognition not only fails to achieve the desired recognition effect,but also has a considerable computational overhead.Therefore,selecting a subset of features from the original feature set that are relevant to the current problem and are not redundant becomes a key issue in video traffic recognition.Feature selection is an important data dimensionality reduction technique that aims to discover the optimal subset of features and thus reduce the dimensionality of the data.Feature selection not only reduces the computational cost,but also retain the internal expression information of the data with a high degree of interpretability.In general,feature selection methods are divided into three categories,including filter,wrapped and embedded methods.Filter method selects the most relevant feature subset by evaluating the correlation between each feature and the target variable.Wrapped methods use feature selection as part of model selection,using some learning algorithm to evaluate each subset of features and select the subset with the highest performance.Embedded methods embed feature selection into model training,simultaneously selecting features and optimizing the model during the learning process.Distributed distance can be used to evaluate the degree of similarity between samples of different categories,and its accuracy of overall sample similarity assessment is better than that of traditional correlation coefficient and mutual information technology.Therefore,it is widely used in the problems of generated adversarial neural network(GAN)and optimal transmission and so on.At present,few feature selection studies focus on measuring feature distribution with non-overlap or small overlap.Therefore,this thesis carries out the research on this problem based on the feature selection method of distribution distance,and carries out the research work in the following aspects.(1)A feature selection algorithm based on adaptive distributed distance(ADDFS)is proposed.Wasserstein distance is used to measure the distance between adaptive feature distributions.Features related to classes are selected as much as possible,and those features that can provide more joint information are also selected.The ADDFS algorithm is compared with other 5 feature selection algorithms on 8 public data sets.The results show that ADDFS algorithm has obvious performance advantage in improving the classification accuracy.(2)Applying the ADDFS algorithm to real web video traffic recognition applications.This thesis collects a group of video traffic data from different platforms and uses these data to conduct identification experiments.The experimental results demonstrate that the video traffic feature set selected by ADDFS achieves high recognition performance.The proposed feature selection method in this article aims to identify the most relevant features from the original feature set,thereby improving the predictive performance and generalization ability of the model.It also helps to avoid issues such as the curse of dimensionality,overfitting,and increased computational costs.Additionally,the application of feature selection techniques in network video traffic recognition is explored,which holds both theoretical significance and practical value in enabling effective monitoring and management of network video traffic for network administrators. |