Font Size: a A A

Two-Stream Video Classification Based On Deep Learning

Posted on:2021-08-19Degree:MasterType:Thesis
Country:ChinaCandidate:Q Y RenFull Text:PDF
GTID:2518306548494064Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of network communication technology and multimedia information technology,people’s ways of acquiring knowledge,recording life,communicating and so on have changed dramatically.In recent years,5G technology has developed maturely,camera configuration of mobile devices has improved,and the number of videos on the Internet continues to grow rapidly.On the one hand,massive video enriches people’s life and brings convenience to people’s life.On the other hand,massive video is a huge challenge to the supervision and retrieval of network video.Therefore,efficient video processing methods are urgently needed to meet the needs of massive video processing.Video classification technology is a technology to automatically determine the category of video,which is an important part of the field of video processing and a basic research topic in the field of computer vision.After the success of deep learning in the field of image processing,more and more researchers apply deep learning to other multimedia information processing.However,compared with image processing,the research of deep learning in video processing is not deep enough.The difference between video data and image data makes the research of video processing still face great challenges.Therefore,this paper takes the dual stream video classification model as the baseline,and starts with two main research difficulties of video classification task: feature extraction and feature fusion,respectively,proposes the depth feature extraction based on the improvement of attention mechanism and the depth feature fusion method of learnable weight,in order to improve the accuracy of video classification.The main innovations of this paper are as follows:(1)In this paper,multi-scale attention mechanism is introduced to enhance feature extraction.The feature extraction network is divided into four stages.In each stage of the feature extraction network,the multi-scale attention module is added after the output of the feature map to strengthen the features with large amount of information in the output of each level,suppress the miscellaneous features,and through the continuous strengthening of the attention effect layer by layer,the more representative features are obtained,so as to get better classification effect.(2)In this paper,spatiotemporal flow features are extracted independently from the dual stream video classification model,while the relationship between spatiotemporal flow features is ignored to establish spatiotemporal flow relationship.In the model of video action recognition based on double flow convolution neural network proposed by simonyan et al.,temporal and spatial flow features are extracted respectively,ignoring the high correlation of temporal and spatial information in video data.Based on the compact two-line pool structure,this paper designs the pluggable spatiotemporal flow connection module,establishes the connection between spatiotemporal feature extraction flows,integrates the information of two flows better,enhances the feature learning of another flow with the features of one flow,and finally improves the quality of features.(3)In this paper,the algorithm is designed to learn the fusion weight of spatiotemporal features.When classifying different kinds of video,its static and dynamic features have different influence on the classification results.The fusion of temporal and spatial features with fixed weight will limit the performance of network classification.In this paper,we design a feature fusion method of spatiotemporal flow with learnable weights,and get better classification results.
Keywords/Search Tags:Video Classification, Two-Stream Model, Attention Mechanism, Bilinear Pooling, Learnble Weight
PDF Full Text Request
Related items