Font Size: a A A

Research Of Video Action Recognition Based On Two-stream Convolutional Neurel Network

Posted on:2018-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y MiFull Text:PDF
GTID:2518305897477034Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
As an important part of video processing,video action recognition is of fundamental importance in the field of computer vision.Through the recognition and analysis of video action,advanced video information can be obtained,which has a wide range of applications in video surveillance,user behavior analysis and artificial intelligence.With the rapid development of the theory and calculation methods of neural network and computing hardware such as image processing unit(GPU),convolutional neural network(CNN)has made very important progress in computer vision field,and has made great progress in video action recognition field.In this paper,we focus on the two-stream convolutional neural network,and based on the traditional two-stream convolution neural network for network architecture and fusion of major improvements.A multilayer precision progressive recognition network and a spatial-temporal asynchronous network are proposed.In the convolution neural network of video recognition,the features of neural networks have different scales,and the features of different depths correspond to the different sensory regions in the original image,and the semantic of the feature is also very different.Second,there is a great deal of similarity between motion actions.After decomposing the motion into the secondary action or the secondary scene,there are many secondary scenes and the secondary movement sharing between the motion behaviors.Based on the above two aspects,this paper presents a multi-precision progressive recognition network.For the underlying,middle and high-level characteristics of the convolution neural network,the recognition network with different recognition precision is trained separately.As the depth of the convoluted layer progresses,the recognition accuracy is also progressive.In order to improve the accuracy of single-stream video recognition,this paper uses the long-short-term memory network(LSTM)to achieve multiprecision fusion for different accuracy of multi-precision networks.There is a strong synchronous asynchronous relationship between the temporal and spatial features of video.Traditional dual-stream fusion usually adopts simple average fusion.The fusion feature is also the result of complete video recognition.It does not take into account the internal information of video,and does not do too much research on the temporal and spatial asynchronous relationship between video and scene.Aiming at the shortcomings of the existing converged network,this paper proposes a video-spatiotemporal asynchronous network.The temporal and spatial asynchronous network is used to match the motion information of the video,covering a certain range of scene information before and after the movement.Finally,we obtain the result of time-space asynchronous fusion of video...
Keywords/Search Tags:action recognition, convolutional neural network, two-stream, multi-level precision-progressive network, spatial-temporal asynchronous fusion
PDF Full Text Request
Related items