Font Size: a A A

Action Recognition Based On Convolution Recurrent Neural Network With Attention Mechanism

Posted on:2018-08-08Degree:MasterType:Thesis
Country:ChinaCandidate:W H YuFull Text:PDF
GTID:2348330536960958Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the network,video explosively grows.Video has become one of today's mainstream media.The understanding of video content has become a research hotspot in the field of computer vision,and it is also the basis of video retrieval and video subtitle generation.Action recognition is an important branch of video classification.Throughout the history of development,action recognition from the traditional manual extraction characteristics to today's depth learning model,the method used more and more efficient,more and more automatic model,the achievements are more accurate.But no matter what method,the proposed model aims to be able to get a good video presentation on a limited set of training,to model the spatial static information in the video with time dynamic information.Video is more complex than the image,because it has time dynamics.The two actions may be partially similar,but over time,the nodes are not the same throughout the time,which requires us to catch the difference.This paper presents an action recognition method based on depth neural network.Firstly,the feature is extracted by using the depth convolution neural network,and then the spatial transformation network is used to deal with the feature.Finally,the time series information in the modeling of the cyclic neural network is used to classify the video.In view of the fact that human beings will have a focus on recognizing images,they will be identified according to some parts of the image,so this mechanism is added to the mechanism of recognition of the action video.In this paper,we use the spatial transform network to deal with the extracted features,extract the useful feature vectors in space,remove the invalid features in the space,and reduce the influence of noise in the video and improve the recognition accuracy.The recurrent neural network flattens the feature graph into a vector,causing the spatial information to be lost.The feature map contains spatial information,which can be extracted to improve the accuracy of the model.In this paper,the convolution operation is added to the original long and short memory network to extract this information so that it can maintain the spatial nature in the process of intermediate propagation.The experimental results show that this method is very effective and can greatly improve the accuracy rate of action recognition.In this paper,the temporal consistency method is used to deal with the video frame,which solves the problem that the original model is too slow.This paper uses temporal consistency analysis algorithm to deal with video features,greatly reducing the redundant information of the video,which makes the model can be easily extended to large data sets.Experimental results show that this method can not only reduce the amount of video data,but also guarantee that there is no bad impact on the final results.
Keywords/Search Tags:Action Recognition, Spatial Transformer network, Convolution Recurrent Neural Network, Analysis of Temporal Coherence
PDF Full Text Request
Related items