Font Size: a A A

Research On Action Recognition Technology Based On Video

Posted on:2021-05-26Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y ZhaoFull Text:PDF
GTID:2428330605456101Subject:Instrument Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of artificial intelligence,deep learning plays an important role in the field of video action recognition.The use of convo lutional neural networks to extract the spatial features of images has become the mainstream method.However,the complex environmental background,lighting conditions and other action-irrelevant visual information in the video frame bring a lot of redundancy and noise to the action spatial features,which affects the accuracy of action recognition.Secondly,different types of action videos may have similar contexts in temporal,which makes the network model predict errors.This paper designs a recurrent region attention mechanism and a video frame attention mechanism for video action recognition to respectively solve the problem of the redundancy and noise in the action spatial features and the interference problem caused by the similar context between the actions in the temporal.Secondly,based on the spatial and temporal characteristics of video,this paper designs a deep spatio-temporal network model that can be trained end-to-end,including convolutional neural network,recurrent region attention mechanism,video frame attention mechanism,and long short-term memory network.Among them,the convolutional neural network is used as a feature extractor to extract the spatial features of the video frame;the recurrent region attention cell in the recurrent region attention mechanism captures the regional visual information related to the action in the spatial feature,and according to the temporal characteristics of the video,the recurrent regional attention cell iterates according to the temporal sequence of the video,so that the recurrent region attention mechanism can effectively capture the action-relevant regional visual information in the spatial features of each frame of the action video sequence;the video frame attention mechanism highlights the more important video frames in the whole video sequence to reduce the interference caused by the similar context between the heterogeneous action video sequences;the long short-term memory network learn the before and after dependencies between the video frames.The cross-entropy loss function is used to update the network model parameters,so that the network model can better distinguish the action categories.On this basis,this paper makes full use of the appearance information and motion information of action,and constructs the RGB modality network model and the optical flow modality network model respectively.Finally,the probability fusion of the output of the two modalities network model is carried out to enhance the accuracy of action recognition.The experimental results on two video action recognition public datasets show that the recurrent region attention mechanism and video frame attention mechanism designed in this paper reduce the problem of redundancy and noise in the spatial feature,and the interference problem caused by the similar context between the actions in temporal,the effectiveness of the recurrent region attention mechanism and the video frame attention mechanism is verified,and the recognition accuracy of the network model is improved.
Keywords/Search Tags:Video action recognition, Recurrent region attention mechanism, Video frame attention mechanism, CNN, LSTM
PDF Full Text Request
Related items