Font Size: a A A

Human Action Recognition Based On Two-stream Convolutional Network

Posted on:2022-12-09Degree:MasterType:Thesis
Country:ChinaCandidate:W Q ZhangFull Text:PDF
GTID:2568306488479544Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the emergence of massive video data,human action recognition has extensive application value in many emerging fields.With the development of deep learning in recent years,human action recognition based on convolutional neural networks obtains good results.However,due to the high cost of computer hardware and complex network model design and training,the full application of action recognition still needs time.Therefore,how to effectively use the convolutional neural network to express the features of action deeply is very important for action recognition.Based on the current two-stream convolutional network,the expression of the temporal and spatial characteristics of video in depth is explored.In response to the current problems,the corresponding improvement plan is proposed.The specific content includes:1.In order to model the long-term temporal information of human motion better,an algorithm of human action recognition based on sequential dynamic images and two-stream convolution network is proposed.First of all,the sequential dynamic images are constructed by the use of the sequential pooling algorithm,and they fully characterize the spatial appearance and long-term motion characteristics of human actions in the video;Then,according to the specificality of the sequential dynamic images,a two-stream convolution network based on inception V3 is designed,and the sequential dynamic images and stacked frame sequences of optical flow are used as input,and combined with data augmentation,Pre-trained model,sparse sampling,etc.,to extract the spatiotemporal information of the video;The classification score of each branch is fused by average pooling.The experimental results show that the recognition rate can be improved by the use of the method.Compared with the traditional two-stream convolution network,this method which is effective can fully use the temporal and spatial information of the action.2.In order to explore a more efficient way of extracting temporal features and improve the ability of the action recognition model to distinguish approximate actions,a human action recognition method with temporal shift module and the channel attention mechanism is proposed.The idea of temporal shift is introduced in the two-stream convolutional network.The information of adjacent frames is mixed through the shift in the time dimension to achieve the effect of single-frame timing modeling.At the same time,it is possible that the changes of some channel data are caused by the introduction of the temporal shift module.The channel attention mechanism is introduced to calibrate the information between different channels and enhance the expressive ability of the spatial features in the video.The experimental results show that the modeling ability of the model in temporal aspects is improved by the introduction of the temporal shift module and the ability to distinguish approximate actions is improved by the addition of the channel attention mechanism.
Keywords/Search Tags:human action recognition, two-stream convolutional network, sequential dynamic images, temporal shift module, attention mechanism
PDF Full Text Request
Related items