Font Size: a A A

Research On CNN_RNN Human Action Recognition Algorithm Based On Attention Mechanism

Posted on:2023-09-18Degree:MasterType:Thesis
Country:ChinaCandidate:J S YuFull Text:PDF
GTID:2568306794957319Subject:Control engineering
Abstract/Summary:PDF Full Text Request
Human action recognition is currently one of the most popular research directions in the field of computer vision,and has a wide range of application prospects and economic value in various fields such as video surveillance,human-computer interaction,and motion analysis.In addition,the video-based human action recognition framework can be applied to a variety of video understanding tasks,so the research in this area has important scientific applications.As the research continues,various novel network architectures and algorithm have been proposed in this direction,from the initial artificial feature approach to the current deep learning-based approach.In this paper,based on the deep learning framework,a series of researches have been conducted around the problems existing in the field of human action recognition,and the main contents and results of the work are summarized as follows:(1)A human action recognition algorithm of feature fusion CNN-BI-LSTM based on splitattention is proposed.For the problem that video data has complex background and noise information unrelated to motion affects the recognition effect,the algorithm first uses a splitattention network to extract spatial features of the video,and uses group convolution and channel attention mechanisms to extract spatial features of the motion and enhance the feature representation.Later,the information interaction between different convolutional layers is facilitated by fusing detailed information such as contour position in the lower layers with highlevel semantic information in the deeper layers to enhance the network performance.Finally,to capture the long time dependencies of the actions,the bi-directional long and short term memory network BI-LSTM is used to model the temporal information from the forward and reverse directions of the video to obtain more accurate temporal patterns of the actions in the video.The effectiveness of the proposed algorithm is verified through extensive experiments on two publicly available datasets,UCF101 and HMDB51.(2)To address the problem that 2D CNN combined with LSTM networks ignore the potential spatio-temporal correlation of motion,the algorithm uses a more efficient and compact3 D convolution to extract local short-time spatio-temporal information of the video.By introducing residual connections to increase the network depth,a 3D residual convolutional neural network with 18 convolutional layers and a 3-layer bidirectional residual LSTM are constructed to extract the short-time spatio-temporal features of the motion and model the longtime dependencies,respectively.In addition,to reduce the interference generated by redundant frames and backgrounds on action recognition,an STA spatio-temporal attention module is proposed to enhance the feature representation of key frames and focal regions by generating feature attention adjustment weights.Finally,experimental validation on two publicly available datasets demonstrates that the algorithm proposed in this paper effectively improves recognition accuracy.(3)A Web-based human action recognition system is designed and implemented.The system mainly consists of a browser-side page display and a server-side algorithm and logic control.After registering and logging into the system,the user can upload the video to be recognized through the front-end page,after which the system calls the back-end algorithm to analyze it,so as to realize the classification and recognition of the video motion,and finally the recognition result is returned to the front-end and displayed to the user.
Keywords/Search Tags:action recognition, attention mechanisms, convolutional neural networks, recurrent neural networks, residual networks
PDF Full Text Request
Related items