Font Size: a A A

Research On Action Recognition And Detection Technology Based On Video

Posted on:2024-08-27Degree:MasterType:Thesis
Country:ChinaCandidate:Z H LiFull Text:PDF
GTID:2568307157483004Subject:Master of Electronic Information (Professional Degree)
Abstract/Summary:PDF Full Text Request
In order to accurately and efficiently extract effective information from massive video data,video action recognition and detection technology has emerged,which is widely used in security monitoring,human-computer interaction and other scenarios.With the development of neural networks,there has been a breakthrough in action recognition and detection technology,which can be better applied to various scenarios.However,due to the complexity of character behavior and the diversity of scenes in video data,existing models still need to be further improved in terms of convergence speed during training and detection accuracy.This article is based on deep learning technology to improve the shortcomings of existing action recognition and detection models.The detailed work is as follows:(1)An action recognition algorithm based on the NL-C3 D model is proposed.The NLC3 D model is an improvement over the C3 D network.Due to the shallow architecture of the C3 D network,it cannot effectively utilize global feature information,resulting in low accuracy in recognition.This paper proposes to integrate non-local modules into the network architecture,which allows the feature extraction network to capture the connections between video frames,linking global features together,and facilitating a deeper understanding of the video.Additionally,to address the problem of the long training time and slow convergence of the C3 D model,this paper replaces the original Re LU activation function with the smoother gradient Mish function,which speeds up model convergence and improves the accuracy of the model’s recognition.A comparative experiment between the NL-C3 D and C3 D models was performed,and the results show that the NL-C3 D model achieves higher accuracy in recognizing public datasets UCF101 and HMDB51 and converges faster during training.(2)An improved video action detection model based on attention mechanisms is proposed.In response to the issues of missed and false detections of small targets in the YOWO model,this paper proposes using a deeper two-dimensional feature extraction network on the 2D CNN branch to enhance the model’s ability to extract spatial information from video frames.Secondly,this paper embeds the SKNet attention module into the network of this branch,enabling the model to dynamically adjust the size of the receptive field,capture information from different-sized receptive fields,and enhance the model’s detection ability for small targets.Finally,to address the incomplete feature fusion of the two branches in the YOWO model’s feature fusion section,this paper proposes to use the CBAM attention mechanism in the feature fusion section.This module performs channel and spatial attention operations on the concatenated feature information,allowing the features extracted by the two branches in the model to be more effectively fused,thus improving the model’s detection accuracy.The improved model was experimentally evaluated on the UCF101-24 and JHMDB datasets,and the experimental results demonstrate that the improved action detection model is more sensitive to small target detection and has improved detection accuracy.
Keywords/Search Tags:action recognition and detection, Non local modules, Mish Activation Function, YOWO model, Attention mechanism
PDF Full Text Request
Related items