Video Action Recognition Based On Hybrid Attention Mechanism And Multi-scale Feature Fusion

Posted on:2024-09-08

Degree:Master

Type:Thesis

Country:China

Candidate:W Y Niu

Full Text:PDF

GTID:2568306920463344

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

As an important research topic in the field of computer vision,human action recognition in videos aims to recognize the actions of people in video scenes and determine their categories.Since video data is a three-dimensional data containing spatiotemporal information,it poses great challenges to feature extraction.Three-dimensional convolutional neural networks,as a successful technique for human action recognition in videos,can model video spatiotemporal information directly,thereby simplifying the difficulty of spatiotemporal information extraction.However,the three-dimensional convolutional neural network model has problems such as difficult optimization,insufficient feature extraction capabilities,and single-feature extraction lacking multi-scale features.In response to these problems,this paper plans to improve the C3 D three-dimensional convolutional neural network based on two main tasks:(1)Propose an improved method of the C3 D three-dimensional convolutional neural network based on multi-scale feature extraction structure and channel attention mechanism.The PPM(Pyramid Pooling Module)multi-scale pooling module can extract features of different scales compared to traditional convolutional str uctures,obtain more abundant information,and the channel attention mechanism can emphasize important channel features in the features.Using the multi-scale feature extraction module to replace the convolutional structure in the original network and introducing the channel attention mechanism to emphasize the extracted features,the network can obtain more rich and effective features,thereby better achieving video action classification.Experimental verification on the UCF-101 and HMDB-51 datasets shows that the proposed improved method has a certain degree of improvement in accuracy indicators,and has certain advantages in performance compared to currently more classic networks.(2)Propose an improved method of the C3 D three-dimensional convolutional neural network combined with a hybrid attention mechanism.The channel attention mechanism will compress spatial information when emphasizing features,resulting in the loss of spatial information.In response to this problem,the spatial attention mechanis m is introduced to improve the network.Based on the GCNet channel attention module,the 3D-Crisscross spatial attention module is introduced to construct a hybrid attention module.These two attention networks have global context modeling operations,which can establish remote dependency relationships for three-dimensional features,enhance the network’s feature extraction capabilities in channels and space,and improve the model’s modeling performance.Experiments are conducted on the UCF-101 and HMDB-51 large video datasets,and compared with other deep learning models.The results show that the proposed method has a relatively higher accuracy rate than other deep learning models,and has a significant improvement in effect compared to the original C3 D method.In summary,based on the C3 D three-dimensional convolutional neural network,this paper proposes two improved methods,which improve the network’s recognition performance.The effectiveness of the proposed improvement methods is verified through theo retical analysis and experimental results.

Keywords/Search Tags:

Deep Learning, 3D Convolutional Neural Network, Attention Mechanism, Multi-scale Feature Extraction

PDF Full Text Request

Related items

1	Research On Iris Recognition Algorithm Based On Deep Neural Network
2	Research On Image Semantic Segmentation Algorithm Based On Deep Learning
3	Research On Single Image Super Resolution Based On Deep Learning
4	Research On Face Sketch Synthesis Algorithm Based On Generative Adversarial Networks
5	Research On Emotion Recognition Based On EEG
6	Research On Person Re-identification Method Based On Multi-scale And Attention Learning
7	Research On Single Color Image Shadow Detection Method Based On Convolutional Neural Network
8	Research On DAS Vibration Source Identification Method Based On Multi-scale Structure Feature Extraction And Sequential Information Mining
9	Research On Cross-domain Recommendation Algorithm Based On Graph Convolutional Neural Network
10	Research On Multi-scale Deep Learning Fusion Method For Infrared And Visible Images