Research On Spatiotemporal Information Fusion And Attention Enhancement Based Human Action Recognition

Posted on:2022-09-20

Degree:Master

Type:Thesis

Country:China

Candidate:H Y Deng

Full Text:PDF

GTID:2518306527978749

Subject:Electronics and Communications Engineering

Abstract/Summary:

PDF Full Text Request

Human action recognition is one of the most important research topics in the fields of artificial intelligence,pattern recognition and machine learning.It is a popular research topic of computer vision and multimedia analysis.It has significant academic value and great application value in the fields of security monitoring,human-computer interaction,medical diagnosis,video classification and so on.In the early,researchers have made great progress in human action recognition.While in practical application,the data for human action recognition is often interfered by illumination,complex background,occlusion,human body itself and other factors.For these reasons,research on human action recognition is always a very challenging topic.The existing human action recognition methods aim to improve the internal structure of single-data flow network,but ignore the information interaction,fusion and enhancement between multiple-data flow networks.To tackle the above problems,this thesis studies this topic in two folds: multi-level spatiotemporal information fusion and multi-branch attention based information enhancement.The main contributions and achievements of this thesis can be summarized as follows:(1)A multi-level spatio-temporal information fusion based human action recognition method is proposed.In order to take advantage of the multi-level spatio-temporal features effectively,a multi-level spatio-temporal information compact fusion module is proposed.The module can reduce the dimension of spatiotemporal features,and it can make information interaction and fusion between the spatial features and temporal features.Moreover,it solves the problem that the compact bilinear pooling algorithm can not directly fuse the spatiotemporal features of multi-convolution layers.A three-stream prediction score fusion network is introduced,of which branch networks are separated,aims to relieve the influence of fusion operations on feature extraction networks.And it utilizes the temporal segment network for long-range temporal structure modeling.Experiments on two RGB video-based human action recognition datasets,UCF101 and HMDB51,prove the method of this paper can achieve excellent recognition performance.(2)A multi-receptive field spatial-channel attention for feature enhancement based human action recognition method is proposed.On the theoretical basis of the method proposed in the previous chapter,a multi-receptive field spatial-channel attention module is introduced to adjust each part of the fusion feature and make the network focus on the effective information area of the input data.The module combinates the spatial branch and the channel branch in parallel style to generate the feature attention adjustment weight.Meanwhile,the spatial branch of the module uses convolution operations with different convolution kernels to expand the information receptive field of the spatial branch.In addition,the residual connection of the module enables it to achieve plug-and-play in the network.Experiments on UCF101 and HMDB51 indicate the proposed method achieves satisfying recognition accuracy.(3)A multi-perspective feature fusion enhancement for skeleton data based human action recognition method is proposed.A multi-perspective feature fusion enhancement module is introduced to strengthen and fuse skeleton data.The module combinates the spatial branch,the channel branch and temporal branch in parallel style.When the input is the same,the module can be utilized as an attention module to enhance the input data and extract more effective features.While the inputs are different from each other,the module can be employed as a spatiotemporal information fusion module.It captures the effective information provided by one input data to strengthen the information of the other input data,so as to accomplish the information fusion.The module is used to strengthen the feature extraction network with graph convolution and to fuse the spatiotemporal features of multi-layer with graph convolution.In addition,a skeleton-diff data extraction method is proposed to make full use of the temporal dimension information in skeleton data.Combining the first-order information data and secondorder information data of skeleton data,a three-stream fusion network based on skeleton data is proposed.Experiments on skeleton-based human action recognition datasets,KineticsSkeleton,NTU-RGBD60 and NTU-RGBD120,show the proposed method is effective.

Keywords/Search Tags:

Human action recognition, Feature fusion, Attention enhancement, Multistream network, Multi-level feature

PDF Full Text Request

Related items

1	Human Action Recognition Based On Multi-level Feature Fusion
2	Based On Joint Points Extraction And Multi-angle Feature Level Fusion Of Human Action Recognition
3	Human Action Recognition Based On Attention Mechanism And Multi-Modality Feature Fusion
4	Human Action Recognition Based On Multi-features Fusion
5	Human Action Recognition Based On Multi-mode Feature Fusion
6	Key-Frame Based Multi-Feature Fusion Human Action Recognition System
7	Research On Representation-level Features Extraction And Fusion Classification Method Of Human Actions In Video Sequences
8	Research On Key Techniques Of 4D Human Action Recognition
9	Research On Human Action Recognition Based On RGB-D Image Sequences
10	Joint-based Feature Fusion For Human Action Recognition