Font Size: a A A

Research And Implementation Of Video Action Recognition Based On Feature Fusion And Hybrid Attention Mechanism

Posted on:2022-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:B Y LiuFull Text:PDF
GTID:2518306746481964Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of Internet and computer technology,a large amount of data needs to be processed in a timely and effective manner in many fields such as short video entertainment,urban security,accident early warning and fire protection.Moreover,the amount of data is still growing at a high speed,then along comes people's urgent demand for video understanding and analysis technology.In recent years,thanks to the application of deep learning technology in the field of video understanding and analysis,this field has been promoted rapidly.As a key technology in the field of video analysis,action recognition further promotes the rapid development of this field.Due to the large volume of video data,how to achieve high-precision recognition at low computational cost is a huge challenge in the current video action recognition field.In video action recognition tasks,deep neural network usually adopts high-level features to predict and classify,but with the increase of network depth,the resolution of feature graph decreases,so it is difficult to make accurate judgment for some subtle actions.In order to solve the problem of low resolution of high-level features and weak semantic information of low-level features,we fuse multi-scale features in the spatial dimension.In addition,the speed of action is also an important basis to judge the category of action,which requires further fusion of features in the temporal dimension.In this thesis,the spatio-temporal multi-scale feature fusion is adopted to effectively improve the accuracy of subtle and speed sensitive motion recognition.Similar to the way people judge actions,attention mechanism helps the network to extract key information of actions efficiently by learning various characteristics.However,the existing attention methods usually only focus on a single type of features.Therefore,on the basis of summarizing existing attention mechanisms,this thesis proposes a hybrid attention module,which further improves the recognition accuracy by screening and fusing spatio-temporal,channel and motion characteristics.Among them,the spatial and temporal attention network is used to characterize the spatial and temporal features of actions;channel attention is used to enhance the interdependence of channels in the time domain;motion attention is used to model the temporal difference of action feature levels in two adjacent frames.This thesis combines the above two parts of research and proposes a video action recognition method based on feature fusion and hybrid attention mechanism.The hybrid attention mechanism is embedded into the EfficientNet network framework to improve the feature screening ability and the multi-scale feature fusion module is introduced to enhance the representation ability of each level of features.In this paper,a large number of experiments are performed on Ego Gesture,Something-Something V2,and Mini-kinetic datasets.The experimental results show that the proposed method can effectively improve the recognition accuracy of subtle and speed-sensitive actions,and it can better handle the video action recognition task in complex scenes.
Keywords/Search Tags:Video Understanding, Action Recognition, Feature Fusion, Mixed Attention
PDF Full Text Request
Related items