Font Size: a A A

Research On Action Recognition Method Based On Motion Feature Extraction And Spatio-temporal Feature Fusion

Posted on:2022-02-20Degree:MasterType:Thesis
Country:ChinaCandidate:X C LiuFull Text:PDF
GTID:2558307154475144Subject:Engineering
Abstract/Summary:PDF Full Text Request
Since the invention of camera technology,video has become an important information storage medium.With the rise and popularization of short videos in recent years,the attention paid to video understanding has increased.Human action recognition in video is a challenging task in video understanding.Its main goal is to model and classify human actions in videos.In order to effectively extract motion features and fuse spatio-temporal features,this paper focuses on the two links of input frame and feature map,and proposes effective methods for their respective problems:1.An action recognition method based on local motion modeling is proposed to supplement motion information for a single input frame in a local time window.A single frame within a local time window can provide static information,but cannot express motion information.This method adopts the strategy of combining sparse sampling and dense sampling to obtain the inter-frame RGB difference in each local time window,encoding it in the early stage of the network,and capturing the local motion features.Then the motion features are added to the original features of the input frame to enrich the input information,which lays the foundation for the subsequent modeling of the network.2.An action recognition method based on channel attention and spatio-temporal information fusion is proposed to emphasize action-related features and fully integrate spatiot-emporal context information.This method first calculates the motion-related attention weight of each channel in the feature map through the channel-level attention mechanism,and then adjusts the feature value according to the weight to enhance the motion-related features.Then,it uses the Temporal Shift Module to move part of the channels along the time dimension to promote the exchange of information between adjacent frames,fully integrate spatio-temporal context information without adding additional parameters,and solve the problem that the 2D convolutional network is difficult to model the temporal relationship.This paper conducts systematic experiments on multiple benchmark data sets to verify the recognition accuracy of the two methods proposed in this paper when used alone and in combination.The experimental results prove the effectiveness of the method proposed in this paper.
Keywords/Search Tags:Computer vision, action recognition, video understanding, motion feature extraction, attention mechanism
PDF Full Text Request
Related items