Font Size: a A A

Research On Action Recognition Algorithm Based On Attention Mechanism And Multi-kernel Convolutional LSTM

Posted on:2023-12-21Degree:MasterType:Thesis
Country:ChinaCandidate:R X ZhangFull Text:PDF
GTID:2568306779987289Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of cloud computing and edge computing in the domestic Internet industry,we have become accustomed to living under social platforms and cameras.At the same time,a large amount of video data is also produced in the fields of intelligent surveillance and smart transportation as well as the most popular online short videos.How to better understand the rich behavioral information in video so as to provide decision aids for a wide range of subsequent applications has also become an important research topic.The nature of video is more spatio-temporal than the learning of image tasks.In this paper,the research of action recognition method is carried out from the following two perspectives for how to effectively obtain the spatio-temporal information between video frames and how to solve the problem of long-range dependence of video information.(1)An action recognition method that incorporates attentional mechanisms is proposed.The method aims to efficiently encode spatio-temporal and motion features in a unified 2D CNN framework.First,a "spatio-temporal-motor" block incorporating attentional mechanisms is proposed,which consists of a channeled spatio-temporal aggregation block to extract spatio-temporal features and a channeled motor excitation block to efficiently encode motor features.The "spatio-temporal-motion" block is then used to replace the original residual blocks in the Res Net architecture,which finally forms the action recognition network in this paper.(2)An action recognition method based on a multi-kernel convolutional LSTM network is proposed.In this paper,we first determine the relationship between convolutional kernel size and spatio-temporal modeling in convolutional LSTM networks,and thus propose to replace a single convolutional kernel of multiple output channels with a sequence of convolutional kernels of different dimensions.Then an additional convolutional layer is used to integrate its multi-kernel outputs,which finally constitutes an action recognition network for efficient and accurate classification of videos.To verify the feasibility of the two action recognition methods proposed in this paper,multiple sets of comparison and validation experiments are conducted on the publicly available datasets UCF101,HMDB51,Something-Something V1,and Sports-1M.The experimental results show that both methods proposed in this paper can effectively improve the action recognition accuracy.
Keywords/Search Tags:Action recognition, Video classification, Attention mechanism, Convolutional LSTM, Sequence learning
PDF Full Text Request
Related items