Font Size: a A A

Skeletal Action Recognition Based On Attention Mechanism Preferences And Local Information Enhancement

Posted on:2023-06-08Degree:MasterType:Thesis
Country:ChinaCandidate:M Q ZhuFull Text:PDF
GTID:2568307103485254Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the progress and development of computer vision and artificial intelligence technology in various fields,video intelligent understanding is becoming an indispensable part.Among them,the task of behavior recognition for spatiotemporal analysis of human motion information has become a research hotspot.Obtain human motion data to analyze motion state and motion intention,and accurately identify motion classification.Compared with the traditional way of using videos and pictures as information carriers,action recognition based on human skeleton information has won more attention due to the robustness and generalization ability of skeleton data to light intensity and complex background.Among them,the graph convolutional network,which co-occurs and extracts by modeling the different action information of the human skeleton in the spatial and temporal dimensions as spatiotemporal feature maps,is the most commonly used technical means.It mainly learns long-term interactions through a series of 3D convolutions.connection,but this connection is limited and limited by the size of the convolution kernel.To solve this problem,this paper introduces the self-attention mechanism in Transformer to capture long-range dependencies and obtain global information,and designs a convolutional self-attention module to solve the strong dependence of Transformer on data and the computational cost.question.The main work and contributions are as follows:(1)A skeleton action recognition model based on synergistic graph convolution and Transformer is proposed.By introducing the self-attention mechanism in Transformer to establish long-range dependencies,and combining it with graph convolutional network for action recognition,the model can not only extract local information through graph convolutional network,but also capture rich long-range information through Transformer dependencies.In addition,the Transformer’s self-attention mechanism is calculated at the pixel level,so it has a huge computational cost.The model designs a network stage division strategy to divide the entire network into two stages.The first stage uses pure convolution to extract shallow spatial features,the second stage uses the proposed Conv T block to capture high-level semantic information,reducing the computational complexity.(2)A convolutional self-attention module is designed to replace the linear embedding in the original Transformer,and a multi-scale framework is used to simultaneously model the multi-order data of the skeleton.The original Transformer architecture loses position and order information,adding a fixed position encoding by vectorizing all input sequences,and then using linear embedding to map the data,and this paper designs a convolutional self-attention module to replace the original linear Embedding,using the characteristics of graph convolution to obtain local spatial information enhancement and implicitly obtain position information,which can remove the position encoding,improve the performance of the model and become more lightweight.In addition,this paper uses a multi-scale framework to combine the first-order joint and second-order bone information of skeleton data,and considers multi-scale fusion features to obtain better feature extraction results.In summary,this paper mainly studies the action recognition based on human skeleton,and finds an effective method to enable the model to obtain both local and global information,and reduce the complexity of the model in a lighter way.Finally,experiments are carried out on two classic action recognition datasets,NTU-RGB+D and KineticsSkeleton.The experimental results show that the proposed method is effective and improves the performance of the model.
Keywords/Search Tags:action recognition, human skeleton, Transformer, graph convolutional network, self-attention mechanism
PDF Full Text Request
Related items