Font Size: a A A

Attention Mechanism Based Action Recognition

Posted on:2021-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:J Y ChenFull Text:PDF
GTID:2518306050471564Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
With the development of the information age,video data has emerged in a large number in daily life and work.Human action recognition in video also has great application value,involving intelligent monitoring,human-computer interaction,autonomous driving and other fields.Action recognition is one of the important tasks for the development of intelligent industry.The human action in the video is a dynamic change in time and space.In recent years,the research on action recognition based on deep learning has developed rapidly.Network structure has been constantly innovated,and the data form involves a variety of modes such as skeleton,RGB and depth map.At present,most action recognition methods rely on big data and high-performance computing technology.But the further improvement of performance cannot be achieved without the guidance of human thinking.Visual attention is one of the mechanisms suitable for network.Attention mechanism can help network extract key information in data efficiently.In different data and tasks,the corresponding attention mechanism should be designed to help improve network performance.Therefore,in the skeleton data and RGB data,this thesis designs the action recognition method based on the attention mechanism respectively.By enhancing the learning of key information,the network has achieved a high recognition accuracy.The main work of this thesis includes the following two points:Firstly,in the skeleton data,this thesis proposes a two-person interaction recognition method based on guided attention.Skeleton data has the characteristics of light weight and less interference,and is widely used in action recognition tasks.However,there is still a lot of redundant information when dealing with two-person interaction.Most of the existing methods cannot effectively extract the interaction features.In order to solve this problem,this thesis proposes a graph convolutional network based on guided attention,which is used for the two-person interaction.The network takes human observation experience as the prior knowledge and designs the skeleton graph of two-person interaction.It establishes the relation of body parts that may produce interaction.With the preset two-person relation,interacted features can be learned better.Experiments show that the method proposed in this thesis can improve the two-person interaction recognition accuracy greatly.Secondly,in RGB video,this thesis proposes an action recognition method based on spatial-temporal attention.RBG video is widely available,but it has a large amount of interference information in space and time.Human action generally occurs only in the local area of space,and has different importance in time.To solve this problem,this thesis proposes an action recognition network based on spatial-temporal attention.By designing the attention mechanism in pseudo-3D convolution,the network enhances the learning of important information.In spatial dimension,the network automatically locates the discriminative region and learns the key information of the action.The final result combines local and global information.In temporal dimension,the temporal attention module is designed to calculate the importance of features.Then it enhances the features of important moments.Spatial attention and temporal attention are built in a unified way in the network to enhance spatial-temporal key information.To sum up,this thesis mainly studies the action recognition based on attention mechanism.The attention mechanism refers to the analysis of human vision.We design the corresponding attention network according to the data and task.By improving the feature learning ability,the methods proposed in this thesis have achieved good results on skeleton data and RGB data respectively.
Keywords/Search Tags:action recognition, attention mechanism, graph convolution, pseudo-3D convolution, feature enhancement
PDF Full Text Request
Related items