Font Size: a A A

The Research On Skeleton Action Recognition Based On Multi-scale Attention Mechanism

Posted on:2021-12-09Degree:MasterType:Thesis
Country:ChinaCandidate:X PanFull Text:PDF
GTID:2518306107468004Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Skeleton action recognition is based on the time series representation of human joint positions,and studies how to let the computer automatically recognize the action category by analyzing its motion pattern.It is an emerging field of computer vision,and it is widely used in sport analysis,advanced human-computer interaction,intelligent monitoring and video retrieval.The existing skeleton action recognition methods mainly focus on the feature expression of the skeleton data with special data structure.These methods extract motion features on a single scale in both time and space domain,which is very detrimental to distinguishing complex and easily confused action categories.In order to solve these problems,this paper proposes a skeleton action recognition method based on multi-scale attention mechanism.The main work is as follows:(1)A multi-scale characterization of skeleton data in time dimension is proposed.By multi-scale convolutions in time dimension,we obtain joint features of different time lengths with different receptive fields.Then,the attention mechanism integrates different receptive field features,fully considers the structural relationship between different length features,and enhances the temporal information expression ability of joint features.(2)A multi-scale representation of skeleton data in spatial dimension is proposed,and an attention mechanism that can effectively communicate and transfer between different scales is established.Skeleton data has a fixed physical structure.In addition to obtaining the joint features through a convolutional network,we also obtain a human spatial representation,which containing part(consisting of three adjacent joint points)and body(consisting of two adjacent parts)with clear physical meaning through cascade operations according to the physical connection of the human body.At the same time,an attention mechanism is established on the spatial feature representation of each scale.According to the inclusion relationship between each scale,the two-way attention transmission between fine-grained and coarse-grained helps the network to better focus the action occurrence site and obtain more effective spatial feature expression.Finally,the above two models are cascaded into an overall model,which is used to generate effective feature expression of the skeleton sequence,and the prediction score is obtained through the fully connected layer.The method in this paper is verified on the NTURGB+D dataset,and the classification accuracy of its two protocols,cross-subject and cross-view,is improved to 87.6% and 93.9%,respectively.Compared with the benchmark method(HCN),it respectively increased by 2.8% and 1.7%.
Keywords/Search Tags:skeleton action recognition, convolution neural network, multi-scale, attention mechanism
PDF Full Text Request
Related items