Font Size: a A A

Research On End-to-End Person Action Recognition Algorithm Based On Skeleton Data

Posted on:2022-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y WuFull Text:PDF
GTID:2518306512475174Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the rapid development of artificial intelligence,human action recognition has been widely used in video surveillance,human-computer interaction,action analysis and intelligent security.The traditional human action recognition is mostly based on RGB video or image.However,due to problems such as target occlusion,lighting changes,and complex background,the effect of human action recognition is not ideal.With the popularity of depth sensors such as Kinect,human action recognition based on skeleton data has attracted wide attention.In the existing methods of action recognition based on skeleton data for RGB video,the two-level serial structure is usually used,which combines the skeleton data detection network and the action recognition network based on skeleton data.In order to avoid the error of action recognition based on skeleton data caused by the inaccuracy of skeleton data detection,this paper proposes an end-to-end human action recognition algorithm based on skeleton data.The end-to-end human action recognition network designed by the algorithm includes the skeleton data detection module and the action recognition module based on skeleton data,and the secondary serial network is optimized to an end-to-end action recognition network.In the skeleton data detection module,this paper designs a skeleton data detection structure with ResNet-50 as the backbone network,extracts the skeleton data from the input image sequence,and obtains the three-dimensional information of the human skeleton data(2D coordinate position and the confidence score of whether it is a skeleton data).In the action recognition module,a dual-stream network structure is designed.The first stream is the three-dimensional coordinates of the skeleton data,and the other stream is the action information of the skeleton data between frames,that is,the difference of the coordinate values.Finally,a multi-task target loss function is designed for the proposed end-to-end network:the location loss of the skeleton data detection module is linearly combined with the recognition loss of the recognition module.In this way,not only the skeleton data detection results can positively affect the recognition results,but also the recognition results can feed back to adjust the accuracy of skeleton data detection.In order to verify the effectiveness of the proposed end-to-end human action recognition algorithm based on skeleton data experiments were performed on the public datasets NTU RGB-D and Northwestern-UCLA Multiview Action 3D Dataset(Northwestern-UCLA).In this paper,the accuracy of human action recognition in the Cross-subject and Cross-view methods of the NTU RGB-D dataset reached 85.9%and 93.2%respectively,the accuracy of human action recognition in the Cross-subject and Cross-view methods of the Northwestern-UCLA dataset were respectively reached 87.8%and 95.7%.Compared with AlphaPose(RMPE:Regional Multi-Person Pose Estimation)and ST-GCN(Spatial Temporal Graph Convolutional Networks)secondary serial methods,the accuracy of human action recognition in NTU RGB-D dataset in Cross-subject and Cross-view mode is improved by 12.4%and 13.3%respectively,in Northwestern-UCLA dataset,the average accuracy of human action recognition is improved by 8.3%and 10.5%respectively in Cross-subject and Cross-view mode,which verifies the effectiveness of the proposed method.In terms of skeleton data detection,compared with the AlphaPose and ST-GCN secondary serial methods,the end-to-end method in this paper has increased the detection accuracy of skeleton data on the NTU RGB-D dataset and the Northwestern-UCLA dataset by 8.5%and 11.4%.On the other hand,it also proves the correction effect of the recognition network on the detection results in this paper.In terms of running speed,the frame rate of this paper has reached 21 and 25 on the NTU RGB-D dataset and Northwestern-UCLA dataset respectively,which can meet the real-time requirements.Experimental results prove that the end-to-end human action recognition algorithm based on skeleton data proposed in this paper can effectively recognize human action in videos.
Keywords/Search Tags:end-to-end, skeleton data detection, action recognition, graph convolution, deep learning
PDF Full Text Request
Related items