Font Size: a A A

Lightweight Person-person Interaction Recognition Based On Skeleton Sequence

Posted on:2020-04-05Degree:MasterType:Thesis
Country:ChinaCandidate:X XuFull Text:PDF
GTID:2428330602451868Subject:Engineering
Abstract/Summary:PDF Full Text Request
As one of the most important research field in video understanding,human action recognition has been widely used in intelligent surveillance,intelligent nursing,human-computer interaction and robot control scenarios.In the research of human behavior recognition,the use of RGB video data is easily affected by background diversity,illumination changes,clothing changes and other factors.While,the human skeleton data itself is a highly abstract of human body and is robust to environment changes.As a result,skeleton-based action recognition has become a hot research field.At present,methods for skeleton-based recognition using convolutional neural network can accurately model skeleton sequence in time and space.However,the amount of parameters of them is large and the models need to be pre-trained with large scale image classification dataset.It brings challenges to hardware storage and computing performance.In addition,as a subset of human action,person-person interaction has not been modeled specifically in current methods,so the accuracy of interaction recognition needs to be improved.To solve the above problems,this paper proposes a lightweight person-person interaction recognition method,which can accurately recognize person-person interactions with very few parameters without pre-training step.The main work of this paper is as follows:Firstly,this paper conducts action recognition using 3D skeleton data obtaining from the depth camera.In order to solve the problem of large parameter quantity of action recognition model,a lightweight convolutional neural network with few parameters is constructed for person-person interaction recognition.Our model includes feature extraction network,correlation feature learning module and action classification module.In order to model the spatio-temporal features in skeleton sequence,we convert a skeleton sequence of one person to a skeleton image according to its spatio-temporal distribution.Then,a feature extraction network with very few parameters is constructed by convolutional layers.Two parallel networks are used to extract the spatio-temporal features from two skeleton sequences of two persons in an interaction,and parameters of these two networks are shared.At the same time,in order to effectively learn the relationship between two persons in an interaction,this paper designs a correlation feature learning module,which fuses features of two persons.Finally,we analyze the parameters of different layers in the proposed network.Compared to existing models,our model is much lighter.Experimental results show that our method can recognize person-person interactions in high accuracy while parameters of our model is quite less.Secondly,due to the high cost of depth camera and the limitation of shooting scene,this paper investigates human pose estimation to extract 2D skeleton data from RGB images which can be obtained from cost-effective traditional cameras.When applied to videos,existing pose estimation methods always introduce missing detection of joints caused by motion blurring and pose changes.This paper proposes a joint tracking algorithm based on optical flow,which uses the temporal context information extracted from real-time optical flow network to track joint in video.At the same time,the detection results and tracking results are fused to form the final results.The experimental results show that our method can effectively solve the problem of missing joints in video and ensure the continuity of joint detection.At the same time,the skeleton extracted by this method has a significant improvement in action recognition task compared with the skeleton extracted by the existing pose estimation method.In summary,the proposed light-weight person-person interaction recognition method takes up very little storage and computing resources.It achieves good results on both depth cameras and traditional cameras.It has very important research and application value.
Keywords/Search Tags:skeleton sequence, skeleton image, correlation feature learning, optical flow, joint tracking
PDF Full Text Request
Related items