Font Size: a A A

Research On Spatiotemporal Two-Stream Human Action Recognition Method Based On Skeleton

Posted on:2021-11-12Degree:MasterType:Thesis
Country:ChinaCandidate:T T HouFull Text:PDF
GTID:2568306104464514Subject:Engineering
Abstract/Summary:PDF Full Text Request
Human action recognition is an important task in computer vision.In the research field of family service robots,the interaction between humans and robots is extremely important.Only by being able to accurately classify human actions,can robots better provide more accurate and high-level services to human beings.The early action recognition algorithms were all based on color image sequences.With the emergence of affordable depth sensor and real-time skeleton estimation algorithm,the action recognition based on skeleton has attracted the attention of many scholars.The use of skeletal data can not only improve the change of illumination and viewpoint,but also ignore the differences caused by clothing,skin color,hair style,etc.This thesis makes a comprehensive analysis of human action recognition methods based on skeleton data in recent years.Considering the characteristics of family service robots,and the problems of complex training,large calculation and low recognition accuracy of human action recognition models,a human action recognition method based on skeleton data is put forward.The specific research contents are as follows.Firstly,Kinect sensor is used to obtain 3D skeleton coordinate data and represent the skeleton model.Considering time evolution and spatial geometry information,the representation of temporal subnet skeleton model and spatial subnet skeleton model are established.The temporal subnet skeleton model is used to calculate the difference between frames of action sequences,and the spatial subnet skeleton model is used to calculate the geometric information of edges.The algorithm of key points extraction and model representation of human skeleton is summarized.Secondly,based on the framework model of spatiotemporal subnet,an end-to-end twostream human action recognition model of the convolution-long and short term memory network with spatiotemporal information is designed.This model is divided into three parts:the first part is the temporal subnet,the second part is the spatial subnet,and the last part is their fusion.The representation matrices of the spatiotemporal skeleton model are taken as the input of the model,and then they are input to 1D CNN for sampling.After extracting the features of the action sequence,these features are input into the LSTM-based deep neural network for learning,and a deeper level of time dependence is obtained.In order to select the most significant skeleton joint motion for each frame,an attention mechanism is introduced for temporal data.The model classifies action by combining the high-level features of temporal domain and spatial domain.In addition,in order to improve the generalization ability of the model,this thesis introduces the data enhancement techniques of rotation and scaling to 3D coordinates of the skeleton during training.Finally,the model is validated on 60 action categories of NTU RGB+D dataset and 20 action categories of MSR Action 3D dataset.Experimental results show that the proposed method is feasible and effective.Compared with other methods,the proposed method has higher recognition accuracy.
Keywords/Search Tags:human action recognition, spatiotemporal representation, skeleton sequence, CNN, LSTM
PDF Full Text Request
Related items