Font Size: a A A

Multi-Modal Space-Time Feature Learning Based 3D Human Action Recognition

Posted on:2018-09-16Degree:MasterType:Thesis
Country:ChinaCandidate:J H ZhaoFull Text:PDF
GTID:2428330518983062Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of machine learning and imaging sensor technology,for the past few years,intelligent applications based on RGB-D sensors have been paid more and more attention in many fields such as intelligent monitoring,intelligent retrieval,human-computer interaction,automatic labeling and so on,Of which the analysis of the moving target is one of the key technologies.Therefore,there is widely concerned about the human action recognition in the imaging scene and the related space-time feature learning.From the technical level,the human behavior recognition based on RGB-D sensor should make full use of all the sequence information,especially the depth sequence information,which can be provided by the sensor.And based on the theory of the motion visual analysis,combined with the frontier of machine learning,artificial intelligence,computer vision and pattern recognition and so on domains,designing or learning to obtain the best distinguishable space-time features to effectively characterize different types of behavioral actions,and thus achieve the action recognition system with high precision and high reliability,which has a strong research value.The main work of this paper is as follows:(1)To design a dual-stream 3D space-time convolution neural network action recognition framework,in order to study the global space-time characteristics of each action category,we consider the use of the original depth map sequence data as a modal data input;We consider the high correlation of the human action itself on the time domain,then we introduce the deep motion map sequence as the second modal data input to another stream of 3D space-time convolution network.And we use the corresponding 3D skeleton sequence data as the third modal input of the whole recognition framework.Considering the advantages of the skeleton sequence data including the 3D coordinates,and the existence of rate change,temporal mismatch and noise and so on problems,we use the methods of artificial designed space-time features to process these problems.This allows the whole recognition system to fully exploit and utilize the discriminatory space-time features of human actions from different perspectives,and ultimately improve the classification accuracy of the recognition system.We compare the evaluation on different 3D public data sets,illustrating the effectiveness of the proposed method.(2)The human skeleton sequence is represented based on the data of human joints.Firstly,we use the special Euclidean group to describe the rotation and translational motion of the body parts of the human skeleton,and then use the Lie group structure to represent the skeleton sequences of different action categories respectively.Because of its intrinsic Riemannian geometry relationship,we consider via sparse Riemannian manifold subspace learning on the multi-task learning framework to study the 3D action recognition from the skeleton sequence.Finally,taking into account the advantages of locally discriminable of the space-time convolution features of the depth motion maps and the space-time feature of the skeletal manifold,while the original depth map sequence and the original depth motion map sequence are favorable to capture the global space-time features of the human motions,we propose a multi-task joint learning action recognition model based on a variety of space-time heterogeneous features,and we prove the effectiveness of our method on different open data sets.
Keywords/Search Tags:Action Recognition, Convolutional Neural Network, Manifold Space, Multi-Task Learning
PDF Full Text Request
Related items