Font Size: a A A

Research On Visual Human Action Recognition

Posted on:2018-08-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:L L WangFull Text:PDF
GTID:1318330536981127Subject:Mechanical and electrical engineering
Abstract/Summary:PDF Full Text Request
Enabling robots to own the vision ability like our human beings is essential for non-contact intelligent interaction between human and robots.Human action based method is the most straightforward and effective way in visual human-robot interaction.Meanwhile,with the increasing demand for more intelligent robots and the development of technologies of image processing,artificial intelligence and robotics,visual action recognition has become a very hot topic recently.However,accomplishing an efficient and robust action recognition under complicated scenarios is still challenging due to the 3D complexity of action signals.At the heart of visual action recognition is to extract fitting features to represent actions both spatially and temporally,and then classify the features using pattern recognition technologies.Focusing on the problem of action spatial-temporal representation,this paper deeply explores the related issues from three layers: the bottom layer for feature extraction,the middle layer for feature description and the top layer for spatial-temporal feature expression.Based on this,we achieve action recognition by combining supervised learning approaches.Generally,this paper consists of the following aspects:Research on spatial-temporal features for describing human motion details is a challenging issue in action recognition.As conventional HS optical flow method is sentive to illumination change and random noise,energy flow algorithm is proposed to accurately analyze the variations of actions in space-time.Specifically,energy flow constructs an energy map firstly as the fundamental features to alleviate the influence of illumination change.Moreover,energy invariance assumption and energy smoothness assumption are presented,based on which the energy flow descriptors can be obtained by solving a constrained Lagrange equation.Then,human motions are able to be analyzed via energy flow directly,while actions can be recognized via fusing energy flow into a bag-of-word model.Studying on local spatial-temporal features for action detection and recognition is also an important problem.As conventional gradient based action recognition mothods cannot obtain very promising results,gradient transfer algorithm is proposed to represent action local features.Specifically,gradient transfer method extracts spatial gradients as fundamental features firstly,and then the forward-backward difference and the 2D projection statistics are applied to describe the features at the middle layer,with a high accuracy.Furthermore,actions are detected by thresholding the 2D projection,while recognized by expressing the descriptors using encoding templates.Using a global spatial-temporal representation template to recognize actions is also a very effective way.As conventional global template is not robust,power difference template is proposed.Specifically,power difference template extracts image power map as fundamental features firstly,and then a normalized projection histogram and a motion kinetic velocity descriptor are presented for feature description at the middle layer,which are fused into a global template.Furthermore,action recognition is finished by classifying the template descriptors using principle component analysis and sequence fusion scheme.Combining deep learnt techniques with appearance models has big prospect in action recognition.As conventional deep learnt based action recognition methods are not very suited for tackling 3D action signals,three-stream convolutional neural networks(CNNs)is proposed.Specifically,in order to attain comprehensive deep learning,three-stram CNNs adopts action image,local optical flow image and global difference template image as fundamental input features firstly.Then,a deep learning architecture comprising of 5 convolution layers,3 pooling layers and 2 fully connected layers is built to extract deep learnt descriptors.Moreover,a soft-VLAD algorithm is presented to express the descriptors,based on which actions are recognized.In summary,this paper exploits the problem of action feature extraction and spatial-temporal representation based on global feature,global template and deep learning architecture,respectively,and multiple applicable methods are proposed.In addition,the introduced approaches are evaluated on popular public action datasets,and the experimental results demonstrate their efficiency,effectiveness and robustness.Plus,an application oriented software system is designed,based on which the related and proposed algorithms are analyzed within some constrained realistic scenarios.
Keywords/Search Tags:Machine vision, Action recognition, Action feature extraction, Action spatial-temporal representation, Action feature classification
PDF Full Text Request
Related items