Font Size: a A A

Research On Continuous Action Recognition Based On Combining Deep Network And Probabilistic Graphical Model

Posted on:2018-07-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:J LeiFull Text:PDF
GTID:1368330623450439Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Human action recognition in video is an important research topic in computer vision.It is widely applied in many fields such as video surveillance,video retrieval and humancomputer interactions.With the arrival of big data age,the scale of video data is getting larger and larger.The demand for human action recognition is becoming more and more urgent.Compared with traditional human isolated action recognition,human continuous action recognition has more practical application value and faces more challenge.It not only needs to deal with the diversity of actions and the complexity of the scene,but also complete the tasks of segmentation and recognition at the same time.Aiming at the two steps: feature extraction and description,action classification in continuous action recognition,this paper proposes the research idea of combining deep network and probabilistic graphical model.Compared with the traditional way of constructing feature by hand,this paper studies on using deep network to automatically learn spatio-temporal action feature based on deep learning technique.It can learn features more adaptive to specific application in a data driven way.For action classification,this paper studies the sequence modelling method for continuous action based on probabilistic graphical model.We build mathematical models using probabilistic graphical model such as latent dynamic conditional random field and hidden Markov model,to describe the dynamic process within action and transitions between actions.The actions are segmented and recognized simultaneously.Besides,we build the integrated model of deep network and probabilistic graphical model.The integrated model is optimized wholly in learning,and incorporates action feature learning and continuous action recognition in a unified way.In details,the main innovations and achievements of this paper include:(1)We propose an object tracking method based on convolutional restricted Boltzmann machine,which learns object features automatically in a data driven way,and improves the accuracy and robustness of object tracking.Object tracking is a foundation problem for many computer vision applications.This paper presents the framework and procedure for object tracking,which adopt the naive Bayes classifier for tracking.In order to solve the challenge of complex factors such as viewpoint variation,illumination change,occlusion and deformation,this paper focuses on the key element of object feature,and uses convolutional restricted Boltzmann machine to learn features automatically.The convolutional restricted Boltzmann machine has convolution structure for the image,and it learns a number of feature extractors under sampling strategy in an unsupervised way.After the features are extracted,the max-pooling operation is following.Then we can obtain the discriminative and robust object features.The experimental results show that our method achieves more accurate and robust tracking results under the interference of complicated factors compared with traditional tracking methods.(2)We propose a continuous action recognition method based on CNN-LDCRF model,which achieves feature learning and action modeling with strong labeled samples.A supervised learning model CNN-LDCRF is proposed for strong labeled samples which contain the label and location of each action.First,this paper designs and constructs a3 D CNN network to extract motion information from the video.The network extracts3 D spatio-temporal features from spatial and temporal domains through 3D convolution kernel.The data from channels of original pixel,gradient and optical flow are put into3 D CNN.It not only distinguishes the information of spatial and temporal dimension,but also adds the prior knowledge of the feature.Second,we adopt LDCRF for the sequence modelling of continuous action.LDCRF model can learn dynamic transitions both between action primitive and between actions.Finally,CNN and LDCRF are integrated under a unified framework,forming a seamless network.The CNN-LDCRF network is trained in ”end-to-end” way,so that the process of feature learning and action modelling can be optimized wholly,and the feature learning ability of CNN and dynamic modelling ability of LDCRF are utilized to the maximum extent.The experimental results show that the CNN feature is superior to the traditional handcrafted features,and the CNN-LDCRF model achieves better results in continuous action recognition.(3)We propose a continuous action recognition method based on hybrid CNN-HMM model,which achieves feature learning and action modeling with weakly labeled samples.Aiming at the problem that the strong labeled samples are difficult to be obtained,we propose a solution that we weakly label the samples,and study the unsupervised learning problem under weakly labeled samples.In this paper,we use the proposed 3D CNN to learn action features and model the continuous action using HMM which is a generative model with the ability of unsupervised learning.The corresponding HMM is built for each action,and the continuous action is described by compositing these HMMs.In order to learn from weakly labeled samples,a hybrid CNN-HMM model is constructed.The Gaussian mixture model is replaced by CNN to model the emission probability of HMM.The HMM estimates the optimal action label sequence corresponding to the weakly labeled sample,and labels the sample automatically.This labeled sample can be used to train the CNN.The CNN and HMM are interdependent,and are trained alternately to achieve overall optimization.The experimental results show that the CNN-HMM model can be effectively trained with weakly labeled samples,and achieves better results in continuous action recognition.(4)We propose an end-to-end training method for CNN-LDCRF model under weakly labeled samples,which solves the learning problem of CNN-LDCRF model under weakly labeled samples.In this paper,we introduce the ECTC algorithm on the basis of CNNLDCRF model,and connect the ECTC layer to the top of the CNN-LDCRF network.For weakly labeled sample,the ECTC layer effectively evaluates all alignments between input video and label sequence by dynamic programming.During the evaluation process,the ECTC layer considers the visual similarity between video frames,so that video frames with similar visual features have consistent class label.We deduce the back propagation process.The errors of objective function are back propagated from ECTC layer to CNNLDCRF network,updating the parameters of CNN-LDCRF network,so that the CNNLDCRF model is trained in ”end-to-end” way.The experimental results show that the CNN-LDCRF model can be effectively trained in the ”end-to-end” way under the weakly labeled samples,and achieves satisfactory performance in continuous action recognition.
Keywords/Search Tags:continuous action recognition, deep learning, convolutional neural network, probabilistic graphical model, conditional random field, hidden Markov model
PDF Full Text Request
Related items