Font Size: a A A

Multi-task Action Recognition Via Integrating With Latent Information

Posted on:2018-07-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y H YangFull Text:PDF
GTID:1368330542993495Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Recently,human action recognition has absorbed ever-increasing attention in computer vision community since it can be applied in many different areas,such as smart surveillance,human-computer interaction,video analysis,medical assistance and so on.In the past few decades,human action recognition has made great progress,and many methods have been reported.However,these existing methods only focus on low-level feature representation,i.e.,silhouette,optical flow,gradient,spatio-temporal feature and deep feature.Moreover,some methods considered action representation and calculation from the view of brain cognition,which can provide experimental testify for the research of cognitive behavioral mechanism,and can improve the basic theory of cognition science through experiments.So,research on human action recognition has important academic meaning and applying value.Compared with object recognition in image,human action recognition more concerns the spatio-temporal change caused by object movement in video.The observation space is extended from 2D to 3D,which greatly increases the difficulties of the action representation and recognition.The majority of existing methods deal with monocular RGB video data that are extremely sensitive to outer factors,such as illumination change,view change,occlusion,and noisy background,which makes the results unsatisfactory.Furthermore,monocular video monitors disable to capture the human movement information in 3D scene,so these monitors are unsuitable to the recognition tasks in real-world scenario.Therefore,human action recognition has many challenging issues to be addressed.For limited video data,this thesis views the RGB video and 3D skeleton video as the objects of interest,and regards the inherent relationships between features and tasks as well as among different tasks as the latent information.By developing new multi-task learning framework with proper constraints,the impacts of the latent information on recognition performance can be exploited and built.Thus the ambiguities among different action classes can be decreased,and so the differences among similar action classes.The main contents of this thesis are concluded as follows:We propose a novel human action recognition method based on multi-task learning framework with super-category.We employ Fisher vector as the action representation by concatenating the gradients of log likelihood with respect to mean vector and covariance parameters of Gaussion Mixture Model.By integrating with the explored super-category information as a prior,feature sharing within super-category and feature competition between super-categories are simultaneously encouraged in multi-task learning framework.Experimental results show that the proposed method achieves higher accuracy with less dimensions of features over several state-of-the-art approaches.Most existing approaches overlook the intrinsic interdependencies between skeleton joints and action classes,thus suffering from unsatisfactory recognition performance.We present a latent max-margin multi-task learning model for 3D action recognition.Specifically,we exploit skelets as the mid-level granularity of joints to describe actions.We then apply the learning model to capture the correlations between the latent skelets and action classes.By leveraging structured sparsity inducing regularization,the common information belonging to the same class can be discovered from the latent skelets,while the private information across different classes can also be preserved.Experimental results show that our model consistently achieves superior performance over recent state-of-the-art approaches.We propose a discriminative multi-instance multi-task learning framework(MIMTL)to discover the intrinsic relationship between joint configurations and action classes.First,a set of discriminative and informative joint configurations for the corresponding action class is captured in multi-instance learning model by regarding the action and the joint configurations as a bag and its instances respectively.Then a multi-task learning model with group structure constraints is exploited to further reveal the intrinsic relationship between the joint configurations and different action classes.Experimental results show that our proposed MIMTL framework performs favorably compared with several state-of-the-art approaches.
Keywords/Search Tags:Human action recognition, 3D Skeleton, Multi-task learning, Multi-instance learning, Latent information, Mutual information
PDF Full Text Request
Related items