Font Size: a A A

Research On Pose Adaptive Human Action Recognition

Posted on:2020-01-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:J DongFull Text:PDF
GTID:1368330626450312Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Human action recognition based on static image is one of the main research field in computer vision.The main target of it is to recognize what human is doing when given the bounding box of a certain person.There are mainly two kinds of recognition methods: one is to take human action recognition as a general image classification problem without considering the specific characteristic of action recognition;the other one is to model the key elements related to human action(such as human pose,object and scene)to capture effective features for action recognition.The emergence of deep convolutional neural network promotes the performance improvement of many tasks in computer vision and also makes human action recognition in static images satisfies the requirement of many real applications.However,due to the viewpoint changes and pose variety of human performing the same action and other factors,the appearance of human action projected on the 2D static images is diverse making its recognition very difficult.Three aspects are considered to solve this topic in this paper: one is to take it as a general image classification problem and in virtue of sparse coding and dictionary learning algorithm to enhance the representation ability of local features;another one is to model scene and pose information related to human action using deep learning framework which further improves the representation ability of image feature;the final one is to utilize human action videos to add dynamic information to the recognition model of static image and enrich image representation.Finally,how to combine this topic with real application and make it applicable to mobile device is researched.The main work and contributions are summarized as follows:(1)In the traditional dictionary learning algorithm,Principal Component Analysis(PCA)feature based on spatial pyramid feature is used to represent the whole image which loses some detailed information and local discriminatition ability.Based on this problem,two schemes are proposed: one is a cascade dictionary learning algorithm in which the first stage is a standard dictionary learning and sparse coding algorithm.The second stage is a discriminative block and group dictionary learning method,whose inputs are the matrix form of spatial pyramid features based on the sparse codes of the first stage.The other one is a supervised dictionary learning and supervised sparse coding algorithm based on local descriptors.Both the dictionary learning stage and sparse coding stage take label information as input,which makes the sparse codes of local descriptors more discriminative.To get effective classifiers,a discriminative weighting model based on max-margin criterion is proposed.This model is efficiently solved by combing it into the framework of multiple-kernel learning.(2)The 2D coordinates of human keypoints in static images can not be directly used for human action recognition.To solve this problem,based on deep convolutional neural networks,two methods taken human pose as auxiliary information to improve the performance of human action recognition in static images are proposed.One is to get the key regions of human's interactions with context using human pose estimation.The detected regions and the bounding box region of human are all fed into an end-to-end feature extraction and feature fusion convolutional neural network.The other one is to train a network for general image classification task which takes human pose estimation network as the main network and the parameters of human pose estimation as initialization parameters.This model provides complementary information for the previous one.(3)The static images lack of dynamic information for action recognition.To solve this problem,a knowledge-transfer model based on convolutional neural network is proposed to transfer the knowledge in RGB space and flow space of videos to the recognition model of static images.By leveraging generative structure,one static image can generate discriminative features of both RGB sequences and flow sequences.Meanwhile,a weighted loss based on reconstruction loss and classification loss is introduced to guide the generation process.This model can successfully enrich the representation of static images and endows the recognition model of static images some dynamic information.(4)For the application in embedded devices,a deep network compression method based on feature map reconstruction is proposed for human pose estimation.The teacher-student structure is adopted to make the student network with less parameters to reconstruct the output of teacher network with more parameters.Meanwhile,an effective multi-stage training scheme is proposed,with which the parameters and operations of the network are greatly reduced while keeping the performance of human pose estimation nearly unchanged.The network of human action recognition is also compressed through distilling model.Both human pose estimation and human action recognition are successfully applied in the intelligent photography.
Keywords/Search Tags:action recognition in static images, dictionary learning, deep neural network, human pose estimation, knowledge-transfer
PDF Full Text Request
Related items