Font Size: a A A

Research On Weakly Supervised Human Action Analysis Based On Deep Learning

Posted on:2022-03-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y P ZhengFull Text:PDF
GTID:1488306734479394Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the popularization of monitoring equipment,our daily activities are increasingly recorded by various cameras,which have created a large amount of visual content.In addition,with the development of various short video social platforms and video websites,the amount of visual content on the Internet is increasing at an exponential rate.It will be impossible to analyze the massive visual content by just manpower.Thus using computer vision technologies to study human action analysis can make a lot of sense.With the advancement of computer information technology and machine learning,deep learning has been widely applied in computer vision.Human action analysis based on deep learning has also achieved great progress.However,the training of deep models needs extremely costly information.In this case,the research on weakly supervised human action analysis is of great significance.Though weakly supervised human action analysis based on deep learning has achieved some progress,the following problems still exist: 1)the problem of unsupervised discriminative representation learning for actions;2)the problem of weakly supervised spatial clues learning for actions;3)the problem of weakly supervised discriminative representation learning for abnormal events.In view of the above problems,this dissertation studies the theory and methods of weakly supervised human action analysis from four aspects.The main research content and contributions are listed as follows:(1)Surrogate labels based unsupervised deep learning for human action categorization.For the fact that it is very difficult to learn discriminative deep representations for action recognition in still images without the supervision of specific action categories,this dissertation builds a training dataset with surrogate labels from unlabeled dataset,and then learns discriminative representations by alternately updating CNN parameters and the surrogate training dataset in an iterative manner.Extensive experiments are conducted on multiple datasets.And the proposed method achieves 59.7% NMI score on Stanford40 dataset,which demonstrates the effectiveness.(2)Spatial attention based visual semantic learning for action recognition.Aiming at the problem of action related spatial semantic learning in still images under the supervision of only action labels,this dissertation proposes a spatial attention layer and a region selection strategy to learn action-specific semantic parts.Moreover,to integrate the information of the scene and action-specific semantic parts,this dissertation creates fusion weights for them to learn discriminative representations by two feature attention layers.Extensive experiments are conducted,and the proposed method achieves 93.0% and 94.1% m AP scores on Stanford40 and Willow datasets respectively,which proves the effectiveness.(3)Weakly supervised spatial clues learning for action recognition.To eliminate the demand of strict supervision on spatial clues learning for action recognition in still images,this dissertation proposes a weakly supervised multiple spatial clues learning method.The proposed method can locate the main human bodies and action-specific semantic parts with only the supervision of action labels.Moreover,the proposed method can obtain discriminative representations by effectively fusing the learned spatial clues.Extensive experiments are conducted on multiple datasets.And the proposed method achieves 30.5% m AP score on the MPII dataset,which proves the effectiveness.(4)Temporal attention based semantic learning for weakly supervised abnormal event detection.Aiming at the problems of the temporal locations learning and specific categories learning of abnormal events simultaneously under only the supervision of abnormal event labels,this dissertation proposes a trilinear attention pooling module and an abnormal discrimination loss.The trilinear attention pooling module is able to find video segments that are most likely to contain abnormal events.The abnormal discrimination loss can guide the neural network to learn discriminative representations.On the UCF Crime dataset,the proposed method achieves an AUC score of 76.5% and 31.5% classification accuracy in tasks of abnormal event detection and classification respectively,which proves the effectiveness.
Keywords/Search Tags:Deep Learning, Action Recognition, Weakly Supervised Learning, Abnormal Event Detection
PDF Full Text Request
Related items