Font Size: a A A

Action Recognition Using Discriminative Topics And Temporal Structures

Posted on:2019-12-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:T W WangFull Text:PDF
GTID:1368330575969854Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Action recognition in videos is a core technology in the field of computer vision.Given some training samples and their predefined action categories,the task of action recognition is to automatically predict the category labels of actions in test videos.Be-cause of the wide application prospects in intelligent video surveillance,human-machine interaction,content based information retrieval,ambient assisted living and other fields,the research on action recognition is of great significance and value.Up to date,de-spite considerable research,there are still some problems in semantic analysis,long-term dynamic evolution modeling,temporal structure modeling between sub-actions and hier-archical modeling.In this thesis,we make in-depth research on action recognition from four levels of research,and propose four novel methods.(1)A method named supervised probabilistic Latent Semantic Analysis(spLSA)is proposed.Probabilistic Latent Semantic analysis(pLSA)is essentially an unsupervised semantic analysis method.When pLSA and its extensions are used for video action clas-sification task,the category labels of training samples have not been fully used in training process,which leads to the lack of adequate discrimination of these learned topics.In this thesis,to learn discriminative topics,spLSA introduces class information into the generation process of words and video samples,and adopts a conditional probability to describe the mapping relationship between the latent topic and the category.spLSA is a unified architecture,in which the latent semantic analysis and the classification of action videos are performed simultaneously.In the process of model fitting,the parameters of spLSA are learned by an Expectation Maximization(EM)algorithm.This EM algo-rithm involves some iterative processes,where the log-likelihood of the complete data is maximized.With the use of category information,spLSA can learn the discriminative topics in action videos while preserving the ability of semantic analysis.(2)A method named Multi-scale Rank Pooling(MSRP)is proposed.This method uses video frames as the basic modeling objects to capture the multi-scale long-term dynamic evolution in action videos.Most of the existing methods consider evolution modeling and multi-scale feature fusion in two separated phases,which cannot capture the optimal dynamic evolution.To address this issue,this thesis introduces a temporal multi-scale smoothing vector into the process of Rank Pooling.This vector is used to define how the representations at different temporal scales are combined together for frame smoothing.MSRP uses two structural risk minimization methods(i.e.,regression structural risk and classification structural risk)to optimize the objective function in a joint learning framework,where the smoothing vector learning,evolution modeling and classifier training are performed jointly.As a result,MSRP can learn a discriminative and flexible representation of multi-scale rather than that of a single scale or a fixed multi-scale.In addition,because the multi-scale evolution is modeled in the pooling stage,MSRP can learn compact evolution feature vectors whose dimensions are as same as the original ones.(3)A method named Latent Duration Model(LDM)is proposed.LDM is a tem-poral variant of Deformable Part Model(DPM),and takes video segments as the basic modeling objects.For each action class,LDM learns a composite template,which in-cludes a root,templat,e and several sub-action templates with strict temporal ordering.To enhance the discriminability of sub-action templates,three types of latent variables are introduced into LDM.Latent duration variables are used to describe intra-class tem-poral scale variation.Latent location variables and latent representation variables are utilized to help search the most discriminative segments in the durations.For temporal structure and relationship,our model takes into account both temporal ordering and duration ratio between consecutive parts,which are robust and flexible to the variety in motion speeds and view angels of action videos.Thus,not only discriminative parts with adaptive durations but also robust pairwise relationship is automatically discovered by LDM.(4)A hierarchical modeling method is proposed.This method constructs a Dynamic Hierarchical Tree(DHT)for each action video from bottom to top.Different from the existing methods that only use feature vectors to build hierarchies,this thesis takes into account two important indicators:similarity of feature vectors and compatibility of dynamic evolution modes,which makes the generated tree structures more suitable for describing actions in videos.In order to ensure that the video segments in leaf nodes are meaningful atomic actions,we use a modified DTW algorithm with minimum length constraint and maximum length one to segment the action videos initially.The minimum length constraint can make the motion mode in an atomic action is stable,while the maximum length constraint ensures that the atomic action contains only a simple and consistent motion mode.A k-Nearest Neighbors Edge Pairs kernel(kNNEP Kernel)is also proposed.Following the idea of "k-nearest neighbors",kNNEP kernel measures the similarity between two trees by the mean value of multiple edge similarities,which can effectively avoid the interference of noise nodes to classification performance.We have carried out thorough experiments on several public datasets,and the ex-perimental results show that the proposed methods achieve higher performances than the related state of the art.
Keywords/Search Tags:Action Recognition, Supervised pLSA, Rank Pooling, Evolution Modeling, Multi-scale Representation, Discriminative Segment, Latent Duration, Parts-based Model, Hierarchical Model, Tree Kernel
PDF Full Text Request
Related items