Action Recognition Using Discriminative Topics And Temporal Structures

Posted on:2019-12-24

Degree:Doctor

Type:Dissertation

Country:China

Candidate:T W Wang

Full Text:PDF

GTID:1368330575969854

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Action recognition in videos is a core technology in the field of computer vision.Given some training samples and their predefined action categories,the task of action recognition is to automatically predict the category labels of actions in test videos.Be-cause of the wide application prospects in intelligent video surveillance,human-machine interaction,content based information retrieval,ambient assisted living and other fields,the research on action recognition is of great significance and value.Up to date,de-spite considerable research,there are still some problems in semantic analysis,long-term dynamic evolution modeling,temporal structure modeling between sub-actions and hier-archical modeling.In this thesis,we make in-depth research on action recognition from four levels of research,and propose four novel methods.(1)A method named supervised probabilistic Latent Semantic Analysis(spLSA)is proposed.Probabilistic Latent Semantic analysis(pLSA)is essentially an unsupervised semantic analysis method.When pLSA and its extensions are used for video action clas-sification task,the category labels of training samples have not been fully used in training process,which leads to the lack of adequate discrimination of these learned topics.In this thesis,to learn discriminative topics,spLSA introduces class information into the generation process of words and video samples,and adopts a conditional probability to describe the mapping relationship between the latent topic and the category.spLSA is a unified architecture,in which the latent semantic analysis and the classification of action videos are performed simultaneously.In the process of model fitting,the parameters of spLSA are learned by an Expectation Maximization(EM)algorithm.This EM algo-rithm involves some iterative processes,where the log-likelihood of the complete data is maximized.With the use of category information,spLSA can learn the discriminative topics in action videos while preserving the ability of semantic analysis.(2)A method named Multi-scale Rank Pooling(MSRP)is proposed.This method uses video frames as the basic modeling objects to capture the multi-scale long-term dynamic evolution in action videos.Most of the existing methods consider evolution modeling and multi-scale feature fusion in two separated phases,which cannot capture the optimal dynamic evolution.To address this issue,this thesis introduces a temporal multi-scale smoothing vector into the process of Rank Pooling.This vector is used to define how the representations at different temporal scales are combined together for frame smoothing.MSRP uses two structural risk minimization methods(i.e.,regression structural risk and classification structural risk)to optimize the objective function in a joint learning framework,where the smoothing vector learning,evolution modeling and classifier training are performed jointly.As a result,MSRP can learn a discriminative and flexible representation of multi-scale rather than that of a single scale or a fixed multi-scale.In addition,because the multi-scale evolution is modeled in the pooling stage,MSRP can learn compact evolution feature vectors whose dimensions are as same as the original ones.(3)A method named Latent Duration Model(LDM)is proposed.LDM is a tem-poral variant of Deformable Part Model(DPM),and takes video segments as the basic modeling objects.For each action class,LDM learns a composite template,which in-cludes a root,templat,e and several sub-action templates with strict temporal ordering.To enhance the discriminability of sub-action templates,three types of latent variables are introduced into LDM.Latent duration variables are used to describe intra-class tem-poral scale variation.Latent location variables and latent representation variables are utilized to help search the most discriminative segments in the durations.For temporal structure and relationship,our model takes into account both temporal ordering and duration ratio between consecutive parts,which are robust and flexible to the variety in motion speeds and view angels of action videos.Thus,not only discriminative parts with adaptive durations but also robust pairwise relationship is automatically discovered by LDM.(4)A hierarchical modeling method is proposed.This method constructs a Dynamic Hierarchical Tree(DHT)for each action video from bottom to top.Different from the existing methods that only use feature vectors to build hierarchies,this thesis takes into account two important indicators:similarity of feature vectors and compatibility of dynamic evolution modes,which makes the generated tree structures more suitable for describing actions in videos.In order to ensure that the video segments in leaf nodes are meaningful atomic actions,we use a modified DTW algorithm with minimum length constraint and maximum length one to segment the action videos initially.The minimum length constraint can make the motion mode in an atomic action is stable,while the maximum length constraint ensures that the atomic action contains only a simple and consistent motion mode.A k-Nearest Neighbors Edge Pairs kernel(kNNEP Kernel)is also proposed.Following the idea of "k-nearest neighbors",kNNEP kernel measures the similarity between two trees by the mean value of multiple edge similarities,which can effectively avoid the interference of noise nodes to classification performance.We have carried out thorough experiments on several public datasets,and the ex-perimental results show that the proposed methods achieve higher performances than the related state of the art.

Keywords/Search Tags:

Action Recognition, Supervised pLSA, Rank Pooling, Evolution Modeling, Multi-scale Representation, Discriminative Segment, Latent Duration, Parts-based Model, Hierarchical Model, Tree Kernel

PDF Full Text Request

Related items

1	Human Action Recognition In Videos Of Realistic Scenes Based On Multi-Scale CNN Feature
2	Research Based On Local Spatiotemporal Features And Parts For Human Action Recognition From Videos
3	Human Action Recognition And Suspected Cheating Behavior Detection Based On Latent SVM
4	Analyzing And Understanding Human Actions In Videos
5	Study On The Skeleton-based Human Action Recognition Model
6	Research On Hierarchical Action Recognition Based On Attention Guidance
7	Research On Low-rank Presentation And Recognition Of Human Actions In Video Sequences
8	Research On Action Prediction In Videos
9	Research On Face Recognition Algorithms Based On Sparseness And Low-rank Constraints
10	Structural Low-rank Representation Based Human Action Recognition