Video Action Analysis And Recognition Based On Middle Level Semantical Representation

Posted on:2022-06-17

Degree:Master

Type:Thesis

Country:China

Candidate:D Xia

Full Text:PDF

GTID:2518306494986419

Subject:Pattern Recognition and Intelligent Systems

Abstract/Summary:

PDF Full Text Request

Video action recognition task is a foundation of video analysis and understanding,this task aims to identify the category of human behaviors in a video.Traditionally,action recognition has been treated as a high-level video classification problem.However,such manners ignore detailed and middle-level understanding about human actions,can’t handle intra-and inter-class variations of human actions well.To fill this gap,this paper deeply investigates video action analysis and recognition method based on middle level semantical representation,by explicitly encoidng human actions as spatio-temporal compositions of body part actions,which can understand human actions better,improve the performance of video action recognition methods.Specifically,this paper proposes a progressive action graph network to recognize human actions in an bottom-up manner,which progressively assembles body parts,their gestures and relevant objects as compositional human representation,and subsequently exploit spatio-temporal relations of humans among frames for action recognition.Progressive action graph network is composed of a middle level semantical action graph module and a high level semantical action representation graph module.First,middle level action semantical graph module can effectively assemble body parts into middle level semantical action representations,with consideration of interactive objects and other body parts.Then,high level action semantical action graph module will assemble middle level semantical action representations into human representations,according to natural structure of human body,and subsequently learns spatio-temporal human relations among frames to form the high level semantical action representations.Finally,high level semantical action representations are fused with video features extracted by video feature extractor to perform video action recognition.In experiments,firstly,this paper validates that this network can boost performance of existing action recognition methods via action composition,in fully-supervised Explain Action dataset.Next,this paper proposes the method of using this network in semi-supervised data setting,validating this network can still boost action recognition performance,with partial-annotations or no-annotation data.Finally,this paper shows that this network can gain great performance on unseen actions in few-shot data setting,which demonstrates the generalization ability of this network.

Keywords/Search Tags:

deep learning, action recognition, action composition, middle level semantical representation

PDF Full Text Request

Related items

1	Online Human Action Analysis Based On Deep Learning
2	Research On Action Recognition Method Based On Feature Representation
3	Human Action Recognition Based On Feature Representation And Attribute Mining
4	Research On Action Recognition In Videos
5	Research On Video Action Recognition Method Based On Visual Representation And Deep Neural Networks
6	Research On Temporal Action Detection And Action Recognition Based On Deep Learning
7	Research On Visual Human Action Recognition
8	Research And Implementation Of Video Action Detection Task Based On Deep Learning
9	Research On Human Action Recognition In Videos
10	Analyzing And Understanding Human Actions In Videos