Font Size: a A A

Video Action Analysis And Recognition Based On Middle Level Semantical Representation

Posted on:2022-06-17Degree:MasterType:Thesis
Country:ChinaCandidate:D XiaFull Text:PDF
GTID:2518306494986419Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Video action recognition task is a foundation of video analysis and understanding,this task aims to identify the category of human behaviors in a video.Traditionally,action recognition has been treated as a high-level video classification problem.However,such manners ignore detailed and middle-level understanding about human actions,can't handle intra-and inter-class variations of human actions well.To fill this gap,this paper deeply investigates video action analysis and recognition method based on middle level semantical representation,by explicitly encoidng human actions as spatio-temporal compositions of body part actions,which can understand human actions better,improve the performance of video action recognition methods.Specifically,this paper proposes a progressive action graph network to recognize human actions in an bottom-up manner,which progressively assembles body parts,their gestures and relevant objects as compositional human representation,and subsequently exploit spatio-temporal relations of humans among frames for action recognition.Progressive action graph network is composed of a middle level semantical action graph module and a high level semantical action representation graph module.First,middle level action semantical graph module can effectively assemble body parts into middle level semantical action representations,with consideration of interactive objects and other body parts.Then,high level action semantical action graph module will assemble middle level semantical action representations into human representations,according to natural structure of human body,and subsequently learns spatio-temporal human relations among frames to form the high level semantical action representations.Finally,high level semantical action representations are fused with video features extracted by video feature extractor to perform video action recognition.In experiments,firstly,this paper validates that this network can boost performance of existing action recognition methods via action composition,in fully-supervised Explain Action dataset.Next,this paper proposes the method of using this network in semi-supervised data setting,validating this network can still boost action recognition performance,with partial-annotations or no-annotation data.Finally,this paper shows that this network can gain great performance on unseen actions in few-shot data setting,which demonstrates the generalization ability of this network.
Keywords/Search Tags:deep learning, action recognition, action composition, middle level semantical representation
PDF Full Text Request
Related items