Multiple Domain Knowledge Based Deep Convolutional Neural Networks For Action Recognition

Posted on:2021-05-28

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Tasweer Ahmad

Full Text:PDF

GTID:1488306464482244

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Human action recognition finds numerous practical applications in computer vision,video surveillance and retrieval,and entertainment industry.With the recent advent of deep learning and convolutional neural networks,action recognition has achieved great success.A variety of deep learning technologies have been proposed for action recognition.The task of action recognition is considered to be challenging because it requires efficient spatiotemporal representation.Moreover,it is always pragmatic to investigate the most relevant features and involve multiple domain knowledge for action recognition.In this dissertation,three new methods have been proposed to address these issues.In the first method,it is proposed to use multiple domain knowledge(raw RGB,pose and skeleton)by using residual-attention network in order to extract the most relevant features from the input video frames.Then,it is used path-signature features to encode the spatio-temporal information for convolutional neural network.In the second method,it is carefully devised attention-joints to emphasize the most relevant joints of the body skeleton.These attention-joints are encoded as having spatial distances from the center-of-body,neighboring distances among joints to capture the spatial information.Meanwhile,flow of attention-joints among consecutive frames provides the temporal details.These spatial-temporal details are integrated as attention-node featurevectors to a graph convolutional network,which makes a classification using this information.The third method is formulated by using graph sparsification for skeleton-based action recognition.Long-term spatio-temporal graph contains spatial and temporal information at the same time,but it also inherently involves the redundant information.This redundant information results in over-fitting;therefore,it is proposed graph sparsification using edge-effective resistance modeling to get a sparse graph of fewer nodes and edges.Then it is devised graph convolutional neural network with self-attention graph pooling to emphasize the local graph structure for action classification.The proposed models are evaluated on challenging action recognition datasets,such as J-HMDB,HMDB-51,UCF-101,Stanford-40 Action,PKU-MMD,NTU RGB+D,NTU RGB+D-120,Kinetics-Skeleton and UTD-MHAD datasets.There are variety of videos in above datasets,such as You Tube videos,multiple camera videos and action images;the proposed architectures perform state-of-the-art over contemporary methods.

Keywords/Search Tags:

Multiple domain knowledge, Spatio-temporal representation, Residual attention network, Convolutional neural network

PDF Full Text Request

Related items

1	Research On Human Action Recognition Based On Convolutional Neural Network
2	Multi-scale 3D Residual Attention Network For Facial Expression Recognition
3	Research On Representation And Reasoning Of Fuzzy Spatio-Temporal Knowledge Based On Description Logics
4	Research On Spatio-Temporal Representation And Reasoning Based On RCC
5	Research On Video Event Recognition Using Deep Network Spatio-temporal Consistency
6	Spatio-Temporal Context-Aware QoS Collaborative Prediction
7	Research On Image Classification Based On Multi-resolution Convolutional Neural Network
8	Research On Image Super-resolution Reconstruction Algorithm Based On Convolutional Neural Network
9	New Machine Translation Models Based On Improved Self-attention Mechanism
10	Research On Representation Of Fuzzy Spatio-temporal Knowledge With Ontology And Construction Method Based On Petri Net