Font Size: a A A

Multiple Domain Knowledge Based Deep Convolutional Neural Networks For Action Recognition

Posted on:2021-05-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:Tasweer AhmadFull Text:PDF
GTID:1488306464482244Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Human action recognition finds numerous practical applications in computer vision,video surveillance and retrieval,and entertainment industry.With the recent advent of deep learning and convolutional neural networks,action recognition has achieved great success.A variety of deep learning technologies have been proposed for action recognition.The task of action recognition is considered to be challenging because it requires efficient spatiotemporal representation.Moreover,it is always pragmatic to investigate the most relevant features and involve multiple domain knowledge for action recognition.In this dissertation,three new methods have been proposed to address these issues.In the first method,it is proposed to use multiple domain knowledge(raw RGB,pose and skeleton)by using residual-attention network in order to extract the most relevant features from the input video frames.Then,it is used path-signature features to encode the spatio-temporal information for convolutional neural network.In the second method,it is carefully devised attention-joints to emphasize the most relevant joints of the body skeleton.These attention-joints are encoded as having spatial distances from the center-of-body,neighboring distances among joints to capture the spatial information.Meanwhile,flow of attention-joints among consecutive frames provides the temporal details.These spatial-temporal details are integrated as attention-node featurevectors to a graph convolutional network,which makes a classification using this information.The third method is formulated by using graph sparsification for skeleton-based action recognition.Long-term spatio-temporal graph contains spatial and temporal information at the same time,but it also inherently involves the redundant information.This redundant information results in over-fitting;therefore,it is proposed graph sparsification using edge-effective resistance modeling to get a sparse graph of fewer nodes and edges.Then it is devised graph convolutional neural network with self-attention graph pooling to emphasize the local graph structure for action classification.The proposed models are evaluated on challenging action recognition datasets,such as J-HMDB,HMDB-51,UCF-101,Stanford-40 Action,PKU-MMD,NTU RGB+D,NTU RGB+D-120,Kinetics-Skeleton and UTD-MHAD datasets.There are variety of videos in above datasets,such as You Tube videos,multiple camera videos and action images;the proposed architectures perform state-of-the-art over contemporary methods.
Keywords/Search Tags:Multiple domain knowledge, Spatio-temporal representation, Residual attention network, Convolutional neural network
PDF Full Text Request
Related items