Font Size: a A A

Research On Spatio-Temporal Feature Based Human Action Recognition

Posted on:2021-10-12Degree:MasterType:Thesis
Country:ChinaCandidate:J Q MiaoFull Text:PDF
GTID:2518306308968769Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of artificial intelligence,computer vision draws more and more attention due to its wide application prospects.Human action recognition is a research hotspot in the field of computer vision which uses computers instead of human eyes to recognize human actions from videos.It is widely applied to many scenarios including video understanding,human-computer interaction,intelligent surveillance and so on.Human action recognition mostly adopts RGB videos collected by 2D cameras and skeleton data captured by 3D cameras for recognition which contain rich spatio-temporal features.Hence,how to design an algorithm model to extract these spatio-temporal features sufficiently and accurately is a key to improve the accuracy of human action recognition.specific contents of the thesis are as follows:Aiming at the problem of inaccurate spatial feature extraction caused by complex background and moving camera in RGB video,the thesis proposes a Pose Mask Spatio-temporal Network(PM-STN).In spatial feature extracting,PM-STN uses a Pose Mask to fuse with the original image to focus on the key spatial features of human body which improves the accuracy of feature extraction of the network.In temporal feature extracting,the effects of different temporal network structures on Pose Mask are studied and an architecture with both Convolutional Neural Network and Long Short Time Memory is designed to fully exploit its spatio-temporal feature extraction ability.Experimental results on multiple benchmarks show that PM-STN achieves state-of-the-art performance in human action recognition.In order to solve the problem that the existing 3D skeleton spatio-temporal feature extraction methods are limited to local feature extraction,which leads to the lack of high-level feature representation ability,the thesis proposes a temporal-aware graph convolution network.In terms of spatial feature extraction,the network's ability to extract high-level spatial features is enhanced through an improved global human topology representation.In terms of temporal feature extraction,the thesis introduces a global memory unit which expands the receptive field and selectively extracts temporal features from skeleton sequences to make up for the deficiency of high-level feature extraction.Experiments conducted on the open dataset show that the method achieves higher accuracy compared with the state-of-the-art methods.
Keywords/Search Tags:human action recognition, spatio-temporal feature, graph convolutional, network long short time memory
PDF Full Text Request
Related items