Font Size: a A A

Action Recognition Based On Graph Convolutional Networks

Posted on:2021-11-24Degree:MasterType:Thesis
Country:ChinaCandidate:R LiuFull Text:PDF
GTID:2518306512987739Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Human motion recognition refers to the process of marking the corresponding action category to a video sequence that contains human motion.The uncertainty and complexity of human behavior make the behavior recognition difficult.As a key issue in understanding human behavior,human action recognition has broad application prospects in many applications,such as human-computer interaction,video retrieval,video surveillance,and smart home.The advantage of skeleton data is that it is robust to affine transformation,illumination and environment changes,and perspective changes.With the development of depth sensing equipment,3D skeleton data are widely concerned by researchers.Here we take the skeleton-based action recognition problem as the breakthrough point,combine deep learning methods and graph model theory,and conduct in-depth research.This thesis focus on several issues such as effective and fast spatial feature extraction of data and difficulty in modeling similar trajectories.This work also designs and implements an action recognition system.Specifically,the work here mainly includes the following aspects:(1)We propose a structure-induced graph convolution action recognition model for robust spatial features extraction.In this work,the skeleton data is first structured into a skeleton graph.Then,in order to represent skeletal graph of discrete joints effectively,we specifically design a structured graph convolution module to encode partitioned body parts as well as their interactions.Concretely,according to natural divisions of body,a collection of structureinduced intra-part graphs representing specific human body parts are defined,and on this basis,intra-part graph convolution is performed to obtain the internal characteristics of human body parts.Then inter-part graph can be built to model the correlation between different human body parts,and we perform inter-graph convolution operation to achieve this function.After integrating the intra-and inter-levels of spatial context/coordinates cues,a convolution filtering process is imposed on time slices so that temporal dynamics of human motion can be captured.And we train and test the model in an end-to-end fashion.(2)We propose a dual-stream structured graph convolution network to model the skeleton and RGB dual-modal video into a unified graph convolution framework.Taking skeleton joints as reference flow,the apparent features of image blocks at human joints in RGB video are extracted,and the appearance graph corresponding to the skeleton graph is constructed.Then the graph convolution network with the same architecture is used to extract the high-level skeleton motion features and appearance features.Finally,the high-level features of the two modalities are effectively fused through the dualmodal fusion layer with complementary advantages,and the action category of the sample is predicted in an end-to-end manner.(3)Based on the above work,we design and implement an action recognition system.The system can use visualization module to visualize skeleton data and RGB videos.The corresponding models will be called according to different types of input data to predict the category of samples.According to the needs of single sample action recognition and display and batch samples recognition,different functional modules are designed and implemented.For the action recognition models proposed in work(1)and(2),this thesis conducts detailed experiments and evaluations on four public skeleton action recognition datasets.Compared with the existing methods,we have achieved the state-of-the-art performance,which verifies the effectiveness of the proposed models.
Keywords/Search Tags:skeleton-based action recognition, graph convolutional network, deep learning, multi-modal model fusion
PDF Full Text Request
Related items