Font Size: a A A

Multi-stream Slow Fast Graph Convolutional Networks For Skeleton-based Action Recognition

Posted on:2022-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:L LengFull Text:PDF
GTID:2518306557969029Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
As an extremely important part of computer vision,action recognition has been studied for decades.The method based on RGB image sequence is susceptible to the influence of complex background,while the skeleton sequence has the advantages of small amount of data and not easy to interfere.At the same time,with the popularity of depth cameras and the development of highperformance human pose estimation algorithms,it is becoming more and more convenient to obtain accurate human skeleton sequence data.In recent years,many researches have used Graph Convolutional Networks(GCN)to model the time and space of human skeleton sequences,and successfully identified the action categories it contains based on the above-mentioned features.In the spatial dimension,skeleton sequence can use a few joint points to refine the current posture of the human body;but in the time dimension,skeleton sequence data still contains a lot of redundant information.To this end,this article is inspired by the idea of Slow Fast network modeling in RGB video data,and samples the skeleton data at different rates in the time dimension,and proposes a multi-stream Slow Fast graph convolution network to improve the accuracy of action recognition based on skeleton sequence rate.The main work of this paper is as follows:(1)Investigate and analyze commonly used action recognition algorithms,respectively introduce the research status of RGB video-based methods and skeleton-based methods,and review the mainstream deep learning action recognition algorithms based on skeleton sequences.Now,analyze the pros and cons of existing algorithms;(2)A Slow Fast graph convolutional network is proposed.High frame rate and small interval sampling data are input to the slow network to strengthen the spatial semantics of the extraction behavior;low frame rate and large interval sampling data are input to the fast network to strengthen the temporal semantics of the extraction behavior.The two aggregate data through a side connection.Since the amount of input data of the fast network and the number of channels of the slow network have dropped significantly,the multi-stream Slow Fast graph convolutional network can obtain better spatiotemporal feature extraction capabilities than traditional GCN methods under the premise of greatly reducing the amount of calculation.(3)In addition to learning the first-order information of the joint points in the skeleton data,the Slow Fast graph convolutional network is extended to learn the second-order information of the joint points in the skeleton data in space and time,and the first-order sum of the bone edge data Secondorder information.A multi-stream structure including 6 fast and slow graph convolutional networks is formed,which is called Multi-Stream Slow Fast Graph Convolutional Networks(MSSF-GCN).In order to further strengthen the spatiotemporal feature extraction capability,three attention mechanisms of channel,time and space are also embedded in MSSF-GCN.Finally,extensive experiments are conducted to evaluate the proposed method on three skeletonbased action recognition databases including NTU RGB+D,NTU RGB+D 120,and SkeletonKinetics.The results show that the proposed method is effective for skeleton-based action recognition and can achieve the recognition accuracy with an obvious advantage in comparison with the state-ofthe-art.
Keywords/Search Tags:Action recognition, Graph convolutional network, Human skeleton, Slow Fast network, Attention mechanism
PDF Full Text Request
Related items