Font Size: a A A

Action Recognition Method Based On Multi-frequency Spatio-temporal Feature Learning

Posted on:2022-10-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y XiaoFull Text:PDF
GTID:2518306737456534Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,due to the rapid development of deep learning technology and the wide application space of action recognition algorithm,behavior recognition based on deep learning has become one of the research focuses in the field of computational vision.Action recognition is different from image recognition.The key of action recognition is to learn the temporal and spatial features of video,while image recognition mainly focuses on the spatial characteristics of image.Therefore,action recognition puts forward higher requirements for recognition algorithm,which requires learning not only the temporal features but also the spatial features.However,the current action recognition algorithms based on deep learning tend to learn spatial features in the learning of spatiotemporal features,and lack the ability of modeling temporal features.At the present time,the two-stream convolutional neural network has the problem that the optical flow extraction speed is too slow,and the action recognition method based on3 D convolution has a large dependence on the spatial characteristics of the video,and is easily affected by background noise and has poor robustness.To solve these problems,we further analyzes the dual stream convolution neural network and 3D convolution based methods,and absorbs the advantages of the two methods,proposes a multi frequency spatio-temporal feature extraction method with better generalization.Specifically,on the one hand,3D convolution is used to extract spatiotemporal features directly from the video,avoiding the time-consuming process of extracting dense optical flow.On the other hand,it extends the dual stream convolution neural network from the perspective of multiple frequencies to improve the modeling ability of time series features and reduce the dependence on spatial features.The main contributions of this work are as follows:1.Combining the advantages of dual stream convolution neural network and 3D convolution,a multi frequency based spatiotemporal feature learning method is proposed.It avoids the time-consuming optical flow extraction process in dual stream convolution neural network,and alleviates the dependence of 3D convolution on spatial information.On the large-scale moments in time dataset,the accuracy of Top1 and top5 are improved by about 1% and 2% respectively,and the computational complexity of the model is about20 gflops less than that of C3 d model.2.we extends multi frequency spatiotemporal features to skeleton action recognition,and proposes a learning method of multi frequency spatiotemporal features based on graph convolution.It can improve the recognition accuracy of about 5% on ntu-rgbd dataset and Kinetics-skeleton dataset,and surpasses some recent action recognition methods based on graph convolution.3.Combining the respective advantages of the three main types of action recognition methods,proposes a method to understand the spatiotemporal information of the video from the perspective of multiple sampling frequencies,and verifies the generalization of the method on the video datasets and the skeleton datasets and scalability and effectiveness.
Keywords/Search Tags:video action recognition, deep learning, convolutional neural network, multi-frequency spatio-temporal features, skeleton
PDF Full Text Request
Related items