Font Size: a A A

The Research On Video Action Recognition Based On Lightweight 3D Convolutional Neural Network

Posted on:2021-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y ChenFull Text:PDF
GTID:2518306104988379Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the popularity of the Internet and the development of smart cities,video resources are becoming increasingly abundant.Video action recognition has been widely concerned,its application scenarios include video surveillance,video audit and intelligent security.The latest research trend is to use 3D convolutional neural network for video action recognition.However,the extra time dimension greatly increases the computational load of the model,and ultimately it is difficult to apply the terminal equipment.On the other hand,video action recognition is more complicated than image recognition in which it requires the extracted action features to have overall coherence and saliency,which is to meet the intuitive feature of video action.In order to realize video action recognition in resource-constrained scenarios,the following innovative work was completed:(1)A 3D Xwise Separable Convolution is designed and a lightweight 3D convolutional neural network Xwise Net based on the 3D Xwise Separable Convolution is constructed.The main innovation is that the 3D Xwise Separable Convolution is based on the idea of separable convolution.It is a lightweight3 D convolution that extracts features independently on the channel dimension,time dimension and spatial dimension of the video,and compares it with an efficient backbone network framework.By combining the 3D Xwise Separable Convolution with an efficient backbone network framework,a lightweight 3D convolutional neural network Xwise Net is finally obtained.(2)According to the need for temporal global information,the Xwise Net is optimized based on the temporal global context.The specific work is to build a temporal global information module TGC Block and combine it with the Xwise Net to obtain the TGC-Xwise Net,which can establish a global dependency relationship,grasp the overall action state and key action points.Extensive experiments on three classic datasets(Kinetics-part A,Kinetics-part B,KTH)validate the effectiveness of the proposed algorithm in terms of lightweight and high accuracy.On three datasets,compared with most mainstream models,when the accuracy is equivalent,the parameter amount is reduced by more than 54.42%,and the calculation amount is reduced by more than 36.29%;The Xwise Net optimized based on temporal global context improves the accuracy of Kinetics-part A by 4.8%.
Keywords/Search Tags:Lightweight, Deep learning, 3D convolutional neural network, Temporal global context, Action recognition
PDF Full Text Request
Related items