Font Size: a A A

Research On Action Recognition Based On 3D Convolutional Neural Network

Posted on:2020-10-22Degree:MasterType:Thesis
Country:ChinaCandidate:S W ZhouFull Text:PDF
GTID:2518306548993459Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the development of internet technology and the popularity of smart devices such as mobile phones,internet video is growing at an explosive rate every day.The analysis and identification of the internet video plays a vital role in the classification,storage and retrieval of video.Action recognition based on video content is the key task in the field of video analysis,because it has broad application prospects such as internet anomaly video detection,intelligent monitoring,human-computer interaction and military fields.Effective extraction of spatiotemporal information in video is the most critical issue in video action recognition tasks.3D convolutional neural network is the main method in the field of video action recognition because it can extract video dynamic features effectively.However,the 3D convolutional neural network has two serious defects:excessive parameter quantity and poor recognition effect.Based on the deep learning method,this paper constructs a 3D residual convolutional neural network framework,which can effectively extract video dynamic features for video action recognition,and mainly studied how to effectively reduce the amount of 3D convolutional neural network parameters and how to effectively improve the recognition accuracy of 3D convolutional neural network.The main research work of this paper includes the following three aspects:(1)Based on the classical and effective residual convolutional neural network in the image domain,this paper constructed a deep 3D residual convolutional neural network by introducing 3D convolution and 3D pooling.Compared with the shallow 3D convolutional neural network,the 3D residual convolutional neural network constructed in this paper can better extract the video action feature and can solve the video action recognition task more effectively.(2)For the problem that the parameters of 3D convolutional neural network are too large and it is prone to over-fitting,this paper proposes to introduce depthwise separable convolution into it to reduce the amount of model parameters.Specifically,this paper select the 3D residual network as the basic model,and constructed a 3D residual neural network based on depth separable convolution.For comparison,a 3D residual neural network based on group convolution and a 3D residual neural network based on group convolution and depthwise separable convolution are also constructed.Experimental results show that,the introduction of depthwise separable convolution is the most effective method to reduce model parameter under the condition of maintaining the performance of 3D convolutional neural networks.(3)For the problem that 3D convolutional neural network can't extract the key information of video effectively,this paper firstly designs a channel attention module and a spatial attention module,which can add different weights to different channels or different spaces in the video feature,extracting key channel information or key space information more efficiently.On this basis,this paper constructs a global attention module,which can add different weight information to the information in any dimension of the video feature,so that the 3D convolutional neural network can extract the key information that determines the video action category more effectively.The related experimental results also verify the importance of the global attention module for the video key feature extraction of 3D convolutional neural networks...
Keywords/Search Tags:action recognition, deep learning, 3D convolutional neural network, depth separable convolution, attention mechanism
PDF Full Text Request
Related items