Font Size: a A A

Research On Video Behavior Recognition Based On Convolutional Neural Network

Posted on:2021-07-04Degree:MasterType:Thesis
Country:ChinaCandidate:R L HuangFull Text:PDF
GTID:2518306047981659Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the coming of 5G network era and the development of Deep Learning and Turing equipment,more and more deep convolutional neural network technology is applied to people's daily life.Although deep neural network has the disadvantages of large amount of training parameters and complex calculation process,we can not ignore its advantages of automatic extraction of implicit feature information,and it is widely used in many fields.Among them,3D convolutional neural network is an end-to-end network model that can directly process continuous video frame data without manual feature extraction,and this network is often used in the field of human behavior recognition based on video data.In this paper,the classic three-dimensional neural network is studied deeply,and according to different needs,the network model is improved,and a variety of improved three-dimensional neural network are proposed.The main work and innovation of this paper are as follows:(1)In order to obtain more representative features,a 3D convolutional neural network model based on deep-level feature fusion is proposed.Not only in the construction of the model,the Relu activation function and BN are used,but also high-level features and low-level features are fused to form a new feature expression;(2)In order to increase the scope of the model receptive field,reduce the ca lculation amount of the model and accelerate the convergence speed of the model,this paper combines the idea of the concept "Inception?Res" structure proposed by Google,and further proposes a three-dimensional neural network model based on multi-scale feature deeply fusion,that is,in the three-dimensional convolution layer of the model,multiple small-dimensional 3D convolution cores are used to replace the large-dimensional 3D convolution cores,adds the residual connection and BN layer are used to pre vent the model from over-fitting;(3)In order to enable the network to have different scale input and extract deeper semantic information,this paper further proposes a 3D convolutional neural network model which integrates multi-level pyramid network and attention mechanism.Multi-level feature fusion and attention mechanism can improve the robustness and recognition accuracy of the traditional 3D convolutional neural network;(4)In order to learn the visual attributes in the video dataset explicitly and refine the similar features,an integrated neural network based on visual attribute enhancement is proposed.Its structure mainly includes three subnetworks: the first sub network is 3D convolutional network based on multi-scale feature deep fusion,the second sub network is 3D convolutional network based on multi-level feature pyramid network and attention mechanism,and the third sub network is convolution network based on visual attribute enhancement,that is,the main network process is to use the mat ure target detection algorithm Faster-RCNN to discover and extract the visual attributes in the video data,then associate the visual attributes with the video action categories,and then input them into the full convolution layer for action classification and recognition.Finally,experiments on UCF-101 data set show the effectiveness of the proposed models.
Keywords/Search Tags:Three-dimensional neural network, Behavior recognition, Feature fusion, Ensemble learning, UCF-101 data set
PDF Full Text Request
Related items