Font Size: a A A

Video Human Behavior Recognition Based On Spatio-temporal Convolutional Neural Network

Posted on:2021-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y QinFull Text:PDF
GTID:2428330614953860Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The development of the Internet has exploded video data,and effectively processing and analyzing massive amounts of video data has become an important task.Making full use of video data can be applied to video security monitoring,video retrieval and classification,human-computer interaction,and unmanned driving.Video behavior recognition has very important research significance,and has achieved many research results at home and abroad.There are two major categories of video-based behavior classification methods: traditional methods and deep learning methods.However,the traditional artificial feature methods faced hundreds of thousands of massive video data with many limitations.With the continuous improvement of computer hardware,deep learning methods have been widely used and researched in the field.Convolutional neural network has good performance in processing image classification,but it is not suitable for video data processing with the addition of time series.The method based on the dual-flow neural network extracts the temporal and spatial features separately.The corresponding pixel relationship is not learned between the two networks,and action recognition clues are easily lost.And because of the limitation of time scale,it is impossible to recognize long-scale video.The C3 D network based on 3D convolution directly extracts the spatio-temporal features,and the calculation speed is fast,which meets the requirements of video recognition,but a large number of parameters and calculations make the network difficult to train.In view of the above problems,this paper proposes a spatiotemporal convolutional neural network video human recognition algorithm.In order to solve the problem that the dual-channel network cannot use the spatiotemporal features,this paper improves the dual-flow neural network,weights the fusion of the feature map extracted by the convolution calculation of the time-domain network and the spatial-domain network,and discusses the influence of the fusion location through experimental analysis.In order to further improve the accuracy,this paper combines the designed dual-stream fusion algorithm with the R(2+1)D algorithm to propose a spatiotemporal convolutional neural network(Spatiotemporal-r(2+1)d).R(2+1)D is an improved algorithm based on C3 D.The 3D convolution kernel is factorized into 2D convolution kernel and 1D convolution kernel.The added Res Net residual learning structure reduces the amount of calculation and parameters,and improves the deep network.The gradient disappears.Randomly sampled video frames and stacked optical flow maps are input to the dual-stream neural network,and the features extracted by the dual channels are adjusted and fused to obtain middle-level semantic features.Input to the improved R(2 + 1)D convolution block to adapt the feature dimensions of the dual stream fusion output.Spatio-temporal modeling,extract and train spatio-temporal features,and finally complete behavior classification through the softmax layer.In this paper,the proposed algorithm is trained and tested on the public UCF-101 dataset and HMDB-51 dataset.Compared with some existing classical behavior recognition algorithms,it obtains better accuracy,indicating the effectiveness of the method in this paper.
Keywords/Search Tags:video behavior recognition, deep learning, convolutional neural network, ResNet, R(2+1) D
PDF Full Text Request
Related items