Font Size: a A A

Research On Human Action Recognition Based On Convolutional Neural Network

Posted on:2022-05-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y HuangFull Text:PDF
GTID:2518306353479874Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Recent years,human action recognition in video has received more and more research in the field of massive intelligent analysis.Research on human action recognition plays a very important role in establishing smarter monitoring systems and more natural human-computer interaction systems.It has broad application prospects in the fields of automatic video classification and video recommendation.Although traditional human action recognition performs well in some specific occasions,it requires manual design and extraction of features,and is greatly affected by the individual and the surrounding environment.Convolutional neural networks can automatically extract features from low-level to high-level.Less interference by individuals and environmental factors,and better generalization ability.Therefore,the research on human action recognition in this article uses methods based on convolutional neural networks..The specific research content as follows:(1)Preprocess the UCF-101 data set image,first decompose the video data in UCF-101 into image frames,and then use the SSD network to calibrate the human body in the image.According to the calibrated human body position,the image in the data set Carry out tailoring and expansion work.(2)For the C3 D convolutional neural network used for action recognition,the network structure of the network is relatively shallow,and the input image resolution is too low,which will cause insufficient feature learning,and the 3D convolution kernel has too many parameters.Overfitting is more likely to occur during training.This paper uses the(2+1)D convolution method of space-time separation to improve the 3D convolution method,and at the same time deepens the depth of the original C3 D network,also adds a batch normalization layer after each group of convolution layers,and redesigns the network structure.Through experimental verification,this network has a better performance than the original network in recognition accuracy.(3)For the Two-stream network used for action recognition,the network structure adopted by the spatial stream and the temporal stream network is VGG-16.VGG-16 network structure is shallow and the feature extraction is not sufficient.And the network only merges the prediction results of the spatial stream and temporal stream convolutional neural network with the average value,failing to take into account the correlation characteristics of spatiotemporal information.Aiming at the above shortcomings,this paper uses Res Net-34 with a deeper network structure to improve the original network.Experiments have been carried out to verify that the change of the network structure can indeed slightly improve the recognition accuracy.Then this paper studies the location and method of spatiotemporal information fusion.Finally,the best fusion method and fusion location are determined through experiments,and the network structure of the spatiotemporal fusion convolutional neural network proposed in this paper is determined.Experiments prove that the network recognition rate of this article has a better performance.
Keywords/Search Tags:Convolutional Neural Network, Human Action Recognition, 3D Convolutional Neural Network, Residual Neural Network, Spatio-temporal Fusion
PDF Full Text Request
Related items