Font Size: a A A

Research Of Human Action Recognition Based On Deep Learning

Posted on:2020-09-14Degree:MasterType:Thesis
Country:ChinaCandidate:J Y ZhengFull Text:PDF
GTID:2428330572485653Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of computer hardware and software,computer vision has received extensive attention in various fields.Video-based human action recognition is a research hotspot and difficulty in the field of computer vision.It plays an important role in intelligent video surveillance and action analysis.In the field of human action recognition based on video,because of the differences in action performance,environment and time,it is more difficult to extract the effective features in video.Especially for the real scene video data with complex background and large changes,the recognition accuracy of all kinds of algorithms is not high.On the other hand,driven by the development of deep learning,some tasks in the field of computer vision have also made great progress.Therefore,this paper focuses on the following aspects of work under the framework of deep learning:1.Research the GoogleNet network which has achieved good results in major competitions.The structure of GoogleNet network is applied to the task of human action recognition based on video.Two-stream network structure is used to process the spatial and temporal information of human action video respectively.Then the spatial and temporal network channels are classified by softmax and then fused with classification scores to realize the action category recognition of human action video.Finally,the comparison proves that the recognition accuracy of GoogleNet network is better than that of two-stream convolution neural network.2.Research inception network structure in GoogleNet.The 55? convolution core in traditional inception structure is replaced by two 33? convolution cores in serial.The 33? convolution core is divided into 31? and 13? convolution cores in serial,which reduces network parameters and speeds up network convergence.The batch normalization process is introduced before and after convolution operation in the improved inception network structure,which effectively improves the problem of low recognition accuracy caused by the change of feature information in the back layer of the network caused by too deep layers.3.Aiming at the problems of the two-stream network structure and the incomplete extraction of video temporal feature information by 2D convolution processing,the excessive parameters brought by 3D convolution for human actionrecognition,and the poor network performance,a human action recognition network based on pseudo-three-dimensional residual network is proposed.Pseudo-three-dimensional convolution can not only extract video feature information completely,but also reduce network parameters by multiple comparisons with 3D convolution.Residual network deepens the network on the basis of preventing network over-fitting.The fusion of pseudo-three-dimensional convolution network and residual network for human action recognition not only improves the recognition accuracy,but also guarantees the real-time performance of the algorithm.
Keywords/Search Tags:action recognition, deep learning, convolutional neural network, residual network
PDF Full Text Request
Related items