Font Size: a A A

Improved ResNeXt Based Human Action Recognition

Posted on:2021-07-27Degree:MasterType:Thesis
Country:ChinaCandidate:J C LiFull Text:PDF
GTID:2568306632967019Subject:Control engineering
Abstract/Summary:PDF Full Text Request
The rapid development of computer hardware technology drives the continuous development of deep learning technology.The human behavior recognition algorithm based on deep learning technology has important application value in intelligent real-time monitoring,human-computer interaction,video retrieval and other fields.Human motion with high complexity and easily affected by the external environment disturbance and camera shake,enhance the accuracy in human behavior recognition tasks become a challenging research content,and the study of human behavior recognition algorithm has important practical significance,in some public places,under the abnormal behavior of real-time monitoring has an important influence on People’s daily life safety.After in-depth research in the field of human behavior recognition,this paper has done the following work.Firstly,this paper studies two main algorithms in the field of human behavior recognition,Two-stream network and 3D convolutional neural network.After comparing the Two-stream network,considering that it takes a lot of time to extract the optical flow characteristics of the dual-flow network,the speed of the Two-stream network will be relatively slow in some scenes requiring real-time recognition.Finally,based on the 3D convolutional neural network,the thesis chooses to improve the accuracy of human behavior recognition task.Based on the 50-layer ResNeXt,this paper finds that the large number of parameters in the 3D convolutional neural network leads to the difficulty in network optimization,and the large number of parameters also requires higher computational performance.In ResNeXt network,the residual module of spatiotemporal feature combined convolution is used to replace the residual module of 3D convolution.The two networks were trained and tested on two datasets,UCF-101 and HMDB51.The accuracy of ResNeXt network based on 3D convolution in UCF-101 and HMDB51 tests was 88.52%and 46.88%,respectively,while the accuracy of ResNeXt network based on spatiotemporal feature combined convolution was 92.08%and 53.34%,respectively.The accuracy of the test shows that ResNeXt combined convolution based on spatiotemporal features has a significant improvement in the effect of both data sets and reduces the number of network parameters compared with 3D convolution.Deep neural networks are difficult to optimize.To solve this problem,this paper proposes a new pre-and post feature fusion algorithm based on the different characteristics of deep and shallow features,and combines it with ResNeXt network.Finally,on the two data sets of ucf101 and HMDB51,the accuracy of ResNeXt network test based on spatio-temporal feature combined convolution was 93.04%and 55.45%,and the improvement of the final accuracy proved the effectiveness of the algorithm.At the same time,the training speed was also improved to some extent.Finally,the SE-Net algorithm applied to 2D image data was improved and embedded into ResNeXt network,so that it could be applied to 3D video data.By weighting feature channels,SE-Net algorithm could not only improve the optimization speed of the network but also improve the accuracy.Finally,the two algorithms were embedded into ResNeXt network based on spatiotemporal feature combined convolution,and the Kinetics dataset was used for training and preserving model parameters as the pre-training model.After the fine-tuning of the loaded pre-training model,the accuracy of the test on ucf-101 reached 97.96%,73.41%on HMDB51 and 89.18%on NTURGB+D.The network model proposed in this paper almost achieves the highest accuracy in ucf-101 and NTURGB+D data sets,while the effect is slightly worse in HMDB51.The accuracy data of the model on three data sets fully prove the robustness and effect of the model.
Keywords/Search Tags:human behavior recognition, ResNeXt-50, fusion of front and rear features, SE-Net, preliminary training
PDF Full Text Request
Related items