Font Size: a A A

Research On Human Action Recognition Method Based On Deep Learning

Posted on:2021-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:S N JiangFull Text:PDF
GTID:2428330602999691Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Human action recognition is an important research direction in the field of computer vision,and has important application value in many fields such as intelligent security,human-computer interaction,intelligent medical treatment,and video retrieval.With the development of deep learning and the arrival of the era of big data,human action recognition has ushered in new opportunities.Human action recognition methods have transitioned from traditional methods based on manual feature selection to current end-to-end methods based on deep learning.This article focuses on human action recognition methods based on two-stream deep neural networks.The main work is as follows:First,a method of human action recognition based on ResNeXt two-stream network model is proposed.In order to further improve the accuracy of the current commonly used action recognition based on VGGNet,Inception or ResNet two-stream network models,this thesis proposes to use the new ResNeXt network structure to replace the original various convolutional neural network structures to build a two-stream network model for humans action recognition in video data.First,in order to extract richer spatiotemporal features,this thesis uses RGB and optical flow modal data to enable the model to recognize the motion appearance and timing information in the video to complement the motion.Secondly,this thesis applies the idea of end-to-end video time segmentation network(TSN)to the proposed ResNeXt network model.By dividing the video into K segments,the long-range time structure of the video sequence is modeled,and the best results are obtained through testing.The video segmentation value K enables the model to better distinguish similar actions that share sub-actions and solve some misjudgment problems that occur due to similar sub-actions.In addition,we apply data enhancement methods to increase the diversity of samples and make the network more generalized and more robust.Experimental results on the UCF101 and HMDB51 datasets show that the proposedmethod's action recognition accuracy is better than the performance of mainstream action recognition models and methods in the current literature.Secondly,a human action recognition method based on motion energy guided video segmentation extraction for multi-modal two-stream network model is proposed.In order to make full use of the motion information acquired by multiple sensors and achieve more efficient video input frame selection,thereby improving the recognition ability of the model,this paper uses multi-modal data information to improve the TSN model algorithm,and proposes a motion energy-based the method of video segmentation and extraction uses the motion energy information captured from the depth data as the basis for guiding the video frame time sampling,and sends the sampled video frames to the ResNeXt network for classification and recognition.In addition,the constructed two-stream network model uses the multi-modal information obtained by both the depth and RGB sensors as the model input,and uses the adaptive multi-stream fusion method to effectively fuse the information of the two modal data to form the final action classification.result.The experimental test results on the NTU RGB + D data set show the effectiveness of the proposed algorithm.
Keywords/Search Tags:Human Action Recongnition, Two-stream Netmork, ResNeXt, Long-range Temporal Structure, Motion Energy, Adaptive Fusion
PDF Full Text Request
Related items