Font Size: a A A

Research On Optimization Technology Of Human Action Recognition In Video

Posted on:2022-10-28Degree:MasterType:Thesis
Country:ChinaCandidate:Q J ZhangFull Text:PDF
GTID:2518306731452624Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
At present,human action recognition technology in video has become one of the research hotspots in the field of Computer Vision.It can be widely used in security,games,sports and other scenes,with high academic and commercial value.The method based on Deep Learning can automatically extract human action features and realize end-to-end train and inference,and has become the mainstream in action recognition technology.In this paper,3D spatiotemporal Residual network based on Deep Learning is used for action recognition,and on this basis,a variety of optimization methods are proposed to enhance the feature extraction ability of the network,reduce the amount of computation of the network,and improve the overall performance.The main research contents are as follows:(1)Aiming at the problems of large amount of computation and limited recognition accuracy of 3D spatiotemporal Residual networks(3D Res Nets),an efficient Two-path 3D spatiotemporal Residual networks(2p-3D Res Nets)is proposed: 1)the high-speed path with more input frames and fewer convolution channels to capture motion information;2)the lowspeed path with fewer input frames and more convolution channels to capture spatial semantic information.Based on the public dataset Kinetics-400,a comparative experiment was carried out on the two networks.The results show that compared with 3D Res Nets,2p-3D Res Nets effectively reduces the amount of computation,and the Top-1 accuracy reaches67.97%.In addition,the recognition results of single path are added in the experimental part to verify the effectiveness of the Two-path structure.(2)After deeply analyzing the structure of multiple Res Nets variants,Res Net V2 and Res Ne Xt are used as the backbone network to enhance the ability of feature extraction and improve the recognition accuracy.Experimental results show that Res Net V2 improves the Top-1 accuracy of2p-3D Res Nets to 69.12%.(3)The Channel Attention Mechanism is introduced,and combined with the time dimension,a Temporality-Channel Attention module(TCA)is proposed,which models the correlation of features in the channel dimension and time dimension respectively,so as to improve the ability of spatiotemporal feature extraction of human action in video,and further improve the recognition accuracy.Experimental results show that after embedding the TCA module,the Top-1 accuracy of 2p-3D Res Nets was further improved to 72.20%.
Keywords/Search Tags:Action Recognition, Deep Learning, 3D ResNets, Attention Mechanism
PDF Full Text Request
Related items