Research On Optimization Technology Of Human Action Recognition In Video

Posted on:2022-10-28

Degree:Master

Type:Thesis

Country:China

Candidate:Q J Zhang

Full Text:PDF

GTID:2518306731452624

Subject:Electronics and Communications Engineering

Abstract/Summary:

PDF Full Text Request

At present,human action recognition technology in video has become one of the research hotspots in the field of Computer Vision.It can be widely used in security,games,sports and other scenes,with high academic and commercial value.The method based on Deep Learning can automatically extract human action features and realize end-to-end train and inference,and has become the mainstream in action recognition technology.In this paper,3D spatiotemporal Residual network based on Deep Learning is used for action recognition,and on this basis,a variety of optimization methods are proposed to enhance the feature extraction ability of the network,reduce the amount of computation of the network,and improve the overall performance.The main research contents are as follows:(1)Aiming at the problems of large amount of computation and limited recognition accuracy of 3D spatiotemporal Residual networks(3D Res Nets),an efficient Two-path 3D spatiotemporal Residual networks(2p-3D Res Nets)is proposed: 1)the high-speed path with more input frames and fewer convolution channels to capture motion information;2)the lowspeed path with fewer input frames and more convolution channels to capture spatial semantic information.Based on the public dataset Kinetics-400,a comparative experiment was carried out on the two networks.The results show that compared with 3D Res Nets,2p-3D Res Nets effectively reduces the amount of computation,and the Top-1 accuracy reaches67.97%.In addition,the recognition results of single path are added in the experimental part to verify the effectiveness of the Two-path structure.(2)After deeply analyzing the structure of multiple Res Nets variants,Res Net V2 and Res Ne Xt are used as the backbone network to enhance the ability of feature extraction and improve the recognition accuracy.Experimental results show that Res Net V2 improves the Top-1 accuracy of2p-3D Res Nets to 69.12%.(3)The Channel Attention Mechanism is introduced,and combined with the time dimension,a Temporality-Channel Attention module(TCA)is proposed,which models the correlation of features in the channel dimension and time dimension respectively,so as to improve the ability of spatiotemporal feature extraction of human action in video,and further improve the recognition accuracy.Experimental results show that after embedding the TCA module,the Top-1 accuracy of 2p-3D Res Nets was further improved to 72.20%.

Keywords/Search Tags:

Action Recognition, Deep Learning, 3D ResNets, Attention Mechanism

PDF Full Text Request

Related items

1	Research On Human Action Recognition Method Based On Deep Learning
2	Studies On Action Recognition In Video Based On Deep Learning
3	Research On Human Action Recognition Method Integrating Visual Attention Mechanism And Deep Learning
4	Action Recognition Based On Two Stream Spatial-Temporal Attention Network
5	Research On Visual Action Recognition Based On Deep Learning
6	Attention Mechanism Based Deep Network For Human Action Recognition In Video
7	Action Recognition And Localization Based On Deep Learning
8	Video Action Recognition Technology Research Based On Deep Learning
9	Research On Human Action Recognition Method Based On Deep Learning
10	Human Skeleton-based Action Recognition Based On Deep Learning