Font Size: a A A

Research On Visual Human Action Recognition Based On Deep Learning

Posted on:2022-03-12Degree:MasterType:Thesis
Country:ChinaCandidate:Z AnFull Text:PDF
GTID:2568306488979239Subject:Control Science and Engineering
Abstract/Summary:
As a hot topic in the research of artificial intelligence and computer vision,human action recognition has high application value in the fields of security,assisted driving,intelligent human-machine interface,and social video recommendation.However,there are many problems in the real scene,such as the complex background,the difference of the object’s appearance and the diversity of the action categories,which make the action recognition still face challenges.In order to solve the problems of how to efficiently and accurately extract human action characteristics and how to accurately distinguish similar actions,the visual human action recognition based on deep learning is studied.The main work and results are as follows:In order to solve the problem that similar actions are difficult to identify,an algorithm based on two-stream pyramid convolution network is proposed.Firstly,the video is preprocessed to obtain the frames,and the optical flow image containing motion information is obtained by calculating the optical flow between two frames.Then,a temporal segment network based on pyramid convolution is designed,which takes RGB frames and the optical flow images as input respectively to extract the apparent and motion characteristics.Finally,the output scores of the two branches are fused by weighted average,and the recognition results are obtained by Softmax.Aiming at the problem that the traditional two-stream network cannot capture the temporal relationship in video sequence,which leads to the poor performance of temporal-dependent action recognition,a method based on a temporal modeling two-stream spatiotemporal network is proposed.Firstly,a convolutional neural network is used to model the temporal relationship by using the temporal shift idea to efficiently capture the spatiotemporal information in the video.Further,an attention mechanism is used to improve the downward learning ability of spatial features caused by the movement of channel information in the temporal dimension.On this basis,a two-stream network is designed that includes spatiotemporal apparent information flow and spatiotemporal motion information flow.Finally,the weighted average method is used to fuse the two streams to obtain the final result.The results on UCF101 and HMDB51 show that the algorithm based on two-stream pyramid convolution network can extract more discriminative action features and improve the accuracy of approximate actions.The algorithm based on temporal modeling two-stream spatiotemporal network can efficiently capture the spatiotemporal information in the video and improve ability to recognize the temporal-dependent actions.
Keywords/Search Tags:Action recognition, Deep learning, Two stream network, Convolutional neural network, Res Net
Related items