Font Size: a A A

Deep Learning On Video Based Human Action Recognition

Posted on:2021-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:H Y ChenFull Text:PDF
GTID:2518306050971999Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Human Action Recognition is the process of using computer to analyze and process video data automatically,then obtain human action categories.It is one of the most challenging topics in computer vision.The development of electronic technology has greatly enhanced the processing of computers and the capacity of storage devices.The broad application of various high-definition video equipment,which makes human action recognition promising in areas like intelligent security and human-computer interaction,facilitates the action recognition algorithm research.Traditional human action recognition algorithms usually use manually designed methods,which have heavy workload and complicated algorithm design,to extract information.They are prone to interference with occlusion,light,and view of change,leading to low recognition accuracy.The development of deep learning has brought new thinking to the design of human action recognition algorithms.The existing deep learning-based action recognition algorithms mainly focus on three different data forms: video sequence,skeleton sequence,and depth map sequence.Among them,the skeleton sequence avoids more noise interference and describes the pose information of the human body in the video;The video sequence is more universally applicable to collect and promote data.This thesis uses two types of data form,such as the video sequence and skeleton sequence extracted by the pose estimation algorithm,are used to build two types of human action recognition network models,obtaining higher recognition rates.Focused on the skeleton sequence data that can represent pose information,this thesis conducts action recognition research based on pose estimation algorithm.First,Yolo as an object detection algorithm with superior accuracy and detection speed,was used to detect and segment the human body.Second,the parallel multi-resolution network model HRNet was improved,proposing an improved multi-scale fusion attitude estimation algorithm MHRNet.Based on the above operations,it is possible to extract more detailed and accurate human skeleton key point feature information from the segmented human body in the video.Furthermore,two models based on LSTM structure recurrent neural network are designed to complete the learning and training of the skeleton key point coordinate information and temporal information of the video,realizing the human action recognition in the video.The experiment proves that the pose estimation algorithm and the action recognition algorithm designed in this thesis can increase the recognition accuracy based on the public human pose estimation dataset and human action dataset.This thesis focuses on the action recognition of the video sequence that is easier to obtain than the skeleton sequence in practical application.With the two dimensions of space and time in the video sequence,,this thesis designs a segmented fast and slow sampling dual branch network model(MSFNet)based on the Slow Fast network(SFNet),and build models on the two dimensions feature information respectively.Attention mechanism,which is introduced to solve the problems of high background complexity and insufficient motion information of video sequence data,attach different attentions to different frames and different positions of the same frame in the video sequence and help to learn the more effective action recognition.Finally,it is verified on the public human action dataset that the human action recognition model based on Attention mechanism has better recognition result.
Keywords/Search Tags:Action Recognition, Pose Estimation, Deep Neural Network, Attention
PDF Full Text Request
Related items