Font Size: a A A

Video Action Recognition And Analysis Based On Deep Learning

Posted on:2022-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:L M JinFull Text:PDF
GTID:2518306572959869Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the popularity of intelligent devices and the rapid development of information technology,video has become an irreplaceable information carrier in life.In order to combat the explosive growth of the number of videos,understanding and analyzing the video content,and extending it to the real scene,will help to improve people's production and living standards.In this paper,we design and implement a video action recognition model based on deep learning for dangerous behaviors in parks,parking lots and other public places.It adds the function of automatic recognition and feedback of dangerous behaviors to the monitoring equipment in parks,parking lots and other public places,and provides an intelligent auxiliary solution for the monitoring personnel.The video action recognition model proposed in this paper is a general two person interactive recognition model.With the help of different data sets in specific scenes,the model proposed in this paper can be extended to other occasions,such as intelligent recognition of abnormal behavior of prisoners in prison,and assisting understanding of students' whispering in the classroom.In this paper,the model is trained on the UT-interaction data set,in which the "Punching","Pushing" and "Kicking" behaviors are defined as dangerous behaviors.When the model identifies such behaviors,it will give some feedback information.In order to avoid the over fitting problem caused by the limited size and diversity of the data set,this paper proposes a series of training strategies,such as expanding the sample set by data enhancement technology,increasing the number of training samples,and improving the quality of training samples;Dropout and BN technology are introduced into the model;Cross data set,cross task and cross modal pre training methods are used to provide valuable and stable weight initialization for the model.Firstly,this paper proposes the Recurrent Pose Estimation Model Based On Attention,and innovatively introduce the human pose estimation model to extract the spatial information of the key points and limbs of the human pose.Then,combined with Bi LSTM model and Attention,the attention weight is automatically assigned according to the contribution of video frames to the recognition results,and the temporal information between frames is modeled.Experiments show that the human pose estimation model,Bi LSTM model and Attention can improve the performance of the model,prove the effectiveness of the Recurrent Pose Estimation Model Based On Attention in the task of action recognition,and prove the role of over fitting optimization strategy in solving the over fitting problem caused by small data set.Although the Recurrent Pose Estimation Model Based On Attention can extract the temporal and spatial information of video,it focuses on capturing the rough and long-term time structure.In order to retain the low-level temporal correlation,this paper proposes the Optical Flow Prediction Model,which introduces optical flow to explicitly encode video motion information,The structure of deep convolution neural network and the value range of optical flow field are adjusted to adapt to the learning of optical flow data.Finally,this paper uses aggregation function to combine the Recurrent Pose Estimation Model Based On Attention with the Optical Flow Prediction Model.The experimental results show that the two sub models can learn complementary information and improve the recognition accuracy of the joint model.
Keywords/Search Tags:Video action recognition, Human pose estimation, Recurrent Neural Networkk, Optical flow
PDF Full Text Request
Related items