Font Size: a A A

Application Research On Video Action Recognition Based On Deep Neural Networks

Posted on:2020-09-27Degree:MasterType:Thesis
Country:ChinaCandidate:Q HeFull Text:PDF
GTID:2428330596475102Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Human action recognition in video has long been an important field in computer vision.In recent years,it has attracted extensive attention in academia and industry,and is the basis of video surveillance,human-computer interaction and other applications.Before the large-scale appliance of deep neural networks to action behavior recognition,traditional methods require experienced experts to design and extract target features from the video manually.With the rapid improvement of computer computing power and the emergence of a large number of training data sets,deep neural network learning algorithms have achieved remarkable achievements in various fields such as image recognition and speech recognition.Deep neural networks have powerful self-learning capabilities and can theoretically fit any function that automatically extracts valid information from the data.After successfully applied deep neural networks to the image classification,it has been gradually extended to the study of video action recognition with temporal correlation.This thesis studied the deep neural network model applied to video action recognition using the public data sets.The attention mechanism was studied and the temporal information was modeled using convolutional neural networks.Following are the works in this thesis:(1)Data preprocessing for different public data sets to establish a unified training and testing process.(2)The pretrained deep neural network model applied to image classification is used to extract the features of video frames,and the spatial pyramid pooling method is adopted to adapt to different sizes of video input.(3)A model based on the improved attention mechanism named VideoAttn is proposed,the temporal attention mechanism is added on the basis of the original spatial attention.To correctly classify the human actions in videos,the action context information contained in the video frame sequence is encoded by the temporal attention mechanism.Through experimental verification on the public dataset,the VideoAttn model's spatiotemporal attention mechanism selectively focuses on different regions of the frame image and video frames related to human behavior in the video.(4)A model consisting entirely of convolutional layers named VideoCNN is proposed,which has the advantages of fast calculation speed and small parameter space.Commonly used deep models for video temporal modeling mostly use recurrent neural networks such as LSTM,GRU,and so on.Recurrent neural networks have problems such as huge computational complexity and slow calculation speed.Such networks also have the insufficiency of encoding all the temporal information for long sequences.Secondly,during the training process,the recurrent neural network will also suffers from gradient vanish or explosion problems,this phenomenon poses a great challenge to our training.Through the experimental verification on the public data sets,the VideoCNN model proposed in this paper has good training and reasoning efficiency.
Keywords/Search Tags:action recognition, deep neural networks, feature extraction, attention mechanism
PDF Full Text Request
Related items