Font Size: a A A

Human Action Detection Research Based On Convolutional Neural Network

Posted on:2019-08-20Degree:MasterType:Thesis
Country:ChinaCandidate:D Y ZhouFull Text:PDF
GTID:2428330545498917Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the introduction of high-definition video surveillance,intelligent monitoring systems based on human action detection technology have developed rapidly in smart cities,military security,and smart homes.At the same time,with the popularization of intelligent terminals and the development of mobile communication networks,a large number of short videos have rapidly emerged.This urgently requires the understanding of video content in order to better retrieve,classify,and review videos,and the subject of video is human action.The huge application prospect and economic value make human action detection quickly become a research hotspot in the field of computer vision.The traditional human action detection algorithm needs to design feature engineering according to specific actions,and the workload is huge and the robustness is not high.In this thesis,the convolutional neural network(CNN)is used to design specific network structures for short videos and uncut long videos,respectively,to improve the robustness,accuracy and practicality of the algorithm.For short videos,learn from object detection algorithm,this thesis proposes a human detection method with object detection and dynamic linking algorithm.In order to improve the detection accuracy,sequential frames are used as input to extract the timing information of the video,and the spatiotemporal fusion algorithm is used to obtain more robust features.Then design an effective dynamic linking algorithm to obtain human action sequences from the results of object detection.Finally,network training,verification,and comparison with previous research work on multiple human action data sets.The experiment verifies the validity of the object detection plus dynamic linking algorithm,while the sequential frames input and spatiotemporal fusion further improve the accuracy.For uncut long videos,a three-dimensional convolutional neural network with recurrent neural network(RNN)is proposed.Firstly,three-dimensional convolution is used to encode low-level features of the video,and then the cyclic memory module is designed to further extract temporal feature.Finally,action detection is implemented through the detection part.In the circular memory part,two parallel semantic constraint modules P(Proposal)and C(Classification)are designed.Through refined loss function design,video segment proposal and classification tasks are realized respectively.During the training,the weights of the semantic constrained part of the loss function are dynamically adjusted to speed up the training and improve the accuracy.Experiments show that compared with previous studies,the accuracy rate issignificantly improved.This shows the effectiveness of the proposed program,and also make the human action detection to a practical step forward.
Keywords/Search Tags:human action detection, object detection, Convolutional Neural Network, spatiotemporal feature, recurrent memory module
PDF Full Text Request
Related items