Font Size: a A A

Human Action Recognition Based On Deep Learning

Posted on:2020-11-23Degree:MasterType:Thesis
Country:ChinaCandidate:X Y WangFull Text:PDF
GTID:2428330575998440Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Recently,human action recognition has been extensively utilized in various industry applications,such as video surveillance,human-machine interaction and elderly care.As a result,human action recognition has great research value.The human action recognition task means to recognize the human action class recorded by a video,with the help of recognition algorithms.The algorithms for researching human action recognition can be categorized into traditional algorithms and deep learning algorithms.However,the handcrafted features are indispensable for traditional algorithms,which is very complicated,time-consuming and poor in generality.Comparatively,deep learning algorithms proposed in recent years can automatically extract features,as a result the deep learning algorithms are more accurate and efficient.Nevertheless,some problems still exist in current deep learning algorithms.Especially,the current algorithms haven't made proper use of both low-level spatio-temporal features and high-level temporal features.Besides,the efficient combination of multi-modality data provided by depth video is ignored.In order to address the two problems mentioned above,a further research for human action recognition based on deep learning algorithm is conducted in our work.The mainly innovate works can be summarized as follows:(1)A novel network I3D-LSTM is proposed in this paper for human action recognition,and the network is pretrained on huge video dataset Kinetics.The I3D-LSTM network can efficiently learn low-level spatio-temporal features and high-level temporal features,thus can achieve more accurate performance for human action recognition.After analyzing the advantages and disadvantages of current deep learning algorithms for human action-recognition,we find that the Three-dimensional Convolutional Neural Networks(3D CNN)is more proper for extracting low-level spatio-temporal features within adjacent frames,whereas Long Short-Term Memory network(LSTM)is better for high-level temporal modelling task.Besides,the current models are generally pretrained on huge image dataset ImageNet,which is very unreasonable for human action recognition algorithms.(2)We also propose a new I3D-GRU network for human action recognition,which further improves the recognition accuracy.I3D-GRU network is an advanced network based on I3D-LSTM network.The Gated Recurrent Unit network(GRU)utilized in I3D-GRU is a variation of LSTM network.What's more,the GRU network shares the same ability of sequence modelling as LSTM network but it has less parameters.As a result,the I3D-GRU network can efficiently avoid the problem of overfitting.And I3D-GRU can achieve better performance on UCF-101 dataset compared with I3D-LSTM network.(3)We propose an efficient multi-stream network for human action recognition on RGB-D depth video dataset.The current RGB-D depth video dataset provide three modalities as follows:the depth maps sequence,the skeleton joint data and RGB video.With consideration of the advantages and disadvantages of these three modalities,we select three deep neural network models which are suitable for their features extraction separately.And then we utilize fusion mechanism to fuse these three single modality network model.Besides,we also conduct a research of different fusion mechanism,including feature fusion mechanism and score-level fusion mechanism.Finally,the best fusion strategy that can make multi-stream network achieve best performance is found.
Keywords/Search Tags:Human Action Recognition, Deep Learning, Convolution Neural Network, Recurrent Neural Network, Multi-stream Network
PDF Full Text Request
Related items