Font Size: a A A

Deep Feature Modeling For Human Action Recognition And Detection

Posted on:2020-05-16Degree:MasterType:Thesis
Country:ChinaCandidate:M Q LiFull Text:PDF
GTID:2428330602452354Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
Action recognition has developed rapidly in recent years and become one of the most popular research in computer vision.The purpose of action recognition is to analyze and understand video content by learning various network models,and eventually to classify the action of targets automatically.It is widely used in autonomous driving,robots,unmanned retail,security monitoring and highlights video collection in all kinds of sports and so many others.It is also a research direction of multi-disciplinary integration,covering computer vision,artificial intelligence as well as pattern recognition.However,due to the complexity of background in videos,external factors such as light transformation and moving of camera,and internal factors such as multiple deformations of human movement,the video-based action recognition task has become a very challenging research content.The traditional action recognition algorithm mainly relies on the hand-craft features,which needs a large amount of computation and is low-efficient.In recent years,the action recognition algorithms based on deep learning have become the mainstream,most of which use the two-stream structure to learn spatial-temporal features.However,the deep network models also have some limitations,due to the semantic problem of optical flow information and the redundancy of data.This paper focuses on using convolutional neural networks without additional motion information and Long Short-Term Memory network to encode discriminative representations in the video,designing fast-trained,well-generalized structure to analyze and understand the body movement in the video.The main contributions of this paper are as follows:(1)In this paper,we propose a deep algorithm for action recognition based on discriminative information learning.First,we take note from context driven operation to study the contribution weight between different frames to extract the discriminative information,which helps the network to be trained in an end-to-end fashion.Secondly,we take advantage of the bi-directional LSTM,using both previous and future context information,replacing the Two-Stream network to perform long term modeling and to deduce the temporal relationship between the global discriminative information.Finally,a temporal relation inference network is proposed to infer the relationship between local adjacent information by simulating the human brain's inference mechanism.To the best of our knowledge,the proposed method provides new state-of-the-art performance on two popular action recognition benchmarks,which is 95.8% on UCF101 and 72.0% on HMDB51,respectively.(2)We extend the action recognition task to sequential action detection,that is,to identify and locate actions in uncut video data.Based on the two-stage structure in the detection task,we propose a hierarchical feature network for sequential action detection,which can be divided into two parts: a)the action classification network,using the deep residual network(Resnet-101)to build the two-stream network.It takes video image frames and stacked optical flows as inputs respectively to learn the motion score of the corresponding frames and to generate original action proposals based on the score.b)the coordinate regression network,which coarsely divides original proposals generated in the classification network into fixed scale units,constructs feature pyramids using unit features and refines the bounding boxes of final proposals using temporal coordinate regression.
Keywords/Search Tags:Deep Learning, Convolutional Neural Network, Recurrent Neural Network, Action Recognition, Temporal Action Detection
PDF Full Text Request
Related items