Font Size: a A A

Research On Some Problems Of Image Sequence Recognition Based On Recurrent Neural Networks

Posted on:2021-03-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:D LiuFull Text:PDF
GTID:1368330647460721Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the popularity of video surveillance devices and smart mobile devices,video data explodes in areas such as security and entertainment.It has become an important part of building “Smart City” to utilize artificial intelligence(AI)technology to understand video data.As an important branch of video analysis technology,image sequence recog-nition(ISR)is a hot topic in computer vision,which has a wide range of applications in human-robot interaction(HRI)?intelligent monitoring and autonomous driving,etc.With the development of deep learning,RNN-based ISR methods have achieved remarkable results.However,it is still challenging to learn the discriminative representations from image sequences with some limitations,e.g.,appearance variation,background changes,poor quality.This dissertation focuses on RNN-based image sequence recognition.First,solving “Who is it” in videos with the gait recognition method(work 1).Then,recogniz-ing the human behaviors with the action recognition methods from three aspects(work 2,3,4),which solves “what did he do” in videos.The four related works and contributions are shown as follows:(1)Existing gait recognition is limited by complex background,shooting difference,hence often yielding sub-optimal performance.The third chapter presents a memory-based gait recognition algorithm.Firstly,in order to transfer the existed articulated hu-man detection model for gait joint extraction,we manually label a small amount of hu-man gait joint positions.And then we fine-tune the model parameters so that the human pose estimation model can produce the 2D joint information from gait image sequences.Inspired by the mechanism of brain sequence processing,we utilize the memory neural network(LSTM)to learn and recognize human gaits.Finally,the experimental results on two available gait datasets with variate appearance and views validate the feasibility and effectiveness of the proposed memory-based gait recognition method.(2)The continuous movement of human body key positions can depict a variety of complex actions.Conventional methods usually construct classifiers with hand-crafted or the learned features to recognize human actions.Different from constructing a direct action classifier for the action recognition task,this research attempts to identify human ac-tions based on the development trends of behavior sequences.The fourth chapter proposes a sequence prediction learning based action recognition algorithm.Firstly,we construct an action predictor for each kind of activity by RNN.These action predictors can predict the action trends of the next time step given input sequence.Then,according to the prediction outputs of action predictors and the removal rule,the poor predictors will be eliminated step by step.Finally,the IDentity(ID)of the left predictor is considered as the label of the action sequence to be classified.The comprehensive experiments are conducted on one-person action and two-person interaction datasets and verify the effectiveness of the proposed method and importance of prediction learning for action recognition.(3)Action recognition methods in video are often subject to various interference in-formation so that they have poor performance in classifying action correctly.In order to selectively focus on important information and learn the distinguished action represen-tations,the fifth chapter suggests a 3D-CBAM-based spatiotemporal-stream model for action recognition in videos.Specifically,we first extract the intra-frame spatial features and inter-frame optical flow features based on the pre-trained deep model for each video action.Then,we implement an effective 3D attention module,which sequentially infers attention maps along three separate dimensions: channel,spatial,and temporal.After adaptive feature refinement based on the attention maps,the temporal pooling process is performed to squeeze the temporal dimension.Finally,we learn the comprehensive spatiotemporal representations by the RNN-based two-stream network to recognize the action sequences.Additionally,we also collect and construct a new Ping-Pong action dataset for subsequent human-robot interaction tasks from You Tube.The proposed 3D-CBAM-based two-stream network obtains the competitive results on the collected Ping-Pong action dataset and public HMDB51 action dataset.(4)There exist different data distributions between the source domain and the target domain for cross-dataset action recognition.The goal of domain adaptation is to solve the domain shift problem and transfer the learned knowledge from the source domain to han-dle the target domain tasks.As a cross-domain feature learning method,domain alignment is a challenging task due to the lack of target data labels.The sixth chapter suggests an ana-logical co-training learning based unsupervised domain adaptation for cross-dataset action recognition.Firstly,inspired by co-training learning,we design a analogical co-training learning based pseudo-label prediction model.By adding the class-aware constraint,the inter-classe distance increases,and intra-class distance decreases between domains gradu-ally.Then,the analogical co-training learning based pseudo-label prediction model is used to label unlabeled target samples.Finally,the classifier for the target domain is trained using the pseudo-labeled target samples.We select the common actions from four public available action datasets to construct the cross-domain action dataset pairs.Experimental results on these dataset pairs show that the proposed action recognition method has the ability to adaptively transfer cross-dataset knowledge well.
Keywords/Search Tags:Recurrent Neural Network(RNN), Image Sequence Recognition(ISR), Gait Recognition, Action Recognition, Unsupervised Domain Adaptation
PDF Full Text Request
Related items