Font Size: a A A

Research On Human Action Recognition In Videos

Posted on:2019-04-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z K LiuFull Text:PDF
GTID:1318330542994138Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Human action recognition in videos is one of the most promising applications in the field of computer vision and pattern recognition.It has been widely applied in intelligent interaction,automatic pilot,virtual reality and so on.The topic has drawn increasing interest from academia,commerce and industry.However,owing to the ambiguity of the semantic boundaries of different actions,the change of illumination,the difference of perspective and background,it is very difficult to recognize human actions in video.This thesis carries out extensive research on action recognition from four aspects:action feature extractor,space representation,classifier and temporal action detection in untrimmed videos.The contributions are as follows:(1)A stacked Overcomplete Independent Component Analysis(OICA)model is pro-posed to extract the discriminative features from videos for human action recog-nition.This model is a data-driven feature extraction model,which learns the appropriate model parameters by unsupervised learning in human action sam-ples.Moreover,the overcomplete independent component analysis algorithm uses more basis vectors than the traditional unsupervised algorithms,thus has a stronger reconstruct ability for complex human behavior and various video en-vironment.For modern feature learning,the layered models are usually able to achieve more promising performance.So,we propose to stack OICA to form a two-layer OICA network.Such layered structure provides an ability to produce more robust and discriminative high-layer features.Furthermore,a block energy based sampling method is proposed to enhance the foreground information in the videos.The experimental results show that compared with traditional feature fea-ture extraction methods,the proposed stacked OICA nentwork not only improves the accuracy of action recognition but also enlarges the application range.(2)An action representation method named Part Movement Model(PMM)is pro-posed for short video clips.The PMM explicitly captures the spatial-temporal structure of human actions and divides the actions into discriminative part move-ments,such that the actions can be better represented.Indeed,introducing such model is to distinguish the broad-variation characteristics of human actions,since the inter-class variations can be captured by emphasizing the most discriminative part movements and the intra-class variations can be approximated by making the locations of part movements configurable.Since manually labeling the part movements in videos is subjective and effort-consuming,an information theoret-ic approach is proposed to automatically infer part movements from the training data.Moreover,the task of recognizing human actions with minimal observa-tional latency is considered,and a feature extraction method which exploits both motion(local flow)and appearance(local shape)features to mitigate the informa-tion insufficiency.Experimental results on three benchmark datasets shows that short clips of 6-7 frames(0.2-0.3 second video)are enough for the proposed method to achieve the recognition performance comparable to the baselines with high latency.(3)A temporal attention model which learns to classify human actions in videos while focusing selectively on the informative frames is proposed.This model does not need explicit annotations regarding such informative frames during training and testing.Specifically,Recurrent Neural Network(RNN)with Long Short-Term Memory(LSTM)unit is adopted and attaches higher importance to the frames which are discriminative for the task at hand.The attention mechanism is de-signed to be differentiable,allowing end-to-end learning with underlying RNNmodel,and no additional inference step is in need during learning.The proposed temporal attention classifier is able to function with any per-frame or per-clip ConvNets as well as other types of feature extractors.In cooperation with RGB and optical flow based deep ConvNets,the proposed method consistently im-proves on no-attention methods and achieves state-of-the-art performance on two challenging datasets.(4)A temporal action detection model named Single-stage Multi-location Convolu-tional Network(SMC)is proposed.Temporal action detection is to detect action instances in untrimmed videos where the categories and temporal boundaries of action instances both need be predicted.SMC completely eliminates additional proposal generation and external spatio-temporal feature resampling,and direct-ly predicts frame-level action locations with action class by a unified end-to-end trainable convolutional neural network.Note that,the convolution operation can be accelerated by modern GPU,thus SMC is every efficient.The experimental results demonstrate that SMC can produce more reliable action detections with much higher speed than the existing state-of-the-art methods.In summary,this thesis exploits the problem of action feature extraction,action representation,action classification and temporal action detection based on local fea-tures,part movement model,attention mechanism and deep neural network,respective-ly.In addition,the introduced methods are evaluated on benchmark action datasets,and the experimental results demonstrate their advancements and practicabilities.
Keywords/Search Tags:Action Recognition, Feature Extractor, Action Representation, Classifier, Convolutional Neural Networks, Temporal Action Detection
PDF Full Text Request
Related items