Font Size: a A A

Variational Deep Networks For Action Classification And Prediction

Posted on:2021-04-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:Lubamba Kasangu EricFull Text:PDF
GTID:1488306128965289Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
This thesis aims at the recognition of human actions in videos.Recognition of behavior is definable as the ability to assess whether a given action is taking place or not in a given frame of a video.This problem is complicated because of the high complexity of human behavior(action,activities)such as camera angle,variability in appearance,the pattern of motion,occlusions,etc.Though,current state-of-the-art approaches have achieved acceptable outcomes,with models primarily relying on particular local Spatio-temporal features and convolutional neural networks.Some challenges remain to tackle the classification of action fully as human activities are complex.In this work,we propose three solutions to model the relationship among local characteristics,particularly in Spatio-temporal context.We introduce three methods in this thesis that try to address the challenging questions.In the first method,we propose a classification model based on dense trajectories coupled with local Spatio-temporal features,where features are encoded using a classical descriptor pipeline,such as the Fisher vector.By contrast to other previous solution,in our second approach instead of we define learned features via a capsule network,as such allowing us to refine our model discriminatively as well as leveraging the weight update process via dynamic routing algorithm.Finally,in our third proposition,we employ a deformable convolution network and SelfBalanced SENsitivity SEgmenter(Su BSENSE)approach for foreground subtraction to classify and to predict future action.Our proposed solutions are generic and are susceptible to improvement in both cases,whether hand-crafted or deep-learning-based methods.1)A novel hybrid framework for action classification via dense features trajectory was proposed.We focus on combining not only shape trajectories information as in traditional approaches but also motion dense trajectories.In this study,samples of videos were classified based on a more comprehensive feature set that contained more information on the activity or the category class of the video.Thus,contrary to trajectory-aligned appearance-based methods,our dense trajectories(containing in addition motion information)can decrease the computation time.2)An action classification model based on a variant Capsule Neural Network with a weight update is proposed.In fact,instead of making the weight update process on default,the proposed solution employed a novel approach of weight update by dynamically routing the loss during backpropagation.The model checks whether the captured information is,in fact,the same as its representation.Besides,the solution does not require a large set for training.And the resource consumption was lower as compared to the traditional approach.3)A deformable convolution and sequential framework for action detection,classification,and prediction are proposed.This model solution focuses on both classifying the action and predicting future activities.Obtained results prove that our solution by leveraging the background-foreground subtraction increases the accuracy.The combination with the Deformable Convolution dramatically impacts the performance as compared to traditional approaches for classification.We showed that our models achieve significant improvements in comparison to the most previous standard or traditional approaches,while mainly reducing the size of required trainable parameters and inputs.Experimentation results let us argue that the ability to process a video in real-time would be a critical factor in potential applications for acknowledgement of actions.All of the methods proposed in this thesis are ready for real-time work.We have empirically proved our argument by developing and suggesting computable solutions.Throughout,our work is an exciting breakthrough that has been achieved in the field of action classification and prediction from video content.The achievement is based on several vital notions: features trajectories,weight update,capsule network,background subtraction,and temporal information.Moreover,contrary to traditional approaches that require a broader set of samples and expensive resources,our work reduced the necessity for more massive sets considerably and lowered the resources requirements.
Keywords/Search Tags:Action Classification, Action Prediction, Dense Trajectories, Background Subtraction, Deformable Convolution, Capsule Network, Long Short-Term Memory
PDF Full Text Request
Related items