Font Size: a A A

Latent Hierarchical Model For Activity Recognition Under A Two-Stream Network Architecture

Posted on:2019-07-21Degree:MasterType:Thesis
Country:ChinaCandidate:S Q LiuFull Text:PDF
GTID:2428330548985889Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
In the field of machine vision,recognition of human behavior has always been a crucial research topic for scholars.Human objects,in complex scenes,are always affected by factors like background,blocking,light,weather and shadow,which induces to low efficiency of feature extraction Still,the Hierarchical model of behavior recognition utilized by most people has unsatisfied capability of feature expression.It is always the emphasis and difficulty to extract target features effectively in complex scenarios and to conduct rich characterization of the features.After deeply concluding previous studies,this thesis proposed a Latent Hierarchical Model for Activity Recognition under a Two-Stream Network Architecture.In feature extraction,the model replaced traditional optical flow histogram and directional gradient histogram with a convolutional neural network,and adopted two streams to separately extract the appearance features and motion features of video data,not necessary to conduct complicated pre-phase processing of the videos,therefore substantially improved the efficiency of feature extraction.In feature expression,researchers embedded a hidden layer into the motion layer of the stratified model to collect more plentiful context information of video series,in order to express features more effectively.In model training,researchers constructed a random field framework of implicit linear chain condition to conduct associated estimations of motions and behaviors,not like most of previous researchers who estimated motions first then behaviors.Input video series were firstly divided into several fractions,each of which contained a single motion,then entered the dual stream convolutional neural network in which the appearance features and motion features of the video series were extracted individually.Next to that,these two types of feature were strung into the Structured Support Vector Machine(Structured-SVM)for classification and training.Before the training,data driving method was applied to initialize variables in the hidden layer to make parameters of the entire hidden layer model complete,then these variables in the hidden layers would automatically update during the training progress.After that,researchers employed the largest remainder method to conduct equivalent replacements of parameters that were to find solution.Finally,researchers carried out associated estimations of motions and behaviors in the random field framework of implicit linear chain condition,then obtained the final estimation results.Experiments on three public datasets-CAD-120,UCF50 and UCF101-suggested that the method illustrated in this thesis over-performed traditional feature extraction methods used by previous researchers,which resulted from the effectiveness of dual stream network on its extraction of target features within complex scenes and advantages of the hidden layer model on its rich expression and estimation of these features.This thesis extended the task of human behavior recognition to the field of first-person view,adopted manually split network and target location network to process videos and generate local interest region,then through the dual stream network conducted feature extraction of the interest region,and finally placed features in the support vector machine for training.Experiments on datasets of Gaze40 and Gaze+44 indicated that the method proposed in this thesis had achieved a pretty high degree of recognition accuracy.
Keywords/Search Tags:twin stream network, latent hierarchical model, structured support sector machine, linear-chain conditional random field, exact inference
PDF Full Text Request
Related items