Font Size: a A A

Research On Action Prediction In Videos

Posted on:2019-08-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y G LuFull Text:PDF
GTID:2428330545454775Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Action recognition is a hot topic in the domain of computer vision,and it is widely applied in human-computer interaction,studio entertainment,intelligent video surveillance,intelligent medical care,and etcetera.Action prediction is a special class of action recognition.Different from conventional action recognition which aims at recognizing complete actions,the purpose of action prediction is to distinguish an action before it is fully executed so that some objectives,such as accident early warning and crime prevention,can be achieved by analyzing the possible impact of the action.This thesis studies and proposes a method on how to predict human actions in videos from three major issues: the feature representation,the prediction mechanism,and the learning model.Firstly,this thesis proposes a novel feature representation method to combine the convolutional neural network feature(the contour descriptor),which can describe incomplete actions well by dispersing the key information to all frames in the video,with the motion boundary histogram feature(the motion descriptor).The combined feature can describe the un-finish action better than both of these features by overcoming the weakness of the convolutional neural network feature in describing action temporal structure when remaining the advantage of it.Secondly,this thesis proposes to predict action categories of unfinished actions by using semantic reasoning and presents an unsupervised semantic mining approach to realize semantic reasoning.The proposed semantic mining approach models the context relationship of semantic concepts by using a General Mixture Transform Distribution model after extracting semantic concepts from the feature representation of input videos above,and inferring the missing semantic concepts of incomplete videos with the context relationship and observed semantic sequences jointly.Then,the complete semantic sequences which contain the observed semantic sequences and the inferred semantic sequences are used to train the action prediction model.Thirdly,this thesis develops a discriminative structural model which takes the inferred semantic sequence as a part of the input.This model adopts a maximum margin based learning framework which establishes the relationship among global features,observed semantic concepts,and inferred semantic concepts which are acquired from the semantic reasoning implicitly by associating the temporal states-related latent variable with the global feature implicitly to achieve the goal of predicting actions in videos.Experimental results on the UT-Interaction dataset and the UCF-Sports dataset show that the proposed feature representation method,action prediction mechanism,and discriminative structural model can effectively improve the accuracy of action prediction.
Keywords/Search Tags:action prediction, feature representation, action semantic, General Mixture Transform Distribution model, discriminative structural model
PDF Full Text Request
Related items