Font Size: a A A

Research On Video Action Recognition Technology Based On Spatiotemporal Feature Extraction

Posted on:2019-06-08Degree:MasterType:Thesis
Country:ChinaCandidate:H Y ZhaoFull Text:PDF
GTID:2438330551960872Subject:Software engineering methods
Abstract/Summary:PDF Full Text Request
Video action recognition usually refers to the process of identifying the categories of human action from a video sequence,this technology is widely used in the fields of multimedia content analysis,human-computer interaction,intelligent real-time monitoring and so on,it can be achieved by extracting feature vectors from the feature of video,and then classifying the feature vectors by classifier.This paper focuses on the extraction of the spatial and temporal characteristics of video,the traditional method usually capture the characteristics of dynamic change of the video by making the three-dimensional space-time domain as a whole,which is one-sided,will cause great loss to the unique change characteristics in the two-dimensional images' space domain or timing domain,therefore,the video action recognition technology needs to separate the video time and spatial structure separately,so as to capture the spatio-temporal feature information of the video dynamic changes more comprehensively.,this paper proposed two kinds of spatio-temporal feature extraction algorithms,the main research contents are as follows:1)In this paper,a video spatio-temporal feature extraction method based on multi-channel spatio-temporal pyramid is proposed.The core content of this method is to construct a multi-channel spatial pyramid model,the pyramid model will abandon the traditional multi-scale subspace segmentation method in the 3-dimension space-time domain,and divide the video respectively in three single channel,namely,three-dimensional space-time domain,two-dimensional image spatial domain,and one-dimensional time domain to construct multi-scale subspace,then calculate the frequency histogram of each subspace based on the bag of words model,finally all subspace frequency histogram series as the video final feature vector to be classified by the classifier.This method can capture the unique dynamic characteristics of video in three dimensional spatio-temporal domain,two-dimensional image space domain and one-dimensional time domain,and enrich the temporal and spatial structure information of video features.2)In this paper,a spatio-temporal feature extraction method based on rank pooling combined with the image's spatial features is proposed.Firstly,we make a multi-scale segmentation in the two-dimensional spatial domain for the every frame image of the video,and then make a supervised learning for the orderly sequence of feature vectors of each subspace by sorting function separately to capture the temporal information of each subspace,the model parameters can be regarded as the feature descriptor for the subspace,finally we connection the model parameter of each subspace and then obtain the final descriptor for the video,then,we use the classifier to identify the corresponding action of video content.The algorithm not only inherits the advantage of rank pooling,which can capture rich dynamic features of temporal domain,but also makes up for the shortcomings of the lack of dynamic characteristics of space,which can effectively improve the accuracy of action recognition for the video.
Keywords/Search Tags:action recognition, spatio-temporal features, multi-channel spatio-temporal pyramid, rank pooling
PDF Full Text Request
Related items