Research On Feature Extraction And Recognition Of Human Actions In Video Sequences

Posted on:2021-02-04

Degree:Doctor

Type:Dissertation

Country:China

Candidate:S L Cheng

Full Text:PDF

GTID:1368330626955638

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

Human action recognition based on video sequence(hereinafter referred as action recognition)is one of the most prosperous topics in computer vision,which has very important theoretical research value and bright prospect.Due to the complexity and diversity of human action,the research on action recognition is still in the process of improvement.There are two problems to be addressed.One is that the uncertainty within and between classes leads to a high degree of confusion in the recognition of similar actions;the other is that there is irregular predictability in the long video sequence,especially when the main action is composed of multiple sub-actions,the distribution of sub-actions has a great impact on the recognition effect.This thesis mainly studies from two aspects: feature extraction and action classification.The main contributions of this thesis include the following aspects:1.An action recognition algorithm based on local spatio-temporal covariance matrix is proposed.Traditional feature concatenation can be regarded as a simple stack of feature vectors in a single dimension.This fusion method usually can not accurately describe the correlation of features in spatial and temporal domain.Therefore,by fusing the covariance matrix of spatial gradient feature and temporal gradient feature in local neighborhood,enhancing the joint representation ability of appearance information and action information simultaneously and improving the discrimination of features,which is of great significance for action recognition.However,the covariance matrix belongs to the Riemannian space,which can not be quantified by the traditional Euclidean space measurement.Through the research,it is found that the covariance matrix in Riemannian space can be mapped to the Euclidean vector space by using the Logarithmic-Euclidean operation.The Experiments show that the performance from the devised local spatio-temporal covariance matrix is outstanding than the traditional cascade feature.2.An action recognition algorithm based on low-rank and sparse joint representation is proposed.Local constrained linear coding uses local nearest neighbor visual words to describe feature samples,instead of simple statistics of unordered visible words in word bag model,and obtains spatial layout information of local features.However,this local description is noise sensitive and ignores the whole information of visual words,which will affect the sufficiency of action description.In this thesis,the low-rank and sparse representation of the feature with respect to the template is used to obtain the global significance information of the action description.Meanwhile,the experiments are designed by using the characteristics of the local spatio-temporal covariance matrix proposed before.It is proved that the low-rank and sparse representation has good significance characteristics compared with the local constrained linear coding and other linear coding algorithms.It can not only suppress the irrelevant features,but also extract the correlation information from the background noise.On the public datasets,the proposed algorithm has achieved better results.3.An action recognition algorithm based on discriminative subspace learning method with low-rank constraint is proposed.In the low-rank and sparse joint representation algorithm,the template is lack of update mechanism,and a large number of samples are needed to participate in template construction to ensure the sufficient generalization performance of the model,however,it also increases the operation cost of template construction.Therefore,by using the low-dimensional mapping property of subspace,the model introduced by the thesis aims to reduce the feature dimension and decrease the operation cost;furthermore,the model explicitly introduces the discrimination constraint,and combines the low-rank representation,which not only preserves the anti-noise ability,but also enhances the discrimination of action representation,realizing the better differentiation between intra-class and inter-class,and improving the recognition accuracy.Experimental results show that discriminant constraints play a positive role in improving the performance of action recognition,and verify that the algorithm has a certain competitiveness compared with the similar type of algorithm.4.An action recognition framework based on improved NetVLAD is proposed.Long video sequences generally have the characteristics of large information redundancy and high resource occupancy.In this kind of video,the main action is usually composed of multiple sub-actions.For dealing with this kind of action recognition problem,it is necessary to definite the spatial and temporal characteristics between main action and subactions and accurately grasp the quantitative relationship between features of sub-actions and background noise.Therefor,the NetVLAD is introduced in this thesis,by quantifying the residual relationship between local descriptors and aggregation centers,and finding the characteristic distributions of sub actions,thus aggregating the action description with strong semantic information.In this procedure,the thesis mainly improves it from three aspects.Firstly,the segment-based sampling strategy is utilized to make the sampled frames evenly distributed in the video sequence,which ensures the integrity of action representation and improves the processing efficiency compared with dense sampling.Secondly,a algorithm equipped with NetVLAD based on spatio-temporal soft assignment is proposed.Since the local aggregation vector network uses 2D convolution to calculate the soft assignment,it lacks the acquisition of spatio-temporal characteristics.Therefore,a spatio-temporal aware module based on 3D convolution is constructed to enhance the spatio-temporal characteristics of soft assignment.Finally a self-attention weighted NetVLAD is proposed.Since the spatio-temporal aware module ignores the correlation information distributed between sequence segments,the spatio-temporal perception scope is limited.Therefore,a soft soft assignment algorithm based on self-attention module is proposed.Through a large number of experiments,it is verified that the soft assignment obtained by self-attention module can effectively expand the perception scope and obtain more context and space-time information.Compared with other advanced algorithms,the algorithm has achieved a competitive recognition effect on UCF101 and HMDB51 datasets.

Keywords/Search Tags:

human action recognition, spatio-temporal local covariance matrix, low-rank and sparse joint representation, subspace learning, NetVLAD

PDF Full Text Request

Related items

1	Study Of Video Human Action Recognition Based On Local Spatio-temporal Features
2	A Study Of Human Action Recognition Based On Spatio-temporal Features
3	Research On The Local Spatio-Temporal Relationships Based Feature Model For Action Recognition
4	Human Body Action Recognition Based On Joint Data And Extreme Learning Machine
5	Structural Low-rank Representation Based Human Action Recognition
6	Human Action Recognition Based On Sparsely Coded Spatio-temporal Video Features
7	Research On Algorithms Of Human Action Recognition Based On Videos
8	Multi-modal Human Action Recognition
9	Learning Human Pose And Action Similarity Metric Using Hierarchical Sparse Models
10	Human Action Recognition Based On Spatial-temporal Manifold Learning