Font Size: a A A

Study On Key Technologies Of Video-based Human Action Recognition

Posted on:2018-07-31Degree:DoctorType:Dissertation
Country:ChinaCandidate:J H WangFull Text:PDF
GTID:1318330515485586Subject:Image Processing and Scientific Visualization
Abstract/Summary:PDF Full Text Request
Human action recognition is one of the most promising applications in computer vision.It has many extensive applications and receives large market demand in the fields of video retrieval,intelligent video surveillance,human-computer interaction,human motion analysis and intelligent monitoring.As a result,it has attracted much attention from industry,academia,commerce,and security agencies,etc.However,analyzing actions in videos turns out to be a very challenging task due to the inconclusive definition of actions and the variation of action style,view point,illumination and background.In this thesis,the human action recognition problem has been extensively studied in four aspects:local feature extraction and description,mid-level action video representation,strategy for large scale action recognition and cross-domain action recognition.The main contents of this thesis are summarized as follows:(1)A dense trajectory sampling method based on saliency map and a color difference descriptor is proposed.Compared with classical dense trajectory sampling,our method evaluates the effectiveness of the trajectory based on the motion and vision saliency of the areas traversed by the trajectory.The method is employed to remove useless trajectories and reduces memory cost for feature storage;meanwhile,it helps to improve the processing speed and the output accuracy.While the classical feature descriptors ignore the color information,the color difference descriptor as proposed utilizes the color differences between spatial and temporal nearby patches in video frames as local feature description.Experimental results show that the color difference descriptor is complementary to the existing appearance and motion feature,and can effectively improve the accuracy of action recognition.(2)A nonnegative component representation with spatio-temporal information is proposed.Classical BoVW representation ignores the relationship between visual words and the spatial-temporal distribution of the local features.In this work,based on the low-level local features,the action units are automatically learned by the graph regularized nonnegative matrix factorization,which leads to a part-based nonnegative component representation.Meanwhile,the mixed Gaussian model is adopted to compute the temporal and spatial distribution of local features associated with each visual word,and the spatial-temporal Fisher vector(STFV)is calculated to represent the distribution of all local features.The STFV is used as part of graph regularization for NMF to incorporate the spatial-temporal cues for final representation.Experimental results show that,compared with the BoVW,the proposed representation method can effectively improve the accuracy of action recognition.(3)A novel hierarchical dictionary learning strategy(HDLS)for large scale action recognition dataset is proposed.To tackle the high variability of action types,HDLS distinguishes disjoint classes and correlated classes and processes them separately.Firstly,it clusters the similar classes into groups and builds up a two-layer hierarchical class model.Then,HDLS takes account of the different properties in the two layers by means of different algorithms for dictionary learning respectively,i.e.,the discriminant class-specific dictionary learning for the first layer and the discriminant joint dictionary learning for the second layer.Finally a classification method for the two layer dictionary learning model is given.The experimental results on several large scale datasets demonstrate the effectiveness of HDLS.(4)A cross-domain action recognition method based on the nonnegative joint dictionary learning is proposed.We employ the data in the source domain and the labeled data in the target domain to learn a joint dictionary for each class,which contains a common dictionary for both domains as well as a domain-specified dictionary for each domain.Based on the joint dictionary,the sample representations for both domains have the common part for the common dictionary,which can be utilized as a bridge for the cross-domain action recognition.To minimize the distribution divergence of the common representation part between the source and target domain,a Maximum Mean Discrepancy criterion is incorporated into the objective function of joint dictionary learning.The experimental results demonstrate the effectiveness of this method.
Keywords/Search Tags:Human Action Recognition, Dense Trajectory, Bag of Visual Words, Nonnegative Component Representation, Dictionary Learning, Cross-Domain Knowledge Transfer
PDF Full Text Request
Related items