Font Size: a A A

Research On Unsupervised Video Action Clustering

Posted on:2021-06-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:B PengFull Text:PDF
GTID:1488306548474674Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the development of mobile Internet and multimedia technology,the rise of a large number of video applications has led to an explosive growth of online video data.As one of the mainstream information dissemination methods,video has been widely used in many fields,such as digital media,science and technology education,security monitoring,and so on.However,while meeting the needs of users,a large number of videos have brought huge challenges to the organization,classification,and application.Although existing supervised or semi-supervised video classification methods have achieved significant performance,such methods usually require a large amount of highquality label data for model learning.Therefore,how to accurately obtain valid information in unlabeled video data,and explore the essential structural characteristics and category distribution of video data has become a hot research topic in the field of computer vision and artificial intelligence.Unsupervised video action clustering aims to achieve unsupervised clustering of videos by learning the inherent structural characteristics among the video data without labels.Based on a self-representation subspace model,this thesis comprehensively considers the characteristics of video action data,and focuses on three aspects of video action clustering methods,which includes the methods based on multi-context multiview joint constraints,multi-context recursive constraints,and deep spatio-temporal feature learning:1.Multi-context multi-view joint constraints.Scene context and motion context are two key cues in video action analysis,and effectively mining the associations among multi-context and multi-view features is beneficial for boosting the performance of the unsupervised video action clustering.To this end,an unsupervised video action clustering method via motion-scene interaction constraint is proposed in this thesis.The proposed method comprehensively considers the static scene characteristics and dynamic motion characteristics,and constructs intra-context constraint and contextual interaction constraint based on the self-representation subspace clustering framework.The intra-context constraint aims to mine the structural similarity of the same-view feature and the complementarity between multi-view features in each context,while the contextual interaction constraint aims to ensure the consistency of the subspace representation in the scene and the motion contexts.By constructing a subspace clustering model with the motion-scene interactive constraint,the proposed method effectively improves the video action clustering performance.2.Multi-context recursive constraint.Considering the association between action context clustering and scene context clustering,as well as the interaction between subspace representation and spectral clustering processes,could improve the action clustering performance.To this end,a recursive constrained framework for unsupervised video action clustering is proposed in this thesis.The proposed method simultaneously obtains multi-context clustering results based on multi-context information in the video.By designing a recursive prior propagation method,the joint information gain in the prior clustering results is effectively mined and fedback to further guide the subspace representation learning and spectral clustering process.Based on the multi-context clustering results,the proposed method constructs a constraint-guided subspace representation and a priori-inherited multi-view spectral clustering method to learn discriminative spectral embedding feature representations,and thus improving the video action clustering performance.3.Deep spatio-temporal feature learning.Inspired by the success of deep learning in feature extraction,it is crucial to explore deep learning based video action clustering method by fusing the self-representation subspace property and the non-linear characteristics of the deep networks.To this end,a deep video action clustering method via spatio-temporal feature learning is proposed in this thesis.The proposed method constructs a deep network model to learn the video spatio-temporal feature representation and subspace representation matrix.In order to obtain clustering-friendly video feature representations,a clustering information feedback mechanism is designed to mine valid information from existing clustering results,and the proposed network is further trained by introducing a cluster-driven objective function.By jointly optimizing video spatiotemporal feature extracting,subspace representation learning,and spectral clustering processes,the video action clustering based on deep learning is effectively implemented.
Keywords/Search Tags:Video Action Clustering, Subspace Representation, Contextual Interaction Constraint, Recursive Constrained Framework, Deep Spatio-temporal Feature Learning
PDF Full Text Request
Related items