Research On Self-supervised Action Recognition Based On Contrast Learning

Posted on:2024-02-10

Degree:Master

Type:Thesis

Country:China

Candidate:L L Wang

Full Text:PDF

GTID:2568307157984549

Subject:Mathematics

Abstract/Summary:

PDF Full Text Request

With the rapid development of Internet technology,a large amount of multimedia information data has been generated,and video understanding and action recognition in realtime surveillance have gradually become popular research directions with wide application prospects in several fields.In recent years,deep neural networks have achieved superior performance in many visual recognition tasks,but such models rely heavily on manually labeled datasets,which require significant labor and time costs to obtain.In contrast,a large amount of unlabeled data is readily available on the Internet,and unsupervised learning using readily available unlabeled data has attracted a lot of attention from researchers,so it is worthwhile to investigate in depth how to utilize unlabeled data and improve the performance of video action recognition.In this paper,we study a self-supervised learning method based on deep neural networks from the multimodal perspective of video data to mine its own supervised information from unlabeled data and learn representations that are useful for downstream tasks,so as to build an efficient human action recognition model.The main research includes the following two aspects:(1)Cross-modal temporal contrastive learning for self-supervised action recognitionTo address the performance limitation of video feature modeling in fine-grained scenes,considering the temporal continuity of video action sequences and the semantic relevance of multimodal information,this paper proposes a self-supervised algorithm for cross-modal temporal contrast learning(CMTCL).The local temporal contrast learning method is designed to adopt different positive and negative sample division strategies to explore the temporal correlation and discriminability between non-overlapping segments of the same instance and enhance the fine-grained feature expression capability;the global contrast learning method is studied to increase the positive samples by cross-modal semantic cotraining to learn the semantic consistency of different views of multiple instances and improve the generalization capability of the model.Extensive experiments on two publicly available action recognition datasets,UCF101 and HMDB51,show that the proposed method improves on average by 2% to 3.5% over cutting-edge mainstream methods.(2)Cross-view consistency mining for self-supervised skeleton action recognitionTo address the problem that single view of skeleton sequence is semantically limited in depth feature expression and considering the information consistency of multiple views of skeleton,this paper proposes a self-supervised algorithm for cross-view consistency mining(CVSCL).Combining multiple skeleton augmentation methods to generate positive sample pairs in contrast learning to increase the spatio-temporal diversity of skeleton sequences and improve the generalization of single-view representations;based on the prior knowledge of skeleton single-view representations,the cross-view consistency mining method is investigated to mine the hard positive examples of samples through the correlation constraints between views and learn the cooperative representation of multiple views.Experimental results show that the proposed method in this paper can effectively improve the accuracy of action recognition on the NTU RGB+D 60/120 datasets under unlabeled settings.

Keywords/Search Tags:

self-supervised learning, action recognition, contrastive learning, multi-modal data, deep learning

PDF Full Text Request

Related items

1	Research On Action Recognition Based On Skeleton Data
2	Research On Semi-supervised Human Action Recognition Based On Convolutional Neural Network
3	Research On Multi-modal Learning For Imbalanced Modal Data
4	Deep Action Recognition Using Cross-View Video Prediction At Edge
5	Multi-modal Human Action Recognition Based On Deep Learning
6	Research On Semi-Supervised Expression Recognition Algorithm Based On Contrastive Learning
7	Cross-modal Representation Learning Based On Multi-negatives Supervised Contrastive Mechanism And Its Application
8	Research On Human Action Recognition Based On Deep Learning
9	Research On Scene Image Recognition And Segmentation Based On Contrastive Learning
10	Research On Weakly Supervised Human Action Analysis Based On Deep Learning