Font Size: a A A

Research On Zero-Shot Recognition Algorithms Based On Deep Learning

Posted on:2024-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:B WangFull Text:PDF
GTID:2568306914965669Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the continuous advancement of communication technology and the development of social media,video data,as a more popular media form,has gradually occupied the dominant position in social media.As a result,the number of actions to be recognized in massive video data is increasing.At the same time,the same batch of videos may have different dimensional action labels,making it costly to label sufficient samples for training.This poses a challenge in identifying unseen videos with limited categories of training data,which is crucial for subsequent downstream applications such as video review and user recommendation.To address this challenge,zero-shot learning has been proposed as an effective solution that can identify unseen classes by learning the visual features and semantic features of seen classes.This thesis focuses on zero-shot action recognition,and the main work includes:1.To address the problem that existing semantic features cannot distinguish close or loosely defined categories under the conventional zero-shot learning(CZSL)setting,the thesis proposes a Video Attribute Prototype Network(VAPNet)to generate a new semantic feature for zero-shot action recognition:video attribute.Through the cross-attention module,the video attribute contains both video personality and category commonality,which improves the generalization ability of zero-shot action recognition.Limited by the capability of video caption model,there may be inaccuracies in video captions.The thesis learns caption uncertainties by modeling each video caption embedding as a Gaussian distribution with learnable mean and variance,thus mitigating the noise in video captions,Besides,the thesis utilizes a joint video-to-attribute and video-to-video contrastive loss to calibrate the joint visual-semantic space.Experiments show that our video attribute outperforms SoTA alternatives under different benchmarks,especially for close or loosely defined categories.2.To address the problem of insufficient discriminative ability of visual features in the generative model under the generalized zero-shot learning(GZSL)setting,the thesis proposes a Combination of Embedding and Generative Network(CEGNet).Existing generative model converts the zero-shot recognition problem into a traditional supervised recognition problem by generating relevant unseen visual features through corresponding semantic features.However,the generated visual features are not discriminative enough in the original visual feature space.To address this,the thesis introduces instancelevel and class-level contrastive learning to learn a more discriminative visual space based on existing generative models.Furthermore,to distinguish seen and unseen samples,the thesis utilizes an out-of-distribution detector.Experiments show that CEGNet educates the existing generative models with superior performance under both GZSL and CZSL settings.Based on deep learning techniques,the thesis proposes relevant improvement algorithms for different settings of zero-shot action recognition.The thesis conducts extensive ablations and compares the performance with existing SoTA methods.The experimental results prove the effectiveness of the proposed algorithms,which can improve the recognition accuracy under different settings and benchmarks,which has certain theoretical significance and application value.
Keywords/Search Tags:video action recognition, zero-shot learning, video attribute, contrastive learning
PDF Full Text Request
Related items