Font Size: a A A

Few-Shot Action Recognition Based On Prototype Learning

Posted on:2024-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:S W LiuFull Text:PDF
GTID:2568307127454154Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Video-based action recognition shows a key role in the domain of machine vision and shows great research significance in community of scholars.It also has great application value in intelligent security,smart city,human-machine interaction,video understanding and other fields,showing great significance in industry.As an important branch of action recognition,few shot action recognition only needs a few number of samples to train,then the model has the ability to perceive new categories,which has attracted wide attention of researchers.At present,the mainstream video-based few shot action recognition methods mainly study how to match query samples and support samples,such as designing better network to extract more efficient spatiotemporal features,learning better class representation features,and designing better sequence matching algorithms.However,these are still some key problems unresolved,such as poor model generalization ability,weak prototype capability of the action,large intra-class distinctions and small inter-class distinctions.How to effectively learn the representations of actions is still a challenging and significant issue.Prototype representation enhancement is the key to few shot action recognition based on prototype matching.The paper takes few shot action recognition based on prototype matching as the research object,utilize multi-dimensional prototype enhancement to study action representations.The paper mainly combining spatiotemporal feature enhancement and prototype learning,proposes and implements three few shot action recognition methods.The following are the major work of the paper and implementation:(1)This paper proposes a dynamic temporal feature enhancement method for few shot action recognition.In order to effectively capture the temporal information between video frames,Spatiotemporal Feature Fusion(SFF)module is proposed.The temporal convolution network is utilized to capture distinct temporal representations of videos,and the feature fusion is on account of the neural network automatic learning weight.To make up for the possible temporal information distribution deviation between the support set and the test set,Dynamic Temporal Transformation(DTT)module is proposed.DTT module generates a new mask feature with different temporal information distribution by disrupting the different spatiotemporal features generated by SFF module to simulate the temporal feature distribution that may exist in the test set.As a supplement to the original features,mask features enrich the training samples and enhance the generalization ability of the network in the test set.In addition,the Transformation Consistency Loss(TCL)is designed to be embedded in DTT,so that the network can correctly recognize the scrambling operation,and then enhance the importance learning of the network for each spatiotemporal feature.Sufficient experiments are conduct on HMDB51,UCF101 and Kinetics-100 datasets,the results prove the significance of the raised methodology.(2)This paper proposes a cross-domain prototype refactor learning for few shot action recognition.On the basis of dynamic temporal feature enhancement,the research on few shot action recognition methods is carried out from the perspective of "prototype enhancement",and the Cross-domain Prototype Refactor Learning(CPRL)module is added.It consists of SupportCorrective Prototype Learning(SCPL)module and Query-Supplemental Prototype Learning(QSPL)module.Among them,the SCPL module uses the weighted average instead of the original average to calculate the class prototypes,narrowing the negative impact of extreme samples on prototype features,and improving the representation ability of prototype features on the average level of categories.QSPL module uses pseudo-label strategy to calculate pseudosamples from query set for prototype learning,which achieves increasing samples to reduce intra-class differences and enhance prototype features.In order to better learn the similarity between features for prototype learning,Re-weighting Similarity Attention(RSA)is also proposed for SCPL module and QSPL module.Experiments on UCF101,HMDB51 and Kinetics-100 datasets have proved that the two prototypes complement each other and successfully enhance the classification performance of the network.Besides,the proposed CPRL module draws a new prototype learning idea for few shot tasks based on prototype features,which can be embedded in the network to enhance the representation ability of prototype features.(3)This paper proposes an adaptive aggregation triple optimization method for few shot action recognition.On the basis of cross-domain prototype reconstruction learning architecture,we further study few shot action recognition from the perspective of "optimizing classification space".By analyzing the traditional cross-entropy loss and triple loss,an Adaptive Aggregation Triple Loss(AATL)for few shot action recognition is designed to optimize the prototype classification space in the form of triplet.AATL module is composed of Prototype-Driven Adaptive Triple Loss(PATL)and Query-Driven Adaptive Triple Loss(QATL).Among them,PATL builds triplets centered on prototype features to make full use of the feature distribution of the query set,optimizing the similarity between the prototype features and the query sample features,and enhancing the prototype’s ability to distinguish subtle differences.QATL takes the query sample as the center to optimize the similarity between prototype features and support set features,enhancing the linear separability of different categories of support set,increasing the inter-class distinctions and reducing the intra-class distinctions.The adaptive margin is designed to replace the fixed margin value and generate a specific margin value for each task to distinguish positive and negative samples.Experiments are carried out on HMDB51,UCF101 and Kinetics-100 datasets,and the results prove the effectiveness of optimizing classification space based on triples.
Keywords/Search Tags:Few Shot Action Recognition, Spatiotemporal feature learning, Prototype enhancement, Similarity optimization
PDF Full Text Request
Related items