Object recognition is one of the core tasks in the field of computer vision.With the development of recognition tasks into large-scale and fine-grained categories,this research faces issues such as data imbalance and high cost of tag data.Traditional deep learning methods are no longer applicable.Therefore,Zero-Shot Learning(ZSL)methods have been proposed.The goal of ZSL is to transfer the information learned from seen classes to unseen classes by sharing semantic information,in order to achieve the recognition of category without using labeled samples for training.ZSL is divided into conventional zero-shot learning and generalized zero-shot learning(GZSL)based on whether the test sample category contains seen classes.GZSL is more challenging and practical than conventional ZSL.Early ZSL research methods mainly focus on learning the mapping function between image feature space and semantic information space.However,due to domain bias and other issues,such methods perform poorly in GZSL settings.To tackle this problem,ZSL methods based on generation model have been proposed.Due to the lack of close connection between generative features and semantic information,as well as the lack of fine-grained information and discriminatory,it also has a certain impact on the performance of zero-shot learning methods based on generation models.Therefore,how to improve the quality of generative features is currently the focus of the zero-shot learning research.In this work,we conduct a systematic research on zero-shot learning methods based on generation models,and conduct in-depth research around the shortcomings of current generation models.At the same time,we verify our proposed methods and compare our proposed methods with other classic methods.The main contents are as follows:(1)To tackle the problem of generative visual features that are not closely related to semantic information,this paper proposes a zero-shot learning method based on aligned variational autoencoders combined with triplets.The method includes a basic variational autoencoder module,a parameter alignment module,a cross reconstruction module,and a triple module.The main function of the basic variational autoencoder module is to generate potential spatial features.The basic variational autoencoder module is constrained by a parameter alignment module,a cross reconstruction module,and a triplet module to better integrate the generative potential spatial features with semantic information,which is conducive to better transferring the information learned from seen classes to unseen classes through semantic information.According to the triplet selection rule,this method produces four variants.We have conducted sufficient experiments on all variants,and the experimental results prove the effectiveness of this method.(2)Focusing on the problem that the original feature space is far from semantic information and lacks semantic relevance and discriminatory,this paper proposes a hybrid zero-shot learning method based on feature contrastive optimization.This method combines the generation model and the embedding model to form a hybrid framework,mapping real visual feature samples and generative visual feature samples generated by the generation model into the embedding space,and performing the final generalized zero-shot learning classification in the embedding space.In addition,in order to optimize the visual feature quality of seen and unseen class samples,the method uses a feature contrastive optimization module that integrates visual → semantic mapping into the basic hybrid framework,introducing semantic consistency loss and contrastive loss to explicitly encourage intra-class compactness and inter-class separability,and guiding the model to learn more semantic-relevant and discriminative visual features.Experiments show that the recognition accuracy and robustness of this method are significantly improved.(3)In the field of flower recognition,considering that flower recognition models which are trained by traditional deep learning methods require a large amount of labeled data for training,and some endangered or rare flower species may not be able to collect enough samples to train traditional recognition models even using modern search methods.Based on the above model,this paper applies zero-shot learning method to the field of flower recognition and develops a zero-shot flower recognition system.This paper provides a detailed introduction to the design and architecture of the system,as well as an explanation and demonstration of its functions and operational effects.The system can recognize flowers of unknown categories which is not available in model training phase by using shared semantic information of different flower categories.Moreover,the system can make up for the problem that some flower categories with few labeled samples cannot be trained and recognized through traditional deep learning models,and has certain significance and practicality. |