| With the rapid development of computer vision technology and in-depth research on deep neural networks,the accuracy and speed of image recognition have surpassed that of human beings.At present,it has been widely used in various fields of daily production and life,but it needs to use a large number of labeled data samples for training.In real life,however,some categories of image data cannot be obtained on a large scale,which makes the supervised training model unable to obtain the ideal recognition accuracy on unseen category data.In addition,new categories are created or discovered every day in production and life,supervised models need to label new categories,rebuild datasets,and retrain them to complete high-precision recognition work.Therefore,zero-shot learning(ZSL),which recognizes unseen classes by training visible classes,has important research significance.At present,scholars mainly use prior features such as attributes as the bridge between image features and category labels in zero-shot learning,and use them to build a mapping between prior knowledge and image features to complete the field transfer between seen classes and unseen classes and to solves the problem of missing class labels for unseen classes.With the deepening of research,zero-shot learning has improved in terms of classification model and recognition accuracy,but there are still some problems.On the one hand,there is a semantic gap between the prior features such as attributes used in zeroshot learning and image features in the high-dimensional mapping space,which leads to the problem of domain offset and affects the recognition accuracy.On the other hand,prior features such as attributes are not fully utilized when used,so that the auxiliary information does not allow the network model to learn enough.In order to solve the above problems,a generative zero-shot learning model is used to alleviate the semantic gap problem,which is used as a research framework and improved according to the following points:(1)Using Trans GAN as the generative adversarial network part of the generative zero-shot learning model framework,improving the Trans GAN model,adding convolutional layers to the generator to extract image features,and simplifying the discriminator to reduce computational requirements.At the same time,the cosine similarity is introduced into the loss function to verify the matching degree between the generated feature and the image feature.(2)Aiming at the problem of low utilization of prior features,considering the correlation between attributes,the co-occurrence matrix between attributes is calculated and introduced into the generator.So that the generator can learn the relationship between attributes,and finally makes the generated features fit the real features more closely.(3)Since the image features extracted by different network models are different,it is necessary to select an image feature extraction network adapted to the generative zero-shot learning model to achieve optimal results.The attention mechanism is used to fuse the discriminator input features and attribute information,so that the discriminator can pay attention to the key information in the attribute features,and then better guide the generator.The recognition accuracy of unseen class is used as the evaluation standard in the zero-shot learning recognition experiment.The results show that using Vision Transformer as feature extraction network is the best.At the same time,the optimization of loss function,attribute fusion based on co-occurrence matrix and attribute fusion based on attention mechanism have achieved good results,which verifies the effectiveness of the proposed method. |