Font Size: a A A

Research On Zero-shot Learning Based On Generative Model

Posted on:2022-07-05Degree:MasterType:Thesis
Country:ChinaCandidate:J R LiuFull Text:PDF
GTID:2568307070452734Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the expansion of dataset size and the rapid development of deep learning technology,great breakthroughs have been made in classification models based on DNNs.Especially after the advent of the CNN-based models and the Transformer-based models,the top-5 accuracy on the ImageNet has reached more than 95%,which greatly exceeds the human level.However,the huge number of species in nature and the differentiation according to local conditions make it impossible to obtain all accurately labeled data for fully supervised training.If faced with some classes that did not appear in the training process,the trained model will appear rigid,and new training data needs to be added for retraining.At this time,zero-shot learning,as a new way,can generalize the knowledge learned in the seen classes and classify or detect the unseen samples accordingly.Because of its profound research significance and broad application prospects in the field of computer vision,zero-shot learning has attracted more and more attention in the academic community in recent years.At this stage,conventional zero-shot learning(CZSL)has achieved excellent performance.However,there is a certain discrepancy between the scene setting of CZSL and the actual scene setting,because it is impossible for the model to know whether the object to be classified belongs to the seen or unseen domains.Therefore,a new and more challenging task has emerged,namely the Generalized Zero Shot Learning task.This new task expands the search range of the original test set from single unseen domains to seen and unseen dual domains,and is replacing the original task,becoming a new topic in the zero-shot field.Based on the above background,this paper focuses on the generalization of zero-shot learning research of generative models and addresses the shortcomings of existing methods,such as the fact that the generated samples are not real enough and they are not diversified.Four methods are proposed to improve the performance of generalized zero-shot recognition and detection.The main content and results of this paper are as follows:(1)This paper proposes a method that goes beyond the normal distribution and can generate a visual distribution closer to reality.At present,most generative methods use semantic prototypes to nest normal noises to generate visual features.Semantic prototypes act as anchor points in a high-dimensional space and normal noise determines the visual distribution of anchor points.In order to ensure that the manifold structure of noise is the same as the structure of visual features,this paper defines a noise encoder and noise space,so that the vectors in the visual space and the noise space are aligned.At the same time,in order to solve the problem that the unseen domain is missing in the visual space,this paper uses the cosine similarity between semantics to select the generated materials.Finally,this paper implements two completely different noise encoders and uses different loss functions to constrain the noise space,both of which finally get results far beyond the normal distribution.(2)This paper proposes a method of using Cramer distance to optimize the generation network.On the one hand,it expands the diversity of generated samples in principle,and on the other hand,it uses WGAN-div and perturbation-based attack strategies to ensure the authenticity of the generated features.Different from some previous WGAN-based methods,it not only breaks the relatively single generative network modes(VAE and WGAN)in the field of generalized zero shot learning,but also surpasses them more in effect.(3)For generative networks,the richer the input material,the more refined the visual features generated.For generative generalized zero-shot learning tasks,generally only the semantic prototype is the only input to anchor the position of the visual feature in the highdimensional space.In order to improve the distinguishability and uniqueness of the generated features,this paper proposes a latent space with orthogonal properties.Use semantics to generate more discriminative latent space vectors,and then combine semantics to further refine the generation of visual features.Through experiments on the two generation networks,the representativeness and uniqueness of the generated features are further deepened.Finally,a large number of experiments have verified that the generated latent space has a good role in promoting GZSL.(4)This paper tries to combine the content of generative model and curriculum learning,and apply it to the task of zero-shot object detection.By imitating the process of human learning,this paper defines a number of "teachers" in different fields,and constantly corrects the results of the generator "students" during training.Each teacher will evaluate the generator in different periods and continuously give "rewards" based on its progress until it reaches the optimal result of convergence.Then combine the generated unseen proposals to train a zero-shot classifier with the same structure as Faster RCNN.The classifier in the original detection network will be replaced by the zero-shot classifier,so that the detector obtains the ability of detecting unseen samples.
Keywords/Search Tags:zero-shot learning, generalized zero-shot learning, generative adversarial network, classification and recognition, object detection
PDF Full Text Request
Related items