| With the development of the big data era,there are massive image data on the network,which promotes the in-depth research and development of deep learning.However,in order to obtain excellent accuracy,deep learning research requires a large amount of manually labeled data,but manual labeling data is a costly and extremely cumbersome work.Driven by the real needs and the continuous development of technology,Zero-shot Learning technology came into being.Zero-shot Learning aims to solve the problem that the current model has a recognition bottleneck for a large amount of unlabeled data.Different from the traditional deep learning classification technology,Zero-shot Learning is a method of using known data,supplemented by relevant common sense information or prior knowledge,to train the learning model,and realize category prediction and identification of unknown data.technology.Aiming at the common problems of domain drift and auxiliary information utilization in Zero-shot Learning,the main work of this article is as follows:First,a transductive single-label Zero-shot Learning model based on convolutional neural network(CNN)is proposed.First,for the single-label Zero-shot Learning task,in order to make full use of the prior knowledge of the unknown class samples and improve the discrimination of the visual features,the study adopts transductive learning.In this study,the visual feature module uses Res Net-50 to extract targeted visual features.Secondly,the loss function is improved based on the cross-entropy function,and the penalty constraint on the bias of known class samples is added to the loss function.The distance between classes is calculated by using the visual feature distribution center to achieve the effect of expanding the class distance and reducing the intra-class distance.Then,a semantic autoencoder model is used to align visual features with semantic features.Finally,the model is verified by simulative experiments.The experimental results show that the model performance in the traditional Zero-shot Learning setting is significantly better than the comparison model,and also achieves good results in the two datasets in the generalized Zero-shot Learning setting.The feasibility of the proposed model for single-label Zero-shot Learning tasks is proved,and the improved loss function greatly alleviates the domain shift problem of single-label Zero-shot Learning.Second,a multi-label Zero-shot Learning model fusing CNN and graph convolutional neural networks(GCN)is proposed.Aiming at the multi-label Zero-shot Learning task.The proposed network includes two feature extraction channels of semantics and vision.The semantic feature channel utilizes the graph attention network for semantic feature extraction,and constructs a directed weighted graph as the input of the graph attention network.The visual feature channel utilizes the Res Net-50 model for visual feature extraction.At the same time,two graph feature embedding methods,local and global,are proposed.Local graph feature embedding converts category semantic features into visual feature convolution kernels to achieve local feature responses.Global graph feature embedding focuses on the global interaction and response of visual and semantic features,and the two graph embedding methods achieve guidance and extraction of targeted visual features from both local and global perspectives.The fused visual features are passed through the prediction module to output judgment results.The proposed model was simulated and verified on two data sets.The experimental results showed that the best average accuracy index is achieved under the traditional Zero-shot Learning and generalized Zero-shot Learning settings,and the results have obvious advantages,which proves the feasibility of the multi-label Zero-shot Learning model that fuses convolutional neural network and graph convolutional neural network proposed in this study,and significantly improves the recognition and prediction ability of unknown classes. |