| With the continuous advancement of computer networks and information technology,various social media platforms and image sharing websites provide a massive amount of images from different visual domains.In work and life,people hope to retrieve images between different domains.Cross-domain image retrieval is proposed to address this issue,with the aim of accurately and automatically matching two cross-domain images by the model.The data distribution in different domains has a significant difference,in other words,heterogeneity in low-level features,while the high-level semantic information has relevance.Therefore,the key challenge of this task is to solve the domain gap.Currently,most methods adopt domain-invariant feature learning algorithms based on adversarial networks or generative models to extract shared features from two visual domains Although these models have made significant progress,they still face the following problems: first,the enormous difference in cross-domain data distribution is prone to cause overfitting problem,and the domain-invariant features extracted by the model have poor generalization ability.Second,the training time of the model is long,and the convergence speed is slow.In addition,if only domain-invariant features are taken as the learning target to implicitly remove domain-specific features,this method will also affect the training effect of the model.Based on these issues,this thesis designs two models and selects two mainstream tasks of cross-domain image retrieval for experimentation.To address the first two problems,this thesis proposes a domain-invariant feature extraction model based on the theory of variational information bottleneck and applies it to the task of cross-view geo-localization,which is to retrieval images from one view to another.Firstly,the image features are divided into regions to achieve feature alignment under different platforms.Then,the regional features are subjected to variational bottleneck optimization to remove specific features related to the view and noise information.Through comparative experiments on the University-1652 and CVACT datasets,the proposed method is proved to improve the performance of cross-view retrieval.A series of ablation experiments are also implemented to further demonstrate the advantages of fast convergence and good generalization.To address the third problem,this thesis proposes a model based on explicit learning and removal of domain-specific features and applies it to the task of sketch-based image retrieval,which is the retrieval of natural images from sketches.This model designs two domain-specific classifiers to learn and remove specific features related to the visual domain in the original data.To be closer to reality,zero-shot settings are added,where the intersection of categories between the test set and the training set is empty.To solve the semantic gap problem of zero-shot learning,knowledge distillation is used to make the features learned by the model generalize to the test set.Through experiments on the Sketchy and TU-Berlin datasets and other ablation experiments,the proposed model is proved to improve the performance of traditional and zero-shot sketch-based image retrieval. |