With the advent of the era of big data,the image data on the Internet has shown an explosive growth.The hash retrieval method can map the features of the image to the Hamming space,reducing the dimension of the features and the retrieval cost.Therefore,the hash retrieval method has also become a basic research direction in the field of multimedia and computer vision,and has been widely studied and applied in large-scale image retrieval problems.Although existing hash retrieval methods have achieved good results on numerous tasks,none of these methods consider the background noise of images.Therefore,the generalization and robustness of the existing hash retrieval models need to be improved.Traditional hash retrieval methods generally assume that training data and test data have the same data distribution.However,in practical application scenarios,different datasets often have different data distributions due to lighting,weather and other reasons.It is difficult to directly apply a model trained in one domain to data in another domain.In addition,in practical application scenarios such as fine-grained image retrieval,the distinction between different categories basically depends on subtle differences in single or multiple local regions.However,the existing hash retrieval methods are mainly aimed at the coarsegrained retrieval problem.Moreover,these feature learning methods usually only focus on the features of the global or a single local region,it is difficult to capture the contextual relationship between multiple local regions,and cannot effectively achieve fine-grained feature learning.In view of the above problems,construct a graph network to mine the correlation information between multiple samples to improve the domain adaptability of the model.The self-attention module is used to mine the contextual relationships among the local regions in a single fine-grained sample to improve the model’s ability to learn finegrained features.Both of them combine coarse to fine correlation information to study robust and efficient hash retrieval methods.The specific work of this paper is as follows:Firstly,for the background noise problem,a reverse spatial transformation hash retrieval method based on mutual learning is proposed.Existing hash retrieval methods usually augment image data randomly with fixed parameters to augment training data.However,this method of data augmentation cannot effectively remove background noise,and sometimes even contaminate the training data.Inspired by the spatial transformation network,an inverse spatial transformation network is designed before the input layer of the main network.The inverse spatial transformation network can adaptively learn a variety of transformation information according to the image content,and can apply affine transformation attacks in the training process to improve the anti-interference ability of the network model.This enables the network to obtain good retrieval performance even when trained on smaller datasets.In addition,the mutual learning training strategy can not only improve the stability of network training,but also accelerate the convergence of the network model.On multiple datasets,systematic experiments demonstrate the state-of-the-art of the method.Taking only 64-bits length hash codes as an example,on the datasets CIFAR-10 and NUS-WIDE,the method achieves m AP of 82.6% and 83.5%,respectively.Secondly,an unsupervised domain adaptation strategy based on association graph is proposed for the model domain adaptation problem caused by the difference of data distribution in different domains.Regarding the domain adaptation problem,there are two main difficulties in transfer learning from the source domain to the target domain: one is what to transfer;the other is how to transfer.Existing domain adaptation methods typically use maximum mean difference or discriminator networks to transfer low-order feature distribution knowledge across domains.For the question of what to transfer,this paper designs a residual graph convolutional network to construct the association graph between samples and mine the association information between data samples.For the problem of how to transfer,considering the geometric properties of the spatial probability distribution,this paper introduces an optimized Hellinger distance to measure the difference in the distribution of associated information between the source domain and the target domain on the statistical manifold.This method not only has a certain theoretical basis,but also provides a new solution to the problem of what to transfer in domain-adaptive learning.On the datasets Office-31,Office-Home and Vis DA2021,the average accuracy of the method on multiple domain adaptation tasks reaches 92.1%,71.5% and 89.1%,respectively.Finally,a weakly-supervised local feature fusion strategy based on self-attention is proposed for the feature learning problem in fine-grained scenarios.Existing fine-grained feature learning methods usually only focus on the features of the global or a single local region,and it is difficult to capture the contextual relationship between multiple local regions.First,we generate attention maps to represent multiple saliency local components of target objects through weakly supervised learning.Second,inspired by bilinear pooling,multi-attention pooling is proposed to extract local semantic features of different parts.Finally,the features of each local part of the target are input into the optimized self-attention network to mine the contextual relationship between each part and fuse the local visual features.This method effectively solves the problem that the existing hash retrieval methods are fast and inaccurate in the face of fine-grained data.On datasets CUB-200-2011,Aircraft,Stanford Cars,and Stanford Dogs,the method achieves classification accuracies of 90.5%,93.9%,95.7%,and 93.1%,respectively.At the same time,on the datasets CUB-200-2011 and Stanford Cars,taking the 64-bit length hash code as an example,the proposed finegrained hashing method achieves 84.54% and 91.74% m AP respectively.In summary,the research work in this paper not only has a certain reference value in the theoretical level of computer vision,but also has considerable reference value for the design and development of robust and practical application systems. |