Font Size: a A A

Metric Learning And Indexing For Large-Scale Image Retrieval

Posted on:2022-03-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:S C KanFull Text:PDF
GTID:1488306560490044Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the proliferation of mobile devices and surveillance equipment,the amount of image and video data has shown an explosive growth trend.How to quickly and accurately extract useful information from these ever-increasing massive data is the core problem of large-scale image retrieval,and also one of the hot issues in fields such as image processing and computer vision.In recent years,many studies have been conducted to greatly improve the performance of large-scale image retrieval by solving the problem of the “semantic gap” between low-level representation and high-level semantics of images,among which a typical one is to generate robust deep feature embedding by combining representation learning and metric learning.In practical applications,large-scale image retrieval techniques still face the following challenges due to the unity of metric learning and deep feature embedding,as well as the limitations of image indexing.First,traditional metric learning methods were mainly based on the single feature of the image,and the complementary information between multiple features cannot be learnt.Second,the image representation models cannot be updated with technologies of traditional metric learning.Although the existing deep feature embedding model can realize the end-to-end update of feature embedding and metric learning modules,it still has shortcomings in multi-feature processing.Finally,most of the deep metric learning methods are based on the data with category labels to design models,which require high cost of massive data annotation.However,the existing unsupervised deep metric learning methods cannot fully mine the semantic information of unlabeled data.In addition,the existing index model strongly relies on the test dataset,and the index structure needs to be updated when new category data is added,which has limitations in scenarios where data is constantly increasing.In order to solve the above problems,this dissertation proposed new methods in four aspects: kernel metric learning,supervised deep feature embedding,unsupervised deep metric learning,and image indexing.The main contributions of this dissertation include the following five points:1)A kernel metric learning for feature fusion model was proposed.Based on matrix partitioning,the kernel metric learning for feature fusion was derived as learning kernel metric models in their respective feature spaces and learning related kernel metric models in reciprocal spaces.In addition,triplets and label constraints were added to the optimization model based on the theories of Log Det decomposition and kernel extreme learning machine(ELM).Finally,the optimization model was solved based on the alternating direction multiplier method(ADMM).In the image retrieval experiments,the 4-Root HSV feature,the SURF-based vector of locally aggregated descriptors(VLAD),the Dense Net feature and the SENet feature were fused in pairs.The experimental results showed that the proposed multi-feature kernel metric learning method achieved the best performance in most feature fusion scenarios.2)A deep feature embedding with handcrafted feature model was proposed.Based on the idea of deep metric learning,handcrafted feature was incorporated into the deep learning model.First,the converter and merger technologies were proposed.Then,by combining the ideas of metric learning and classification,label and distance information were combined to form a class-metric loss function to train the model.Finally,a large number of experiments were carried out in applications of general image retrieval,person re-identification and vehicle re-identification.Experimental results indicated that the proposed deep feature embedding model can obtain better feature embedding.3)A graph neural network model of local semantic correlation for deep feature embedding was proposed.First,based on the K-nearest neighbor information of an image,a graph neural network was established in the feature space to characterize and predict the local correlation structure of the image.Then,an edge correlation prediction network and a node feature embedding network of the graph were built to learn the correlation information and correlation weighted feature embedding of neighboring images.In addition,similarities between features of mini-batch images and global training images were calculated based on the memory bank idea,and a metric loss function was constructed to train the network.Experimental results on fine-grained image retrieval application proved the effectiveness of the local semantic correlation graph neural network model.4)A relative order analysis and optimization model for unsupervised deep metric learning was proposed.First,high confidence similarity pairs were established based on the self-augmentations and images near the cluster center of the anchor image,and high confidence non-similarity pairs were built based on images of other categories that far away from the class of anchor image.Then,a relative order consistency loss and a metric order consistency loss were established to train the relative order prediction network and the feature embedding network in a cooperative manner.Finally,the predicted relative order was used to refine the retrieval results based on the distance between the features.Experimental results in image retrieval application showed that the relative order analysis and optimization model obtained the state-of-the-art performance in unsupervised scenarios.5)A model of zero-shot learning to index on semantic trees for scalable image retrieval was proposed.First,the semantic tree codings of the training images were built by using the features and category information of the training dataset.Then,the convolutional neural network was trained with the supervised information of index that established based on the encoded semantic tree over the training images.Finally,the trained model was used to predict the index information of the test image that has a different category from the training set,and the predicted index information is decoded back to the semantic tree to obtain the index by a decoding algorithm.Experimental results on the SOP and Image Net datasets indicated that the proposed model of zero-shot learning to index on semantic trees was superior to the existing mainstream methods and provided the advantage of scalability.
Keywords/Search Tags:Image retrieval, feature fusion, kernel metric learning, deep feature embedding, deep metric learning, unsupervised metric learning, learning to index
PDF Full Text Request
Related items