Font Size: a A A

Study On Multimodal Image Reranking Algorithm

Posted on:2020-01-14Degree:MasterType:Thesis
Country:ChinaCandidate:T J WangFull Text:PDF
GTID:2428330578957085Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of mobile devices,Internet and storage technology,digital images are becoming more and more accessible.Millions of images are uploaded to social platforms every day,and how to quickly and accurately find the pictures that users need in massive data,which becomes an urgent problem to solve for image retrieval.Generally,there are two kinds of image retrieval query methods:text-based query and content-based query.The text-based query matches the similarity between the keyword and the tag entered by the user.The content-based query requires the user to input a picture and then search according to the visual similarity of the image.At present,a mainstream query method is a text content hybrid search method,that is,a preliminary query result is obtained through a text query,and then the preliminary results are reordered according to the visual features of the image.In the industry,this method is called Image Reranking.At present,there are two problems in Image Reanking:firstly,the initial result is used as a supervised signal of reranking,and its quality also affects the effect after reranking.However,due to the low quality of the text in the current image data set.Many texts contain a lot of noise,which leads to a decline in the quality of the initial result.Secondly,the existing retrieval methods for multi-modal fosion don't take into account the adaptive adjustment of multi-modal features.And the problem of sparseness is not considered when the number of features increases.In view of the above problems,this paper proposes an algorithm for multi-modal image reordering.The main work is as follows:(1)Tag refinement.For the problem of more noise in text data,this paper uses tag refinement technology to solve,including tag denoising and supplementary label.The neighbor voting method is used to denoise the label.The main basis is:most of tag datasets of images which are visually similar overlap.Some tags may contain an empty tag set after denoising tags.This will result in a lot of rich information loss.The solution is to use a label based on the neighbor image to supplement the label.The efficiency of the initial results after tag refinement was verified by experimental comparison.(2)Adaptive fusion of multi-modal features.The visual features of multiple modalities are integrated into a unified framework,each modality is assigned a weight,and the weight of the model is adaptively adjusted by solving the model.The model adds a weight constraint regular term,which can be expressed in two ways:L1 norm and L2 norm.When the model of L2 norm is solved,the idea of SMO algorithm in support vector machine is adopted.Compared with traditional quadratic programming,the calculation speed is increased and the efficiency is improved.In addition,when the model uses many features,it does not guarantee that each feature is effective,and feature selection is needed.Therefore,this paper also proposes a method of combining multimodality with sparse learning,which uses the L1 norm to achieve the purpose of sparseization.Through a large number of comparative experiments,the results show that the L1 norm can achieve the purpose of feature selection compared to other methods.
Keywords/Search Tags:Image retrieval, Visual reranking, Multimodal, Sparse, Tag refinement
PDF Full Text Request
Related items