Font Size: a A A

Diversity Induced Image Retrieval Algorithms With Multi-modal Information Fusion

Posted on:2020-11-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:B YuanFull Text:PDF
GTID:1368330602950302Subject:Intelligent information processing
Abstract/Summary:PDF Full Text Request
Image retrieval on the internet is always the focus of researches in the multimedia domain.Plenty of research work aims at improving users' retrieval efficiency,meanwhile,tries to provide users with highly matched results fast and accurately.However,recent researches indicate that users only have ambiguous idea on their requirements during retrieval,which shows search engines should provide extensive choosable contents to guide them toward requirements.Moreover,highly developed internet technologies ensure lots of information available.This information containing descriptions,relationships and so on is called multimodal information,which brings opportunity and challenge for image retrieval.Diversity based retrieval is able to summarize multi-modal information,mine the characters of entities and analyze the multiple aspects of a query to help users fast target demands by illustrating the overview.Most of the existing diversity based algorithms focus on the relationship of the low-level features,which will fail to process images in natural scene when the semantic gap is serious.How to design an efficient retrieval algorithm to model the multiple aspects of queries needs further investigation.Besides,the development of multimedia technologies and newly emerging data formats require the emergence of new proposals.For this purpose,this dissertation is dedicated to diversity based image retrieval.Targeting on different types of multimedia data,it designs the algorithms suitable with general images,multi-scale images,irrelevant information surrounding images and multi-modality mixed images according to different scenes.It covers the issues of feature extraction,multi-modal information fusion,continuous information modeling and joint inference of multi-modal data.The main contributions are summarized as follows:Aiming at the issue of the introduced noise in the process of textual features extraction,we propose an algorithm based on discriminative features and multi-modality fusion.In the textual labels of images,there exists the wrongly and repeatedly marked information,which will make negative effect on retrieval performance.Thus,a diversity-based textual feature extraction method is proposed in this thesis.At first,a proper threshold is learnt by maximizing the inter-class cost function on the labels of images in training set.Then,according to the threshold,the textual information is extracted and separated into common words and unique words.Next,they are transformed into textual features by words occurrence matrix of social media.At last,the images in test set are assigned into topics.In each topic,the adjacent graph of their textual information is constructed to find and remove the ones far from other clusters to improve its quality.The experimental results show that our proposed feature extraction method can extract the diversity-based features effectively,which improve the diversity score of retrieval results.Aiming at the issue of variable scales of objects and multi-features fusion,a new retrieval algorithm based on regions analysis is proposed in this thesis.Facing up with the problems,traditional methods are usually unable to construct the relationship between images accurately.Besides,the way of weighted summing of multi-modal features also cannot make the information of different modal to play their best roles.For these reasons,a new retrieval framework based on higher semantic meaning and diversified textual features is proposed.At first,the images are separated into regions in which the local features are extracted and quantized into visual words.Then the latent dirichlet allocation model is constructed by using regions as documents and using visual words as words.This operation aims to assign images into different topics.At last,in each topic,we propose a three-stage strategy based on textual features to improve the coherence of topics.A large number of comparisons indicate the proposed framework can fuse visual and textual features more effectively and achieve good performance in three mainstream evaluation criteria.Aiming at the issue of quantization error in modeling process,a retrieval algorithm modeling the integral visual information is proposed in this thesis.In the modeling process,traditional topic model needs to quantize the continuous visual features into discrete visual words,which will lead to information loss.To solve this problem,Gaussian distribution based image retrieval framework is proposed to directly model the continuous visual features.Firstly,local visual features need to be extracted,and the means and deviations of Gaussian distributions are initialized according to the dimension of the features.Then,Gaussian latent dirichlet model is constructed to generate topic features for regions.Finally,the proposed dual-spectral clustering algorithm is applied to transform features into image groups.The experimental results show that the proposed method achieves better performance in the challenging landmarks with sophisticated background.Aiming at the problems of traditional topic models in new multimedia environment,such as they are disable to infer a higher semantic meaning and hard to combine multi-modal information,we propose a semantic model of multiple topics for image retrieval.Traditional topic models only assign topics for visual points,while images are represented by the histograms on topics.Actually,each form of the histograms corresponds to a definite higher-level topic(a topic for an image).So a new topic model is proposed that using interactive data and visual features to infer higher-level topic for an image.Firstly,the relationship matrix of two level topics are initialized by dirichlet prior parameters.Then,a pre-learnt parameter is used to fuse metadata into interactive scores.Finally,the high-level topic of an image and the topics of visual points are calculated simultaneously by iteration of a two-layer model.The experimental results in an universal database show that our method can improve the retrieval performance by using metadata and acquire the topic of an image exactly.In conclusion,four different algorithms are proposed in this thesis from the aspects of diversity-based feature extraction,multi-information fusion,modeling on continuous visual information and the unified modeling on textual-visual features.Large amount of experimental results and comparisons indicate that the effectiveness of our method.
Keywords/Search Tags:image retrieval, topic model, diversity, interactive information, spectral clustering
PDF Full Text Request
Related items