Font Size: a A A

Research On Multimodal Fusion Based Image Reranking

Posted on:2016-12-06Degree:MasterType:Thesis
Country:ChinaCandidate:G X ZhangFull Text:PDF
GTID:2308330461486266Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the great development of computer software and hardware and the emergence of a variety of mobile smart devices, e.g., smart mobile phone, digital camera, it becomes so easy for people to upload and share photos on the Internet and this has led to the explosive growth of multimedia content on the web. Duo to the great success of text document retrieval, so far, most popular image search systems still search image from the ocean of multimedia content on the Internet by the surrounding text associated with the images. However, the textual information of images usually contains much noise and the power of describing an image is too weak, so the search result of text based image search methods is unsatisfying. To address this problem, the idea of image search reranking has been put forward and has received considerable attention. Image search reranking is defined as reordering the ranked visual documents or images based on the initial search results or some auxiliary knowledge to improve search performance. Most existing reranking algorithms solely use one modality of an image, e.g., textual feature or visual feature, during reranking and the results from these methods are still unsatisfying. Some researchers try to fuse image multimodal to rerank the images from the initial search results and have shown promising results. However, they all neglect the fact of the correlation between multimodal representations on the reranking results, moreover, the visual features and textual features are the representations of the image from different views and their semantic meanings are correlated and they are essentially heterogeneous so that it is hard to measure the similarity between them directly. Motivated by those observations, In this paper, we propose two graph based image reranking approaches:Canonical Correlation Analysis Random Walk Reranking (CCA-RW)and Latent Semantic Sparse Hashing Random Walk Reranking(LSSH-RW).CCA-RW maps the heterogeneous data to a joint abstract feature space by linear projections, so we can measure the similarity between them directly and conveniently. Then, we construct an isomorphic complete graph to model the image set and compute the similarity matrix on this graph. After that, we employ random walk algorithm to rerank the images.In LSSH-RW, we consider that modeling the correlation between the multimodal of the image in latent semantic space and the high-level abstract representation of the image is benefit to the reranking process. Firstly, LSSH-RW uses Sparse Coding and Matrix Factorization to map the visual feature space and the textual feature space to two isomorphic latent semantic spaces, then maps these two isomorphic latent semantic space to a joint high-level abstract space, In this high-level abstract space, we can compute the similarity between any two feature nodes and get the similarity matrix. We employ random walk algorithm on the isomorphic complete graph to rerank the images.Comparing with several other algorithms, our experiment conducted on the dataset demonstrates the effectiveness of CCA-RW and LSSH-RW.
Keywords/Search Tags:image reranking, multimodal, canonical correlation analysis, latent semantic sparse hashing, random walk
PDF Full Text Request
Related items