Font Size: a A A

Research On Multi-modal Web Image Retrieval

Posted on:2008-07-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:R H HeFull Text:PDF
GTID:1118360272966681Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Text-Based Image Retrieval (TBIR) is the main technology in the current commercial image search engine, which depends on the text only to indirectly retrieve Web images. In contrast, Content-Based Image Retrieval (CBIR) has recently received a great deal of interest in the research community, the major challenge of which is the"semantic gap"problem, i.e. the gap between the low-level visual features (such as color, texture, shape etc.) and the high-level semantic concepts (such as mountain, monkey, flower etc.).The Web images in the Web context have multi-modal characteristics obviously. In this paper, a hierarchy, fine-grained Web image model and corresponding retrieval model are built based on the multi-modal attributes of Web images, which integrate the techniques of TBIR and CBIR to implement the multi-modal retrieval of Web images. The key idea is to simultaneously leverage all types of data which are related to Web image (mainly including the textual features, visual features and hyperlink information), explore their mutual reinforcement, and construct the associations between keywords and visual features to bridge the"semantic gap"and increase the retrieval performance. Based on the retrieval model, three approaches are proposed from different prospectives and scenarios to realize the multi-modal retrieval.First, a Multi-Relationship Based Relevance Feedback (MRBRF) scheme is proposed to fully utilize the multi-modal attribute of Web image. This approach takes the user to the center and mainly is applied in the scenario of multi-example feedback. It is based on the traditional relevance feedback scheme, and extends and seamlessly integrates the Manifold Ranking Algorithm (MRA) and Similarity Propagation Algorithm (SPA). The approach simultaneously utilizes the intrinsic global manifold structure of Web images in both textual feature space and visual feature space, and further implements the mutual reinforcement between the two feature spaces by the hyperlinks. Therefore, the approach realizes a non-linear combination of textual feature and visual feature. This approach makes full use of the multi-modal characteristics and multiple relationships of Web images in the iterations of interactive relevance feedback. Second, a new automatic combination method based on the cross-modal association rule is proposed to avoid the"lazy user"problem in Web context. This approach is suitable for most of the common Web users, which is used in the scenario of automatic feedback. It starts from the inverted file, and explores the cross-modal association between the keyword and several visual feature clusters by Frequent Itemset Mining (FIM) and Association Rule (AR) techniques. Therefore, the approach realizes the sequential multi-modal retrieval automatically, without user's feedback. This approach avoids the"lazy user"problem in the Web context, and inherits the good extensibility of data mining technique.Finally, an approach that integrates the semantic network and long-term relevance feedback learning is proposed to balance the user's relevance feedback and"lazy user"problem. This approach integrates the automatic feedback and long-term relevance feedback learning together. The main idea is to fully utilize but not depend on the user's feedback. Therefore, it not only avoids the"lazy user"problem, but also fully uses the feedback information of the previous users. The approach builds a semantic network between keyword and visual feature clusters based on the inverted file, and updates the semantic network by the long-term relevance feedback learning. This approach has good dynamiticity, and considers the characteristics of both most of the common Web users (which have the"Lazy User"problem) and a few advanced users (which use the Relevance Feedback).The above three approaches are tested in our VAST (VisuAl & SemanTic image search) system, and are compared with other methods. The experimental results show that the proposed approaches implement the multi-modal retrieval from different prospectives and adapt to different Web users. These approaches really benefit to break through the limitation of TBIR and CBIR, alleviate the"semantic gap"and increase the retrieval performance.
Keywords/Search Tags:Web Image Retrieval, Multi-Modal, Relevance Feedback (RF), Association Rule, Semantic Network, Long-Term Relevance Feedback Learning (LT-RF)
PDF Full Text Request
Related items