Font Size: a A A

Multi-Modal Image Retrieval

Posted on:2014-02-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:1228330395994952Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
As the fast development of the Internet and the mobile network, it is possible to get information from the Internet anywhere. And usually, search engine is the most important way of accessing the information on the Internet. Traditional search process is always launched by initiating a text query, and also results in all of text information. However, the multimedia content, such as images, videos, and sounds, are dramatically increasing on the Internet and are overwhelming the text information. Traditional text-based search methods are able to retrieve multimedia via text related to the multimedia, such as tags and surrounding texts on the webpages, instead of the content of the multimedia itself. Recently, the technique breakthroughs in content-based image retrieval have changed this situation. Aiming at finding the duplicate images largely existed on the Internet, researchers developed image feature matching and indexing schemes so that one can search images with images. But the application of this image-to-image search also limits the user to initiate a search according to his/her own mind. Conveying the complex user intent is still the issue to solve.On the other hand, the prevalence of mobile devices such as tablets and mobile phones are gradually taking the role of the PCs as the Internet accesses. Such mobile devices always provide a handful of interactive sensors with them, such as mic, cameras, touch screens, and etc. This great opportunity opens a door to convey user intent interactively. However, we are still facing the traditional text-based search engine and webpages to search as on a PC, without making full use of the functions provided by the mobile devices. In this thesis, we are going to dig these advantages of the mobile devices and develop a multimodal mobile visual search system combining voice/text and visual information.This thesis focuses on visual based image search method, especially on multimodal query formulation, feature structure, and video indexing. The main subjects and novelty may be summarized as follows:(1) This thesis proposes a multimodal and interactive query formulation. Based on the multimodal query, we designed a visual search system with user interface and visual search algorithm. In the search system, the user first issues a voice query describing what the image that the user want is like. After analyzing the voice and extracting the key words, the system will offer the user series of exemplary images based on these key words, from which the user can choose some and compose a composite visual query by placing and resizing them on a blank canvas. The system then re-ranks the text-based search results according to the visual query and return to the user with the images that visually match the visual query. Finally, the proposed position-based visual matching scheme renders the system viable.(2) This thesis proposes a method to search in large-scale images by positioned multi-exemplar visual query. The method makes use of a region-based indexing scheme and relative position checking. For large-scale images, searching by a positioned collage need to take care of both the existence and position of each component. Our proposed method avoids the absolute position matching scheme widely used in existing works but brings about a relative position checking method based on position estimation during similarity calculation. A two-step strategy is used: first making sure the existence of each component and then checking the relative position of each pair of the targets. The experiments show the obvious advantage of our proposed method over the existing methods. Moreover, using the multi-exemplar visual search, we can also break one single image into many exemplars to perform similar image search.(3) This thesis also proposes a method to compactly index video data. By extracting several virtual frames from the original video, which best cover the video content, we can change the video search into image search problem. First we extract local features from the video. Then we break the video into several separate shots by feature clustering. From each cluster, we can generate a virtual frame that best represents the shot, by maintaining the stable features and discarding unstable ones. Experiments show the advantage of virtual-frame method over the keyframe method in its compactness and representativeness.To summarize, in the context of mobile visual search, this thesis discusses input modality, image feature, structure matching, and video indexing. By this thesis, we propose new perspective of mobile visual search and new methodologies. Experiments demonstrate their viability and effectiveness.
Keywords/Search Tags:image retrival, multi-modal input, multi-exemplary-based search, relative position checking, similar image search, virtual frame
PDF Full Text Request
Related items