Font Size: a A A

Content Based Multimodal Video Retrieval

Posted on:2009-11-01Degree:MasterType:Thesis
Country:ChinaCandidate:Z LuFull Text:PDF
GTID:2178360242476848Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Nowadays, with the popularization of personal digital video equipment, the extension of memories'capability, and the improvement of network condition, more and more people are wild about capturing those interesting videos around them, sharing the videos in the Internet with others and searching the videos that they are interested in. In film and multimedia production industries, it is considerable time-consuming and laborious to process a tremendous number of video data normally. In military affairs, reconnaissance is essential. It is crucial to extract the useful content from a mass of videos recorded by scouts and UAVs.Although the video retrieval applications come into being, the prevailing video search engines, such as YouTube and Yahoo! Video, are text-based retrieval essentially, and here the texts are extracted normally from some manual annotations, such as the title and tags of the video, or from the textual content surrounding the videos or images instead of the video content. This kind of method processes is quite similar to traditional document search. It is a great challenge and attractive to retrieve videos by utilizing only audio and visual information without using those outside information, because it is an interdisciplinary research topic covering image processing, speech recognition, information retrieval, machine learning, pattern recognition, etc. In order to promote the progress in content-based video retrieval via open, metrics-based evaluation, TRECVID, which is funded by the Disruptive Technology Office (DTO) and the National Institute of Standards and Technology (NIST), has been running a TREC-style video retrieval evaluation since 2003. Over time, many efforts of TRECVID and other research parties have been invested to yield a better understanding of how systems can effectively accomplish such retrieval and how one can reliably benchmark their performance. This thesis work is a part of the submission of TRECVID2007 jointly implemented by MSRA and SJTU. In this thesis, the contributions in three aspects are highlighted as follows.Firstly, in text based retrieval, different text combination strategies are proposed to different kinds of language in videos, which overcoming the decline of recall due to the smallness of text combination number and the decline of precision due to the bigness of text combination number. On the other hand, formula BM25 which works well in traditional text retrieval is improved. Considering that in traditional text retrieval, the query words must appear in the document when they are retrieved in it. But in video retrieval, query words retrieved in speech texts have different probabilities to appear in the videos depending on different parts of speech. So keyword weights are introduced in formula BM25 by different parts of speech to improve text based retrieval.Secondly, in the fusion of text based retrieval and concept based retrieval, a kind of query dependent classified linear fusion is proposed, which assigns different weights to multi-modals depending on different kinds of topics. This method is no less accurate than nonlinear fusion, easy to be implemented, having small computation and can be used in real-time system. Moreover, considering the similarity of the positive returned results in vision, clustering by K-means is proposed to re-rank the fusion results, which further improves the final results.Lastly, in concept based retrieval, both the method used in this thesis and the prevailing methods of concept detection are mostly image level, namely those low-level features used are global. When the image is complicated and many concepts are contained, the features of different concepts will affect each other, making concept detection worse. At last in this thesis, segmentation based concept detection is assumed, and under the existing segmentation algorithms and lots of image segmentation results, segmentation based concept detection is analyzed and discussed. It is proposed that segmentation in detail is not helpful to concept detection and the approximate segmentation is helpful, which directs my future work.
Keywords/Search Tags:video retrieval, content-based retrieval, text retrieval, concept detection, image segmentation, fusion, re-ranking
PDF Full Text Request
Related items