Font Size: a A A

Video Fusion Analyzing And Semantic Understanding

Posted on:2007-11-11Degree:MasterType:Thesis
Country:ChinaCandidate:H YangFull Text:PDF
GTID:2178360182466705Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The explosive growth of video data in large-scale information repositories such as digital library and World Wide Web poses new challenges to conventional video analysis and information retrieval technologies. A typical information repository features a huge amount of heterogeneous data and a large number of non-expert users. Therefore, the video retrieval tool that supports semantic-level queries is highly desirable in the context of repository, which is however beyond the capability of the up-to-date video retrieval technologies. The work presented in this paper extends the conventional video analysis and retrieval research by proposing two key techniques for digital library: the video multimodal fusion analysis and the video automatic annotation and retrieval. These techniques provide critical building blocks for video analysis and retrieval facilities in digital libraries or similar information repositories.We present an overview of the background, motivation, and basic approaches of our research in the beginning of this paper.In Chapter 2, we present a review of the fundamental research on video structure and shot boundary detection. We also introduce and discuss the most advanced research on multimodal fusion analysis, automatic annotation and semantic-level retrieval in the area of video, including the basic approaches, related techniques and typical systems.A novel algorithm for video multimodal fusion analysis based on the Maximum Entropy model is proposed in Chapter 3. Video contains rich semantic information which can be represented by multimodal features including text transcripts, visual/audio features. Here, the Maximum Entropy model is implemented based on the multimodal features, and is used for video semantic understanding and story segmentation.Chapter 4 describes a comprehensive mechanism for video automatic annotation. People have traditionally used manual annotation for linguistic indexing to support semantic-based video management and retrieval. However, this approach is liable to subjective and requires a huge amount of human effort, especially for large video collections. Our automatic annotation method with temporal information employs the corresponding transcripts obtained from the broadcast news, which can make a deeper semantic understanding and improve the search results.In Chapter 5, we propose a video analysis and retrieval system based on our previous work, which includes the off-line video fusion analysis system and the on-line video semantic retrieval system. The whole application has been tested in the digital library, and the result shows that the proposed solution can effectively improve the efficiency of video retrieval.We conclude the whole paper in Chapter 6, with a brief discussion of the application prospect and future research directions.
Keywords/Search Tags:multimedia, video, multimodal, fusion analysis, semantic annotation, statistical learning, information retrieval, digital library
PDF Full Text Request
Related items