Font Size: a A A

Multi-view Video Content Analysis And Summarization

Posted on:2012-07-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y W FuFull Text:PDF
GTID:2178330335463275Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of computation, communication, and storage infrastructures, multi-view video systems that simultaneously capture a group of videos and record the video content of the occurrence of events with considerable overlapping field of views (FOVs) across multiple cameras have become more and more popular. In contrast to the rapid development of video collection and storage techniques, consuming these multi-view videos still remains a problem. For instance, watching a large number of videos to grasp important information quickly is a big challenge. The problems involved in multi-view video content analysis and summarization are discussed in this thesis, and two creative algorithms are proposed.First, it is a critical technique that how to effectively and efficiently analyze the multi-view video content, especially the foreground detection of video objects. We propose an interactive and sampling-based object segmentation method for multi-view video and key-frame image sets, which is named content-sensitive collection snapping. Interactive segmentation methods have greatly simplified the task of object cutout from an image. However, segmenting a large number of images in multi-view videos is still a tedious task. With our method, a user only provides a small number of strokes to segment and refine a few sampled key-frame images. For each of the rest of frames in videos, our method finds relevant sample images and applies the corresponding appearance models to guide the segmentation. To improve the segmentation results using the user strokes on a few images with unsatisfactory segmentation results, our method calculates the relevance map that measures the probability that a stroke can be appropriately applied at each pixel/region of an image, and applies it accordingly. Our experiments show that our method can effectively segment a multi-view video and key-frame image set with a wide variety of contents and significantly reduce user inputs.Second, it is challenging how to effectively represent, extract, and summarize the critical infor-mation embedded in the contents of multi-view video. For the first time we present the multi-view video summarization method. Video summarization, as an important video content service, pro-duces a condensed and succinct representation of video content, which facilitates the browsing, retrieval, and storage of the original videos. Based on a good content analysis of multi-view videos, the users also need an excellent summary of multi-view videos in order to rapidly grasp the ma-jor contents of multi-view videos. Unfortunately, previous video summarization studies focused on monocular videos, and the results would not be good enough if they were applied to multi-view videos directly, due to problems such as the redundancy in multiple views. We construct a spatio-temporal shot graph and formulate the summarization problem as a graph labeling task. The spatio-temporal shot graph is derived from a hypergraph, which encodes the correlations with different attributes among multi-view video shots in hyperedges. We then partition the shot graph and identify clusters of event-centered shots with similar contents via random walks. The summa-rization result is generated through solving a multi-objective optimization problem based on shot importance evaluated using a Gaussian entropy fusion scheme. Different summarization objectives, such as minimum summary length and maximum information coverage, can be accomplished in the framework. Moreover, multi-level summarization can be achieved easily by configuring the optimization parameters.We also propose the multi-view storyboard and event board for presenting multi-view sum-maries. The storyboard naturally reflects correlations among multi-view summarized shots that describe the same important event. The event-board serially assembles event-centered multi-view shots in temporal order. Single video summary which facilitates quick browsing of the summarized multi-view video can be easily generated based on the event board representation.To sum up, the major contributions in this thesis are as follows.Collection snapping, our method can effectively segment a multi-view video and key-frame image set with a wide variety of image contents and significantly reduce user inputs.Multi-view video summarization,1. Spatio-temporal shot graph is used for the representation of multi-view videos. Such a representation makes the multi-view summarization problem tractable in the light of graph theory. The shot graph is derived from a hypergraph which embeds different correlations among video shots within each view as well as across multiple views.2. Random walks are used to cluster the event-centered shot clusters, and the final summary is generated by multi-objective optimization. The multi-objective optimization can be flexibly configured to meet different summarization requirements. Additionally, multi-level summaries can be achieved easily through setting different parameters. In contrast, most previous methods can only summarize the videos from a specific perspective on the summaries. 3. Multi-view video storyboard and the event-board are presented for representing multi-view video summary. The storyboard naturally reflects correlations among multi-view summa-rized shots that describe the same important event. The event-board serially assembles event-centered multi-view shots in temporal order.
Keywords/Search Tags:multi-view video, random walks, spatio-temporal shot graph, video summarization, collection segmentation
PDF Full Text Request
Related items