Font Size: a A A

Extraction Of Semantics In Films

Posted on:2009-12-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z C ZhaoFull Text:PDF
GTID:1118360278465438Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the technology development of computer, internet, telecommunication, and compression of video and audio, digital video is streaming into the common life. However, since the diversity of video content and the high-dimensionally spatiotemporal structure of video data, it becomes crucial for efficient organization, management, storing, rapid retrieval and browsing of video data, while the traditional data management and retrieval method become unsuccessful. As a result, the content-based video retrieval (CBVR) emerges.Recently, CBVR has made great achievements in many aspects, so the extraction of semantic information has become the research focus and several prototype systems based on semantic video retrieval appear. However, since some important problems remain unresolved in such field as the extraction of semantic object, the understanding of video content, etc., large-scale applications have not come true. As a result, in this paper, we propose some new frameworks and methods on the basis of some perceptive clues, film theory and cross-domain analysis. The main contents follow:The representation of visual content is the basis of CBVR. Since the static features such as color, texture etc. can only represent the internal characters of the image, but can not depict the temporal clue of the image sequence, an algorithm of global motion (GM) estimation is proposed in the compressed domain to dynamically describe the context of visual contents. The GM parameters are firstly extracted according to a 6-parameter motion model; and then a motion segmentation method is presented to segment and annotate videos according to GMs and the motion information is described by a feature-point sequence; finally, in order to validate the effectiveness of extracted motion feature, a video retrieval framework based on GM is proposed. The experiment results show that the algorithm could exactly segment videos into motion sub-segments, and a high precision of video retrieval could be obtained. At the same time, the query-by-keywords is realized based on Xquery engine.Shot boundary detection (SBD) is the basic task of structure analysis in CBVR. This paper develops a fast and high-performance SBD system based on three important factors: representation of visual content, construction of continuity feature signal, pattern classification and recognition. For each effecting factor, corresponding resolutions are proposed. For example, for the first problem, we concentrate on analyzing the tradeoff between the invariance and the sensitivity of various visual features; for the second problem, the context of feature signal should be taken into account; for the last problem, support vector machines (SVM) are used to detect the cut shot and the gradual shot. In addition, some independent detectors such as the edge detector, the motion detector are developed to improve the overall performance. According to the TRECVID 2007 SBD evaluation, our system achieves a satisfying result among 15 participators from the world.The extraction of semantic objects in videos is another difficulty of CBVR. An algorithm for selective extraction of visual saliency objects from color images and videos is presented in this paper. Color quantization based on vector quantization (VQ) is firstly performed; And then, quantized image is segmented according to its color and spatial distribution; thirdly, based on the visual attention model, focuses of attention (FOAs) of objects are selected; finally, in term of shifts of FOAs and the Gestalt principle, saliency objects are extracted by merging color, texture, boundary and homogeneity features. Experimental results on the Corel image database and the TREC videos demonstrate the effectiveness of the proposed approach after the subjective evaluation.In order to generate the video summarization for movie, we propose a computable film-structure model—the nine-plot (NP) model, which parses an entire film into three hierarchical semantic levels: act, NP, and scene. The model has been motivated by systemic analysis on "Hollywood mode" and generic narrative structure of a story. A set of modeling methods for film-making rules and grammars based on scene segmentation and classification are also proposed. As an important application of the model, a hierarchical video summarization framework, including static key-frame extraction and dynamic video skimming, is established by combining a perceptual attention model and an emotion model. To be concrete, firstly, an attention model is set up based on quantitating and integrating of visual-aural dramatic elements of film. Secondly, affective arousal is extracted to estimate the importance of scene so as to adaptively allot proportion extracted for video summary. The promising experimental results on 7 full-length Hollywood movies demonstrate the effectiveness and generality of our proposed framework.Semantic analysis and extraction of the video is a challenging problem. In a new viewpoint, we present a content understanding framework of the movie based on social network analysis and the film ontology. Firstly, we summarize the difficulties of semantic analysis, and find a latent resolution by using the social network analysis and constructing the film ontology to shorten the semantic gap of automatic content understanding of the movie. Secondly, a set of modeling algorithms are also set up, that is, we parse a full-length movie into a series of causal action events and dialogue events, and then a hierarchical high-level action event detection method is proposed according to temporal clue and context of basic events, and finally, a semantic graph of the dialogue is built up to summarize dialog events. At the same time, some important semantic information such as the social community and career classification are extracted. Two Hollywood action movies demonstrate the feasibility of our proposed framework, and several basic semantic elements of "who", "when", "where" and "what" could be obtained, which are always the most important work to catch a visual information.
Keywords/Search Tags:semantic extraction, structural analysis, video summarization, semantic object, video retrieval, attention model
PDF Full Text Request
Related items