Font Size: a A A

Adaptive and multimodal approach to multimedia content analysis

Posted on:2002-01-08Degree:Ph.DType:Thesis
University:Polytechnic UniversityCandidate:Liu, ZhuFull Text:PDF
GTID:2468390011499161Subject:Engineering
Abstract/Summary:PDF Full Text Request
The volume of multimedia data generated nowadays is exploding. To efficiently access and retrieve desired information, tools that enable automated analysis based on content are becoming indispensable. Multimedia content is defined at both perceptual and conceptual levels. The former refers to the content characterized purely by intrinsic perception properties such as color, motion, or acoustic features. The latter refers to the content that is specified based on concepts or semantics such as sunset, anchors, or news headline stories. At both levels, the content is embedded in multiple forms that are usually complimentary to each other. The main objective of this thesis is to adaptively analyze the multimedia content by integrating cues from multiple modalities, including audio, video, and text, mainly in the scope of news broadcast.; At the perceptual level, news broadcast data is segmented and classified into different video events such as news reporting and commercials. Audio and visual features are developed and integrated, aiming at discriminating different events effectively. Various classification mechanisms, including linear fuzzy threshold, maximum likelihood using Gaussian Mixture Model and Hidden Markov Model, Neural Network, as well as Support Vector Machine, are benchmarked.; At the conceptual level, algorithms and demonstration systems for three applications are developed. In News Broadcast Browsing System, recovering and presentation of the embedded hierarchy structure of news broadcast are addressed. Important semantic objects such as hosting characters and headline news stories are adaptively extracted using the audio/visual models that are bootstrapped from on-line data. The problem of efficient search and retrieval of segmented multimedia objects based on audio is discussed in Query-by-example in Audio System. A distance metric framework is proposed to determine the difference of mixture type Probability Density Functions, and is applied in measuring the dissimilarity of audio segments based on their model parameters. In Major Cast Detection System, we developed an algorithm to detect the major casts in video, for example, anchor persons in news broadcasts and major characters in movies. The algorithm integrates both speaker and face information and constructs a ranked list of major casts based on their temporal and spacial presence.
Keywords/Search Tags:Multimedia, News broadcast, Major
PDF Full Text Request
Related items