Font Size: a A A

A content-adaptive analysis and representation framework for summarization using audio cues

Posted on:2006-04-06Degree:Ph.DType:Dissertation
University:Polytechnic UniversityCandidate:Radhakrishnan, RegunathanFull Text:PDF
GTID:1458390008957714Subject:Engineering
Abstract/Summary:
We propose a content-adaptive analysis and representation framework that postpones the use of content-specific processing to a stage as late as possible. We propose an inlier/outlier based representation based on audio analysis for this task. It is based on the key observation that the audio features in the vicinity of "interesting" events are outliers in a background "uninteresting" events.; The analysis framework to support such an inlier/outlier based representation is based on detecting outlier subsequences from a time series of audio features or semantic audio labels. Using a sliding window, we sample the whole time series and estimate statistical models for the usual "uninteresting" background. We construct an affinity/kernel matrix by computing pairwise distances between the estimated statistical models. Then, using a graph theoretic approach for grouping, we detect outlier subsequences which cause the corresponding statistical models in their times of occurrence to be different from other estimates of the dominant background. We also rank the detected outliers based on how deviant it is from the background. Once we detect all subsequences that are outliers from a background, then we bring in domain knowledge or content-specific processing to pick out a subset of outliers that are correlated with "interesting" events for that domain or content genre. Such a framework also helps in the choice of key audio classes in a data driven way instead of relying on intuition.; We apply the proposed framework to consumer video browsing. For sports content, we show that commercials and highlight events are among the outliers in sports audio and can be effectively extracted using such an analysis and representation framework. We also show that the key highlight audio class obtained systematically through the outlier detection procedure outperforms the cheering audio class (chosen based on intuition) for sports highlights extraction. For situation comedy video, we detect scene transitions and laughter tracks successfully based on the outlier detection framework. The proposed framework detects suspicious events from elevator surveillance audio as outliers effectively. Finally, we show that key audio classes that are correlated with events of interest can be systematically acquired using the proposed framework.
Keywords/Search Tags:Framework, Audio, Using, Events, Key
Related items