Font Size: a A A

Contextual modeling of audio signals toward information retrieval

Posted on:2011-02-24Degree:Ph.DType:Dissertation
University:University of Southern CaliforniaCandidate:Kim, SamuelFull Text:PDF
GTID:1448390002959570Subject:Engineering
Abstract/Summary:
The main focus of this dissertation is on audio modeling and indexing toward audio information retrieval. In this regard, various novel methodologies are proposed in the direction of capturing audio context within a wide spectrum of audio contents; from well-structured music to unstructured environmental sound. This dissertation consists of two major parts depending on the types of audio contents: music information retrieval and general audio information retrieval.;In the first part, an efficient context-based music information retrieval method using music fingerprint is introduced. The music fingerprint is proposed to encapsulate musical context of a given music audio in a compact representation obtained directly from the music audio signal; it provides an efficient handle for music information retrieval in terms of both accuracy and computing requirements. The musically meaningful aspects considered in deriving this representation include harmonic structures and their temporal dynamic information (a.k.a. chord progression). Empirical results on various music information retrieval tasks, such as opus identification, composer identification and semantic description annotation show that the proposed music fingerprint is competitive to the state-of-the-art systems in terms of accuracy and computing power requirements.;In the second part, a new contextual modeling algorithm for general audio information retrieval is introduced. Assuming that hidden acoustic topics exist and they represent the context of an audio clip, we proposed a latent acoustic topic model that learns a probability distribution over a set of hidden topics of a given audio clip in an unsupervised manner. We use the latent Dirichlet allocation (LDA) method to implement the latent acoustic topic model and introduce the notion of acoustic words to support modeling within this framework. The proposed audio information retrieval system also aims to provide users with flexibility in formulating their retrieval queries using naive text as well as pre-determined categories or audio examples. To mitigate interoperability issues between the annotation and retrieval processes inherent in text descriptions, we propose an intermediate audio description layer (iADL) spanned by onomatopoeic and semantic labels in conjunction with context-based text transformation methods that map na¨yve descriptions onto the proposed iADL.
Keywords/Search Tags:Information retrieval, Audio, Modeling, Text, Proposed
Related items