Font Size: a A A

Underdetermined source separation using speaker subspace models

Posted on:2010-01-07Degree:Ph.DType:Thesis
University:Columbia UniversityCandidate:Weiss, Ron JFull Text:PDF
GTID:2448390002475473Subject:Engineering
Abstract/Summary:
Sounds rarely occur in isolation. Despite this, significant effort has been dedicated to the design of computer audition systems, such as speech recognizers, that can only analyze isolated sound sources. In fact, there are a variety of applications in both human and computer audition for which it is desirable to understand more complex auditory scenes. In order to extend such systems to operate on mixtures of many sources, the ability to recover the source signals from the mixture is required. This process is known as source separation.;In this thesis we focus on the problem of underdetermined source separation where the number of sources is greater than the number of channels in the observed mixture. In the worst case, when the observations are derived from a single microphone, it is often necessary for a separation algorithm to utilize prior information about the sources present in the mixture to constrain possible source reconstructions. A common approach for separating such signals is based on the use of source-specific statistical models. In most cases this approach requires that significant training data be available to train models for the sources known in advance to be present in the mixed signal. We propose a speaker subspace model for source adaptation that alleviates this requirement.;We report a series of experiments on monaural mixtures of speech signals and demonstrate that the use of the proposed speaker subspace model can separate sources far better than the use of unadapted, source-independent models. The proposed method also outperforms other state of the art approaches when training data is not available for the exact speakers present in the mixed signal.;Finally, we describe an system for binaural speech separation that combines constraints based on interaural localization cues with constraints derived from source models. Although a simpler system based only on localization cues is sometimes able to adequately isolate sources, the incorporation of a source-independent model is shown to significantly improve performance. Further improvements are obtained by using the proposed speaker subspace model to adapt to match the sources present in the signal.
Keywords/Search Tags:Speaker subspace model, Source, Present
Related items