Font Size: a A A

Stereo music source separation via Bayesian modeling

Posted on:2007-03-22Degree:Ph.DType:Dissertation
University:Stanford UniversityCandidate:Master, Aaron StevenFull Text:PDF
GTID:1448390005463074Subject:Music
Abstract/Summary:
It is often useful to be able to separate out the musical sources on a stereo recording. It allows the end user to easily remix and transcribe sources and to perform karaoke. It also allows single-source technical solutions to be applied to previously mixed music: speech or voice recognition, higher-accuracy pitch detection, and source-specific efficient audio coding. We presently consider separating sources mixed in the stereo (two channel) format, common in commercial recordings. This may be viewed as a special case of blind source separation where the mixtures are generally underdetermined (because there are in general more than two sources), yet information about the sources and their mixing is available. Specifically, we often know that sources---voices and instruments---are amplitude panned between the left and right channels, that they are only active at certain points in time, and that they have certain general loudness and spectral characteristics. In addressing such cases, we propose a short-time Fourier transform (STFT) domain Bayesian system that considers at each input point the frequency-dependent observed panning, phase offset between channels, and combined loudness of input. Based on these observations and training data, it computes "expected median value" estimates of the sources. The source combination modeling is significant because it considers frequency and loudness information, and because it allows the separation system to choose any number of active sources for a given set of input parameters, rather than just two. We position this system in a newly proposed framework that describes existing and proposed demixing as possibly nonlinear beamforming. This unifying framework is helpful because it allows us to visualize and understand how various stereo source separation systems relate to each other. It also allows us to break apart the separation system into components that attenuate magnitude, control panning, and select phase. We use this fact to build a karaoke system that preserves stereo imaging. We also demonstrate demixing superior to that of other systems on synthetic examples, using newly proposed psychoacoustic metrics.
Keywords/Search Tags:Stereo, Source, Allows, System
Related items