Font Size: a A A

Vocal Separation Based On Time-Frequency Analysis

Posted on:2010-03-28Degree:MasterType:Thesis
Country:ChinaCandidate:X Q XieFull Text:PDF
GTID:2178360278472440Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
As the demand for automatic analyzing, organizing, and retrieving a vast amount of online music data explodes, musical sound separation has attracted significant attention in recent years. Monaural separation that attempts to recover each source/instrument line from single-channel polyphonic music is a particularly challenging problem. We will separate the vocal from single-channel polyphonic music, and obtain good separated result. Broadly speaking, existed monaural musical sound separation systems are either based on traditional signal processing techniques (mainly sinusoidal modeling), statistical techniques (such as sparse coding and nonnegative matrix factorization), or psychoacoustic studies (computational auditory scene analysis, CASA).Time-Frequency (T-F) analysis is very effective to research musical signal which is a typical non-stationary signal. T-F analysis method is an important ramification of non-stationary signal processing. It employs the joint function of time and frequency to represent, analyze and process the non-stationary signal. We can classify the T-F analysis methods to linear and nonlinear representations according to the joint function. The linear analysis includes short-time Fourier transform (STFT), Gabor transform and wavelet transform. The nonlinear analysis method contains Wigner-Ville distribution and Cohen's class. Furthermore, auditory filter has become an important T-F analysis technique.Analyzing an auditory scene and identifying the various sounds present in it has been the primary focus of the research called CASA. We design the vocal separation system based on T-F analysis drawing inspiration from the CASA. The system consists of T-F decomposition, predominant pitch detection, extraction of vocal T-F information and synthesis of vocal. Because STFT and Gammatone filter are used to decompose signal in T-F decomposition stage, we design two different separation methods. In the vocal separation method based on STFT, the time domain signal is transformed into time-frequency domains using STFT. So the spectrum varies with time after processing. But the system based on Gammatone filter uses a Gammatone filterbank to decompose the original signal into many time domain signals with different frequency bands, then each filtered output are divided into overlapping frames. The predominant pitch detection stage is consistent in the tow different separation method. Although some methods are used to detect pitch, it is very difficult to detect the pitch of vocal where the musical signals are polyphonic. We extract pitch of vocal employing the harmonic characteristic of music. In the third stage, T-F information of vocal is extracted. The STFT system extracts the harmonics in spectrum of each frame according to detected predominant pitch. In the second separation method, the correlogram, cross-channel correlation and onset detection features are computed besides predominant pitch. In the last stage, the vocal is synthesized. Inverse transform of extracted STFT of vocal is computed in the STFT method. Vocal is synthesized by adding all channels in the Gammatone filter method.
Keywords/Search Tags:Time-Frequency analysis, vocal separation, predominant pitch detection, auditory filter, computational auditory scene analysis
PDF Full Text Request
Related items