Font Size: a A A

Research On Time-Frequency Analysis Based Music Identification And Singing Separation Algorithms

Posted on:2015-03-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:B L ZhuFull Text:PDF
GTID:1108330464455351Subject:Computer application technology
Abstract/Summary:Request the full-text of this thesis
With the popularity of Internet and the fast development of multimedia technologies, the amount of music data on the web increases dramatically, and at the same time more and more users with different demand of music information are using music applications. In this case, the problems of how to organize and manage the massive music data in a proper way, and how to extract various information from music become important.Music Information Retrieval (MIR) is a research field that aims at solving the above problems. In this thesis, we focus on two important MIR tasks, i.e., music identification and monaural singing voice separation. We propose one and two new algorithms for these two tasks respectively. All these three algorithms are based on time-frequency analysis, and they all include first decomposing the input music signal into a time-frequency representation, and then analyzing the music in both the time and frequency domains simultaneously.Firstly, to address the robustness problem of music identification against time stretching and pitch shifting, we propose a novel music identification algorithm based on the Scale Invariant Feature Transform (SIFT) feature extracted from spectrogram image. This algorithm is inspired by our observation that time stretching and pitch shifting of an audio signal can be described respectively as the time-axis stretching and frequency-axis translation of the corresponding log-frequency spectrogram. Since SIFT is an invariant feature against image stretching and translation, the SIFT feature extracted from spectrogram image also has robustness to audio time stretching and pitch shifting.Secondly, to solve the singing voice separation problem, we propose a novel algo-rithm with two stages of spectrogram factorization. In the two stages of this algorithm, we construct a long-window and a short-window spectrogram for the input song respec-tively, and perform Non-Negative Matrix Factorization (NMF) on the spectrogram. A spectral discontinuity thresholding method is devised for the long-window NMF to se-lect out components of pitched instruments, and a temporal discontinuity thresholding method is designed for the short-window NMF to select out components of percussive instruments. After eliminating the selected components in each stage, the pitched and percussive elements of music accompaniment are filtered out from the song, leaving out the singing voice.In addition to the above method, we also propose an extension to the traditional pitch-based singing voice separation algorithm. In our extension, we apply NMF to de-compose the time-frequency representation of input song into a set of non-overlapping time-frequency segments, each segment originating from a single sound source. Inte-grated into the pitch-based inference framework, these segments can improve the per-formance of voice/music separation significantly.
Keywords/Search Tags:Time-Frequency Analysis, Music Information Retrieval, Music Identifica- tion, Monaural Singing Voice Separation
Request the full-text of this thesis
Related items