Font Size: a A A

Study On The Single Channel Source Separation Of Singing Voice Music

Posted on:2016-06-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2308330461477888Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
The blind source separation (BSS) refers to separating the source signal technology from observed signals in the case of unknown mixing process and the unknown source signals. With the development of the blind source separation technique in recent ten years, the overdetermined BSS and the determined BSS have been solved succesfully. Now we focus on the underdetermined BSS problem mainly. Among them the single channel voice/music source separation has become hotspot yet. The single channel means that there is only one observation, it is mixed up by multiple sources. It is meaningful of single channel voice/music source separation, especially in extracting accompaniment, pitch extraction, chord extraction, lyrics aligning and recognition and so on.This paper mainly research the single channel voice/music source separation with OdB mixed model, the specific task is singing voice separation, mainly including the following two aspects:(1) This paper proposes an unsupervised single channel voice/music source separation method based on repeat pattern extraction (REPET) and nonnegative tensor decomposition (NTF). The nonnegative tensor factorization technique is utilized to separate singing voice and the background from single channel mixture unsupervisedly. The REPET uses the periodicity, self-similarity or repeat pattern in background music to construct a period mask, the period music accompaniment can be extracted by the period mask from the mixture. So this paper combines the NTF and the REPET to extract the background music of the components from the NTF by the periodic mask constructed by the period music accompaniment, then the singing voice and the background music are separated.(2) On the basis of studying the deep recurrent neural networks (DRNN) deeply, this paper proposes a supervised single channel voice/music source separation method. Experiment shows that the deep recurrent networks could achieve better results than the general deep neural networks. This paper utilizes the deep recurrent networks to the single channel voice/music source separation. Jointly training the soft mask and the recurrent neural networks could improve the separation results further. In order to improve the non-correlation of voice and music source signal, the separation capability of the entire neural network could be improved through the improved discriminative training objective function. Fine-tuning the learning rate accelerates the whole networks to converge to the global extremum when avoiding the local optima. In order to improve the generalization ability of the whole network, through adaptively learning the parameters of the activation functions, the separation ability of the whole network can be improved further.The experiment is carried on the MIR-IK data set. A lot of experiments is done of the two proposed method in this paper, and the proposed methods compare with the other state-of-the art methods. The experiments show that the proposed methods achieve a good result.
Keywords/Search Tags:NTF, REPET, Deep Neural Networks, Source Separation
PDF Full Text Request
Related items