Font Size: a A A

Research On Mono Source Separation Algorithm Based On Feature Enhancement And Data Enhancement

Posted on:2020-07-17Degree:MasterType:Thesis
Country:ChinaCandidate:B X HeFull Text:PDF
GTID:2438330626464266Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Source separation refers to the technology of separating source signals from the mixture signal,in which the specific process of mixing is unknown.When the mixture signal is recorded in the monaural channel,the source separation is more challenging due to the limited available channel information.In this paper,the monaural mixture signal we discussed is composed of vocal and music,which is called monaural singing voice separation(MSVS).The effective separation of vocal and music is a research hotspot in the signal processing field,and it is a key technology for practical applications such as accompaniment extraction,lyrics recognition,and singer information recognition.Due to the weak expression ability of the nonlinear relationship in the mixture signal,the separation effect of the early shallow separation model is not satisfactory.With the development of deep learning,the proposed deep neural network can capture the spatial structure information from the mixture signal,thereby promoting the model to obtain better separation effect.At present,the separation models based on deep neural networks usually use the magnitude spectra of mixture signals as the input features of the network.Such high-dimensional features that contain certain redundant information often increase the training burden of the network.In addition,in order to further improve the representation ability of the mixture signal,the separation network is usually designed as a multi-level,multi-unit architecture,and the lack of training data often leads to serious overfitting of separation network.In this paper,we discuss the input features of the separation model and the expansion of the training data.The main work is as follows:(1)A separation model based on enhanced feature is proposed.Instead of directly using the magnitude spectra feature of the mixture signal,the model first uses a convolution filter to generate the low-dimensional,de-redundant magnitude spectra feature of the mixture signal.Then according to the characteristics of MSVS and the particularity of vocal and music distribution,the obtained features are further concatenated with the high-resolution magnitude spectra of the mixture signal to generate enhanced feature.This enhanced feature,which is specifically designed for MSVS,preserves the key part of the magnitude spectra to reduce the computational load of the model,and complements the sparsely distributed vocal part by concatenating high-resolution magnitude spectra.Experiments show that compared with using the magnitude spectra feature directly,the separation network based on enhanced feature can further improve the separation effect and effectively shorten the training time.(2)A data augmentation model based on generative adversarial network(GAN)and variational autoencoder(VAE)is proposed.Traditional manual data augmentation methods are based on the assumption that sources are independent in the mixture signal.However,the correlation between sources is the main reason for the difficulty of source separation.Therefore,manual data augmentation methods cannot generate high-quality source correlation mixture signals.The data augmentation network proposed in this paper firstly utilizes a variational autoencoder to model the training set of vocal and music respectively to reverse the data generation process.Then we add adversarial training to the latent space generated by the variational encoder,so that the discriminator in the generative adversarial network can distinguish the original mixture signal and the generated mixture signal in the latent space.Experiments show that the proposed data augmentation network can generate high-quality mixture signal samples,and further improve the separation effect of complex separation networks.
Keywords/Search Tags:Deep learning, source separation, variational autoencoder, generative adversarial network, data augmentation
PDF Full Text Request
Related items