Font Size: a A A

Research On Sound Source Separation Algorithm Based On Deep Neural Network

Posted on:2022-11-03Degree:MasterType:Thesis
Country:ChinaCandidate:T H LiFull Text:PDF
GTID:2518306782452044Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
During speech processing,multiple people talking at the same time and their voices mixed together are often encountered.Depending on with or without reverberation,the mixtures can be classified as dry mixture and reverberated mixture.These mixtures can reduce the efficiency and accuracy of the speech processing.Therefore,we would like to separate the clean sounds from the mixture efficiently and quickly.With the rapid development and improvement of(deep)neural networks,many excellent algorithms based on these technologies have been developed in the field of speech separation.These algorithms can be categorized into three classes: deep clustering,semantic segmentation models and ``encoder-separator-decoder'' architectures.In this study,we find that these three classes of algorithms can be abstracted into one,the separation algorithm with reference signal.Based on this algorithm,we try to separate clean sounds form single-channel reverberated mixture.Specifically,the main contributions of this thesis are listed as follows.Firstly,we improve the mothod of generating mixture-clean data pairs.There are very few open source datasets of reverberated mixtures.To generate the reverberated mixtures,we would prefer to convolute clean sounds with impulse responses.Based on the previous works,we refine the generation method of impulse response and improve the generation data pairs by addressing the characteristics of reverberated mixtures.Secondly,we design neural networks for the two targets of generating reference signals and speech separation based on the scheme of reference signal-assisted separation.The reference signals are required to be reverberation-free and characterize the main features of the clean signal.We propose the pre-processing scheme of stride four sampling,and two complementary networks,the reverberation removal network and the mixture separation network,following the requirements and taking into account the complexity,the size of the model and the inference speed.This study also propose a sample rate restoration network for the speech separation with reference signal.This network is designed to make full use of the reference signal and to exploit the hidden information from reverberated mixture.A high-pass filter is appended to the loss function of the sample rate restoration network,which implicitly assign large weights to the high-frequency components,and cancel the negative effects of down sampling,and improve the quality of separations.While training,specialized training strategies are designed according to the features of each task,which maximize the efficiency of the both subnetworks and the combination network.Lastly,we verify the superiority of the proposed method through experiments.We compare the proposed network with the previous networks: SVoice,Su DORMRFNet,LSTMTas Net,Conv Tas Net,DPRNNTas Net,DPTNet.The reverberated mixtures are computed by convoluting clean audios and impulse responses,where the clean audios are picked from Librispeech dataset and the impulse responses are pick from both FUSS dataset and the dataset generated by virtual rooms.In comparison with the inference speed,the inference speed of the proposed network is faster.Comparing with the separation results,the proposed network performs slightly better when the SI-SNR of the input signal is large,and performs better when the SI-SNR of the input signal is small.
Keywords/Search Tags:Reverberated Mixture, Generation of The Mixture, Neural Network, Reference Signal, Speaker Separation
PDF Full Text Request
Related items