Font Size: a A A

The Research And Implementation Of A Speech Separation Method Based On Domain Adversarial Training

Posted on:2023-10-03Degree:MasterType:Thesis
Country:ChinaCandidate:J R ChenFull Text:PDF
GTID:2558306905495604Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Speech separation,also known as the cocktail party problem,processes the mixed speech data of multiple people speaking at the same time,and separates clean audio of each speaker from the mixture audio.Since the cocktail party problem was proposed,many speech separation algorithms have been proposed with promising results.However,the demand for a large amount of training data makes it difficult for speech separation systems to be used in scenarios with limited training samples,such as complex and changeable real-life scenarios or small language environment.Although the speech separation task in the samples-limited scenarios can be achieved through the generalization ability of the model itself,the generalization ability of neural networks is extremely limited.Existing solutions tend to use speech data to fine-tune the well-trained model,but the fine-tuning method is powerless for unlabeled test samples.Therefore,how to improve the performance of the speech separation model in the new samples-limited environment is still an urgent problem to be solved before the speech separation model is put into practical use.In view of the above problems,this paper will study the cross-language transfer ability of the speech separation model and observe the impact of the new language test speech data on the performance of the well-trained speech separation model.In this paper,combined with the study of transfer learning,an unsupervised domain adversarial training method is proposed,which only uses a small amount of unlabeled speech data to fine-tune the welltrained speech separation model and try to improve the effect of the model in the new environment.The research content of this paper is mainly divided into the following aspects:(1)The problem of cross-languages speech separation is defined.Firstly,we define the speech separation problem.In this paper,we mainly study the single-channel speech separation in which two speakers speak at the same time.Then,it is determined to use the domain adaptation method in transfer learning to improve the performance of the speech separation model in the new language environment.Finally,we generalize the problem of knowledge transfer between different language speech datasets as a cross-lingual speech separation problem and give the problem of cross-lingual speech separation from a mathematical perspective.(2)Design a cross-lingual transfer method based on domain adversarial training.We implement a language transfer method based on domain adversarial training,which is called LTDAT algorithm.We choose the Conv-Tas Net as the basic speech separation model and pre-train the Conv-Tas Net on a samples-sufficient dataset to obtain a well-trained speech separation model.Then,adversarial training is used to narrow the distribution difference between the two different language datasets to improve the effect of the well-trained ConvTas Net on the resource-limited speech dataset.(3)Explore the effect of the depth of the transformation network and training data volume on the transfer performance.Two different network architecture configurations are used to explore the effect of the transformation network’s depth on the speech separation model,the first configuration uses a shallower network as the generator in adversarial training,and the second configuration uses a deeper network as the generator.Finally,experiments are carried out for the two configurations respectively,and the effects of different amounts of new language training data on the model transfer performance are compared,and the results are evaluated using metrics such as SDR and SI-SNR.The experimental results show that the cross-language transfer algorithm proposed in this paper successfully improves the performance of the speech separation model in the resouce-limited environment.In the experiment,the deeper transformation network configuration can achieve optimal performance improvement,and the degree of improvement increases with the training samples of the new language.Compared with the original Conv-Tas Net model,the LTDAT algorithm only uses 1000 unlabeled training data to enhance the performance by 0.93 d B and0.89 d B absolute SI-SNR and SDR improvement respectively on the Chinese test dataset STCMDS.
Keywords/Search Tags:Speech Separation, Transfer Learning, Domain Adaptation, Cross-Language Transfer
PDF Full Text Request
Related items