Font Size: a A A

Research And Implementation Of Speech Enhancement Based On Domain-Adversarial Training Of Neural Networks

Posted on:2023-08-02Degree:MasterType:Thesis
Country:ChinaCandidate:D W YangFull Text:PDF
GTID:2568306911986149Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of the times,speech is more and more significant in people’s lives,but a lot of background noise is often mixed in the background of the real world.This can lead to degradation of voice quality,so speech enhancement has become a hot topic in this case.In addition to traditional speech enhancement methods,speech enhancement methods based on deep learning have taken hold in the field of speech enhancement.It can directly realize the mapping of noisy speech to the target speech by using the neural network,which avoids the drawbacks that traditional methods often need to predict the noisy speech spectrum,and improves the performance of speech enhancement.Speech enhancement models based on convolutional neural networks perform well,but the training of deep neural network requires a large amount of labeled data,and there may be only a small amount of data or no labeled data actually.In this situation,the deep neural network model that performs well on the training set has a significant decline in the performance of the test set with different distributions.How to improve the generalization of the model under the condition of insufficient conditions is an urgent problem to be solved.To address the above problems,this paper proposes a convolutional speech enhancement optimization algorithm based on DANN by combining Domain-Adversarial Training of Neural Networks(DANN)and speech enhancement methods,called Convolutional Speech Enhancement Algorithm Based on DANN(CSEDANN).The basic structure of the algorithm is a convolutional neural network,which is inserted into the attention module and adjusted based on the idea of DANN to get a new model.Compared with the original model,there is a noticeable improvement in the generalization of the model.The research content of this paper is divided into the following aspects:(1)Define the optimization problem of speech enhancement.Firstly,the performance index of speech quality and its calculation method are introduced in detail,and the definition of speech enhancement problem is given.Then we analyze the problem of transfer learning in speech enhancement in detail and defines it.Finally,the definition of the overall speech enhancement optimization problem is given.(2)A speech enhancement optimization algorithm CSEDANN based on DANN is designed and implemented by me.The algorithm first adopts the convolutional layer,batch normalization layer and activation layer cascade structure to form a convolution-based speech enhancement network.Secondly,an attention module is inserted into the network to improve the network performance.Finally,the feature extractor,predictor and domain classifier are constructed based on the CNN model with attention,and the CSEDANN algorithm is designed.Based on the model,the training steps of the algorithm in two cases are given.When there are only a few labeled samples,the algorithm is first trained by the source dataset and then fine-tuned.When targeting unlabeled samples,the labeled samples of the original dataset and the unlabeled samples of the target dataset are used for joint training.(3)We verify the effectiveness of the CSEDANN algorithm proposed in this paper through experiments.Experiments are carried out under the combination of different source domain datasets and target domain datasets to verify the model performance.We focus on the changes of the method in the target domain,and use SDR to evaluate the performance.After inserting the attention module,the SDR indicator value increased by 0.11dB.In the case of a small number of labeled samples,the SDR indicator value of CSEDANN increased by 1.54dB compared with CNN speech enhancement method with attention,0.08dB compared with freeze fine-tuning method,and 0.07dB compared with ablation experiment.In the case of no labeled samples,the SDR indicator value of CSEDANN increased by 1.05dB compared with CNN speech enhancement method with attention,which shows the effectiveness of the method,and proves that the algorithm proposed in this paper has achieved stable and significant improvement compared with the original algorithm.
Keywords/Search Tags:Speech Enhancement, DANN, Convolutional Neural Network, Transfer Learning
PDF Full Text Request
Related items