Joint Training Algorithms For Generative Adversarial Networks And Their Application To Speech Separation

Posted on:2022-10-01

Degree:Master

Type:Thesis

Country:China

Candidate:T Wang

Full Text:PDF

GTID:2518306524951919

Subject:Electronics and Communications Engineering

Abstract/Summary:

PDF Full Text Request

As the best way of human-computer interaction efficiency,speech is a hot research object in the field of AI.However,due to the influence of environment and other factors,the speech signal received by the machine is susceptible to severe interference,which makes it difficult for smart devices to obtain accurate speech information,thus affecting the efficiency of human-computer interaction.Speech separation is an important branch of speech enhancement technology,which mainly aims at speech interference between speakers.However,speech interference is different from noise interference,it is impossible to make approximate distribution assumptions.Therefore,it has always been a hot topic for scholars.Traditional speech separation methods based on signal processing often lose part of useful information during the separation process.The deep learning method,which uses a large number of speech samples as processing objects,improves the quality of separated speech with its strong nonlinear fitting ability,but with the decrease of Signal-to-Noise Ratio(SNR),the target speech features are masked,and the performance of system is limited.Based on this,this paper mainly focuses on the performance improvement of dual-speaker mixed speech under low SNR,and conducts in-depth research from the time-domain and frequency-domain perspectives.Firstly,the training errors of the existing speech separation methods come from the loss functions,which is not only difficult to comprehensively measure the differences between speech,but also has poor performance under low SNR.This paper proposes a cooperative training Generative Adversarial Network(GAN)in the frequency domain to learn the distribution differences between speech,it makes full use of its confrontation mechanism,and the generative model and the discriminative model are used to learn the features of the target speech and the interference speech respectively.Referring to the principle of time-frequency masking,further proposes a speech separation system based on time-frequency masking.In the experiment of the optimal ? value,the performance is best when the value is around 100,and the time-frequency masking as the output target can provide more separation information for the training process,which is beneficial to improve the overall quality of the separated speech under low SNR.Then,because the frequency-domain separation method ignores the phase,it is easy to lose part of the information when extracting time-frequency features.In this paper,we build a cooperative training GAN speech separation system in the time domain,and directly use the time-domain waveform as the training object to preserve the integrity of the speech to the greatest extent.Considering the large difference between the amplitudes,a nonlinear sigmoid function is proposed for normalization.The results show that the time-domain separation system still has the best performance when ?=100,and has better separation performance,but it is weaker than the frequency-domain separation method in spectrum recovery.Finally,separation results on mixed speech of the same sex and non-homologous test speech show that the separation effect of same-sex mixed speech is weaker than that of opposite-sex mixed speech because of the similar frequency;while the proposed method still has good generalization performance on the non-homologous speech.At the same time,the actual mixed speech is collected for testing,and the results show that the separation effect under opposite-sex mixing is significant,while the separation effect under same-sex mixing is not good.

Keywords/Search Tags:

speech separation, generative adversarial network, cooperative training, time-frequency masking, sigmoid normalization

PDF Full Text Request

Related items

1	Research On End-to-end Multi-speech Separation Technology Based On Generative Adversarial Nets
2	Blind Separation Of Multiple Speech Signals
3	Study On Speech Separation And Speech Enhancement Methods
4	Research On Underdetermined Blind Speech Separation Based On Sparsity In Time-frequency Domain
5	Research On Underdetermined Convolutive Speech Signal Separation Methods
6	The Application Of Adversarial Networks To Speech And Language Tasks
7	Single-Channel Speech Separation Using Sequential Dictionary Learning
8	Research On Underdetermined Speech Blind Separation Based On Sparsity
9	Data Privacy Masking Of Text Sequence Dataset Based On Generative Adversarial Network
10	Research And Application Of A Generative Adversarial Algorithm And Its Impact On Adversarial Examples