Research On Monophonic Speech Enhancement Algorithm Based On Attention Mechanism

Posted on:2022-07-22

Degree:Master

Type:Thesis

Country:China

Candidate:M Y Li

Full Text:PDF

GTID:2518306341451774

Subject:Electronics and Communications Engineering

Abstract/Summary:

PDF Full Text Request

As a pre-processing technology for speech tasks,speech enhancement has been widely used in speech recognition,and speech separation.Speech enhancement is mainly to remove noisy from noisy speech and restore clean speech.Previous researches are mainly divided into traditional algorithms and deep learning methods.Traditional methods contain many assumptions.In low SNR environment,speech distortion will occur.Now deep learning methods proved to be better than the traditional algorithms.The main work of this paper is to propose a speech enhancement algorithm based on time domain.In the past,deep learning methods for speech enhancement often working in the frequency-domain,it needs to convert noisy speech between time-domain and frequency-domain.In the process,it increases a lot of computer computation,at the same time inevitably lost voice signal phase information.Now some studies have adopted the time-domain speech enhancement algorithm to solve the above problems.Because one frame of speech signal in time-domain is correlated with adjacent frames,and conventional convolution cannot learn the correlation between long distance speech frames.So in order to solve this problem,this thesis proposes an end-to-end model in time-domain,based on the encoder-decoder architecture,and insert Bi-LSTM and non-local block between encoder and decoder,in order to learn the correlation between long distance speech frames.In order to achieve a better effect in the evaluation index SI-SNR,the loss function based on SI-SNR was used in the training to maintain the consistency of the evaluation index and the training index.Experimental results show that this model has achieved good results in all the evaluation indexes selected in this thesis,compared with baseline,the SI-SNR of the model was improved by 9.6%at SNR of 0dB.The above research discussed that non-local block was inserted into encoder-decoder architecture,but the model needed to occupy a large amount of memory.Inserting Criss-Cross Attention between encoder and decoder instead of inserting non-local block,this method can reduce computer computation and the use of memory.Experimental results show that the Criss-Cross Attention module significantly reduces FLOPs by about 83.3%and memory by about 73%.

Keywords/Search Tags:

speech enhancement, time-domain, end-to-end, attention mechanism

PDF Full Text Request

Related items

1	Research On Time Domain Speech Enhancement Algorithms Based On Auditory Perceptual Weighting
2	Research On Single Channel Speech Enhancement Based On Multi-head Attention Mechanism
3	Study On Speech Enhancement Based On Deep Learning
4	Research On End-to-end Speech Enhancement Algorithm Based On Attention Joint Convolutional Network
5	Dural Microphone Speech Enhancement Based On Deep Learning And Beamforming
6	Research On Speech Enhancement Algorithm Based On Attention Fusion Convolutional Neural Network
7	Improved Tacotron2 Speech Synthesis Method Based On Forced Monotonic Attention Mechanism
8	Research On Fully Convolutional Neural Network Based Speech Enhancement Algorithm In The Time Domain
9	Study On Speech Enhancement Algorithms Based Real-valued Discrete Gabor Transform
10	Multi-modal Speech Emotion Recognition Based On The Attention Mechanism