Font Size: a A A

Research On Monophonic Speech Enhancement Algorithm Based On Attention Mechanism

Posted on:2022-07-22Degree:MasterType:Thesis
Country:ChinaCandidate:M Y LiFull Text:PDF
GTID:2518306341451774Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
As a pre-processing technology for speech tasks,speech enhancement has been widely used in speech recognition,and speech separation.Speech enhancement is mainly to remove noisy from noisy speech and restore clean speech.Previous researches are mainly divided into traditional algorithms and deep learning methods.Traditional methods contain many assumptions.In low SNR environment,speech distortion will occur.Now deep learning methods proved to be better than the traditional algorithms.The main work of this paper is to propose a speech enhancement algorithm based on time domain.In the past,deep learning methods for speech enhancement often working in the frequency-domain,it needs to convert noisy speech between time-domain and frequency-domain.In the process,it increases a lot of computer computation,at the same time inevitably lost voice signal phase information.Now some studies have adopted the time-domain speech enhancement algorithm to solve the above problems.Because one frame of speech signal in time-domain is correlated with adjacent frames,and conventional convolution cannot learn the correlation between long distance speech frames.So in order to solve this problem,this thesis proposes an end-to-end model in time-domain,based on the encoder-decoder architecture,and insert Bi-LSTM and non-local block between encoder and decoder,in order to learn the correlation between long distance speech frames.In order to achieve a better effect in the evaluation index SI-SNR,the loss function based on SI-SNR was used in the training to maintain the consistency of the evaluation index and the training index.Experimental results show that this model has achieved good results in all the evaluation indexes selected in this thesis,compared with baseline,the SI-SNR of the model was improved by 9.6%at SNR of 0dB.The above research discussed that non-local block was inserted into encoder-decoder architecture,but the model needed to occupy a large amount of memory.Inserting Criss-Cross Attention between encoder and decoder instead of inserting non-local block,this method can reduce computer computation and the use of memory.Experimental results show that the Criss-Cross Attention module significantly reduces FLOPs by about 83.3%and memory by about 73%.
Keywords/Search Tags:speech enhancement, time-domain, end-to-end, attention mechanism
PDF Full Text Request
Related items