Font Size: a A A

Research On Speech Enhancement Algorithm Based On Attention Fusion Convolutional Neural Network

Posted on:2022-10-12Degree:MasterType:Thesis
Country:ChinaCandidate:Z Z XuFull Text:PDF
GTID:2518306338967929Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
The purpose of speech enhancement technology is to purify the noisy speech destroyed by the interference source and restore the speaker's signal.In this process,enhancing the quality and intelligibility of speech is the unswerving goal of scholars in this field.At present,the traditional speech enhancement algorithms based on digital signal processing technology will produce serious noise residue or speech distortion and make the device unable to work when the surrounding environment becomes bad or no longer meets the set conditions.To solve the above problems,speech enhancement algorithms based on deep learning are emerging rapidly in recent years.They can recover clean speech in extremely low signal-to-noise ratio(SNR)and complex background environments,and have made remarkable achievements.Among them,convolutional neural network can greatly reduce the number of network parameters on the premise of guaranteeing the effect of denoising,which has been widely paid attention by researchers.However,due to the limited receptive field of convolution operation,it is difficult to model global context,which limits the space for further improvement of network model and is not conducive to enhanced speech recovery.In order to solve the above problem,this thesis focuses on improving the ability of convolutional network to obtain long-range correlation by the attention mechanism and help convolutional network further improve the performance of speech enhancement.This thesis proposes an attention-augmented fully convolutional neural network(AAUNet)for monaural speech enhancement,integrates a new two-dimensional relative self-attention mechanism into UNet.The specific method is to concatenate the outputs of the convolution operation and the proposed attention mechanism in the channel direction to develop a new feature map.This feature permits it to flexibly adjust the proportion of attention channels,finding the optimal combination of convolution learning local details and self-attention acquiring the global context.The experimental results show that AAUNet surpasses all the contrast methods under various unknown noises and SNR conditions,and improves the denoising ability.However,this thesis finds that the performance of speech enhancement decreases when the percentage of channels of attention mechanism is 100%.To solve this problem,we propose another speech enhancement model based on stand-alone self-attention mechanism(SAUNet).The stand-alone self-attention mechanism can freely set the size of the operation region and improve the distance-based perception ability through the multi-valued matrix.The experimental results show that compared with AAUNet,SAUNet can improve the PESQ and STOI scores by 7.93%and 4.16%,respectively.With a small amount of parameters added,SAUNet can achieve significant performance improvement.
Keywords/Search Tags:speech enhancement, fully convolutional neural network, attention mechanism
PDF Full Text Request
Related items