Font Size: a A A

Research On End-to-end Speech Enhancement Algorithm Based On Attention Joint Convolutional Network

Posted on:2022-10-12Degree:MasterType:Thesis
Country:ChinaCandidate:X FengFull Text:PDF
GTID:2518306560492854Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The purpose of speech enhancement is to remove various interference noises in noisy speech by designing an efficient signal processing algorithm,and restore a clean enhanced speech,while ensuring that the enhanced speech has a higher recovery quality and intelligibility.Traditional speech enhancement algorithms need to make strict assumptions about speech and noise signals before they are used,which limits their application in realworld scene.In recent years,neural networks that do not require any assumptions and have strong data modeling capabilities have received extensive attention from researchers and have become the mainstream algorithms in this field.This article focuses on improving the global modeling level and speech enhancement capabilities of convolutional neural networks.Convolution operations are good at paying attention to the partial details of the input speech signal,but its receptive field is very limited and it is difficult to capture global information.Therefore,it is necessary to stack multiple layers to learn the context dependence of the speech signal.However,as the number of layers deepens,the network will generate a lot of redundant information,which is not conducive to network learning after being transmitted layer by layer.In order to solve the above problems,this article combines three different types of self-attention mechanisms with convolutional neural networks to help the network obtain the global information of the speech signal from multiple angles,focus on effective features,and suppress redundant features.The specific research content is as follows:(1)The thesis takes Wave-Unet convolutional neural network as the basic structure,combines the Stand-alone full attention layer with Wave-Unet,and proposes a new speech enhancement model Wave-sa-Unet.The output speech feature map of the CLP layer is sent to the Stand-alone full attention layer for pixel focusing and feature reconstruction,which helps the model pay attention to useful information and suppress redundant information.The thesis adopts an end-to-end speech enhancement framework,and through a reasonable design of the network structure,the complex speech feature extraction process is omitted,and the noisy speech signal is directly sent to the model for training,and the enhanced speech waveform is output.At the same time,the thesis takes the scale-invariant signal-to-noise ratio as the model's objective function,and directly optimizes the speech evaluation index to improve the speech enhancement capability of the network model.Experimental results show that,compared with the Wave-Unet baseline model,Wave-sa-Unet produces a scale-invariant signal-to-noise ratio gain of0.54 d B.(2)By introducing two self-attention mechanisms(non-local module and channel squeeze-excitation mechanism)to Wave-sa-Unet,the thesis proposes a multi-attention joint convolution speech enhancement model Wave-ma-Unet,Three self-attention mechanisms assist and calibrate the convolutional network from different angles to help the network further improve the denoising level and speech enhancement capabilities.Experimental results show that,Wave-ma-Unet produces a scale-invariant signal-to-noise ratio gain of 0.66 d B than Wave-sa-Unet.
Keywords/Search Tags:Speech enhancement, Convolutional Neural Network, Self-attention mechanism
PDF Full Text Request
Related items