| In the actual communication process,the highest frequency of communication is voice communication,while in real life,it is extremely susceptible to interference from ambient noise,which affects the quality of voice communication signals and communication efficiency.Therefore,it is necessary to adopt speech enhancement technology to improve the quality of speech signals,achieve the purpose of removing environmental noise and improving the quality of speech signals.The specific task of speech enhancement is actually to increase the signal-to-noise ratio of speech signals,making them more conducive to listening and understanding.In recent years,with the development of deep learning,neural networks have the ability to handle complex nonlinear relationships,and speech signals are non-stationary signals of continuous time and amplitude.People are beginning to use deep neural networks in the field of speech enhancement.The main research contents of this article are as follows:(1)A gated network model based on multi head attention mechanism is proposed.Aiming at the long-term dependence problem in traditional convolutional cyclic neural networks.This article is mainly based on the Gated Network(GCN)architecture,combined with Transformer’s multi header attention mechanism,and uses residual technology to alleviate the gradient disappearance phenomenon caused by the Post LN structure.To some extent,it solves the problem of not taking into account the important information of each time slot in the gated network,while reducing the computational complexity of the model.(2)This paper proposes a GAN model using GCN as a part of the generator and combining LSTM with Multi-scale discriminator.Gating networks can control the input of GAN generators to enable them to generate more diverse speech signals,thereby improving the diversity and stability of speech signals.At the same time,the gated network can adaptively select the characteristics of the input GAN generator,thereby reducing training time and data volume,thereby accelerating training speed.Due to the use of five signal-to-noise ratios to preprocess the dataset in this article,using a multi-scale discriminator can more comprehensively understand the characteristics of input speech,and more accurately distinguish speech signals with different signal-to-noise ratios,thereby improving the performance of speech enhancement.(3)Inspired by the application of U-Net in computer vision,this paper proposes an optimized network model,GCU-N.Its encoder and decoder use two four layer UNet units to capture dynamic long-term context information,and the intermediate layer still uses the same structure as the GCN model proposed in this paper. |