| Speech enhancement has been widely used in the field of human-computer interaction.In order to improve the noise reduction performance and generalization ability of speech enhancement algorithms in practical applications,and to solve the problem of redundant information causing interference to neural networks,this thesis combines attention mechanism and deep neural network for speech enhancement tasks.A good voice enhancement effect has been achieved.The main work of this thesis is as follows:(1)Speech enhancement algorithm combined with efficient channel attention mechanism.The generator and discriminator of traditional generative adversarial network maximize and minimize the same objective function respectively.The loss function of traditional generative adversarial network cannot fully describe the relationship between the estimated spectrum and the pure speech spectrum.In order to solve this problem,a relative generative adversarial network is introduced in this paper,and its discriminator requires that the pure spectrum has higher realism than the enhanced spectrum;while the generator of the relative generative adversarial network in turn requires the enhanced spectrum to have a higher realism than the pure spectrum.sex.And the L1norm is introduced in the loss function of the generator.Finally,an efficient channel attention mechanism is introduced into the generator of the relative generative adversarial network,which suppresses irrelevant speech enhancement information and improves the flexibility and accuracy of the model.Compared with the baseline model in the Nonspeech-100 dataset,this model has an average improvement of 2.79%in perceptual evaluation of speech quality(PESQ),and an average improvement of 0.95%in short-term objective intelligibility(STOI);under the Noise X-92 dataset,PESQ increased by 3.8%on average,and STOI increased by 2.03%on average.The experimental results show that the method improves the performance of the model without adding a large amount of computation.(1)Speech enhancement algorithm combined with efficient channel attention mechanism.The generator and discriminator of traditional generative adversarial network maximize and minimize the same objective function respectively.The loss function of traditional generative adversarial network cannot fully describe the relationship between the estimated spectrum and the pure speech spectrum.In order to solve this problem,a relative generative adversarial network is introduced in this paper,and its discriminator requires that the pure spectrum has higher realism than the enhanced spectrum;while the generator of the relative generative adversarial network in turn requires the enhanced spectrum to have a higher realism than the pure spectrum.sex.And the L1 norm is introduced in the loss function of the generator.Finally,an efficient channel attention mechanism is introduced into the generator of the relative generative adversarial network,which suppresses irrelevant speech enhancement information and improves the flexibility and accuracy of the model.Compared with the baseline model in the Nonspeech-100 dataset,this model has an average improvement of 2.79%in PESQ,and an average improvement of 0.95%in STOI;under the Noise X-92 dataset,PESQ increased by 3.8%on average,and STOI increased by 2.03%on average.The experimental results show that the method improves the performance of the model without adding a large amount of computation.(2)Speech enhancement algorithm for complex convolutional recurrent networks with a parameter-free attention mechanism.Taking the convolutional recurrent network as the backbone,a complex convolutional recurrent network with a parameter-free attention mechanism is designed.In this experiment,in the downsampling module,an asymmetric convolution block composed of a deep over-parameterized convolution layer is used to replace the traditional convolution layer to enhance the horizontal and vertical features in the feature learning process,obtain more learnable parameters,and improve the model.expressive ability;adding a residual learning module with a parameter-free attention to suppress redundant information;completing the downsampling work through the Soft Pool module.In the upsampling module,replace the parametric convolution with transposed convolution for upsampling,and remove the Soft Pool module.In the middle layer,the bidirectional gated loop unit is used to obtain bidirectional information when processing the speech timing information.Finally,a feature fusion enhancement module is added to the skip connection to make up for the loss of information in the up and down sampling process.This model improves the accuracy of the model without adding much computation.Compared with the baseline model,under the Voice Bank+DEMAND dataset,the PESQ of this model is improved by 0.15(5.49%),the CBAK is improved by 0.14(4.34%),the COVL is improved by 0.40(12.42%),and the CSIG index is improved by 0.57(15.28%).%).The experimental results show that it has high theoretical significance and practical value for practical speech enhancement. |