Speech is one of the important carriers of information and an important information exchange method in daily life.Speech enhancement is an important pre-processing technology of speech signal processing.Its purpose is to suppress the noise part of the noisy speech signal as much as possible,while enhancing the speech part to restore a clean speech signal with high intelligibility.This study takes speech enhancement as the research direction and summarizes the previous research experience.At present,the research direction of speech enhancement has gradually shifted from traditional analog and digital signal processing methods such as spectral subtraction,statistical methods,and filter methods to computer processing methods based on machine learning.In this paper,the method of neural network is selected to output the prior signal-to-noise ratio,and then calculate the gain function through the minimum mean square error criterion,and combine the phase information of the noisy speech signal to restore the enhanced speech signal.A neural network algorithm structure based on multi-head attention mechanism combined with convolution structure is designed in this paper.The neural network part of the algorithm consists of three parts: input layer,encoder layer,and output layer.The most important encoder layer is based on multi-head attention mechanism.The introduction of the attention mechanism enables the model to calculate the correlation between frames,and after applying the multi-head mechanism,the model can pay attention to the feature details of different parts in the noisy speech.In order to make the model suitable for time series causality,a frame mask is added to the calculation of the attention value.To make up for the lack of attention mechanism in feature extraction,a convolutional structure is added after the attention layer.In the experiment part,Chinese speech dataset and a variety of daily common noise datasets are selected in this paper.An objective quality evaluation standard is adopted here as the evaluation basis for the enhancement effect.We select three different methods suitable for long-term dependent sequence processing.The neural network structure served as a control group for the experiment.The experimental results show that the proposed model can effectively enhance the noisy signals generated under different signal-to-noise ratios,is suitable for long-term dependent speech signals,and is superior to LSTM and TCN in objective quality evaluation and system performance overhead.network structure. |