Font Size: a A A

Study On Speech Enhancement Based On Deep Learning

Posted on:2020-12-05Degree:MasterType:Thesis
Country:ChinaCandidate:H M ZhangFull Text:PDF
GTID:2428330590997172Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
In the DNN-based speech enhancement method,the DNN model establishes a mapping between noisy speech features and clean speech features.To take advantage of contextual information,the DNN model concatenates several frames of speech features as input,which may result in speech impairment.Moreover,it is independent between each frame of the speech feature during training,and it is difficult to learn the correlation between adjacent speech frames.The LSTM model directly inputs the speech features flatly and cannot take advantage of the intrinsic link between the time dimension and the frequency dimension in the spectrogram.It also cannot use the following information of the speech.Moreover,it has a large number of parameters and requires high computing power.In view of the above problems,this thesis studies the speech enhancement method based on deep learning.The main work contents are as follows:(1)A DNN speech enhancement method combining attention mechanism is proposed.This method applies the idea of attention mechanism to speech enhancement.The main idea is to add attention layer before the full connection layer.First,we use the attention layer to extract the weight corresponding to each frame.Then we multiply each frame by its weight and concatenate them into a long vector.Finally we input the vector into the DNN model.(2)A speech enhancement method based on LSTM model is improved.This method concatenates several frames into a long vector and inputs it into the model,so that the LSTM model can be trained in rich context information.At the same time,the attention layer is added to the model,and the global variance is applied to the model.Finally,the effectiveness of the improved method is proved by experiments.(3)A speech enhancement method combining CNN and GRU is proposed.In this method,the input spectrogram is encoded into a high-dimensional feature by a convolutional network,and then the feature vector is modeled by a two-layer GRU network.Finally it is input to the fully connected layer with a linear activation function.The model makes full use of CNN's feature extraction capabilities and time modeling capabilities of GRU networks.
Keywords/Search Tags:Speech Enhancement, Long Short-Term Memory, Attention Mechanism, Convolutional Neural Network, Gated Recurrent Unit
PDF Full Text Request
Related items