Study On Speech Enhancement Based On Deep Learning

Posted on:2020-12-05

Degree:Master

Type:Thesis

Country:China

Candidate:H M Zhang

Full Text:PDF

GTID:2428330590997172

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

In the DNN-based speech enhancement method,the DNN model establishes a mapping between noisy speech features and clean speech features.To take advantage of contextual information,the DNN model concatenates several frames of speech features as input,which may result in speech impairment.Moreover,it is independent between each frame of the speech feature during training,and it is difficult to learn the correlation between adjacent speech frames.The LSTM model directly inputs the speech features flatly and cannot take advantage of the intrinsic link between the time dimension and the frequency dimension in the spectrogram.It also cannot use the following information of the speech.Moreover,it has a large number of parameters and requires high computing power.In view of the above problems,this thesis studies the speech enhancement method based on deep learning.The main work contents are as follows:(1)A DNN speech enhancement method combining attention mechanism is proposed.This method applies the idea of attention mechanism to speech enhancement.The main idea is to add attention layer before the full connection layer.First,we use the attention layer to extract the weight corresponding to each frame.Then we multiply each frame by its weight and concatenate them into a long vector.Finally we input the vector into the DNN model.(2)A speech enhancement method based on LSTM model is improved.This method concatenates several frames into a long vector and inputs it into the model,so that the LSTM model can be trained in rich context information.At the same time,the attention layer is added to the model,and the global variance is applied to the model.Finally,the effectiveness of the improved method is proved by experiments.(3)A speech enhancement method combining CNN and GRU is proposed.In this method,the input spectrogram is encoded into a high-dimensional feature by a convolutional network,and then the feature vector is modeled by a two-layer GRU network.Finally it is input to the fully connected layer with a linear activation function.The model makes full use of CNN's feature extraction capabilities and time modeling capabilities of GRU networks.

Keywords/Search Tags:

Speech Enhancement, Long Short-Term Memory, Attention Mechanism, Convolutional Neural Network, Gated Recurrent Unit

PDF Full Text Request

Related items

1	Research On Air Quality Prediction Based On Deep Learning
2	Research On Chinese Named Entity Recognition Based On Deep Learning
3	Research On Speech Emotion Recognition Based On Convolutional Recurrent Neural Network
4	Research On End-to-End Speech Recognition Based On GRU And Self-Attention Mechanism
5	The Cross-site Script Detection Based On Deep Learning
6	Sentiment Classification For Review Texts Via Bi-directional Gated Recurrent Unit
7	Speech Separation Technology Based On Deep Learning
8	Research On Image Caption Based On Attention Mechanism
9	Text Classification Research Based On Deep Neural Network And Attention Mechanism
10	Research On Relation Classification Via Bidirectional Long Short-Term Memory Networks With Attention Mechanism