Font Size: a A A

Single Channel Speech Enhancement Based On Deep Learning

Posted on:2021-05-07Degree:MasterType:Thesis
Country:ChinaCandidate:J D LiFull Text:PDF
GTID:2428330620976430Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The purpose of speech enhancement is to suppress the noise in noisy speech while retaining as much of the clean speech as possible.In recent years,speech enhancement has been formulated as a supervised learning problem,where the discriminative patterns of speech and noise are learned directly from training data.In particular,the deep learning-based speech enhancement methods have shown promising performance.This thesis conducts two studies under supervised speech enhancement:(1)Capsule network based speech enhancement: deep neural networks(DNN)have achieved good performance on the speech enhancement task,but the noise generalizability of DNN is unsatisfactory.In order to improve the generalization performance of the model,we propose the capsule network based speech enhancement method.Capsule networks are proposed in the image processing field,showing robustness to affine transformations of inputs,and capsule networks are good at recognizing overlapping objects.We consider noisy speech is the overlap of speech and noise,so capsule networks are also suitable for speech enhancement.Experimental results showed that the capsule network based method has better noise generalization performance than DNN.(2)Temporal convolutional recurrent network based speech enhancement: Most deep learning based models manipulate signals in the time-frequency domain.The phase of the target is difficult to estimated directly using models.Therefore,models generally only estimate the amplitude spectrum of speech,while use phase of noisy speech to resynthesize waveform.In this thesis,we propose to use temporal convolutional recurrent network(TCRN)for speech enhancement,which directly maps noisy speech to clean speech.TCRN is an end-to-end model,which efficiently models short-and long-term information in speech.The experimental results show that TCRN outperforms previous LSTM and CRN based methods in terms of speech intelligibility and speech quality.
Keywords/Search Tags:speech enhancement, deep learning, deep neural network, temporal convolution, capsule network
PDF Full Text Request
Related items