Research On Speech Enhancement Based On Deep Learning

Posted on:2022-08-10

Degree:Master

Type:Thesis

Country:China

Candidate:Q Y Zhong

Full Text:PDF

GTID:2518306740496504

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

With a wide range of application scenarios,speech enhancement technology has attracted extensive attention in the field of speech signal processing.As the front-end of speech signal processing system,speech enhancement has wide application in video conference,hearing aid,smart house and smart vechile.Yet,common speech enhancement algorithms based on spectrum mapping or mask have some shortcomings.First,the multi-to-one mask regression relies on the statistical information learned by the neural network and ignores capturing the inter-frame two-dimensional information.Second,spectrum mapping captures twodimensional information well,but compared with mask,spectrum lacks some artificial prior knowledge.Considering the above two points,this paper proposes a speech enhancement method based on mask mapping and two single channel speech enhancement algorithms based on this modeling method: mask-mapping-based residual dense network(MM-RDN)and maskmapping-based hybrid dilated convolutional network(MM-HDCN)To adapt the mapping network to the texture-abundant two-dimensional spectrum,based on the U-net structure,we introduce the hybrid dilated convolution to a convolutional encoderdecoder(CED),which can maximize the receptive field of network,elimianate the gridding effect and reduce the amount of model parameters.Simulation results show that the maskmapping framework can effectively enhance speech in known and unknown scenes multidimensionally,and outperforms the multi-to-one mask regression and spectrum mapping.Also,MM-HDCN is proven to be robust,lightweight and with generalization.Focusing on making full use of feature maps,residual dense block(RDB)is used to improve the fitting ability of neural networks.RDB can form the contiguous memory mechanism through densely connected layers,local feature fusion and local residual learning,and it can also make the training process stable.The proposed MM-RDN also takes Log-Power Spectra(LPS)as the input feature,and IRM as training target to train the speech enhancement model.Simulations show that increasing the window length has a positive effect on the maskmapping-based speech enhancement.Also,it turns out that MM-RDN can effectively utilize the two-dimensional information of LPS and the artificial prior information of IRM.Compared with MM-HDCN and others,MM-RDN has a significant improvement in the measure indexes,and enhance the signal quality,perceptive quality and speech intelligibility better.Comparing two proposed algorithms,MM-RDN has a better comprehensive performance while MM-HDCN has a lighter structure.Both of them have good robustness and generalization in speech enhancement,which surpasses the existing algorithm.

Keywords/Search Tags:

Deep Learning, Speech Enhancement, Mask Mapping, Convolutional Neural Network

PDF Full Text Request

Related items

1	Research On Deep Learning Based Speech Enhancement
2	Speech Enhancement Based On Iterative Mask Estimation And Generative Adversarial Networks
3	Research On Deep Learning Based Speech Enhancement
4	Research On Deep Neural Network Based Speech Enhancement
5	A Speech Enhancement System Based On Deep Learning And Parallel Computing
6	Speech Enhancement Research Based On Sparse Representation And Deep Neural Network
7	Research On Speech Enhancement Algorithms Based On Deep Learning
8	Research And Implementation Of Lightweight Speech Enhancement Algorithm For Air Control
9	Study On Speech Enhancement Based On Deep Learning
10	Codebook-based Speech Enhancement Using Deep Neural Network