Font Size: a A A

Research On Speech Enhancement Based On Deep Learning

Posted on:2022-08-10Degree:MasterType:Thesis
Country:ChinaCandidate:Q Y ZhongFull Text:PDF
GTID:2518306740496504Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With a wide range of application scenarios,speech enhancement technology has attracted extensive attention in the field of speech signal processing.As the front-end of speech signal processing system,speech enhancement has wide application in video conference,hearing aid,smart house and smart vechile.Yet,common speech enhancement algorithms based on spectrum mapping or mask have some shortcomings.First,the multi-to-one mask regression relies on the statistical information learned by the neural network and ignores capturing the inter-frame two-dimensional information.Second,spectrum mapping captures twodimensional information well,but compared with mask,spectrum lacks some artificial prior knowledge.Considering the above two points,this paper proposes a speech enhancement method based on mask mapping and two single channel speech enhancement algorithms based on this modeling method: mask-mapping-based residual dense network(MM-RDN)and maskmapping-based hybrid dilated convolutional network(MM-HDCN)To adapt the mapping network to the texture-abundant two-dimensional spectrum,based on the U-net structure,we introduce the hybrid dilated convolution to a convolutional encoderdecoder(CED),which can maximize the receptive field of network,elimianate the gridding effect and reduce the amount of model parameters.Simulation results show that the maskmapping framework can effectively enhance speech in known and unknown scenes multidimensionally,and outperforms the multi-to-one mask regression and spectrum mapping.Also,MM-HDCN is proven to be robust,lightweight and with generalization.Focusing on making full use of feature maps,residual dense block(RDB)is used to improve the fitting ability of neural networks.RDB can form the contiguous memory mechanism through densely connected layers,local feature fusion and local residual learning,and it can also make the training process stable.The proposed MM-RDN also takes Log-Power Spectra(LPS)as the input feature,and IRM as training target to train the speech enhancement model.Simulations show that increasing the window length has a positive effect on the maskmapping-based speech enhancement.Also,it turns out that MM-RDN can effectively utilize the two-dimensional information of LPS and the artificial prior information of IRM.Compared with MM-HDCN and others,MM-RDN has a significant improvement in the measure indexes,and enhance the signal quality,perceptive quality and speech intelligibility better.Comparing two proposed algorithms,MM-RDN has a better comprehensive performance while MM-HDCN has a lighter structure.Both of them have good robustness and generalization in speech enhancement,which surpasses the existing algorithm.
Keywords/Search Tags:Deep Learning, Speech Enhancement, Mask Mapping, Convolutional Neural Network
PDF Full Text Request
Related items