Font Size: a A A

TSEGAN Speech Enhancement Based On Dynamic Convolution And Narrow-band Conformer Network

Posted on:2024-09-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y X LuFull Text:PDF
GTID:2568307136492284Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Speech enhancement(SE)is one of the research focuses in the field of speech signal processing,with the aim of suppressing or eliminating the background noise of the speaker’s speech while ensuring that the speech does not produce distortion.In the research of SE,SE mainly evaluates the advantages and disadvantages of an algorithm through two aspects: one is to suppress or eliminate the background noise of the speaker’s speech,and the other is to ensure that the generated enhanced speech has high speech perception quality and intelligibility.In recent years,researchers have focused more on the continuously developing deep learning,and SE methods based on deep neural network have naturally become the main research direction in this field.With the promotion of deep learning based neural network technology,many high-performance models have emerged in the field of SE.Based on the TSEGAN baseline model,this thesis conducts related analysis and research on the improvement of perceptual quality and intelligibility,and proposes a series of methods to improve the performance of the model.First,in order to improve the performance of the baseline model and generate better quality enhanced speech,this thesis improves the generator network by introducing the Narrow-band Conformer network into the generator network of TSEGAN,and proposes the method of NBCTSEGAN.Due to the combination of the global modeling capability of Transformer and the local modeling capability of the convolutional network,the multi-level Narrow-band Conformer network enables the generator to better process the feature information of the speech signal and improve the perception quality and intelligibility of the enhanced speech.The subjective and objective evaluations demonstrate that compared with the baseline model,the average STOI value of the proposed method is increased by 4.92%,the average PESQ value is increased by 2.85%,indicating that the proposed method improves the perceptual quality and intelligibility of the enhanced speech,and the average CSIG/CBAK/COVL value are increased by 4.85%,3.28%,4.30% respectively,indicating that the proposed method improves the overall auditory effect of the enhanced speech effectively.Furthermore,on the basis of the above improved model,in order to further improve the quality of the enhanced speech,this thesis proposes the Dy Conv-NBC-TSEGAN.The proposed model takes advantage of the feature that dynamic convolution does not add a lot of extra calculation and can further improve model performance.It replaces the 2-D convolution with dynamic convolution to improve the anti deception ability of the discriminator,and improve the generation and expression capability of the generator.The subjective and objective evaluations demonstrate that compared with the baseline model,the average STOI value of the proposed method is increased by 6.56%,the average PESQ value is increased by 4.63%,and the average CSIG/CBAK/COVL value are increased by 9.7%,7.38%,5.73% respectively,compared to the NBC-TSEGAN model,the average STOI value of the proposed method is increased by 1.96%,the average PESQ value is increased by 1.73%,and the average CSIG/CBAK/COVL value are increased by 4.63%,2.38%,1.37% respectively,indicating that the proposed method improves the intelligibility and overall auditory effect of the enhanced speech,and effectively improves the enhancement effect of the model.In summary,by introducing Narrow-band Conformer Network into the baseline model,the proposed method improves the generator’s generation and expression capabilities,and improves the perceptual quality and intelligibility of the enhanced speech.In order to further improve the quality of the enhanced speech,this thesis proposes the TSEGAN based on Narrow-band Conformer network and dynamic convolution by replacing ordinary convolution with dynamic convolution,dynamic convolution can generate different dynamic convolution kernels over time,which can dynamically adjust the parameters of each convolutional kernels,improving the performance of the model and the intelligibility and overall auditory effect of the enhanced speech effectively.
Keywords/Search Tags:speech enhancement, TSEGAN, deep neural network, Narrow-band Conformer network, dynamic convolution
PDF Full Text Request
Related items