Font Size: a A A

Research And Implementation Of Single-channel Speech Enhancement Model Based On Deep Learning

Posted on:2024-07-28Degree:MasterType:Thesis
Country:ChinaCandidate:X C ZhuFull Text:PDF
GTID:2558307127959259Subject:Electronic information
Abstract/Summary:PDF Full Text Request
With the arrival of the 5G era,smart home,real-time communication,online courses and other fields have gradually entered the public’s vision.In the actual scene,all kinds of complex noise and reverberation full of randomness will cause interference and destruction to the voice signal,which seriously affects the application of various intelligent terminal devices developed based on the voice signal.Speech enhancement,as an important branch of speech signal processing,its main purpose is to purify and restore speech signals damaged by various environmental interference,so as to improve speech intelligibility and perceptual quality.Most of the traditional speech enhancement models are unsupervised learning.They make unreasonable assumptions about noise statistics,and their ability to suppress non-stationary noise is slightly poor.In recent years,speech enhancement algorithms based on deep neural networks have developed rapidly.Based on the previous research on deep speech enhancement algorithms,this paper studies the Multi-view Attention Network(MANNER)in time domain.In order to solve the problems of slow training speed,large memory and poor effect of speech enhancement in MANNER network,a series of improvement schemes are proposed.The main work is as follows:In view of the calculation difficulties caused by the large number of parameters of the MANNER model,which does not meet the lightweight requirements,flash attention is used to improve the model.In order to further improve the speech enhancement performance of the model,aiming at the loss of local speech details during decoding,the information interaction between global features and local features is established,and the fuse block module is designed for information coupling,And use neighborhood attention to replace local attention to further optimize local feature expression.Experimental verification shows that after 50 iterations,FLOPs of the improved model are reduced by 32.76%,video memory occupancy is reduced by 66.51%,and forward and backward propagation parameters are reduced by 82.8% compared with the original model.Compared with the baseline model,STOI score of the enhanced speech after improvement is increased by 2.08%,and PESQ score is increased by 6.32%.In order to avoid the problem that the deep network pays too much attention to the noise elimination ability and ignores the improvement of speech quality,this paper introduces target feature perception contrast stretching in MANNER network to emphasize the target feature perception.In view of the measurement mismatch caused by the original loss function on the speech evaluation indicators,a new loss function is designed based on the non-linear feature mapping module using the normalized PESQ score as the label to better perform the minimum loss constraint task and further improve the speech enhancement performance.Experimental verification shows that the STOI score of the improved network is 2.2% higher than that of the baseline network,and the PESQ score is 22.6% higher than that of the baseline network after 50 iterations.Establish the final improved network AP-MANNER for experiment.300 iterations were set in the experiment,and the model reached the convergence state.The STOI score of enhanced speech of AP-MANNER network was 94.79,which was0.46% higher than the baseline network;The PESQ score was 3.3576,which was 7.61%higher than the baseline network.Finally,based on the AP-MANNER network proposed in this paper,a voice enhancement visualization system is designed to facilitate users to analyze the voice enhancement performance of the model more intuitively and conveniently.
Keywords/Search Tags:Speech enhancement, MANNER network, attention mechanism, feature coupling, perceptual contrast stretching, nonlinear feature loss function
PDF Full Text Request
Related items