Research On Improved ME-MGCRN Speech Enhancement Algorithm Based On Low Signal-to-noise Ratio Case | | Posted on:2024-07-28 | Degree:Master | Type:Thesis | | Country:China | Candidate:Z X Fan | Full Text:PDF | | GTID:2558306920453864 | Subject:Communication Engineering (including broadband network, mobile communication, etc.) (Professional Degree) | | Abstract/Summary: | PDF Full Text Request | | Speech signal processing is increasingly integrated with the research of intelligent computing technology and intelligent robotics,becoming an important branch of intelligent information technology.In real life,interactive systems can be subject to a variety of noise interference,which can significantly degrade system performance,so speech enhancement has a wide range of applications in speech systems.Compared with traditional algorithms,deep neural network-based speech enhancement has a greater advantage,it can fit the nonlinear model more completely,so the current deep neural network algorithm is more widely used.This thesis improves the empirical modal decomposition method and convolutional recurrent neural network to enhance noisy signals at low signal-to-noise ratios using a combination of traditional and deep neural network methods.The main work in this thesis is as follows.Under the condition of low signal-to-noise ratio(SNR),to address the problems of inadequate speech feature extraction and speech enhancement effect of traditional neural network,the combination of Gated Linear Units(GLU)and Convolution Recurrent Neural Network(CRN)is considered to extract speech features that are more Considering that the Empirical Mode Decomposition(EMD)algorithm has better full-band feature analysis capability for arbitrary nonlinear signals,this thesis proposes the Adaptive Mean Median-Empirical Mode Decomposition-Multilayer Gated Convolutional Recurrent Neural Networks(ME-MGCRN)for speech enhancement.MGCRN)speech enhancement model.Using an improved EMD algorithm combined with noise correlation,the processed noisy speech signal is decomposed into low-dimensional and highdimensional signal features,Using the Libri Speech ASR dataset,the performance of the adaptive mean-median empirical modal decomposition algorithm is analysed,and the performance of the ME-MGCRN model is investigated against the baseline model and conventional models in the perceptual evaluation of speech quality(PESQ)and shorttime objective intelligibility(STOI).Objective Intelligibility(STOI)and other evaluation metrics.The study shows that the proposed ME-MGCRN model has improved the evaluation metrics of PESQ and STOI compared with the baseline model and the traditional model,and the best speech enhancement effect is achieved when Huber loss function is used.When the signal-to-noise ratio is-5d B,for different noises,the proposed method in this thesis outperforms GCRN by at least 0.02 in PESQ and 1.1% in STOI,thus proving that the ME-MGCRN model has good denoising effects.To address the problem that the current neural network does not make full use of the different dimensional information in the time domain part,and at the same time further improve the model enhancement performance,the research on MGCRN network continues,taking advantage of the Temporal Convolutional Network(TCN)to compute the low-level features of CNN and combining the Feature Fusion Module(FFM)for different dimensional The Adaptive Mean Median-Empirical Mode DecompositionMultilayer Gated Feature Fusion Module Convolutional Recurrent Neural Networks(ME-MGFCRN)model is proposed by combining the sensitivity of the FFM to features of different dimensions.The experiments use the Libri Speech ASR dataset to analyse the performance of the TCN module,investigate the advantages of the feature fusion module,models in terms of PESQ and Frequency-weighted Segmental Signal-to-noise Ratio(fw Seg SNR)evaluation metrics.The generalizability for different noise models is discussed.It is experimentally demonstrated that the ME-MGFCRN model has good results on both subjective and objective evaluation metrics,and the best results are obtained when W-SDR is used as the loss function.Compared with the ME-MGFCRN model,for different noises,the objective evaluation index fw Seg SNR improves by a minimum of 0.86 d B and PESQ improves by a minimum of 0.02 compared to other limit models at a signal-to-noise ratio of-5d B;by comparing the subjective evaluation,it can be seen that for different noises,the subjective scores are above 2.75 at SNR=-4d B,thus proving that the ME-MGFCRN model has good generalisation.The algorithm can be widely used in the field of human-computer interaction such as in-vehicle central control speech and intelligent furniture,promoting the development of speech enhancement. | | Keywords/Search Tags: | low signal-to-noise ratio, speech enhancement, EMD, GCRN, ME-MGCRN, TCN, feature fusion module, ME-MGFCRN | PDF Full Text Request | Related items |
| |
|