Font Size: a A A

Research On Speech Enhancement Optimization Based On Evaluation Net

Posted on:2022-09-20Degree:MasterType:Thesis
Country:ChinaCandidate:Q ZhangFull Text:PDF
GTID:2518306509454464Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Speech enhancement aims to improve the quality and intelligibility of speech by using signal processing technology and various algorithms.As the front-end module of speech recognition,it plays an important role in speech interaction,teleconference,listening assistance,military eavesdropping and other scenarios,and has received extensive attention from academia and industry.Compared with traditional methods,the speech enhancement method based on deep learning has outstanding performance in dealing with low signal-to-noise ratio and non-stationary noise,but there are still some shortcomings in some aspects.Speech enhancement methods in the framework of deep learning usually use the mean square error(MSE)as the objective function of optimizing model parameters.However,some studies have shown that enhanced speech with lower MSE scores does not guarantee high speech quality scores.This is because speech evaluation metrics are designed around the human auditory,while MSE only calculates the Euclidean distance of the features related to enhanced speech and clean speech.As a result,the current objective function cannot reflect the auditory perception of the human ear,resulting in the mismatch between loss function and evaluation metrics.However,the commonly used evaluation metrics in the field of speech enhancement are usually highly complex and non-differentiable functions,which cannot be propagated back.Therefore,these metrics cannot be directly used as objective functions to optimize speech enhancement models.To solve these problems,we propose a speech enhancement optimization method based on Evaluation Net.In this paper,Evaluation Net is designed to simulate the speech evaluation metrics,which takes the speech quality perception assessment(PESQ)and short-term objective intelligible degree(STOI)as the training target respectively,so as to obtain the Evaluation Net equivalent to speech evaluation metrics.Evaluation Net will not only reflect the score of evaluation metrics correctly,but also be used as an optimized network to guide the training of speech enhancement network,so as to solve the problem of the mismatch between loss function and evaluation metrics.Secondly,the speech enhancement network is trained in series with the Evaluation Net of fixed weight,so that the evaluation metrics can guide the speech enhancement network training indirectly,and the enhanced network can achieve higher speech quality and intelligibility score.In order to verify the generality of the Evaluation Net model,this paper constructs the speech enhancement network in the frequency domain and the time domain respectively and proves it through experiments.The experimental results show that the optimized speech enhancement model has better performance on both PESQ and STOI than the MSE loss function trained speech enhancement network.
Keywords/Search Tags:speech enhancement, evaluation net, speech quality assessment, hearing optimization
PDF Full Text Request
Related items