Font Size: a A A

Research On High Sampling Rate Speech Enhancement Model Based On Deep Neural Network

Posted on:2022-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:Z H ZhangFull Text:PDF
GTID:2518306572958569Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
Speech enhancement is an important part of speech signal pre-processing.It is widely used in hearing AIDS,voice communication,human-computer interaction and other fields.The development of intelligent devices drives the improvement of hardware specifications in the consumer field,and the use of microphone with high sampling rate has become a development trend.It is of great practical significance to study the speech enhancement with high sampling rate.In this paper,a speech enhancement algorithm with a single channel sampling rate of 48 k Hz is studied,and a speech enhancement method combining deep learning and traditional methods is proposed.In traditional speech enhancement methods,the minimum mean square error(MMSE)spectrum amplitude estimator has high performance and good robustness.The effectiveness of this method is mainly affected by the accuracy of prior SNR and posterior SNR estimation.Since it is difficult for traditional methods to accurately estimate the prior SNR of speech signals with non-stationary noise.This paper uses Deep Xi prior SNR estimation framework to replace the traditional methods to complete the estimation of the prior SNR.Deep Xi framework uses the powerful nonlinear mapping capability of Deep neural network to map the amplitude spectrum of noisy speech to the prior SNR.The original Deep Xi prior SNR estimation framework uses a short and long time memory network with residual connections to realize the prior SNR estimation.But this network has the disadvantages of poor parallelism and high resource consumption.In this paper,we propose to estimate the prior SNR of 48 k Hz speech signals by using three network structures: temporal convolutional network,multi-branch temporal convolutional network and multi-head attention mechanism.The minimum mean square error short-time amplitude spectrum estimator and the minimum mean square error log-amplitude spectrum estimator are usually used to estimate the amplitude spectrum of pure speech signals,both of which do not take into account the masking effect of auditory system.In this paper,a weighted Euclidean distortion measure is used to improve the amplitude spectrum estimator,which takes into account the masking effect of human auditory system.The improved amplitude spectrum estimator using weighted Euclidean distortion measure has better speech enhancement performance under most conditions compared with the shorttime amplitude spectrum estimator and the logarithmic amplitude spectrum estimator,and can effectively reduce the residual noise.In this paper,speech and noise data at 48 k Hz sampling rate are used to construct training set,verification set and test set for model training,validation and testing.Experimental results show that the proposed speech enhancement method has obvious performance improvement compared with the traditional speech enhancement method.For the accuracy of the prior SNR estimation,the multi-branch time convolutional network can effectively expand the field of perception,and its performance is obviously better than that of the time convolutional network.Multi-head attention mechanism can effectively improve the accuracy of prior SNR estimation,and its performance has obvious advantages compared with time convolutional network and multi-branch time convolutional network.For different amplitude estimators,the comprehensive performance of the minimum mean square error spectrum estimator improved by weighted Euclidian distortion measure is slightly better than that of the minimum mean square error logarithmic spectrum estimator and the minimum mean square error short-time spectrum estimator.
Keywords/Search Tags:high sampling rate speech enhancement, prior SNR estimation, minimum mean square error estimator, time convolutional network, attentional mechanism
PDF Full Text Request
Related items