Font Size: a A A

End-to-end Speech Enhancement Technique Based On Residual Neural Network

Posted on:2021-11-03Degree:MasterType:Thesis
Country:ChinaCandidate:D J WangFull Text:PDF
GTID:2518306470968949Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,social ways related to "sound",such as live video and voice chat,are becoming more and more popular,and the quality of speech communication is receiving more and more attention.In the process of speech communication,the speech signals are always interfered by various background noises,which leads to the degradation of the quality of speech communication.Consequentially,the performance of many speech processing systems is seriously affected.In order to effectively improve the performance of systems and the quality of speech communication,speech enhancement techniques have been widely studied.As a kind of newly emerging deep neural network,residual neural networks have added residual modules by using skip connections to alleviate the problem of gradient disappearance caused by increasing depth in deep neural networks.At present,residual neural networks have achieved good performance in image classification and object recognition.Therefore,residual neural networks are applied to achieve speech enhancement in this thesis.The speech features of the transform domain are used as the inputs of neural networks in most speech enhancement techniques.That not only requires signal conversion,but also usually ignores the phase information.Thus,in this thesis,the raw waveform of the speech signal is chosen as the input of the residual neural network to implement end-to-end(i.e.waveform-in and waveform-out)speech enhancement technology.The main research contents are given as follows:1.An end-to-end single-channel speech enhancement method based on the residual neural network is proposed.Most existing speech enhancement methods are performed in the transform domain,and feature extraction is performed on the speech signal through operations like Fourier transform.However,this process may destroy the high frequency components and time-domain correlated information of the speech signal.So,this thesis chooses to directly process the raw waveform in the time domain to achieve end-to-end single-channel speech enhancement.Experimental results showed that the proposed method can effectively suppress noise and improve quality of the enhanced speech.2.Microphone array has been widely used in intelligent devices.In order to improve the noise suppression effect of the microphone array,a multi-channel speech enhancement method is proposed by combining Minimum Variance Distortionless Response(MVDR)beamformer and post-filter.In this method,a MVDR beamformer based on complex ratio masking is first developed.After that,the proposed end-to-end single-channel speech enhancement technology is employed as its postfilter to further suppress residual noise.And the the performance of speech enhancement is effectively improved.3.In order to verify the generalization ability and practicability of the proposed end-to-end single-channel speech enhancement based on the residual neural network,this technique is utilized to suppress the noise collected from different life scenarios,and the performance of the technique is measured by different objective quality evaluation methods.In addition,this technique is applied to the Speex codec as a front-end speech enhancement module to improve the quality of speech communication.
Keywords/Search Tags:Speech enhancement, End-to-end model, Residual neural network, Postfilter, Speex codec
PDF Full Text Request
Related items