Font Size: a A A

Research On Speech Enhancement Technology In Low Signal-to-Noise Ratio Environment

Posted on:2022-08-19Degree:MasterType:Thesis
Country:ChinaCandidate:J XiaoFull Text:PDF
GTID:2518306575469084Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Speech enhancement technology is an important topic in the field of speech signal processing,and its main purpose is to improve the quality of speech and increase its intelligibility.Using speech enhancement technology as a front-end module,it has a wide range of applications in speech recognition systems,speech encoders,hearing aids,and other fields.However,in actual application scenarios,it will inevitably be affected by environmental noise and human voice interference.Therefore,many efficient speech enhancement algorithms have been proposed.The traditional unsupervised methods are represented by speech enhancement algorithms based minimum mean square error,while the supervised methods are represented based on the deep neural network.Due to the promise perforamce,in recent years,speech enhancement algorithms based on deep learning have attracted widespread attentions.This thesis focuses on the single-channel speech enhancement algorithms and the multi-channel speech enhancement algorithms using supervised approaches.The research contents are divided into the following two aspects:1.In the single-channel speech enhancement,a speech enhancement algorithm based on a joint weighted loss function is proposed to address the problem of severe performance degradation of the speech enhancement algorithm in a low signal-to-noise ratio environment.First,a weighted loss function that represents the voice distortion error and the residual noise error is developed,where a fixed weighting factor ? is utilized to balance the voice distortion and noise suppression.To perform denosing task,a convolutional encoder decoder-long short-term memory(CED-LSTM)network is proposed,which uses the feature extraction ability of the convolutional encoder decoder network,and the long short-term memory network can use the characteristics of longterm context information.In the meantime,a large number of experimetns determines the optimal wieghting value ?.The results show that the performance of the proposed loss is significantly better than the losses of the scale-invariant signal-distortion ratio and the mean square error in the short-time Fourier transform domain,and it also presents a good generalization ability in a low signal-to-noise ratio environment.2.In the case of multi-channel speech enhancement,the problem of multi-channel speech enhancement in which human voice interference and background noise coexist is studied,and a multi-channel speech enhancement algorithm based on sub-array configuration is proposed.To that aim,first,the microphone array is divided into two sub-arrays and then beamformings are performed to remove the human voice interference and background noise.However,beamforming alone cannot effectively suppress the unwanted signals and to further enhance the speech,convolutional neural network(CNN)are developed at each sub-array,where the weight sharing is also conducted.The experimental results show that the performance indicators of the proposed algorithm at various signal-to-noise ratios is significantly improved compared to single-channel CNN and single-channel LSTM network.
Keywords/Search Tags:Speech enhancement, deep learning, low signal-to-noise ratio, joint weighted loss function, sub-array
PDF Full Text Request
Related items