Font Size: a A A

Research On Deep Learning Based Monaural Speech Enhancement

Posted on:2022-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:H F LiFull Text:PDF
GTID:2518306737976509Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The purpose of speech enhancement technology is to improve the perceptual quality and intelligibility of speech by removing non-speech noise in audio or removing speech reverberation.Speech enhancement has a wide range of applications in communication systems,speech recognition,hearing aids and other scenarios.Originally,speech enhancement is a research topic in the field of digital signal processing.With the development of deep learning technology,more and more studies begin to focus on utilizing deep learning methods to deal with speech enhancement tasks.Benefiting from the powerful modeling capabilities of neural networks,the current speech enhancement methods based on supervised neural network achieve overwhelming advantages over traditional digital signal processing based methods.However,deep learning based monaural speech enhancement can still be improved.In order to further improve the performance of speech enhancement,the relative loss which is more in line with human senses and the ?-law spectrum GAN are proposed respectively.The mean square error is commonly used as the objective function for supervised training of speech enhancement neural networks,but the way mean square error measures the error does not conform to human sensory response.For this reason,the relative loss is proposed,which introduces reference magnitude of spectrum to error calculation.The relative loss is more in line with human senses.Experimental results show that the relative loss achieves higher objective metrics such as perceptual evaluation of speech quality and shorttime objective intelligibility than mean square error.Moreover,inspired by general adversarial networks and speech compression algorithm,the ?-law spectrum GAN is proposed for speech enhancement.The ?-law spectrum GAN introduces a novel trainable spectrum compression layer into the discriminator to help it distinguish true and fake examples,and the trainable spectrum compression layer is used to constrain the generator's training so that it can better know the difference between the generated spectrum and the clean spectrum.Experimental results show that ?-law spectrum GAN surpasses current state-ofthe-art monaural speech enhancement studies on various evaluation criteria.
Keywords/Search Tags:Monaural speech enhancement, Relative loss, ?-law spectrum GAN, Deep learning, Digital signal processing
PDF Full Text Request
Related items