Font Size: a A A

Research On Fully Convolutional Neural Network Based Speech Enhancement Algorithm In The Time Domain

Posted on:2021-08-22Degree:MasterType:Thesis
Country:ChinaCandidate:H Y DongFull Text:PDF
GTID:2518306110497274Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The purpose of speech enhancement is to extract the original clean speech with complete content and correct semantics from the noisy speech.As a key preprocessing module in human-computer interaction,speech communication and other systems,speech enhancement algorithm plays an important role in the performance of the system.As speech technology becomes more and more widely used,there are higher requirements for speech enhancement algorithms.However,due to the diversity of noise environments,traditional fully convolutional neural networks based speech enhancement in the time domain use time-domain features as training targets.As a result,it is difficult to separate noise from clean speech by traditional methods,and the speech enhancement algorithm does not have good generalization ability in different noise environments.In order to solve this problem,this paper studies and improves the traditional fully convolutional neural networks based speech enhancement algorithm in the time domain.The main research contents of this paper are as follows:(1)Summarized the knowledge of signal processing related to speech enhancement by consulting literature,reference books,and read classic and leading-edge papers in the field of speech enhancement.Understood the history and future mainstream development direction of speech enhancement technology.Explored the core ideas and structure of fully convolutional neural networks.And researched the fully convolutional neural network based speech enhancement,then analyze its advantages and disadvantages.(2)The traditional fully convolutional neural networks based speech enhancement in the time domain uses time domain loss,resulting in distortion or residual noise in the enhanced speech,and because the design of loss function does not consider the harmonic structure of speech,the speech enhancement algorithm will fail in low SNR environment.In this paper,a speech enhancement algorithm based on harmonic loss in frequency domain is proposed.In this method,the clean speech is modeled by harmonic noise model(HNM),and the HNM component in frequency domain after modeling is taken as the training target in loss function.The fully convolution neural network(FCN)used for speech enhancement in time domain is trained by minimizing the frequency-domain harmonic loss function.Experimental results showed that the improved algorithm can effectively remove the noise between harmonics under the condition of low SNR,and improve the speech quality and intelligibility significantly.(3)In order to make the time domain speech enhancement algorithm have a good denoising effect under various noise conditions.For the time domain speech enhancement algorithm based on the harmonic loss function(FCN-HR)proposed in this paper,there is a problem that the enhancement model trained with a fixed residual coefficient of speech is not strong enough to adapt to different noise environments.First,the parameters in the FCN-HR algorithm are optimized to find the optimal parameter model under different signal-to-noise ratios.Then combining the optimal parameter model with the signal-to-noise ratio perception classifier to propose a time-domain speech enhancement algorithm based on signal-to-noise ratio perception.Experimental results show that the proposed algorithm has good adaptability to different noise environments and can achieve satisfactory denoising effect under various signal-to-noise ratio conditions.
Keywords/Search Tags:Speech Enhancement in Time Domain, Fully Convolutional Neural Network, Harmonic Noise Model, Harmonic Loss Function in Frequency Domain, Signal-to-Noise Ratio Classification
PDF Full Text Request
Related items