Font Size: a A A

Single Channel Speech Enhancement Based On Deep Neural Network

Posted on:2021-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:B LiFull Text:PDF
GTID:2428330614968306Subject:Engineering
Abstract/Summary:PDF Full Text Request
Speech enhancement is one of the most challenging tasks in the field of speech signal processing.With the influx of intelligent terminal devices into people's lives,the performance of traditional speech enhancement methods has been unable to meet people's needs.Most of the traditional methods are based on unsupervised learning,using non speech segments to estimate noise,and subtracting the estimated noise from speech segments.However,the noise in real life is full of randomness,and some assumptions made by traditional methods are often inconsistent with the actual noise scene.Traditional enhancement methods have some problems,such as residual music noise,speech signal destruction,and poor ability of non-stationary noise suppression,which make it difficult to be widely used in the actual noise scene.In recent years,due to the improvement of computing power and large amounts of data,deep neural network(DNN)has achieved good results in the field of image and speech,which lays a foundation for its development in speech enhancement tasks.The DNN method for nonlinear mapping of noisy and clean speech signals is based on a large number of paired training corpus.DNN method is not sensitive to the problem of time dependence,while CNN method guarantees the dependence of time and frequency dimension by convolution,and based on CNN,a series of research on the improvement of speech enhancement performance is carried out.First of all,we introduce the most extensive additive noise model,and propose a CNN speech enhancement method based on logarithm amplitude spectrum for this kind of noise.The log amplitude spectrum of noisy speech and clean speech is used as the input and annotation data of the network.The network estimates the log amplitude spectrum of clean speech signal by predicting log masking,which can not only use time-dependent information,but also learn the complex nonlinear relationship between input and output.Secondly,the method based on logarithmic amplitude spectrum ignores the influence of the phase factor on the recovery of time-domain signal.We indirectly recover the phase information by processing the real part spectrum and the virtual part spectrum.Because it is difficult to train real part spectrum and virtual part spectrum directly,the way of compressing real part and virtual part mask is proposed.In addition,multi input information is proposed,and single task and multi task are used to train the network.In addition,directly using the network for speech enhancement can significantly eliminate the background noise,but it will cause uncomfortable hearing damage to the speech segment.Therefore,a training method called Share Net is proposed to solve speech impairment while reducing noise.Finally,a set of training and inference software framework is proposed,and the training process is run on the server with a graphics card,and the inference process is run on the raspberry pi hardware system.
Keywords/Search Tags:Speech Enhancement, Convolutional Neural Network, Encoder-Decoder, Phase Correlation, Shared Network
PDF Full Text Request
Related items