Font Size: a A A

Single Channel Speech Separation Technology Based On Deep Neural Network And Research

Posted on:2020-12-30Degree:MasterType:Thesis
Country:ChinaCandidate:L Q FengFull Text:PDF
GTID:2428330575453370Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
As the most important means of human communication,language has a wide range of applications in real life,but the voice is vulnerable to the surrounding environment during the communication process,resulting in the loss of voice quality.Therefore,speech separation technology is particularly important.Speech separation separates the separation of human voice and noise,the separation of human voice and human voice,and there is only one microphone device in most application scenarios in life.The main research work in this paper is the separation of human voice and noise in single channel.In recent years,deep learning has been widely used in the field of speech signal processing.Among them,Deep Neural Networks(DNN)has shown strong advantages in the field of speech separation.Hierarchical nonlinear processing makes DNN have powerful ability to represent learning,but since the speech signal is an unsteady,time-varying signal,there is still noise interference and inaccurate noise estimation in the speech separated by DNN.The problem affects the quality of the speech signal.For the problem of noise interference in the traditional DNN speech separation,the following two improved methods are proposed:(1)In the DNN pre-training phase,network pre-training is performed with a restricted Boltzmann machine to minimize the error between the speech features of the network output and the pure speech features,and the error is updated by the back propagation algorithm.The weight,after a series of update iterations,obtains the trained DNN speech separation model.(2)By combining DNN with spectral subtraction,the noise energy of each time-frequency unit in the corresponding time-frequency block is estimated by spectral subtraction based on the principle that the energy of adjacent time-frequency units of speech signals has continuity.The noise energy spectrum is estimated by subtracting the energy power spectrum of mixed speech signals to achieve the purpose of separating speech from noise.The quality of the speech separation training target is related to the quality of the synthesized target speech.The ideal binary masking and the ideal ratio masking are used asthe training targets,and experiments are performed on the same experimental parameter settings and the same data set.In this paper,two experiments are completed,each of which is done by selecting the same data set in the standard speech database TIMIT using DNN.The experimental results show that the method 1 significantly reduces the error between the output speech and the pure speech.The intelligibility and similarity coefficients of the separated speech have been significantly improved.Method 2 makes the intelligibility and signal-to-noise ratio of the separated speech significantly improved,and the signal separating the speech is closer to the signal of the pure speech.
Keywords/Search Tags:Speech separation, Single channel, Deep neural network, Spectral subtraction, Training targets
PDF Full Text Request
Related items