Font Size: a A A

Study On Single Channel Speech Enhancement Algorithm Based On Deep Neural Network

Posted on:2021-04-26Degree:MasterType:Thesis
Country:ChinaCandidate:W Y LiFull Text:PDF
GTID:2428330611454121Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Information exchange plays a vital role in the development of human beings.Voice communication is one of the most commonly used technologies for information exchange.However,the daily voice communication process is susceptible to interference from background noise,room reverberation and echo.These interferences will lead to reduced voice quality,clarity and intelligibility,which will affect user experience and comfort.In order to solve the impact of noise on voice interference,a voice enhancement module must be added at the front end of the voice communication system.Traditional speech enhancement algorithms are mainly based on the assumption of noise stationarity for speech enhancement.Such algorithms often cannot handle non-stationary noise well,resulting in limited enhancement performance of speech enhancement algorithms.In recent years,with the rapid development of computer technology and the continuous maturity of deep learning,the speech enhancement technology based on deep learning has attracted more and more attention from researchers.Compared with the traditional speech enhancement technology,the speech enhancement performance and robustness have been greatly improved.How to design a computing target with better learning effect,an activation function with fast convergence and good convergence effect,has become one of the research hotspots based on deep learning speech enhancement algorithms.This paper takes the deep neural network as an entry point to study the impact of the neural network's calculation goals and activation functions on speech enhancement,and usesthe vocal recognition system to verify the enhanced speech.The reliability and practicality of the speech enhancement technology proposed in this paper stability.First,this paper proposes an exponential-based compression algorithm to compress and limit the Gammtone domain amplitude spectrum,which improves the learning effect of the neural network on the calculation target,and thus enhances the effect of speech enhancement.The Gammtone domain amplitude spectrum in this algorithm has no unreasonable assumption that noise and speech are not independent of each other,and is closer to the actual situation.It can theoretically improve speech enhancement performance;considering that there is no upper limit in the Gammtone domain amplitude spectrum algorithm,it is not conducive to deep nerves.The network learns the calculation target.This algorithm introduces an exponential compression algorithm to compress and limit the Gammtone domain amplitude spectrum and decompress it when predicting the calculation target.Simulation experiment results prove that the algorithm can obviously improve the learning effect of the neural network on the calculation target,and generally improve the performance of speech enhancement.Secondly,in order to improve the network convergence speed and effect without changing the neural network structure,this paper proposes a locally linear controllable Tanh activation function(Parametric Tanh,PTanh).The PTanh activation function takes into account that the Relu activation function converges quickly,and the Tanh activation function has a good convergence effect.Therefore,referring to the idea of? ? Taylor series,the advantages of the Tanh and Relu activation functions are merged.Experimental results show that the Ptanh activation function can improve the convergence speed and effect of deep neural networks.Finally,in order to verify the reliability and stability of the speech enhancement algorithm that uses the improved Gammtone domain amplitude spectrum algorithm as the calculation target and the Ptanh function algorithm as the activation function,this paper designs a set of vocal spectrum recognition system that can be practically applied.Firstly,a model for singing recognition is designed,and the model is trained with pure speech;then deep neural networks are used to perform speech enhancementprocessing on noisy speech;finally,the enhanced speech is subjected to singing recognition.The recognition rate results of vocal score recognition verify the reliability and stability of the improved speech enhancement technology in the actual recognition system.
Keywords/Search Tags:Speech enhancement, Deep neural network, Energy amplitude spectrum, PTanh activation function, Singing spectrum recognition
PDF Full Text Request
Related items