Font Size: a A A

Speech Enhancement Based On Representation Learning

Posted on:2018-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:J W WuFull Text:PDF
GTID:2428330512494314Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
Speech is one of the most important means of human-computer interaction.The intelligibility of speech is benefit for the successful progress of human-computer interaction,especially the recognition performance of speech recognition application system.Therefore,it is of great theoretical and practical value to study speech enhancement to improve the intelligibility of speech signal,and the development of speech enhancement is also a hotspot in the field of speech signal processing.The key point of speech enhancement is to find an effective representation method of speech signal.The effective representation means that the representation method can distinguish the clean speech from noisy speech,and distinguish the different signal components in the speech as well,so as to enhance the interest part of signal while suppress the noise and the part needless.In this paper,we study the speech enhancement method based on representation learning form two perspectives,adaptive dictionary learning and deep neural network.The main contents and contributions of this paper are as follows:(1)The Bayesian adaptive dictionary method based on sparse representation is introduced into the field of speech representation for the first time.The dictionary learning,sparse coefficients representation and noise variance estimation are integrated into a joint procedure of Bayesian posterior estimation using the Beta Process Factor Analysis(BPFA).The parameters are described by probability distribution,which can overcome the shortcomings of the traditional dictionary learning method which are over-dependent on the parameter setting.The experiments of speech enhancement in time domain were executed on NOIZEUS database.The ability of the method to learn dictionary and sparse representation adaptively is discussed.And the results show that the method can remove the environmental noise effectively and improve the human ear hearing experience as well without any noise variance estimation.(2)The study of deep learning shows that the adaptive dictionary method based on sparse representation is a shallow network,which can only extract the low-level features of the signal.However,the high-level features are needed to make the speech enhancement algorithm more robust.In addition,the speech signal appears a strong temporal correlation,which the current adaptive dictionary method is difficult to effectively describe.In view of this,Bidirectional Long Short-Term Memory(BLSTM)recurrent neural network is used to study the relationship between the noisy speech feature and the clean speech feature,so as to make effective use of the temporal correlation of speech signal and high-level semantic features.In this paper,the feature is Mel frequency Cepstrum Coefficient(MFCC),and the experiments were executed on the Chinese database.The results of speech recognition under noisy environment show that the speech enhancement method based on BLSTM has a good robustness background noise.
Keywords/Search Tags:Speech Enhancement, Representation Learning, Dictionary Learning, Recurrent Neural Network
PDF Full Text Request
Related items