Study On Single-Channel Speech Enhancement Under Supervised Learning Condition

Posted on:2018-06-12

Degree:Doctor

Type:Dissertation

Country:China

Candidate:L Zhang

Full Text:PDF

GTID:1318330512982670

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Language is a convenient and important communication tool between humans and machines through speech signals,which are media to spread information.However,the speech signals which carry the information are always damaged by various interferers and noises in real life and production environments.Since the speech structures have been destroyed and some other interferer components have been added,the degraded speech signals will not only reduce the subjective perceived quality for the people,but also decrease the intelligibility of speech information both for human and machines.The purpose of speech enhancement is going to recover the speech signals from the degraded signals through suppressing and removing the interferers,noises and even the refraction parts.Therefore,the perceived quality and intelligibility of the speech have been improved.Speech enhancement can be categorized into three ways including speech dereverberation,separation and denoising according to different pollution sources as speech signals self refraction,other speakers speech signals and environmental noises.Speech enhancement algorithms can also be classified into single-channel base methods and multi-channel based methods because of using different channels or microphones.The single-channel enhancement algorithms are the basics for speech enhancement,which are usually used with multi-channel methods.Then single-channel speech enhancement research is extensive and very important.In recently years,with the quick popularity of internet especially mobile internet and smart device,it is much more convenient to collect the speech data.Thus it provides rich data materials for trying various training methods to enhance the speech.Based on above background,with the limitation of conventional speech enhancement method to extend and handle nonstionary noises,this dissertation focuses on single-channel speech enhancement under the condition of supervised learning.With recently raised and developed dictionary learning(DL)and sparse representation(SR)theories and methods,this dissertation propose three novel different algorithms for speech enhancement among the research fields of single-channel speech denoising and speech dereverberation and denoising.The main works and innovations are as follows:Firstly,we propose a single-channel noise reduction algorithm based on discriminant joint dictionary learning(JDL).There are always two important issues to realize the speech denoising by the enhancement algorithms based DL and SR:First,how to improve the discriminant and distinction abilities between the learned speech and interferer dictionaries;second,how to maintain the consistent of SR both in the training and enhancement stages.We propose a new discriminant JDL method by combing the cross SR error terms and the inter-atomic correlation terms between different dictionaries in the cost function to promote the discrimination and distinction between different dictionaries in this dissertation.Therefore,it helps to improve the accuracy of SR.In addition,through the usage of mixed signals combing clean speech and noise data,we unify the SR method used both in the training and enhancement stages,to maintain the consistency.Through these two aspects of improvement,we have achieved a better speech enhancement algorithm in this dissertation.Secondly,since general single-channel noise reduction algorithms based on DL and SR only take use of the time-frequency(TF)magnitude spectrum of signals,they have not fully dug out the relationships and links between the speech and interferer existing in the noisy signal.In this dissertation,the concepts of ratio mask(RM)and mask dictionary is proposed to make full use of the implied relationship that the sum of RMs of speech and interferer in the TF magnitude spectrum of the noisy signal is approximately 1.That is mainly guaranteed by the sparsity property of speech and interferer components in the TF domain.Then,we propose a JDL algorithm combining the TF magnitude spectrum and the RM information based on DL and SR method to learn the signal dictionary and mask dictionary for both speech and interferer.Next,the composite dictionary combined signal dictionary and mask dictionary is used to sparsely represent the noisy signal and the mixed RM.After that,the obtained SR coefficients are combined with the corresponding signal and mask dictionary to develop two different soft mask(SM)filters to complete the final noise reduction.The experimental results verify the effectiveness of the proposed algorithm.Finally,this dissertation studies the single-channel speech dereverberation and reduction problem,and points out that there exist computation convergence and complexity problems for a long room impulse response(RIR)in previous single-channel speech dereverberation and noise reduction algorithms based on nonnegative matrix factorization or non-negative convolution model.To relieve the problem,we creatively propose a two-stage model based sparse representation method to implement the single-channel speech dereverberation and noise reduction.The key point of proposed algorithm is to decompose a long RIR into convolution of two shorter RIRs,and then to achieve the stepwise sequential process of the two RIR model parameters.Finally,we design the filters based on the estimated parameters to enhance the speech from the noisy reverberant signals.In addition,this dissertation draws on the advantages of ensemble learning and fusion algorithms to design two different fusion algorithms to achieve better enhancement.The experimental results verify the effectiveness of the proposed algorithms.

Keywords/Search Tags:

Single-channel speech enhancement, supervised learning condition, speech denosing, speech dereverberation and denoising, joint dictionary learning, nonnegative matix factorization, nonnegative matix convolution

PDF Full Text Request

Related items

1	Research On Single Channel Speech Enhancement Algorithm Based On Supervised Learning
2	Study On Speech Enhancement Based On NMF Algorithm
3	Speech Enhancement Based On Sparse Representation And Dictionary Learning
4	Speech Enhancement Using Nonnegative Matrix Factorization With The Constrained Speech Spectrum
5	Research On Sparse Representations And Deep Learning Based Supervised Speech Enhancement
6	Single Channel Speech Enhancement And Separation
7	Research On Two Methods Of Single Channel Speech Separation
8	Single Channel Speech Enhancement Based On Nonnegative Matrix Factorization
9	Research On The Improvement Of Speech Enhancement Algorithm Based On Sparse Representation And Dictionary Learning
10	Speech Denoising Using Joint Dictionary Learning And Sparse Representation