Font Size: a A A

Research On Speech Signal Preprocessing Based On Deep Learning In Complex Environment

Posted on:2019-10-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:T GaoFull Text:PDF
GTID:1368330551456961Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
In recent years,the AI(Artificial Intelligence)craze triggered by deep learning is affecting and changing people's way of life.People no longer meet the single text,the instruction of human-computer interaction,but look forward to the voice interaction this more convenient and fast communication way.Speech becomes an indispensable infor-mation medium.However,the speech in the actual transmission process,background noise,human voice interference will affect the quality,not only make the speech of the intelligibility and hearing loss,but also for the subsequent application challenges,such as speech recognition,speaker recognition and so on.In the complex application en-vironments,the speech signal preprocessing is particularly important.As the front-end interface of speech application,speech signal preprocessing can be used to subdivide speech enhancement of noise interference and the speech separation of human voice interference processing.Speech enhancement refers to the technique of suppressing noise and extracting useful speech signals from mixed signals when the speech signal is disturbed by noise.In general,speech enhancement considers the type of noise that does not cover the class of voice interference signals.Speech separation is a front-end processing technology used to deal with human voice interference,which is designed to extract the speaker's speech signal and remove other speaker's speech signal,such as"cocktail party problem".For speech enhancement task,it can be divided into traditional speech enhance-ment algorithm and deep learning-based speech enhancement algorithm.Traditional speech enhancement algorithms are mostly unsupervised,and it is often necessary to make certain assumptions about the characteristics of the speech signal and the noise signal and the interaction between them.The traditional unsupervised speech enhance-ment algorithm can deal with the stationary noise better,but it is difficult to deal with the non-stationary noise.Recently,deep learning has been successfully applied in many fields,and has been paid more attention and researched in the field of speech enhance-ment.Early studies have found that speech enhancement algorithms based on deep neural networks(DNN)can achieve significant performance gains compared to tradi-tional speech enhancement algorithms,especially in non-stationary noise conditions.However,the DNN-based supervised speech enhancement algorithm has some gener-alization problems in practical application,such as speech loss,low degree of speech intelligibility and so on,in the face of real noise scene,speech style difference and lowSNR(Signal-to-Noise Ratio).In view of these problems,this dissertation will focus on training data preparation,model fusion and new model structure design to enhance the ability of speech enhancement algorithm based on deep learning in complex real environment.Firstly,in the framework of the existing DNN speech enhancement algorithm,based on the analysis of the training data under low SNR,the voice activity detec-tion(VAD)algorithm is used to process the training data,and the two DNN speech enhancement models with different emphases are obtained.Based on the complemen-tarity between different speech enhancement models,the test phase incorporates two DNN enhancement models through VAD to improve speech enhancement performance at low SNR.So that the model can not only eliminate noise but also retain the necessary target speech.Secondly,aiming at the generalization of the speech enhancement model based on deep learning,this dissertation proposes a new progressive speech enhancement frame-work.The progressive learning under this framework can decompose the speech en-hancement problem in the way of increasing SNR,which makes the function of the network clear.It is different from the traditional "black box" neural network training.Further,densely connected progressive learning is developed in order to improve the learning ability of the model so that it can train a deeper and better speech enhancement model.Progressive learning has been successfully applied in both DNN and long short-term memory(LSTM)network structure,which improves the generalization ability of speech enhancement model in practical application scenarios.For the speech separation task,it can also be divided into traditional speech sep-aration algorithm and deep learning-based speech separation algorithm.The tradi-tional speech separation algorithm is based on the computational auditory scene anal-ysis(CASA).CASA is built on the perceptual theory of auditory scene analysis,using clustering constraints(grouping cue)such as the pitch and other characteristics to track the same speaker's speech.The speech separation algorithm based on deep learning can be subdivided into speaker-dependent and speaker-independent speech separation.The speaker-dependent speech separation based on deep learning has good separation performance.In this dissertation,the speech separation under noisy environment and the lack of target speaker training data are studied under the speaker-dependent scenes.Firstly,in the noisy environment,the speaker-dependent speech separation model con-siders the noise interference and the human voice interference as the interference of the target speech,and uses the neural network to jointly model,and finds that there is complementarity between the two interference in the experimental process.Secondly,aiming at the application scenario of the target speaker training data insufficiency,this dissertation proposes two-stage speech separation scheme to solve the data problem,and carries on the experiment verification on the CHiME-5 real data.
Keywords/Search Tags:speech signal preprocessing, deep learning, speech enhancement, speech separation, voice activity detection, low SNR, progressive learning, CHiME-5 challenge
PDF Full Text Request
Related items