Font Size: a A A

Research On Speech Separation And Tracking Algorithms Based On LSTM And Clustering Analysis

Posted on:2020-01-05Degree:MasterType:Thesis
Country:ChinaCandidate:H LiuFull Text:PDF
GTID:2428330596994927Subject:Instrument Science and Technology
Abstract/Summary:PDF Full Text Request
At present,voice interaction technologies such as speech synthesis and automatic speech recognition(ASR)have been widely used in real life.But in real environment,with the interference factors such as background noise,multiple speakers' voices and reverberation,it reduces the speaker's hearing and intelligibility,thus affecting the actual effect of voice interaction.Speech separation and tracking technology,in order to solve the problem of obtaining high fidelity and purity target speaker's speech signal from multiple speakers' interference or other background noise,can be applied to conference recording,public security criminal investigation and monitoring and voice identity authentication in noisy environment,and has broad application prospects and research value.In this paper,the theory of speech separation and related algorithms are studied.The related algorithm modules,such as speech separation and voiceprint recognition,are described in detail.The time-frequency masking based on generating countermeasure network to improve speech and speech tracking based on speaker recognition are studied in depth.Its main work is as follows:Firstly,the principle of neural network speech separation algorithm based on time-frequency masking is introduced.The advantages of LSTM in extracting temporal features of speech signal are expounded,and the shortcomings of current supervised speech separation are analyzed.Then,a speech separation method based on generating countermeasure network is adopted.A recursive deduction algorithm and sparse encoding are introduced to improve the generation of time-frequency masking in the speech generation stage,and a discriminator is used to classify the true and false speech signals,so that the generated signals can continuously approach the target speech signals and reduce the disturbance between the signal sources.Then,a speech tracking method based on speaker recognition and clustering analysis is proposed,which only relies on audio information.That is to say,speaker recognition technology is used to track the target speaker's speech.In the part of speaker recognition,the shortcomings of GMM model in a small amount of corpus are analyzed,and then the GMM-UBM speaker model is used to model the target speaker in order to construct the speaker recognition system.In the part of speaker speech separation,the problem of class replacement caused by multiple clustering in K-means clustering process is analyzed,and the increase of speech time frame is verified.Input can avoid the problem of class permutation and improve the method of speaker voice separation,including buffer the centroid in K-means clustering and reduce the sampling rate,in order to improve the real-time performance of voice separation,optimize its loss function,and introduce regular terms into embedding feature space to improve the quality of voice separation.Finally,we use MIR-1K and TIMIT voice data sets to simulate the above methods.The experimental results show that the generation of countermeasure network has strong ability to suppress the introduction of noise(SAR).GMM-UBM model still has high recognition rate in short-term voice testing,and it can effectively optimize the loss function and clustering process in speaker voice separation algorithm.Improve the real-time performance of the algorithm and the quality of speech separation.
Keywords/Search Tags:Speech separation, Speaker recognition, Voice tracking, time frequency masking, speaker recognition, category substitution
PDF Full Text Request
Related items