Font Size: a A A

Speech Separation Method And Implementation Based On Deep Learning

Posted on:2022-06-12Degree:MasterType:Thesis
Country:ChinaCandidate:X Y WangFull Text:PDF
GTID:2518306752497364Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As the need for intelligent voice control grows,the importance of speech recognition also rises.However,in complex real environment,speech is affected by various interference,resulting in an influence on the performance of speech recognition.Speech separation provides clear and pure speech for voice interaction applications and becomes an indispensable frontend processing of speech recognition.Traditional denoising methods based on signal processing do not use the fundamental feature of speech: harmonic and pitch,and thus their performance limited.On the other hand,there are still some difficulties with speech separation method based on Computational Auditory Scene Analysis.For example: the unvoiced sound part is difficulty to separate and the pitch feature is vulnerable to interference.With the development of deep learning in recent years,increasing number of deep neural networks are applied to speech separation.Deep neural networks are good at modeling deep features in speech signal by virtue of multi-layer non-linear structures.Therefore,the application of deep learning to speech separation has important significance.Although deep learning-based methods perform well in speech separation,there is still the problem of phase separation.The main aim of speech separation currently is amplitude separation and phase of mixed speech is used when reconstructing separated speech.The phase of a single voice is continuously fading while the continuity gradient of the mixed voice phase is affected.It may cause deterioration in the sound quality of separated speech and crosstalk may occur in these speeches.In order to solve this problem,this thesis uses the Griffin-Lim algorithm to reconstruct separated speech.The Griffin-Lim algorithm uses amplitude estimates of the model for waveform reconstruction obtaining continuous and gradual phase and is a solution to the phase separation problem.By this way of reconstructing separated speech,the influence by the phase of mixed speech is prevented and the auditory perceptual quality of separated speech is improved.The main purpose in this thesis is the separation of single channel speech.This thesis designs a phase separation method based on Griffin-Lim algorithm.It takes Mel-frequency cepstral coefficients as the input to conduct mask estimation and after extraction of amplitude spectra and amplitude smoothing,Griffin-Lim algorithm is used to reconstruct the separated speech.The experimental results show that the method in this thesis can effectively remove kinds of noise from the mixed speech and the separated speech still has good intelligibility and perceptual quality even under the conditions of low SNR and paroxysmal noise.This thesis gives a separation method for male and female mixed speech based on deepclustering and Griffin-Lim algorithm.The method takes the Mel-frequency cepstral coefficients and amplitude spectral joint features as inputs into the model and maps speech features and their context to high-dimensional space and conducts clustering of high-dimensional features to get binary masking estimates.Separated speech is reconstructed using Griffin-Lim algorithm after extraction of amplitude spectra.The experimental results show that the method in this thesis can effectively separate the mixed speech of male and female simultaneous speakers,and obtaining clear separated speech for each speaker.This thesis preliminarily designs and implements a speech separation system and validates the above method.This speech separation system can conduct single speech signal and noise separation or male and female speech separation.After reading in the speech file to be separated and selecting the type of separation,the system can output the separated speech,draw the separated speech waveform and play separated voice.Function testing and safety testing of the separation system are conducted to prove that it can work and avoid errors.
Keywords/Search Tags:speech separation, deep learning, Griffin-Lim signal estimation algorithm, long short-term memory
PDF Full Text Request
Related items