Speech Separation Method And Implementation Based On Deep Learning

Posted on:2022-06-12

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Wang

Full Text:PDF

GTID:2518306752497364

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

As the need for intelligent voice control grows,the importance of speech recognition also rises.However,in complex real environment,speech is affected by various interference,resulting in an influence on the performance of speech recognition.Speech separation provides clear and pure speech for voice interaction applications and becomes an indispensable frontend processing of speech recognition.Traditional denoising methods based on signal processing do not use the fundamental feature of speech: harmonic and pitch,and thus their performance limited.On the other hand,there are still some difficulties with speech separation method based on Computational Auditory Scene Analysis.For example: the unvoiced sound part is difficulty to separate and the pitch feature is vulnerable to interference.With the development of deep learning in recent years,increasing number of deep neural networks are applied to speech separation.Deep neural networks are good at modeling deep features in speech signal by virtue of multi-layer non-linear structures.Therefore,the application of deep learning to speech separation has important significance.Although deep learning-based methods perform well in speech separation,there is still the problem of phase separation.The main aim of speech separation currently is amplitude separation and phase of mixed speech is used when reconstructing separated speech.The phase of a single voice is continuously fading while the continuity gradient of the mixed voice phase is affected.It may cause deterioration in the sound quality of separated speech and crosstalk may occur in these speeches.In order to solve this problem,this thesis uses the Griffin-Lim algorithm to reconstruct separated speech.The Griffin-Lim algorithm uses amplitude estimates of the model for waveform reconstruction obtaining continuous and gradual phase and is a solution to the phase separation problem.By this way of reconstructing separated speech,the influence by the phase of mixed speech is prevented and the auditory perceptual quality of separated speech is improved.The main purpose in this thesis is the separation of single channel speech.This thesis designs a phase separation method based on Griffin-Lim algorithm.It takes Mel-frequency cepstral coefficients as the input to conduct mask estimation and after extraction of amplitude spectra and amplitude smoothing,Griffin-Lim algorithm is used to reconstruct the separated speech.The experimental results show that the method in this thesis can effectively remove kinds of noise from the mixed speech and the separated speech still has good intelligibility and perceptual quality even under the conditions of low SNR and paroxysmal noise.This thesis gives a separation method for male and female mixed speech based on deepclustering and Griffin-Lim algorithm.The method takes the Mel-frequency cepstral coefficients and amplitude spectral joint features as inputs into the model and maps speech features and their context to high-dimensional space and conducts clustering of high-dimensional features to get binary masking estimates.Separated speech is reconstructed using Griffin-Lim algorithm after extraction of amplitude spectra.The experimental results show that the method in this thesis can effectively separate the mixed speech of male and female simultaneous speakers,and obtaining clear separated speech for each speaker.This thesis preliminarily designs and implements a speech separation system and validates the above method.This speech separation system can conduct single speech signal and noise separation or male and female speech separation.After reading in the speech file to be separated and selecting the type of separation,the system can output the separated speech,draw the separated speech waveform and play separated voice.Function testing and safety testing of the separation system are conducted to prove that it can work and avoid errors.

Keywords/Search Tags:

speech separation, deep learning, Griffin-Lim signal estimation algorithm, long short-term memory

PDF Full Text Request

Related items

1	Speech Separation Technology Based On Deep Learning
2	Research On Speech Separation Algorithm Based On Traditional Method And Deep Learning Method
3	Research And Application Of The Short-term Memory Network For Adjusting Gate Length
4	Deep Learning For Spoken Term Detection
5	Research On FBMC Modulation Signal Detection Technology Based On Deep Learning
6	Speech Emotion Recognition Based On Deep Learning Technology
7	Research On Algorithms Of Speech Sentence Recognition Based On Deep Learning
8	Complex Target RCS Estimation Based On Deep Learning
9	Research On Fall Detection Based On Long Short-term Memory Artificial Neural Network And Wrist Sensor
10	Research On Tibetan Lhasa Dialect Speech Recognition Based On Deep Learning