Font Size: a A A

Research And Design Of Speech Separation Algorithm Based On Deep Learning

Posted on:2021-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:M LiFull Text:PDF
GTID:2428330623468137Subject:Software engineering
Abstract/Summary:PDF Full Text Request
After decades of development of speech separation,considerable progress has been made.The rise of deep learning has greatly promoted the further development of speech separation.This thesis studies the speech separation algorithm in the case of a single channel,that is,the target speech is separated from the mixed speech collected by a single microphone.This thesis mainly studies the use of deep learning technology to improve the speech separation algorithm.After an in-depth analysis of the modeling ideas of the current speech separation algorithm,two main points of improvement can be found:First,when human speech and noise are separated,multi-layer perceptrons have powerful feature extraction capabilities in processing speech.But the fixed and limited windows are generally used to model the speech data,which not only increases the dimension of the input data,but also ignores the timing correlation of the speech,resulting in insufficient utilization of the feature information contained in the mixed speech signal.Second,when separating mixed speech that two people are saying at the same time,most of the current algorithms need to track two types of speech features in the mixed speech at the same time.It is difficult for the human to hear the speeches of two people at the same time.It is difficult to design an effective separation model unlike the common human auditory cognition.This thesis studies the above problems and proposes corresponding solutions.The main contributions are as follows:1.Deeply studied the modeling method of speech separation algorithm,discussed the structural characteristics of different neural networks,and proposed a speech and noise separation algorithm based on deep recurrent neural network.Based on the use of a composite neural network,a feature frame stitching strategy based on the attention mechanism is designed and implemented.The attention mechanism is used to extract contextual information,and then the weighted sum is used to form a feature frame,which is connected with the current input as the input feature of the deep recurrent neural network.This method effectively captures the temporal correlation of the speech signal context.In addition,an additional mask layer is added to the model to constrain the output of the deep recurrent neural network,which improves the accuracy of the estimated speech.The effectiveness of the model is verified by designing comparative experiments,and the performance of the algorithm under unknown noise is found to be more robust than the comparative experiments.2.From the perspective of auditory perception theory,a speaker information extraction mechanism with strong flexibility is designed,and a multi-person speech separation algorithm based on a multi-layer attention mechanism is proposed.In this model,the speaker information is introduced to generate voice print features,and a multi-layer iterative mechanism is designed.The attention mechanism is used to calculate the similarity between the voice print features and the mixed speech features to obtain a clearer separation of the target speech.Experimental results show that the proposed algorithm has higher separation performance than non-specific speaker separation.Compared with related work,it has generally improved in multiple speech indicators and is more complex.It has a more excellent effect in the mixed human speech environment.
Keywords/Search Tags:single channel, speech separation, deep neural network, attention mechanism, noise
PDF Full Text Request
Related items